Container Images for Large Language Models

openEuler provides container images to support large language models (LLMs) such as Baichuan, ChatGLM, and iFLYTEK Spark.

The provided container images come with pre-installed dependencies for both CPU and GPU environments, ensuring a seamless out-of-the-box experience.

Pulling the Image (CPU Version)

docker pull openeuler/llm-server:1.0.0-oe2203sp3

Pulling the Image (GPU Version)

docker pull icewangds/llm-server:1.0.0

Downloading the Model

Download the model and convert it to GGUF format.

# Install Hugging Face Hub.
pip install huggingface-hub

# Download the model you want to deploy.
export HF_ENDPOINT=https://hf-mirror.com
huggingface-cli download --resume-download baichuan-inc/Baichuan2-13B-Chat --local-dir /root/models/Baichuan2-13B-Chat --local-dir-use-symlinks False

# Convert the model to GGUF format.
cd /root/models/
git clone https://github.com/ggerganov/llama.cpp.git
python llama.cpp/convert-hf-to-gguf.py ./Baichuan2-13B-Chat
# Path to the generated GGUF model: /root/models/Baichuan2-13B-Chat/ggml-model-f16.gguf

Launch

Docker v25.0.0 or above is required.

To use a GPU image, you must install nvidia-container-toolkit. Detailed installation instructions are available in the official NVIDIA documentation: Installing the NVIDIA Container Toolkit.

docker-compose.yaml file content:

version: '3'
services:
  model:
    image: <image>:<tag>   # Image name and tag
    restart: on-failure:5
    ports:
      - 8001:8000    # Listening port number. Change "8001" to modify the port.
    volumes:
      - /root/models:/models  # LLM mount directory
    environment:
      - MODEL=/models/Baichuan2-13B-Chat/ggml-model-f16.gguf  # Model file path inside the container
      - MODEL_NAME=baichuan13b  # Custom model name
      - KEY=sk-12345678  # Custom API Key
      - CONTEXT=8192  # Context size
      - THREADS=8    # Number of CPU threads, required only for CPU deployment
    deploy: # GPU resources, required only for GPU deployment
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

docker-compose -f docker-compose.yaml up

docker run command:

# For CPU deployment
docker run -d --restart on-failure:5 -p 8001:8000 -v /root/models:/models -e MODEL=/models/Baichuan2-13B-Chat/ggml-model-f16.gguf -e MODEL_NAME=baichuan13b -e KEY=sk-12345678 openeuler/llm-server:1.0.0-oe2203sp3

# For GPU deployment
docker run -d --gpus all --restart on-failure:5 -p 8001:8000 -v /root/models:/models -e MODEL=/models/Baichuan2-13B-Chat/ggml-model-f16.gguf -e MODEL_NAME=baichuan13b -e KEY=sk-12345678 icewangds/llm-server:1.0.0

Testing

Call the LLM interface to test the deployment. A successful return indicates successful deployment of the LLM service.

curl -X POST http://127.0.0.1:8001/v1/chat/completions \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer sk-12345678" \
     -d '{
           "model": "baichuan13b",
           "messages": [
             {"role": "system", "content": "You are a openEuler community assistant, please answer the following question."},
             {"role": "user", "content": "Who are you?"}
           ],
           "stream": false,
           "max_tokens": 1024
         }'

Bug Catching

Buggy Content

Bug Description

Submit As Issue

It's a little complicated....

I'd like to ask someone.

Just a small problem.

I can fix it online!

Bug Type

Specifications and Common Mistakes

● Misspellings or punctuation mistakes;

● Incorrect links, empty cells, or wrong formats;

● Chinese characters in English context;

● Minor inconsistencies between the UI and descriptions;

● Low writing fluency that does not affect understanding;

● Incorrect version numbers, including software package names and version numbers on the UI.

Usability

● Incorrect or missing key steps;

● Missing prerequisites or precautions;

● Ambiguous figures, tables, or texts;

● Unclear logic, such as missing classifications, items, and steps.

Correctness

● Technical principles, function descriptions, or specifications inconsistent with those of the software;

● Incorrect schematic or architecture diagrams;

● Incorrect commands or command parameters;

● Incorrect code;

● Commands inconsistent with the functions;

● Wrong screenshots.

Risk Warnings

● Lack of risk warnings for operations that may damage the system or important data.

Content Compliance

● Contents that may violate applicable laws and regulations or geo-cultural context-sensitive words and expressions;

● Copyright infringement.

How satisfied are you with this document

Not satisfied at all

Very satisfied

Submit

Click to create an issue. An issue template will be automatically generated based on your feedback.