Home > Tech > Content

Local Deployment Guide for ChatGLM3-6B Bilingual Language Model

Tech 1

ChatGLM3-6B is a powerful open-source bilingual (Chinese-English) dialogue model based on the General Language Model (GLM) architecture. Developed by Zhipu AI and Tsinghua University, it features 6.2 billion parameters and offers lower deployment requirements compared to larger models. Running this model locally ensures data privacy and full control over the inference environment.

Environment Configuration

Python Environment Setup

ChatGLM3 requires Python 3.7 or higher. Using a package manager like Anaconda or Miniconda is recommended to manage environments and avoid conflicts with system-wide libraries.

# Download the Miniconda installer
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

# Execute the installation
bash Miniconda3-latest-Linux-x86_64.sh

# Initialize the shell environment
source ~/.bashrc

Installing Git LFS

Because the pre-trained model weights are large (exceeding 10GB), Git Large File Storage (LFS) is required to clone the repository successfully.

# For RHEL/CentOS systems
sudo yum install git git-lfs -y

# Initialize Git LFS
git lfs install

Repository and Model Installation

Clone the Codebase

First, clone the official implementation from GitHub and set up a dedicated virtual environment.

git clone https://github.com/THUDM/ChatGLM3.git
cd ChatGLM3

# Create a Python 3.10 environment named 'glm-env'
conda create -n glm-env python=3.10 -y
conda activate glm-env

# Install required dependencies
pip install -r requirements.txt

Retrieve Pre-trained Weights

You can download the weights from Hugging Face or use the ModelScope mirror for faster speeds in specific regions.

# Option A: Hugging Face
git clone https://huggingface.co/THUDM/chatglm3-6b

# Option B: ModelScope (Alternative)
git clone https://www.modelscope.cn/ZhipuAI/chatglm3-6b.git

Model Initialization and Usage

Before running the model, locate the script you intend to use (cli_demo.py, web_demo_plus.py, or openai_api.py). You must update the model path and device configuration. The standard loading logic looks like this:

from transformers import AutoTokenizer, AutoModel

# Update 'local_path' to your downloaded weights folder
local_path = "./chatglm3-6b"
tokenizer = AutoTokenizer.from_pretrained(local_path, trust_remote_code=True)

# Use .cuda() for GPU or .float() for CPU inference
chat_model = AutoModel.from_pretrained(local_path, trust_remote_code=True).half().cuda()
chat_model = chat_model.eval()

Console-Based Interaction (CLI)

To interact with the model via a terminal, modify the cli_demo.py script to point to your local model path and run:

python cli_demo.py

Users can type queries directly. Commands like clear reset the history, and stop exits the session.

Web Interface

ChatGLM3 includes a Gradio or Streamlit-based web interface for a more user-friendly experience. Update the path in web_demo.py and execute:

python web_demo.py

The script will provide a locall URL (e.g., http://127.0.0.1:8501) that can be opened in any browser.

Deploying as an OpenAI-Compatible API

For integration with third-party tools like ChatGPT-Next-Web, use the provided OpenAI format API server. Modify the model path in openai_api.py and start the service:

python openai_api.py

Once the server is running (defaulting to port 8000), you can configure your frontend applications to point to your server IP. In ChatGPT-Next-Web, set the API endpoint to http://<your-ip>:8000 and select chatglm3 as the model name.

Hardware Acceleration Notes

GPU Deployment: Requires NVIDIA drivers and CUDA. Ensure you use .cuda() in the loading script. The 6B model typically requires ~13GB of VRAM in FP16 mode.
CPU Deployment: If VRAM is insufficient, use .float() or .quantize(bits=4).float(). Note that CPU inference is significantly slower.

Tags: ChatGLM3 llm

Back to List

Prev: Defining Custom App Permissions

Next: Convolutional Neural Networks with PyTorch

Fading Coder

Local Deployment Guide for ChatGLM3-6B Bilingual Language Model

Environment Configuration

Python Environment Setup

Installing Git LFS

Repository and Model Installation

Clone the Codebase

Retrieve Pre-trained Weights

Model Initialization and Usage

Console-Based Interaction (CLI)

Web Interface

Deploying as an OpenAI-Compatible API

Hardware Acceleration Notes

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Local Deployment Guide for ChatGLM3-6B Bilingual Language Model

Environment Configuration

Python Environment Setup

Installing Git LFS

Repository and Model Installation

Clone the Codebase

Retrieve Pre-trained Weights

Model Initialization and Usage

Console-Based Interaction (CLI)

Web Interface

Deploying as an OpenAI-Compatible API

Hardware Acceleration Notes

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment