Deploying and Running InternLM Large Language Models: A Practical Guide
Deploying InternLM2-Chat-1.8B for Interactive Dialogue
Environment Setup
Access the InternStudio development platform and create a new development machine. Select the Cuda11.7-conda image and allocate 10% of an A100 GPU. After the machine initializes, open the terminal and execute the environment configuration command:
studio-conda -o internlm-base -t demo
# Alternative manual setup:
# conda create -n demo python==3.10 -y
# conda activate demo
# conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia
Activate the newly created environment and install the necessary Python packages:
conda activate demo
pip install huggingface-hub==0.17.3 transformers==4.34 psutil==5.9.8 accelerate==0.24.1 streamlit==1.32.2 matplotlib==3.8.3 modelscope==1.9.5 sentencepiece==0.1.99
Model Download
Create the working directory and download script:
mkdir -p /root/demo
touch /root/demo/cli_demo.py
touch /root/demo/download_mini.py
cd /root/demo
Populate the download script with the following code:
import os
from modelscope.hub.snapshot_download import snapshot_download
os.system("mkdir /root/models")
model_storage_path = "/root/models"
snapshot_download(
"Shanghai_AI_Laboratory/internlm2-chat-1_8b",
cache_dir=model_storage_path,
revision='v1.1.0'
)
Execute the script to retrieve the model weights:
python /root/demo/download_mini.py
Running the CLI Demo
Create the inference script at /root/demo/cli_demo.py:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
MODEL_PATH = "/root/models/Shanghai_AI_Laboratory/internlm2-chat-1_8b"
tokenizer = AutoTokenizer.from_pretrained(
MODEL_PATH,
trust_remote_code=True,
device_map='cuda:0'
)
language_model = AutoModelForCausalLM.from_pretrained(
MODEL_PATH,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map='cuda:0'
)
language_model = language_model.eval()
SYSTEM_PROMPT = """You are an AI assistant whose name is InternLM.
- InternLM is a conversational language model developed by Shanghai AI Laboratory.
- InternLM can understand and communicate fluently in English and Chinese.
"""
conversation_history = [(SYSTEM_PROMPT, '')]
print("============= InternLM Chatbot Ready. Type 'quit' to exit. =============")
while True:
user_input = input("\nUser >>> ")
user_input = user_input.strip()
if user_input.lower() == "quit":
break
printed_length = 0
for response_text, _ in language_model.stream_chat(tokenizer, user_input, conversation_history):
if response_text:
print(response_text[printed_length:], flush=True, end="")
printed_length = len(response_text)
Run the interactive demo:
conda activate demo
python /root/demo/cli_demo.py
Deploying Character-Finetuned Models
Fine-tuned variants of InternLM2-Chat-1.8B, such as character-based chatbots trained on specific dialogue data, can be deployed for roleplay applications. These models leverage full-parameter fine-tuning on curated datasets.
Environment Preparation
conda activate demo
cd /root/
git clone https://gitee.com/InternLM/Tutorial -b camp2
cd /root/Tutorial
Running the Character Demo
Execute the download script and launch the Streamlit application:
python /root/Tutorial/helloworld/bajie_download.py
streamlit run /root/Tutorial/helloworld/bajie_chat.py --server.address 127.0.0.1 --server.port 6006
Establish SSH port forwarding from your local machine:
ssh -CNg -L 6006:127.0.0.1:6006 root@ssh.intern-ai.org.cn -p YOUR_PORT_NUMBER
Access the web interface at http://127.0.0.1:6006 to interact with the model.
Running Lagent Agent Framework with InternLM2-Chat-7B
Lagent is a lightweight open-source agent framework that enables large language models to function as intelligent agents with tool-calling capabilities.
Key Features
- Streaming output support via
stream_chatinterface - Unified API design supporting OpenAI, Transformers, and LMDeploy backends
- Extensible action system for custom tool integration
Environment Configuration
This task requires 30% A100 GPU allocation. Clone the Lagent repository:
conda activate demo
cd /root/demo
git clone https://gitee.com/internlm/lagent.git
cd /root/demo/lagent
git checkout 581d9fb8987a5d9b72bb9ebd37a95efd47d479ac
pip install -e .
Model Setup and Execution
Create symbolic links to shared model resources:
ln -s /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-7b /root/models/internlm2-chat-7b
Modify the model path in /root/demo/lagent/examples/internlm2_agent_web_demo_hf.py (approximately line 71):
value='/root/models/internlm2-chat-7b'
Launch the agent demo:
streamlit run /root/demo/lagent/examples/internlm2_agent_web_demo_hf.py --server.address 127.0.0.1 --server.port 6006
Configure SSH tunneling as previously described and enable the data analysis option to test tool-calling capabilities.
Deploying InternLM-XComposer2 Multimodal Model
InternLM-XComposer2 is a vision-language model built on InternLM2, capable of text-image composition and visual understanding tasks.
Environment Setup
This task requires 50% A100 GPU allocation. Install additional dependencies:
conda activate demo
pip install timm==0.4.12 sentencepiece==0.1.99 markdown2==2.4.10 xlsxwriter==3.1.2 gradio==4.13.0 modelscope==1.9.5
Clone the repository:
cd /root/demo
git clone https://gitee.com/internlm/InternLM-XComposer.git
cd /root/demo/InternLM-XComposer
git checkout f31220eddca2cf6246ee2ddf8e375a40457ff626
Create symbolic links:
ln -s /root/share/new_models/Shanghai_AI_Laboratory/internlm-xcomposer2-7b /root/models/internlm-xcomposer2-7b
ln -s /root/share/new_models/Shanghai_AI_Laboratory/internlm-xcomposer2-vl-7b /root/models/internlm-xcomposer2-vl-7b
Image-Text Composition Demo
cd /root/demo/InternLM-XComposer
python /root/demo/InternLM-XComposer/examples/gradio_demo_composition.py \
--code_path /root/models/internlm-xcomposer2-7b \
--private \
--num_gpus 1 \
--port 6006
Visual Question Answering Demo
conda activate demo
cd /root/demo/InternLM-XComposer
python /root/demo/InternLM-XComposer/examples/gradio_demo_chat.py \
--code_path /root/models/internlm-xcomposer2-vl-7b \
--private \
--num_gpus 1 \
--port 6006
Access the interface to upload images and submit visual queries.
Appendix: Configuration and Utility Commands
Package Manager Mirror Configuration
For pip, configure a mirror source:
pip install -i https://mirrors.cernet.edu.cn/pypi/web/simple some-package
pip config set global.index-url https://mirrors.cernet.edu.cn/pypi/web/simple
For conda, create or modify ~/.condarc:
cat <<'EOF' > ~/.condarc
channels:
- defaults
show_channel_urls: true
default_channels:
- https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
- https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r
- https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2
custom_channels:
conda-forge: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
pytorch: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
EOF
Model Download Methods
Using Hugging Face Hub:
from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="internlm/internlm2-7b", filename="config.json")
Using ModelScope:
from modelscope import snapshot_download
model_dir = snapshot_download(
'Shanghai_AI_Laboratory/internlm2-chat-7b',
cache_dir='/path/to/models',
revision='master'
)
Symbolic Link Management
To remove a symbolic link:
unlink /root/demo/internlm2-chat-7b
Terminal Session Management
When running Gradio applications, properly terminate the process before starting a new demo to avoid GPU memory exhaustion. Close the terminal tab and open a fresh session for subsequent experiments.