Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Deploying and Running InternLM Large Language Models: A Practical Guide

Tech May 17 2

Deploying InternLM2-Chat-1.8B for Interactive Dialogue

Environment Setup

Access the InternStudio development platform and create a new development machine. Select the Cuda11.7-conda image and allocate 10% of an A100 GPU. After the machine initializes, open the terminal and execute the environment configuration command:

studio-conda -o internlm-base -t demo
# Alternative manual setup:
# conda create -n demo python==3.10 -y
# conda activate demo
# conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia

Activate the newly created environment and install the necessary Python packages:

conda activate demo
pip install huggingface-hub==0.17.3 transformers==4.34 psutil==5.9.8 accelerate==0.24.1 streamlit==1.32.2 matplotlib==3.8.3 modelscope==1.9.5 sentencepiece==0.1.99

Model Download

Create the working directory and download script:

mkdir -p /root/demo
touch /root/demo/cli_demo.py
touch /root/demo/download_mini.py
cd /root/demo

Populate the download script with the following code:

import os
from modelscope.hub.snapshot_download import snapshot_download

os.system("mkdir /root/models")

model_storage_path = "/root/models"

snapshot_download(
    "Shanghai_AI_Laboratory/internlm2-chat-1_8b",
    cache_dir=model_storage_path,
    revision='v1.1.0'
)

Execute the script to retrieve the model weights:

python /root/demo/download_mini.py

Running the CLI Demo

Create the inference script at /root/demo/cli_demo.py:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

MODEL_PATH = "/root/models/Shanghai_AI_Laboratory/internlm2-chat-1_8b"

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_PATH, 
    trust_remote_code=True, 
    device_map='cuda:0'
)

language_model = AutoModelForCausalLM.from_pretrained(
    MODEL_PATH,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map='cuda:0'
)

language_model = language_model.eval()

SYSTEM_PROMPT = """You are an AI assistant whose name is InternLM.
- InternLM is a conversational language model developed by Shanghai AI Laboratory.
- InternLM can understand and communicate fluently in English and Chinese.
"""

conversation_history = [(SYSTEM_PROMPT, '')]

print("============= InternLM Chatbot Ready. Type 'quit' to exit. =============")

while True:
    user_input = input("\nUser >>> ")
    user_input = user_input.strip()
    
    if user_input.lower() == "quit":
        break

    printed_length = 0
    for response_text, _ in language_model.stream_chat(tokenizer, user_input, conversation_history):
        if response_text:
            print(response_text[printed_length:], flush=True, end="")
            printed_length = len(response_text)

Run the interactive demo:

conda activate demo
python /root/demo/cli_demo.py

Deploying Character-Finetuned Models

Fine-tuned variants of InternLM2-Chat-1.8B, such as character-based chatbots trained on specific dialogue data, can be deployed for roleplay applications. These models leverage full-parameter fine-tuning on curated datasets.

Environment Preparation

conda activate demo
cd /root/
git clone https://gitee.com/InternLM/Tutorial -b camp2
cd /root/Tutorial

Running the Character Demo

Execute the download script and launch the Streamlit application:

python /root/Tutorial/helloworld/bajie_download.py
streamlit run /root/Tutorial/helloworld/bajie_chat.py --server.address 127.0.0.1 --server.port 6006

Establish SSH port forwarding from your local machine:

ssh -CNg -L 6006:127.0.0.1:6006 root@ssh.intern-ai.org.cn -p YOUR_PORT_NUMBER

Access the web interface at http://127.0.0.1:6006 to interact with the model.

Running Lagent Agent Framework with InternLM2-Chat-7B

Lagent is a lightweight open-source agent framework that enables large language models to function as intelligent agents with tool-calling capabilities.

Key Features

  • Streaming output support via stream_chat interface
  • Unified API design supporting OpenAI, Transformers, and LMDeploy backends
  • Extensible action system for custom tool integration

Environment Configuration

This task requires 30% A100 GPU allocation. Clone the Lagent repository:

conda activate demo
cd /root/demo
git clone https://gitee.com/internlm/lagent.git
cd /root/demo/lagent
git checkout 581d9fb8987a5d9b72bb9ebd37a95efd47d479ac
pip install -e .

Model Setup and Execution

Create symbolic links to shared model resources:

ln -s /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-7b /root/models/internlm2-chat-7b

Modify the model path in /root/demo/lagent/examples/internlm2_agent_web_demo_hf.py (approximately line 71):

value='/root/models/internlm2-chat-7b'

Launch the agent demo:

streamlit run /root/demo/lagent/examples/internlm2_agent_web_demo_hf.py --server.address 127.0.0.1 --server.port 6006

Configure SSH tunneling as previously described and enable the data analysis option to test tool-calling capabilities.

Deploying InternLM-XComposer2 Multimodal Model

InternLM-XComposer2 is a vision-language model built on InternLM2, capable of text-image composition and visual understanding tasks.

Environment Setup

This task requires 50% A100 GPU allocation. Install additional dependencies:

conda activate demo
pip install timm==0.4.12 sentencepiece==0.1.99 markdown2==2.4.10 xlsxwriter==3.1.2 gradio==4.13.0 modelscope==1.9.5

Clone the repository:

cd /root/demo
git clone https://gitee.com/internlm/InternLM-XComposer.git
cd /root/demo/InternLM-XComposer
git checkout f31220eddca2cf6246ee2ddf8e375a40457ff626

Create symbolic links:

ln -s /root/share/new_models/Shanghai_AI_Laboratory/internlm-xcomposer2-7b /root/models/internlm-xcomposer2-7b
ln -s /root/share/new_models/Shanghai_AI_Laboratory/internlm-xcomposer2-vl-7b /root/models/internlm-xcomposer2-vl-7b

Image-Text Composition Demo

cd /root/demo/InternLM-XComposer
python /root/demo/InternLM-XComposer/examples/gradio_demo_composition.py \
    --code_path /root/models/internlm-xcomposer2-7b \
    --private \
    --num_gpus 1 \
    --port 6006

Visual Question Answering Demo

conda activate demo
cd /root/demo/InternLM-XComposer
python /root/demo/InternLM-XComposer/examples/gradio_demo_chat.py \
    --code_path /root/models/internlm-xcomposer2-vl-7b \
    --private \
    --num_gpus 1 \
    --port 6006

Access the interface to upload images and submit visual queries.

Appendix: Configuration and Utility Commands

Package Manager Mirror Configuration

For pip, configure a mirror source:

pip install -i https://mirrors.cernet.edu.cn/pypi/web/simple some-package
pip config set global.index-url https://mirrors.cernet.edu.cn/pypi/web/simple

For conda, create or modify ~/.condarc:

cat <<'EOF' > ~/.condarc
channels:
  - defaults
show_channel_urls: true
default_channels:
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2
custom_channels:
  conda-forge: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  pytorch: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
EOF

Model Download Methods

Using Hugging Face Hub:

from huggingface_hub import hf_hub_download

hf_hub_download(repo_id="internlm/internlm2-7b", filename="config.json")

Using ModelScope:

from modelscope import snapshot_download

model_dir = snapshot_download(
    'Shanghai_AI_Laboratory/internlm2-chat-7b',
    cache_dir='/path/to/models',
    revision='master'
)

Symbolic Link Management

To remove a symbolic link:

unlink /root/demo/internlm2-chat-7b

Terminal Session Management

When running Gradio applications, properly terminate the process before starting a new demo to avoid GPU memory exhaustion. Close the terminal tab and open a fresh session for subsequent experiments.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.