Home > Tech > Content

Integrating and Deploying LORA Fine-Tuned ChatGLM3-6B Models Locally

Tech Apr 14 21

Local Model Deployment

First, acquire the ChatGLM3-6B model. It can be downloaded from its GitHub repository or the Hugging Face Hub. After downloading, set up a Python virtual environment and install the necessary dependencies.

LORA Fine-Tuning

Dataset Preparation

Construct a dataset in a conversation format suitable for instruction fine-tuning. The following script demonstrates converting data from an Excel file into the required JSON structure.

import pandas as pd
import json

# Load your dataset
df = pd.read_excel('path/to/your/data.xlsx')

conversation_list = []

# Convert each row into a conversation pair
for _, row in df.iterrows():
    conv_entry = {
        'conversations': [
            {'role': 'user', 'content': row['user_input']},
            {'role': 'assistant', 'content': row['assistant_response']}
        ]
    }
    conversation_list.append(conv_entry)

# Save to a JSON file
with open('training_data.json', 'w', encoding='utf-8') as f:
    for entry in conversation_list:
        json.dump(entry, f, ensure_ascii=False)
        f.write('\n')

print("Dataset saved to 'training_data.json'.")

Executing the Fine-Tuning

Activate your configured environment and run the fine-tuning script. Parameters are typical adjusted in a configuration file like lora_config.yaml. Execute the command:

python finetune_hf.py [dataset_path] [base_model_path] [lora_output_dir]

Testing the Fine-Tuned Model

To test a checkpoint, use the inference script:

python inference_hf.py output/checkpoint-3000/ --prompt "Introduce yourself."

Merging the LORA Adapter with the Base Model

After fine-tuning, integrate the LORA adapter weights back into the original model to create a single, deployable model.

import torch
from peft import PeftModel
from transformers import AutoModel

# Paths
base_model_path = 'path/to/original/chatglm3-6b'
lora_adapter_path = 'path/to/lora/output'
merged_model_save_path = 'path/to/save/merged_model'

# Load the base model
base_model = AutoModel.from_pretrained(base_model_path, trust_remote_code=True).cuda()

# Load the LORA adapter
lora_model = PeftModel.from_pretrained(base_model, lora_adapter_path, torch_dtype=torch.float16)
lora_model = lora_model.to("cpu")

# Merge weights
merged_model = lora_model.merge_and_unload()

# Save the merged model
merged_model.save_pretrained(
    merged_model_save_path,
    max_shard_size="2GB",
    safe_serialization=True
)

print(f"Merged model saved to {merged_model_save_path}")

Important: After merging, copy the essential tokenizer files (e.g., tokenizer.json, tokenizer_config.json) from the original ChatGLM3-6B directory to the new merged model directory. This prevents tokenizer-related errors during loading.

Loading and Testing the Merged Model

You can now load the merged model like any standard Hugging Face model for inference.

from transformers import AutoTokenizer, AutoModel
import torch

model_path = 'path/to/merged_model'

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModel.from_pretrained(model_path, trust_remote_code=True).cuda()
model.eval()

# Create a prompt
user_query = "Explain quantum computing in simple terms."

# Generate a response
reply, _ = model.chat(tokenizer, user_query, history=[])
print(f"Assistant: {reply}")

Troubleshooting a Common Error

Error: IndexError: index out of range in self

Cause: This often occurs when the evaluation dataset size defined in the script exceeds the actual number of samples in your validation set.

Solution: Adjust the range in the dataset selection to match your available data.

# Original problematic line if dataset has less than 50 samples
eval_dataset = val_dataset.select(list(range(50)))

# Corrected line for a very small validation set
eval_dataset = val_dataset.select(list(range(min(1, len(val_dataset)))))

Tags: ChatGLM3 LORA Fine-Tuning Model Merging

Back to List

Prev: Extracting Text-Based Jokes from Qiushibaike with Python

Next: Mounting and Using External Storage Devices on Linux Servers

Fading Coder

Integrating and Deploying LORA Fine-Tuned ChatGLM3-6B Models Locally

Local Model Deployment

LORA Fine-Tuning

Dataset Preparation

Executing the Fine-Tuning

Testing the Fine-Tuned Model

Merging the LORA Adapter with the Base Model

Loading and Testing the Merged Model

Troubleshooting a Common Error

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Integrating and Deploying LORA Fine-Tuned ChatGLM3-6B Models Locally

Local Model Deployment

LORA Fine-Tuning

Dataset Preparation

Executing the Fine-Tuning

Testing the Fine-Tuned Model

Merging the LORA Adapter with the Base Model

Loading and Testing the Merged Model

Troubleshooting a Common Error

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment