Integrating and Deploying LORA Fine-Tuned ChatGLM3-6B Models Locally
Local Model Deployment
First, acquire the ChatGLM3-6B model. It can be downloaded from its GitHub repository or the Hugging Face Hub. After downloading, set up a Python virtual environment and install the necessary dependencies.
LORA Fine-Tuning
Dataset Preparation
Construct a dataset in a conversation format suitable for instruction fine-tuning. The following script demonstrates converting data from an Excel file into the required JSON structure.
import pandas as pd
import json
# Load your dataset
df = pd.read_excel('path/to/your/data.xlsx')
conversation_list = []
# Convert each row into a conversation pair
for _, row in df.iterrows():
conv_entry = {
'conversations': [
{'role': 'user', 'content': row['user_input']},
{'role': 'assistant', 'content': row['assistant_response']}
]
}
conversation_list.append(conv_entry)
# Save to a JSON file
with open('training_data.json', 'w', encoding='utf-8') as f:
for entry in conversation_list:
json.dump(entry, f, ensure_ascii=False)
f.write('\n')
print("Dataset saved to 'training_data.json'.")
Executing the Fine-Tuning
Activate your configured environment and run the fine-tuning script. Parameters are typical adjusted in a configuration file like lora_config.yaml. Execute the command:
python finetune_hf.py [dataset_path] [base_model_path] [lora_output_dir]
Testing the Fine-Tuned Model
To test a checkpoint, use the inference script:
python inference_hf.py output/checkpoint-3000/ --prompt "Introduce yourself."
Merging the LORA Adapter with the Base Model
After fine-tuning, integrate the LORA adapter weights back into the original model to create a single, deployable model.
import torch
from peft import PeftModel
from transformers import AutoModel
# Paths
base_model_path = 'path/to/original/chatglm3-6b'
lora_adapter_path = 'path/to/lora/output'
merged_model_save_path = 'path/to/save/merged_model'
# Load the base model
base_model = AutoModel.from_pretrained(base_model_path, trust_remote_code=True).cuda()
# Load the LORA adapter
lora_model = PeftModel.from_pretrained(base_model, lora_adapter_path, torch_dtype=torch.float16)
lora_model = lora_model.to("cpu")
# Merge weights
merged_model = lora_model.merge_and_unload()
# Save the merged model
merged_model.save_pretrained(
merged_model_save_path,
max_shard_size="2GB",
safe_serialization=True
)
print(f"Merged model saved to {merged_model_save_path}")
Important: After merging, copy the essential tokenizer files (e.g., tokenizer.json, tokenizer_config.json) from the original ChatGLM3-6B directory to the new merged model directory. This prevents tokenizer-related errors during loading.
Loading and Testing the Merged Model
You can now load the merged model like any standard Hugging Face model for inference.
from transformers import AutoTokenizer, AutoModel
import torch
model_path = 'path/to/merged_model'
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModel.from_pretrained(model_path, trust_remote_code=True).cuda()
model.eval()
# Create a prompt
user_query = "Explain quantum computing in simple terms."
# Generate a response
reply, _ = model.chat(tokenizer, user_query, history=[])
print(f"Assistant: {reply}")
Troubleshooting a Common Error
Error: IndexError: index out of range in self
Cause: This often occurs when the evaluation dataset size defined in the script exceeds the actual number of samples in your validation set.
Solution: Adjust the range in the dataset selection to match your available data.
# Original problematic line if dataset has less than 50 samples
eval_dataset = val_dataset.select(list(range(50)))
# Corrected line for a very small validation set
eval_dataset = val_dataset.select(list(range(min(1, len(val_dataset)))))