Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Integrating and Deploying LORA Fine-Tuned ChatGLM3-6B Models Locally

Tech 3

Local Model Deployment

First, acquire the ChatGLM3-6B model. It can be downloaded from its GitHub repository or the Hugging Face Hub. After downloading, set up a Python virtual environment and install the necessary dependencies.

LORA Fine-Tuning

Dataset Preparation

Construct a dataset in a conversation format suitable for instruction fine-tuning. The following script demonstrates converting data from an Excel file into the required JSON structure.

import pandas as pd
import json

# Load your dataset
df = pd.read_excel('path/to/your/data.xlsx')

conversation_list = []

# Convert each row into a conversation pair
for _, row in df.iterrows():
    conv_entry = {
        'conversations': [
            {'role': 'user', 'content': row['user_input']},
            {'role': 'assistant', 'content': row['assistant_response']}
        ]
    }
    conversation_list.append(conv_entry)

# Save to a JSON file
with open('training_data.json', 'w', encoding='utf-8') as f:
    for entry in conversation_list:
        json.dump(entry, f, ensure_ascii=False)
        f.write('\n')

print("Dataset saved to 'training_data.json'.")

Executing the Fine-Tuning

Activate your configured environment and run the fine-tuning script. Parameters are typical adjusted in a configuration file like lora_config.yaml. Execute the command:

python finetune_hf.py [dataset_path] [base_model_path] [lora_output_dir]

Testing the Fine-Tuned Model

To test a checkpoint, use the inference script:

python inference_hf.py output/checkpoint-3000/ --prompt "Introduce yourself."

Merging the LORA Adapter with the Base Model

After fine-tuning, integrate the LORA adapter weights back into the original model to create a single, deployable model.

import torch
from peft import PeftModel
from transformers import AutoModel

# Paths
base_model_path = 'path/to/original/chatglm3-6b'
lora_adapter_path = 'path/to/lora/output'
merged_model_save_path = 'path/to/save/merged_model'

# Load the base model
base_model = AutoModel.from_pretrained(base_model_path, trust_remote_code=True).cuda()

# Load the LORA adapter
lora_model = PeftModel.from_pretrained(base_model, lora_adapter_path, torch_dtype=torch.float16)
lora_model = lora_model.to("cpu")

# Merge weights
merged_model = lora_model.merge_and_unload()

# Save the merged model
merged_model.save_pretrained(
    merged_model_save_path,
    max_shard_size="2GB",
    safe_serialization=True
)

print(f"Merged model saved to {merged_model_save_path}")

Important: After merging, copy the essential tokenizer files (e.g., tokenizer.json, tokenizer_config.json) from the original ChatGLM3-6B directory to the new merged model directory. This prevents tokenizer-related errors during loading.

Loading and Testing the Merged Model

You can now load the merged model like any standard Hugging Face model for inference.

from transformers import AutoTokenizer, AutoModel
import torch

model_path = 'path/to/merged_model'

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModel.from_pretrained(model_path, trust_remote_code=True).cuda()
model.eval()

# Create a prompt
user_query = "Explain quantum computing in simple terms."

# Generate a response
reply, _ = model.chat(tokenizer, user_query, history=[])
print(f"Assistant: {reply}")

Troubleshooting a Common Error

Error: IndexError: index out of range in self

Cause: This often occurs when the evaluation dataset size defined in the script exceeds the actual number of samples in your validation set.

Solution: Adjust the range in the dataset selection to match your available data.

# Original problematic line if dataset has less than 50 samples
eval_dataset = val_dataset.select(list(range(50)))

# Corrected line for a very small validation set
eval_dataset = val_dataset.select(list(range(min(1, len(val_dataset)))))

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.