Home > Tech > Content

Fine-tuning ChatGLM3-6B for Named Entity Recognition using LLaMA-Factory

Tech Apr 24 13

Dataset Preparation

The experiment utilizes the LLaMA-Factory framwork to perform named entity recognition (NER) on the ChatGLM3-6B model.

All resources related to this task—datasets, LoRA fine-tuning scripts, deployment configurations, inference code, prediction results, and evaluation metrics—are available on the ModelScope platform.

The output directory contains the trained LoRA weights. You can access the dataset at: https://modelscope.cn/datasets/jieshenai/llm_clue_ner2020/files

Install required dependencies:

git clone https://github.com/hiyouga/LLaMA-Factory.git
# conda create -n llama_factory python=3.10
# conda activate llama_factory
cd LLaMA-Factory
pip install -e .[metrics]

Create a new folder for storing the dataset and script files:

mkdir glm_ner_scripts
cd glm_ner_scripts
git clone https://www.modelscope.cn/datasets/jieshenai/llm_clue_ner2020.git

Dataset Format

This dataset follows a format similar to DeepKE but is adapted for use with LLaMA-Factory for better compatibility and ease of learning.

Sample entry from the dataset:

{
  "instruction": "You are an expert in entity extraction. Extract entities from the input according to the schema definition. Return empty list if no entities found. Respond in JSON string format. Schema: ['address', 'book', 'company', 'game', 'government', 'movie']",
  "input": "Zhejiang Commercial Bank's enterprise credit department Dr. Ye Laogui offers another perspective on the five barriers. Dr. Ye believes that for currently domestic commercial banks,",
  "output": "{\"address\": [], \"book\": [], \"company\": [\"Zhejiang Commercial Bank\"], \"game\": [], \"government\": [], \"movie\": []}"
}

The NER task is reformulated as a sequence-to-sequence generation problem.

Add custom dataset configuration to LLaMA-Factory/data/dataset_info.json:

"llm_ner2_train": {
  "file_name": "../glm_ner_scripts/llm_clue_ner2020/llm_ner_dataset2/train.json",
  "file_sha1": "8dffb2d6e55ef8916f95ff7ccbcfbfe9d6865d12"
}

LoRA Fine-tuning

Execute training via the script train.sh:

CUDA_VISIBLE_DEVICES=0 python ../../src/train_bash.py \
--stage sft \
--do_train \
--model_name_or_path ZhipuAI/chatglm3-6b \
--dataset_dir ../../data \
--dataset llm_ner2_train \
--template chatglm3 \
--finetuning_type lora \
--lora_target query_key_value \
--output_dir ./output/output_train \
--overwrite_cache \
--per_device_train_batch_size 4 \
--gradient_accumulation_steps 4 \
--lr_scheduler_type cosine \
--logging_steps 10 \
--save_strategy epoch \
--learning_rate 5e-5 \
--num_train_epochs 2.0 \
--plot_loss \
--fp16

Key parameters:

dataset_dir: Path to the dataset info file.
dataset: Name of the dataset defined in dataset_info.json.

Training took approximately two hours with ~24GB VRAM.

API Deployment

Deploy the model using the fine-tuned LoRA weights:

CUDA_VISIBLE_DEVICES=0 API_PORT=8000 python ../../src/api_demo.py \
--model_name_or_path ZhipuAI/chatglm3-6b \
--adapter_name_or_path output/output_train/checkpoint-2250 \
--template chatglm3 \
--finetuning_type lora

The trained LoRA weights are located under the output directory in ModelScope.

Use req.ipynb to interact with the deployed API endpoint. You can view the noteobok here: https://modelscope.cn/datasets/jieshenai/llm_clue_ner2020/file/view/master/req.ipynb?status=1

Model Inference

Use llm_ner_dataset2/dev.json instead of test.json, since the latter has incorrect labels.

In req.ipynb, functionality includes:

Sending requests to the model API.
Saving predictions alongside original data into llm_predict2.json.

Note: Currently, one request per line is sent, which limits batch processing capabilities. Future improvements could support batched inputs for faster inference.

Evaluation

Example entry in llm_predict2.json:

{
  "instruction": "{\'instruction\': \'You are an expert in entity extraction. Extract entities from the input according to the schema definition. Return empty list if no entities found. Respond in JSON string format.\', \'schema\': [\'name\', \'organization\', \'position\', \'scene\'], \'input\': \'From African raw material suppliers like Mo Tanbi, some newcomers in investment often fall victim to deliberately hyped-up \'},
  "input": "",
  "output": "{\"name\": [\"Mo Tanbi\"], \"organization\": [], \"position\": [\"raw material supplier\", \"industry expert\"], \"scene\": []}",
  "predict": {"name": ["Mo Tanbi"], "organization": [], "position": ["investor", "expert"], "scene": []}
}

Evaluation metrics compare:

output: Ground truth labels.
predict: Predictions generated by the model.

Use eval2.ipynb for performance assessment. View the notebook here: https://modelscope.cn/datasets/jieshenai/llm_clue_ner2020/file/view/master/eval2.ipynb?status=1

Data Availability

Due to platform policies, datasets may be removed. A backup archive is available for download.

Back to List

Prev: Swing Component Troubleshooting and Advanced Feature Guide

Next: Node.js 5-Minute Guide: Connect to Redis and Execute Read/Write Operations

Fading Coder

Fine-tuning ChatGLM3-6B for Named Entity Recognition using LLaMA-Factory

Dataset Preparation

Dataset Format

LoRA Fine-tuning

API Deployment

Model Inference

Evaluation

Data Availability

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

SBUS Signal Analysis and Communication Implementation Using STM32 with Fus Remote Controller

Leave a Comment

Copyright © fadingcoder.top

Related Articles

https://modelscope.cn/datasets/jieshenai/llm_clue_ner2020/files

https://modelscope.cn/datasets/jieshenai/llm_clue_ner2020/file/view/master/req.ipynb?status=1

https://modelscope.cn/datasets/jieshenai/llm_clue_ner2020/file/view/master/eval2.ipynb?status=1

Fading Coder

Fine-tuning ChatGLM3-6B for Named Entity Recognition using LLaMA-Factory

Dataset Preparation

Dataset Format

LoRA Fine-tuning

API Deployment

Model Inference

Evaluation

Data Availability

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

SBUS Signal Analysis and Communication Implementation Using STM32 with Fus Remote Controller

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment