Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Fine-tuning ChatGLM3-6B for Named Entity Recognition using LLaMA-Factory

Tech 3

Dataset Preparation

The experiment utilizes the LLaMA-Factory framwork to perform named entity recognition (NER) on the ChatGLM3-6B model.

All resources related to this task—datasets, LoRA fine-tuning scripts, deployment configurations, inference code, prediction results, and evaluation metrics—are available on the ModelScope platform.

The output directory contains the trained LoRA weights. You can access the dataset at: https://modelscope.cn/datasets/jieshenai/llm_clue_ner2020/files

Install required dependencies:

git clone https://github.com/hiyouga/LLaMA-Factory.git
# conda create -n llama_factory python=3.10
# conda activate llama_factory
cd LLaMA-Factory
pip install -e .[metrics]

Create a new folder for storing the dataset and script files:

mkdir glm_ner_scripts
cd glm_ner_scripts
git clone https://www.modelscope.cn/datasets/jieshenai/llm_clue_ner2020.git

Dataset Format

This dataset follows a format similar to DeepKE but is adapted for use with LLaMA-Factory for better compatibility and ease of learning.

Sample entry from the dataset:

{
  "instruction": "You are an expert in entity extraction. Extract entities from the input according to the schema definition. Return empty list if no entities found. Respond in JSON string format. Schema: ['address', 'book', 'company', 'game', 'government', 'movie']",
  "input": "Zhejiang Commercial Bank's enterprise credit department Dr. Ye Laogui offers another perspective on the five barriers. Dr. Ye believes that for currently domestic commercial banks,",
  "output": "{\"address\": [], \"book\": [], \"company\": [\"Zhejiang Commercial Bank\"], \"game\": [], \"government\": [], \"movie\": []}"
}

The NER task is reformulated as a sequence-to-sequence generation problem.

Add custom dataset configuration to LLaMA-Factory/data/dataset_info.json:

"llm_ner2_train": {
  "file_name": "../glm_ner_scripts/llm_clue_ner2020/llm_ner_dataset2/train.json",
  "file_sha1": "8dffb2d6e55ef8916f95ff7ccbcfbfe9d6865d12"
}

LoRA Fine-tuning

Execute training via the script train.sh:

CUDA_VISIBLE_DEVICES=0 python ../../src/train_bash.py \
--stage sft \
--do_train \
--model_name_or_path ZhipuAI/chatglm3-6b \
--dataset_dir ../../data \
--dataset llm_ner2_train \
--template chatglm3 \
--finetuning_type lora \
--lora_target query_key_value \
--output_dir ./output/output_train \
--overwrite_cache \
--per_device_train_batch_size 4 \
--gradient_accumulation_steps 4 \
--lr_scheduler_type cosine \
--logging_steps 10 \
--save_strategy epoch \
--learning_rate 5e-5 \
--num_train_epochs 2.0 \
--plot_loss \
--fp16

Key parameters:

  • dataset_dir: Path to the dataset info file.
  • dataset: Name of the dataset defined in dataset_info.json.

Training took approximately two hours with ~24GB VRAM.

API Deployment

Deploy the model using the fine-tuned LoRA weights:

CUDA_VISIBLE_DEVICES=0 API_PORT=8000 python ../../src/api_demo.py \
--model_name_or_path ZhipuAI/chatglm3-6b \
--adapter_name_or_path output/output_train/checkpoint-2250 \
--template chatglm3 \
--finetuning_type lora

The trained LoRA weights are located under the output directory in ModelScope.

Use req.ipynb to interact with the deployed API endpoint. You can view the noteobok here: https://modelscope.cn/datasets/jieshenai/llm_clue_ner2020/file/view/master/req.ipynb?status=1

Model Inference

Use llm_ner_dataset2/dev.json instead of test.json, since the latter has incorrect labels.

In req.ipynb, functionality includes:

  • Sending requests to the model API.
  • Saving predictions alongside original data into llm_predict2.json.

Note: Currently, one request per line is sent, which limits batch processing capabilities. Future improvements could support batched inputs for faster inference.

Evaluation

Example entry in llm_predict2.json:

{
  "instruction": "{\'instruction\': \'You are an expert in entity extraction. Extract entities from the input according to the schema definition. Return empty list if no entities found. Respond in JSON string format.\', \'schema\': [\'name\', \'organization\', \'position\', \'scene\'], \'input\': \'From African raw material suppliers like Mo Tanbi, some newcomers in investment often fall victim to deliberately hyped-up \'},
  "input": "",
  "output": "{\"name\": [\"Mo Tanbi\"], \"organization\": [], \"position\": [\"raw material supplier\", \"industry expert\"], \"scene\": []}",
  "predict": {"name": ["Mo Tanbi"], "organization": [], "position": ["investor", "expert"], "scene": []}
}

Evaluation metrics compare:

  • output: Ground truth labels.
  • predict: Predictions generated by the model.

Use eval2.ipynb for performance assessment. View the notebook here: https://modelscope.cn/datasets/jieshenai/llm_clue_ner2020/file/view/master/eval2.ipynb?status=1

Data Availability

Due to platform policies, datasets may be removed. A backup archive is available for download.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.