Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Implementing a RAG-Based Assistant with InternLM and Huixiangdou

Tech 1

Overview of Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation integrates an external knowledge base with a large language model to improve response quality. This approach modifies the model's generation process by first retrieving relevant information from a pre-constructed corpus. This transforms the model's task from a closed-book to a open-book scenario, where the knowledge base serves as the reference material.

Key advantages of RAG include:

  • Mitigating Hallucinations: By providing factual context, RAG increases the likelihood of generating accurate and relevant tokens.
  • Addressing Timeliness: Static model training data can become outdated. RAG allows for the integration of recent information, such as news, into the knowledge base.
  • Enhancing Data Security: Organizations can leverage private, internal documents by building a local knowledge base, avoiding the need to share sensitive data with external models or incurring the high cost of full model fine-tuning.

The RAG Pipeline

The workflow consists of three primary stages:

  1. Indexing: This stage builds the knowledge base. Source documents are segmented in to chunks, which are then encoded into vector representations (embeddings) using an embedding model. These vectors are stored in a vector database.
  2. Retrieval: Upon receiving a user query, the same embedding model encodes the query into a vector. This query vector is compared against the stored knowledge vectors to find the most semantically similar entries. A reranker model can further refine the results by scoring and selecting the top-K most relevant knowledge chunks.
  3. Generation: The LLM generates a response based on both the original user query and the retrieved context, leading to more informed and accurate outputs.

RAG Optimizations

  • Indexing Optimization: Techniques include improving embedding quality and optimizing the vector index structure.
  • Retrieval Optimization: Methods involve query expansion, context window management, iterative/recursive retrieval, and adaptive retrieval strategies.
  • Generation Optimization: Fine-tuning the underlying LLM for better integration of retrieved context.

RAG vs. Supervised Fine-Tuning (SFT)

Aspect RAG SFT
Characteristics Non-parametric memory; knowledge is updated via the external database. Excels at knowledge-intensive tasks. Generates diverse content through retrieval. Parametric memory; adapts to specific tasks via training on labeled data. Requires substantial labeled datasets. May suffer from catastrophic forgetting of prior knowledge.
Use Cases Tasks requiring up-to-date information or involving sensitive, proprietary data. Scenarios with abundant, high-quality labeled data and a need for peak model performance on a defined task.
Limitations Performance heavily depends on the quality and coverage of the knowledge base and the base LLM's capabilities. High cost due to need for data labeling and model retraining for each new task.

Deploying the Huixiangdou Assistant on InternLM Studio

Environment Setup

Activate the base environment and create a new conda environment for the project.

studio-conda -o internlm-base -t InternLM2_Huixiangdou

Create symbolic links to the embedding and reranker models, as well as the LLM weights.

ln -s /root/share/new_models/maidalun1020/bce-embedding-base_v1 /root/models/bce-embedding-base_v1
ln -s /root/share/new_models/maidalun1020/bce-reranker-base_v1 /root/models/bce-reranker-base_v1
ln -s /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-7b /root/models/internlm2-chat-7b

Install the required Python packages.

%pip3 install protobuf==4.25.3 accelerate==0.28.0 aiohttp==3.9.3 auto-gptq==0.7.1 bcembedding==0.1.3 beautifulsoup4==4.8.2 einops==0.7.0 faiss-gpu==1.7.2 langchain==0.1.14 loguru==0.7.2 lxml_html_clean==0.1.0 openai==1.16.1 openpyxl==3.1.2 pandas pydantic==2.6.4 pymupdf==1.24.1 python-docx==1.1.0 pytoml==0.1.21 readability-lxml==0.8.1 redis==5.0.3 requests==2.31.0 scikit-learn==1.4.1.post1 sentence_transformers==2.2.2 textract==1.6.5 tiktoken==0.6.0 transformers==4.39.3 transformers_stream_generator==0.0.5 unstructured==0.11.2

Clone the Huixiangdou repository.

cd /root
git clone https://github.com/internlm/huixiangdou
cd huixiangdou
git checkout 447c6f7e68a1657fce1c4f7c740ea1700bde0440

Configuration

Update the configuration file with the paths to the local models.

sed -i '6s#.*#embedding_model_path = "/root/models/bce-embedding-base_v1"#' /root/huixiangdou/config.ini
sed -i '7s#.*#reranker_model_path = "/root/models/bce-reranker-base_v1"#' /root/huixiangdou/config.ini
sed -i '29s#.*#local_llm_path = "/root/models/internlm2-chat-7b"#' /root/huixiangdou/config.ini

Prepare Knowledge Base and Query Lists

Clone the project's own repository to use as sample knowledge data.

cd /root/huixiangdou
mkdir repodir
git clone https://github.com/internlm/huixiangdou --depth=1 repodir/huixiangdou

Create a custom list of acceptable queries (good_questions.json).

import json

accepted_queries = [
    "What is the purpose of Huixiangdou?",
    "How can Huixiangdou be deployed in a WeChat group?",
    "What are the supported large models?",
    "How do I configure the config.ini file?",
    "What are the application scenarios?"
]

with open('/root/huixiangdou/resource/good_questions.json', 'w') as f:
    json.dump(accepted_queries, f, ensure_ascii=False, indent=2)

Create a test query file to verify the assistant's behavior.

test_inputs = [
    "This is just random text for rejection test.",
    "What large models does Huixiangdou support?",
    "How to integrate multimodal models with RAG?",
    "Where is Shanghai AI Lab located?"
]

with open('/root/huixiangdou/test_queries.json', 'w') as f:
    json.dump(test_inputs, f, ensure_ascii=False, indent=2)

Build the Vector Database

Run the feature store service to process the knowledge corpus and query lists into the vector database.

cd /root/huixiangdou
mkdir workdir
python3 -m huixiangdou.service.feature_store --sample ./test_queries.json

Run the Assistant

Modify the main script to test specific queries.

sed -i "74s/.*/    queries = [\"What is Huixiangdou?\", \"How to deploy to WeChat\", \"Shanghai AI Lab address\"]/" /root/huixiangdou/huixiangdou/main.py

Execute the assistant in standalone mode.

cd /root/huixiangdou/
python3 -m huixiangdou.main --standalone

The execution log typically reveals a multi-step process: 1) Query classification, 2) Topic summarization for refined retrieval, 3) Context retrieval based on the topic, 4) Relevance scoring and re-ranking of retrieved chunks, 5) Final answer generation using both query and context.

Building a Custom Knowledge Q&A Assistant

Local Deployment with Custom Documents

Clone the repository and install dependencies in a local environment.

import os
os.chdir('/root/custom_assistant')
!git clone https://github.com/InternLM/HuixiangDou.git
os.chdir('HuixiangDou')
!git checkout 447c6f7e68a1657fce1c4f7c740ea1700bde0440
!pip install -r requirements.txt  # Assuming a requirements file exists

Download the embedding and reranker model weights.

import os
os.environ['HF_HUB_ENABLE_HF_TRANSFER'] = '1'
# Set proxies if needed, e.g., os.environ['http_proxy']='http://proxy:port'
!huggingface-cli download --resume-download maidalun1020/bce-embedding-base_v1 --local-dir ./models/bce-embedding-base_v1
!huggingface-cli download --resume-download maidalun1020/bce-reranker-base_v1 --local-dir ./models/bce-reranker-base_v1

Update the config.ini file to point to the local model directories.

Create a Custom Knowledge Base

Place your domain-specific documents (e.g., server usage manuals) in a dedicated directory, such as ./knowledge_docs. Create a new good_questions.json file containing example queries related to your domain. Build the vector database using the custom knowledge source.

python -m huixiangdou.service.feature_store --repo_dir ./knowledge_docs \
    --good_questions ./resource/good_questions.json \
    --sample ./test_queries.json

Launch the Web Interface

Install Gradio and launch the web service.

!pip install gradio==4.25.0
!python -m huixiangdou.web.gradio_server

Access the provided local URL to interact with your custom RAG assistant.

Tags: RAG

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.