Implementing a RAG-Based Assistant with InternLM and Huixiangdou
Overview of Retrieval-Augmented Generation (RAG)
Retrieval-Augmented Generation integrates an external knowledge base with a large language model to improve response quality. This approach modifies the model's generation process by first retrieving relevant information from a pre-constructed corpus. This transforms the model's task from a closed-book to a open-book scenario, where the knowledge base serves as the reference material.
Key advantages of RAG include:
- Mitigating Hallucinations: By providing factual context, RAG increases the likelihood of generating accurate and relevant tokens.
- Addressing Timeliness: Static model training data can become outdated. RAG allows for the integration of recent information, such as news, into the knowledge base.
- Enhancing Data Security: Organizations can leverage private, internal documents by building a local knowledge base, avoiding the need to share sensitive data with external models or incurring the high cost of full model fine-tuning.
The RAG Pipeline
The workflow consists of three primary stages:
- Indexing: This stage builds the knowledge base. Source documents are segmented in to chunks, which are then encoded into vector representations (embeddings) using an embedding model. These vectors are stored in a vector database.
- Retrieval: Upon receiving a user query, the same embedding model encodes the query into a vector. This query vector is compared against the stored knowledge vectors to find the most semantically similar entries. A reranker model can further refine the results by scoring and selecting the top-K most relevant knowledge chunks.
- Generation: The LLM generates a response based on both the original user query and the retrieved context, leading to more informed and accurate outputs.
RAG Optimizations
- Indexing Optimization: Techniques include improving embedding quality and optimizing the vector index structure.
- Retrieval Optimization: Methods involve query expansion, context window management, iterative/recursive retrieval, and adaptive retrieval strategies.
- Generation Optimization: Fine-tuning the underlying LLM for better integration of retrieved context.
RAG vs. Supervised Fine-Tuning (SFT)
| Aspect | RAG | SFT |
|---|---|---|
| Characteristics | Non-parametric memory; knowledge is updated via the external database. Excels at knowledge-intensive tasks. Generates diverse content through retrieval. | Parametric memory; adapts to specific tasks via training on labeled data. Requires substantial labeled datasets. May suffer from catastrophic forgetting of prior knowledge. |
| Use Cases | Tasks requiring up-to-date information or involving sensitive, proprietary data. | Scenarios with abundant, high-quality labeled data and a need for peak model performance on a defined task. |
| Limitations | Performance heavily depends on the quality and coverage of the knowledge base and the base LLM's capabilities. | High cost due to need for data labeling and model retraining for each new task. |
Deploying the Huixiangdou Assistant on InternLM Studio
Environment Setup
Activate the base environment and create a new conda environment for the project.
studio-conda -o internlm-base -t InternLM2_Huixiangdou
Create symbolic links to the embedding and reranker models, as well as the LLM weights.
ln -s /root/share/new_models/maidalun1020/bce-embedding-base_v1 /root/models/bce-embedding-base_v1
ln -s /root/share/new_models/maidalun1020/bce-reranker-base_v1 /root/models/bce-reranker-base_v1
ln -s /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-7b /root/models/internlm2-chat-7b
Install the required Python packages.
%pip3 install protobuf==4.25.3 accelerate==0.28.0 aiohttp==3.9.3 auto-gptq==0.7.1 bcembedding==0.1.3 beautifulsoup4==4.8.2 einops==0.7.0 faiss-gpu==1.7.2 langchain==0.1.14 loguru==0.7.2 lxml_html_clean==0.1.0 openai==1.16.1 openpyxl==3.1.2 pandas pydantic==2.6.4 pymupdf==1.24.1 python-docx==1.1.0 pytoml==0.1.21 readability-lxml==0.8.1 redis==5.0.3 requests==2.31.0 scikit-learn==1.4.1.post1 sentence_transformers==2.2.2 textract==1.6.5 tiktoken==0.6.0 transformers==4.39.3 transformers_stream_generator==0.0.5 unstructured==0.11.2
Clone the Huixiangdou repository.
cd /root
git clone https://github.com/internlm/huixiangdou
cd huixiangdou
git checkout 447c6f7e68a1657fce1c4f7c740ea1700bde0440
Configuration
Update the configuration file with the paths to the local models.
sed -i '6s#.*#embedding_model_path = "/root/models/bce-embedding-base_v1"#' /root/huixiangdou/config.ini
sed -i '7s#.*#reranker_model_path = "/root/models/bce-reranker-base_v1"#' /root/huixiangdou/config.ini
sed -i '29s#.*#local_llm_path = "/root/models/internlm2-chat-7b"#' /root/huixiangdou/config.ini
Prepare Knowledge Base and Query Lists
Clone the project's own repository to use as sample knowledge data.
cd /root/huixiangdou
mkdir repodir
git clone https://github.com/internlm/huixiangdou --depth=1 repodir/huixiangdou
Create a custom list of acceptable queries (good_questions.json).
import json
accepted_queries = [
"What is the purpose of Huixiangdou?",
"How can Huixiangdou be deployed in a WeChat group?",
"What are the supported large models?",
"How do I configure the config.ini file?",
"What are the application scenarios?"
]
with open('/root/huixiangdou/resource/good_questions.json', 'w') as f:
json.dump(accepted_queries, f, ensure_ascii=False, indent=2)
Create a test query file to verify the assistant's behavior.
test_inputs = [
"This is just random text for rejection test.",
"What large models does Huixiangdou support?",
"How to integrate multimodal models with RAG?",
"Where is Shanghai AI Lab located?"
]
with open('/root/huixiangdou/test_queries.json', 'w') as f:
json.dump(test_inputs, f, ensure_ascii=False, indent=2)
Build the Vector Database
Run the feature store service to process the knowledge corpus and query lists into the vector database.
cd /root/huixiangdou
mkdir workdir
python3 -m huixiangdou.service.feature_store --sample ./test_queries.json
Run the Assistant
Modify the main script to test specific queries.
sed -i "74s/.*/ queries = [\"What is Huixiangdou?\", \"How to deploy to WeChat\", \"Shanghai AI Lab address\"]/" /root/huixiangdou/huixiangdou/main.py
Execute the assistant in standalone mode.
cd /root/huixiangdou/
python3 -m huixiangdou.main --standalone
The execution log typically reveals a multi-step process: 1) Query classification, 2) Topic summarization for refined retrieval, 3) Context retrieval based on the topic, 4) Relevance scoring and re-ranking of retrieved chunks, 5) Final answer generation using both query and context.
Building a Custom Knowledge Q&A Assistant
Local Deployment with Custom Documents
Clone the repository and install dependencies in a local environment.
import os
os.chdir('/root/custom_assistant')
!git clone https://github.com/InternLM/HuixiangDou.git
os.chdir('HuixiangDou')
!git checkout 447c6f7e68a1657fce1c4f7c740ea1700bde0440
!pip install -r requirements.txt # Assuming a requirements file exists
Download the embedding and reranker model weights.
import os
os.environ['HF_HUB_ENABLE_HF_TRANSFER'] = '1'
# Set proxies if needed, e.g., os.environ['http_proxy']='http://proxy:port'
!huggingface-cli download --resume-download maidalun1020/bce-embedding-base_v1 --local-dir ./models/bce-embedding-base_v1
!huggingface-cli download --resume-download maidalun1020/bce-reranker-base_v1 --local-dir ./models/bce-reranker-base_v1
Update the config.ini file to point to the local model directories.
Create a Custom Knowledge Base
Place your domain-specific documents (e.g., server usage manuals) in a dedicated directory, such as ./knowledge_docs.
Create a new good_questions.json file containing example queries related to your domain.
Build the vector database using the custom knowledge source.
python -m huixiangdou.service.feature_store --repo_dir ./knowledge_docs \
--good_questions ./resource/good_questions.json \
--sample ./test_queries.json
Launch the Web Interface
Install Gradio and launch the web service.
!pip install gradio==4.25.0
!python -m huixiangdou.web.gradio_server
Access the provided local URL to interact with your custom RAG assistant.