Home > Tech > Content

Debugging Unexpected LLM Invocations in Dify's Knowledge Retrieval Node

Tech 1

Context

A Dify instance was deployed locally on a GPU server, integrated with Xinference hosting the THUDM/glm-4-9b-chat model. A RAG workflow was built using Dify’s default task flow template and a local knowledge base.

During chat execution, the knowledge retrieval node failed with an error: Model 'gpt-3.5-turbo' not found. Notably:

Knowledge retrieval should not require an LLM at all — it’s expected to be vecter/keyword/BM25-based.
No gpt-3.5-turbo model was registered or available in the environment.

This discrepancy prompted a source-level investigation to trace where and why this model was being resolved.

Root Cause Analysis

The retrieval logic resides in api/core/rag/datasource/retrieval_service.py, supporting four modes: keyword, vector, full-text (BM25), and hybrid. None of these perform LLM inference directly.

Keyword Retrieval

Implemented via Jieba in api/core/rag/datasource/keyword/jieba/jieba.py:

def search(self, query: str, **kwargs: Any) -> List[Document]:
    keyword_table = self._get_dataset_keyword_table()
    top_k = kwargs.get('top_k', 4)
    indices = self._retrieve_ids_by_query(keyword_table, query, top_k)

    documents = []
    for idx in indices:
        segment = db.session.query(DocumentSegment).filter(
            DocumentSegment.dataset_id == self.dataset.id,
            DocumentSegment.index_node_id == idx
        ).first()
        if segment:
            documents.append(Document(
                page_content=segment.content,
                metadata={
                    "doc_id": idx,
                    "doc_hash": segment.index_node_hash,
                    "document_id": segment.document_id,
                    "dataset_id": segment.dataset_id
                }
            ))
    return documents

Purely deterministic keyword matching — no LLM involvement.

Vector Retrieval

Delegates to vector DB clients (e.g., Milvus):

def search_by_vector(self, query_vector: List[float], **kwargs: Any) -> List[Document]:
    results = self._client.search(
        collection_name=self._collection_name,
        data=[query_vector],
        limit=kwargs.get('top_k', 4),
        output_fields=[Field.CONTENT_KEY.value, Field.METADATA_KEY.value]
    )

    docs = []
    for r in results[0]:
        meta = r['entity'].get(Field.METADATA_KEY.value, {})
        meta['score'] = r['distance']
        if r['distance'] > kwargs.get('score_threshold', 0.0):
            docs.append(Document(
                page_content=r['entity'].get(Field.CONTENT_KEY.value, ""),
                metadata=meta
            ))
    return docs

No language model used — only embedding similarity scoring.

Full-Text (BM25) Retrieval

For Qdrent, implemented as a filtered scroll with MatchText:

def search_by_full_text(self, query: str, **kwargs: Any) -> List[Document]:
    from qdrant_client.http import models
    
    filter_expr = models.Filter(must=[
        models.FieldCondition(key="group_id", match=models.MatchValue(value=self._group_id)),
        models.FieldCondition(key="page_content", match=models.MatchText(text=query))
    ])

    response = self._client.scroll(
        collection_name=self._collection_name,
        scroll_filter=filter_expr,
        limit=kwargs.get('top_k', 2),
        with_payload=True,
        with_vectors=True
    )

    return [
        self._document_from_scored_point(r, Field.CONTENT_KEY.value, Field.METADATA_KEY.value)
        for r in response[0] if r
    ]

Again, purely database-native — no LLM.

Multi-Dataset Routing Logic

The real LLM dependency lies not in retrieval per se, but in routing across multiple datasets. In api/core/workflow/nodes/knowledge_retrieval/knowledge_retrieval_node.py, two strategies are supported:

Single-dataset routing (N-to-1): Uses an LLM to classify user intent and select the most relevant dataset based on its description.
Multi-dataset retrieval (multi-retrieval): Queries all configured datasets in parallel and re-ranks results — zero LLM involvement.

The issue arises exclusively in the N-to-1 path. Examining the model resolution logic:

def _fetch_model_config(self, node_data: KnowledgeRetrievalNodeData):
    model_name = node_data.single_retrieval_config.model.name
    provider_name = node_data.single_retrieval_config.model.provider

    model_manager = ModelManager()
    return model_manager.get_model_instance(
        tenant_id=self.tenant_id,
        model_type=ModelType.LLM,
        provider=provider_name,
        model=model_name
    )

This confirms that the LLM is sourced strictly from the node’s configuration — specifically node_data.single_retrieval_config.model.

Inspecting the frontend-provided node config revealed:

"single_retrieval_config": {
  "model": {
    "name": "gpt-3.5-turbo",
    "provider": "openai"
  }
}

That value wasn’t derived dynamically — it was baked in to the default workflow template.

Further inspection confirmed that the UI’s N-to-1 configuration panel explicitly defaults to OpenAI’s gpt-3.5-turbo, even when no OpenAI provider is configured. The /api/workspaces/current/default-model endpoint correctly returns glm-4-9b-chat, but the template ignores it and hardcodes gpt-3.5-turbo for the router.

Resolution Path

The fix is manual: edit the knowledge retrieval node → switch to multi-retrieval mode, or explicitly assign glm-4-9b-chat under the N-to-1 model selector. However, the root friction stems from template design — the out-of-the-box workflow assumes cloud-hosted OpenAI models, making local deployments fragile without explicit configuration overrides.

Tags: Dify RAG llm Debugging knowledge-retrieval

Back to List

Prev: Identifying Gems in a Collection Using Hash Sets

Next: Understanding Python Decorators: A Practical Guide for Test Automation Development

Fading Coder

Debugging Unexpected LLM Invocations in Dify's Knowledge Retrieval Node

Context

Root Cause Analysis

Keyword Retrieval

Vector Retrieval

Full-Text (BM25) Retrieval

Multi-Dataset Routing Logic

Resolution Path

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Debugging Unexpected LLM Invocations in Dify's Knowledge Retrieval Node

Context

Root Cause Analysis

Keyword Retrieval

Vector Retrieval

Full-Text (BM25) Retrieval

Multi-Dataset Routing Logic

Resolution Path

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment