Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Debugging Unexpected LLM Invocations in Dify's Knowledge Retrieval Node

Tech 1

Context

A Dify instance was deployed locally on a GPU server, integrated with Xinference hosting the THUDM/glm-4-9b-chat model. A RAG workflow was built using Dify’s default task flow template and a local knowledge base.

During chat execution, the knowledge retrieval node failed with an error: Model 'gpt-3.5-turbo' not found. Notably:

  1. Knowledge retrieval should not require an LLM at all — it’s expected to be vecter/keyword/BM25-based.
  2. No gpt-3.5-turbo model was registered or available in the environment.

This discrepancy prompted a source-level investigation to trace where and why this model was being resolved.

Root Cause Analysis

The retrieval logic resides in api/core/rag/datasource/retrieval_service.py, supporting four modes: keyword, vector, full-text (BM25), and hybrid. None of these perform LLM inference directly.

Keyword Retrieval

Implemented via Jieba in api/core/rag/datasource/keyword/jieba/jieba.py:

def search(self, query: str, **kwargs: Any) -> List[Document]:
    keyword_table = self._get_dataset_keyword_table()
    top_k = kwargs.get('top_k', 4)
    indices = self._retrieve_ids_by_query(keyword_table, query, top_k)

    documents = []
    for idx in indices:
        segment = db.session.query(DocumentSegment).filter(
            DocumentSegment.dataset_id == self.dataset.id,
            DocumentSegment.index_node_id == idx
        ).first()
        if segment:
            documents.append(Document(
                page_content=segment.content,
                metadata={
                    "doc_id": idx,
                    "doc_hash": segment.index_node_hash,
                    "document_id": segment.document_id,
                    "dataset_id": segment.dataset_id
                }
            ))
    return documents

Purely deterministic keyword matching — no LLM involvement.

Vector Retrieval

Delegates to vector DB clients (e.g., Milvus):

def search_by_vector(self, query_vector: List[float], **kwargs: Any) -> List[Document]:
    results = self._client.search(
        collection_name=self._collection_name,
        data=[query_vector],
        limit=kwargs.get('top_k', 4),
        output_fields=[Field.CONTENT_KEY.value, Field.METADATA_KEY.value]
    )

    docs = []
    for r in results[0]:
        meta = r['entity'].get(Field.METADATA_KEY.value, {})
        meta['score'] = r['distance']
        if r['distance'] > kwargs.get('score_threshold', 0.0):
            docs.append(Document(
                page_content=r['entity'].get(Field.CONTENT_KEY.value, ""),
                metadata=meta
            ))
    return docs

No language model used — only embedding similarity scoring.

Full-Text (BM25) Retrieval

For Qdrent, implemented as a filtered scroll with MatchText:

def search_by_full_text(self, query: str, **kwargs: Any) -> List[Document]:
    from qdrant_client.http import models
    
    filter_expr = models.Filter(must=[
        models.FieldCondition(key="group_id", match=models.MatchValue(value=self._group_id)),
        models.FieldCondition(key="page_content", match=models.MatchText(text=query))
    ])

    response = self._client.scroll(
        collection_name=self._collection_name,
        scroll_filter=filter_expr,
        limit=kwargs.get('top_k', 2),
        with_payload=True,
        with_vectors=True
    )

    return [
        self._document_from_scored_point(r, Field.CONTENT_KEY.value, Field.METADATA_KEY.value)
        for r in response[0] if r
    ]

Again, purely database-native — no LLM.

Multi-Dataset Routing Logic

The real LLM dependency lies not in retrieval per se, but in routing across multiple datasets. In api/core/workflow/nodes/knowledge_retrieval/knowledge_retrieval_node.py, two strategies are supported:

  • Single-dataset routing (N-to-1): Uses an LLM to classify user intent and select the most relevant dataset based on its description.
  • Multi-dataset retrieval (multi-retrieval): Queries all configured datasets in parallel and re-ranks results — zero LLM involvement.

The issue arises exclusively in the N-to-1 path. Examining the model resolution logic:

def _fetch_model_config(self, node_data: KnowledgeRetrievalNodeData):
    model_name = node_data.single_retrieval_config.model.name
    provider_name = node_data.single_retrieval_config.model.provider

    model_manager = ModelManager()
    return model_manager.get_model_instance(
        tenant_id=self.tenant_id,
        model_type=ModelType.LLM,
        provider=provider_name,
        model=model_name
    )

This confirms that the LLM is sourced strictly from the node’s configuration — specifically node_data.single_retrieval_config.model.

Inspecting the frontend-provided node config revealed:

"single_retrieval_config": {
  "model": {
    "name": "gpt-3.5-turbo",
    "provider": "openai"
  }
}

That value wasn’t derived dynamically — it was baked in to the default workflow template.

Further inspection confirmed that the UI’s N-to-1 configuration panel explicitly defaults to OpenAI’s gpt-3.5-turbo, even when no OpenAI provider is configured. The /api/workspaces/current/default-model endpoint correctly returns glm-4-9b-chat, but the template ignores it and hardcodes gpt-3.5-turbo for the router.

Resolution Path

The fix is manual: edit the knowledge retrieval node → switch to multi-retrieval mode, or explicitly assign glm-4-9b-chat under the N-to-1 model selector. However, the root friction stems from template design — the out-of-the-box workflow assumes cloud-hosted OpenAI models, making local deployments fragile without explicit configuration overrides.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.