Skip to main content
Zeus’s knowledge base retrieval uses a dual-path recall strategy combining vector semantic search + BM25 keyword matching, with results merged via RRF (Reciprocal Rank Fusion) to balance semantic understanding and exact matching.

Dual-path Recall + RRF Fusion

Retrieval Flow

StageDescription
1. Vector Retrievalpgvector cosine similarity search, filtered by user_id + knowledge_base_id metadata, retrieves top_k × 3 candidates
2. BM25 Retrievaljieba Chinese tokenization + BM25Okapi keyword matching, also retrieves top_k × 3 candidates
3. RRF FusionReciprocal Rank Fusion merges both result sets. Formula: score = vector_weight / (rank + 60) + keyword_weight / (rank + 60)
4. Sort & TruncateResults sorted by fusion score in descending order, returning Top K

Parameter Configuration

ParameterDefaultDescription
top_k5Maximum number of results to return
vector_weight0.6Weight of vector search in RRF
keyword_weight0.4Weight of keyword search in RRF
use_reranktrueWhen enabled, retrieves 3x candidates for fusion ranking; when disabled, returns vector results directly

Fallback Strategy

When rank-bm25 or jieba is not installed, the system automatically falls back to vector-only search mode, using only pgvector for semantic retrieval. A log message is emitted: ⚠️ BM25 dependencies unavailable, using vector-only search mode.

BM25Store 3-tier Cache

The BM25 index uses a 3-tier cache architecture, supporting lazy loading and multi-instance deployments:

Cache Tiers

TierStorageCharacteristics
L1Process MemoryFastest, Dict structure, lost on process restart
L2RedisShared across instances, TTL auto-expiry, serialized storage
L3PostgreSQL (bm25_indexes table)Persistent, zlib compression + pickle serialization

Index Lifecycle

EventBehavior
First retrievalLazy loading — fetches all chunks from the knowledge base, builds BM25 index, writes to all 3 cache tiers
Document added/deletedinvalidate(kb_id) — clears the index for that knowledge base from all 3 cache tiers
Next retrievalAutomatically rebuilds the index
Redis TTL expiryRecovered from PostgreSQL, or rebuilt from scratch

Index Data Structure

@dataclass
class BM25IndexData:
    bm25: BM25Okapi          # BM25 index instance
    corpus: List[List[str]]   # Tokenized corpus (jieba-segmented)
    documents: List[str]      # Original document content
    doc_ids: List[str]        # Document ID list (for result mapping)
    created_at: float         # Creation timestamp
    doc_count: int            # Document count

PostgreSQL Persistence Table

CREATE TABLE IF NOT EXISTS bm25_indexes (
    kb_id VARCHAR(255) PRIMARY KEY,
    index_data BYTEA NOT NULL,        -- zlib compressed + pickle serialized
    doc_count INTEGER NOT NULL DEFAULT 0,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

Multi-Knowledge-Base Retrieval

When a user associates multiple knowledge bases in a conversation, the system performs dual-path retrieval independently on each knowledge base, then merges all results sorted by score: Each knowledge base has its own independent BM25 index, with no interference between them.

RAGService

RAGService is the service-layer wrapper for retrieval, providing:
MethodDescription
search(query, knowledge_base_id, user_id, top_k, score_threshold)Single knowledge base retrieval
search_multiple_knowledge_bases(query, knowledge_base_ids, user_id, top_k)Multi-knowledge-base retrieval + merged ranking
format_for_context(chunks, max_chars)Format as LLM context (with source citations)
format_for_tool_output(chunks, max_chunks)Format as Agent tool output
get_retriever(user_id, knowledge_base_id, k)Get a LangChain VectorStoreRetriever