Dual-path Recall + RRF Fusion
Retrieval Flow
| Stage | Description |
|---|---|
| 1. Vector Retrieval | pgvector cosine similarity search, filtered by user_id + knowledge_base_id metadata, retrieves top_k × 3 candidates |
| 2. BM25 Retrieval | jieba Chinese tokenization + BM25Okapi keyword matching, also retrieves top_k × 3 candidates |
| 3. RRF Fusion | Reciprocal Rank Fusion merges both result sets. Formula: score = vector_weight / (rank + 60) + keyword_weight / (rank + 60) |
| 4. Sort & Truncate | Results sorted by fusion score in descending order, returning Top K |
Parameter Configuration
| Parameter | Default | Description |
|---|---|---|
top_k | 5 | Maximum number of results to return |
vector_weight | 0.6 | Weight of vector search in RRF |
keyword_weight | 0.4 | Weight of keyword search in RRF |
use_rerank | true | When enabled, retrieves 3x candidates for fusion ranking; when disabled, returns vector results directly |
Fallback Strategy
Whenrank-bm25 or jieba is not installed, the system automatically falls back to vector-only search mode, using only pgvector for semantic retrieval. A log message is emitted: ⚠️ BM25 dependencies unavailable, using vector-only search mode.
BM25Store 3-tier Cache
The BM25 index uses a 3-tier cache architecture, supporting lazy loading and multi-instance deployments:Cache Tiers
| Tier | Storage | Characteristics |
|---|---|---|
| L1 | Process Memory | Fastest, Dict structure, lost on process restart |
| L2 | Redis | Shared across instances, TTL auto-expiry, serialized storage |
| L3 | PostgreSQL (bm25_indexes table) | Persistent, zlib compression + pickle serialization |
Index Lifecycle
| Event | Behavior |
|---|---|
| First retrieval | Lazy loading — fetches all chunks from the knowledge base, builds BM25 index, writes to all 3 cache tiers |
| Document added/deleted | invalidate(kb_id) — clears the index for that knowledge base from all 3 cache tiers |
| Next retrieval | Automatically rebuilds the index |
| Redis TTL expiry | Recovered from PostgreSQL, or rebuilt from scratch |
Index Data Structure
PostgreSQL Persistence Table
Multi-Knowledge-Base Retrieval
When a user associates multiple knowledge bases in a conversation, the system performs dual-path retrieval independently on each knowledge base, then merges all results sorted by score: Each knowledge base has its own independent BM25 index, with no interference between them.RAGService
RAGService is the service-layer wrapper for retrieval, providing:
| Method | Description |
|---|---|
search(query, knowledge_base_id, user_id, top_k, score_threshold) | Single knowledge base retrieval |
search_multiple_knowledge_bases(query, knowledge_base_ids, user_id, top_k) | Multi-knowledge-base retrieval + merged ranking |
format_for_context(chunks, max_chars) | Format as LLM context (with source citations) |
format_for_tool_output(chunks, max_chunks) | Format as Agent tool output |
get_retriever(user_id, knowledge_base_id, k) | Get a LangChain VectorStoreRetriever |