What is Reciprocal Rank Fusion?

RRF is an algorithm that combines multiple ranked lists into a single ranked list. For each result, it computes a score based on rank position: 1/(k+rank). Results appearing in multiple lists accumulate scores from each. This approach is elegantly simple, requires no score normalization, and consistently outperforms individual ranking methods in information retrieval benchmarks.

Is hybrid search slower than single-mode search?

The latency increase is minimal because the dense and sparse searches run in parallel, not sequentially. The total time is approximately the maximum of the two search times plus a small overhead for RRF fusion. In practice, hybrid search adds 5-15 milliseconds compared to dense-only search, which is negligible in the context of an overall chatbot response.

When is hybrid search better than vector-only search?

Hybrid search is particularly advantageous when your knowledge base contains specific identifiers (product codes, error numbers, proper names), technical terminology, or content where exact keyword matches are important. For purely conceptual question-answering with no specific terms, vector search alone may be sufficient, but hybrid search rarely hurts and often helps.

🔄Hybrid Search

Hybrid Search

Hybrid search combines dense vector search (semantic similarity) with sparse keyword search (exact matching) to deliver more accurate and comprehensive retrieval results.

What Is Hybrid Search?

Hybrid search is a retrieval strategy that combines two fundamentally different search paradigms — dense vector search and sparse keyword search — to achieve better accuracy than either approach alone. Dense search uses AI embeddings to find semantically similar content: a query about "automobile maintenance" will match documents about "car repair" because the concepts are semantically equivalent. Sparse search uses traditional keyword matching algorithms like BM25 to find documents containing the exact query terms, excelling at matching specific names, codes, product IDs, and technical terminology that might be lost in the semantic abstraction of embeddings. Each approach has characteristic weaknesses: dense search can miss exact matches that are semantically ambiguous, while sparse search misses semantically equivalent but lexically different content. Hybrid search runs both searches in parallel and merges the results using a fusion algorithm, typically Reciprocal Rank Fusion (RRF), which combines the ranked lists without requiring comparable scoring scales. The result is a retrieval system that handles both conceptual questions ("how do I handle returns") and specific queries ("order number 12345 status") with equal effectiveness.

How Hybrid Search Works

Hybrid search operates through a parallel retrieval and fusion pipeline. When a query arrives, it is processed simultaneously through two paths. The dense path embeds the query using the same embedding model used during indexing, then performs a cosine similarity search against the vector index to retrieve the top-k most semantically similar chunks. The sparse path processes the query through a text search engine (typically PostgreSQL's tsvector/tsquery with BM25-like ranking) to find chunks containing matching keywords, with weighting based on term frequency and document frequency. Each path produces a ranked list of results. These two ranked lists are then combined using Reciprocal Rank Fusion (RRF), an algorithm that assigns each result a score based on its rank position in each list: score = 1 / (k + rank), where k is a constant (typically 60) that controls how much the top positions are emphasized. Results appearing in both lists receive scores from both, naturally boosting documents that are both semantically and lexically relevant. The fused list is the final retrieval result, ordered by combined RRF score. Some systems add a third stage: cross-encoder reranking, which uses a more computationally expensive model to re-evaluate the top results for precision.

Why Hybrid Search Matters

Single-mode search creates blind spots that directly impact chatbot accuracy. Pure vector search may miss queries about specific product names, error codes, or technical identifiers that need exact matching. Pure keyword search misses the vast space of semantically equivalent but lexically different queries. In production chatbot deployments, both types of queries are common: customers ask conceptual questions ("what's included in the premium plan") and specific ones ("error code E-4012") in the same conversation. Hybrid search handles both seamlessly, improving retrieval recall by 10-30% compared to either method alone in typical knowledge base scenarios. This improvement in retrieval directly translates to better chatbot answers, higher confidence scores, and fewer escalations.

How Chatloom Uses Hybrid Search

Hybrid search is a core component of Chatloom's RAG pipeline. The system stores both dense vector embeddings (in pgvector) and sparse keyword indices (PostgreSQL tsvector with GIN indexing) for every knowledge base chunk. At query time, both searches run in parallel, and results are combined using Reciprocal Rank Fusion via a custom rrf_score() database function. The hybrid results then pass through Cohere's cross-encoder reranker for additional precision before being fed to the LLM as context. This multi-stage retrieval pipeline is why Chatloom achieves high accuracy even on diverse query types.

Related Terms

Explore related concepts to deepen your understanding.

Retrieval-Augmented Generation

Embedding (AI)

Vector Database

Reranking

Frequently Asked Questions

What is Reciprocal Rank Fusion?: RRF is an algorithm that combines multiple ranked lists into a single ranked list. For each result, it computes a score based on rank position: 1/(k+rank). Results appearing in multiple lists accumulate scores from each. This approach is elegantly simple, requires no score normalization, and consistently outperforms individual ranking methods in information retrieval benchmarks.
Is hybrid search slower than single-mode search?: The latency increase is minimal because the dense and sparse searches run in parallel, not sequentially. The total time is approximately the maximum of the two search times plus a small overhead for RRF fusion. In practice, hybrid search adds 5-15 milliseconds compared to dense-only search, which is negligible in the context of an overall chatbot response.
When is hybrid search better than vector-only search?: Hybrid search is particularly advantageous when your knowledge base contains specific identifiers (product codes, error numbers, proper names), technical terminology, or content where exact keyword matches are important. For purely conceptual question-answering with no specific terms, vector search alone may be sufficient, but hybrid search rarely hurts and often helps.

Related Resources

Retrieval-Augmented Generation Embedding (AI)Reranking RAG AI Chatbot Feature

Stop maintaining chatbots. Ship an AI agent.

Build your first agent

in under an hour.

Pick a template, connect your content, and deploy across every channel. Your free plan is ready when you are.

Free forever plan

No credit card

Production-ready in under an hour