Hybrid Search & Retrieval
Find the 3 bugs in this hybrid search SQL query
“A developer tools company built RAG over their API docs. Pure vector search worked great for conceptual questions but completely missed error code lookups — 'ERR_429' returned articles about 'rate limiting' instead of the literal docs containing 'ERR_429'. Hybrid search with RRF fixed both cases.
Production RAG Stack
Why Dense Search Alone Is Not Enough
Pure vector (dense) search captures semantic meaning but misses exact keyword matches. Search 'error code ERR_429' and dense search might return documents about 'rate limiting' (semantically similar) but miss the document that literally contains 'ERR_429' (keyword match). BM25 lexical search catches exact matches but misses paraphrases.
Hybrid search runs both in parallel and merges results using Reciprocal Rank Fusion (RRF). The formula: score = sum(1 / (k + rank_i)) where k=60 is a smoothing constant. A document ranked #2 in both lists scores higher than one ranked #1 in only one list.
This pushes precision from ~62% (dense only) to ~84% (hybrid with RRF).
If you run pgvector, you already have hybrid search built-in. PostgreSQL's full-text search (tsvector + GIN index) provides BM25-equivalent lexical search. Add CREATE INDEX ON documents USING GIN (to_tsvector('english', content)); and write a single CTE query with RRF. No Elasticsearch, no extra service, no data sync. Your ops cost stays the same.
Code Debugger
Hybrid Search & Retrieval
1 question • ~2 min
Tip: Complete the learn sections first for the best score.