6

Re-ranking

Diagnose the p95 latency problem in this RAG system

A customer support AI hit its 1s p95 SLA until the team added a cross-encoder re-ranker across all queries. p95 jumped to 1.8s. The fix: conditional re-ranking — skip the reranker when bi-encoder confidence is above 0.85. 63% of queries were simple enough to skip, bringing p95 back to 950ms.

— Level 6 · Production RAG Pipeline
+100 XP5 min6 / 10

Retrieve → Re-rank → Generate Pipeline

1 of 9