7
Evaluation & CI/CD
+100 XP5 min7 / 10
RAGAS: The 4 Metrics That Matter
RAGAS: The 4 Metrics That Matter
The RAGAS framework defines four core metrics for RAG evaluation:
- Faithfulness: Does the answer only contain claims supported by the retrieved context? A score of 0.6 means 40% of the answer is fabricated. This is the hallucination detector.
- Answer Relevancy: Does the answer actually address the user's question?
- Context Precision: Of the chunks retrieved, what fraction are relevant?
- Context Recall: Of all chunks that contain the answer, how many did we retrieve?
Faithfulness should always be the primary gate. Never use the same model family as both judge and generator. Judge-generator bias inflates scores by 15-25%.
1 of 12