10
Evaluation & Diagnosis
+100 XP5 min10 / 11
Overview: Evaluation & Diagnosis
Overview: Evaluation & Diagnosis
Always report DELTA metrics, not just final scores. Evaluate general benchmarks (MMLU, HellaSwag) alongside task-specific ones to catch catastrophic forgetting — the silent killer of fine-tuned models.
1 of 3