8
Dataset Curation Pipeline
+100 XP5 min8 / 11
Overview: Dataset Curation Pipeline
Overview: Dataset Curation Pipeline
Magpie (ICLR 2025): by prompting Llama 3 with only a user-turn token, the model auto-generates instruction-response pairs. 300K Magpie-filtered examples match the quality of 10M human-curated samples. Data quality beats data quantity every time.
1 of 3