2

SFT Fundamentals

Supervised Fine-Tuning — Hyperparameters That Matter

+100 XP5 min2 / 11

Overview: SFT Fundamentals

Overview: SFT Fundamentals

Flash Attention 2 delivers 3-10x faster attention computation. Learning rate is the single most critical SFT hyperparameter — too high causes catastrophic forgetting, too low means no adaptation.

1 of 3