2
SFT Fundamentals
+100 XP5 min2 / 11
Overview: SFT Fundamentals
Overview: SFT Fundamentals
Flash Attention 2 delivers 3-10x faster attention computation. Learning rate is the single most critical SFT hyperparameter — too high causes catastrophic forgetting, too low means no adaptation.
1 of 3