7
Benchmarking & Performance Tuning
+100 XP5 min7 / 10
Overview: Benchmarking & Performance Tuning
Overview: Benchmarking & Performance Tuning
TTFT (Time to First Token) is what users consciously feel β aim <200ms for interactive apps. TPS (Tokens per Second) determines throughput for document generation. For a 7B model on RTX 4090: optimal is all 33 layers on GPU, 8K context, threads equal to physical cores (not hyperthreaded).
1 of 3