8
Deployment Patterns
+100 XP5 min8 / 10
Overview: Deployment Patterns
Overview: Deployment Patterns
Your Docker + Kubernetes experience transfers directly to LLM serving. GPU scheduling in K8s uses nvidia.com/gpu resource requests, nvidia-container-toolkit for runtime, and toleration-based nodeSelector for GPU node pools. Horizontal scaling for stateless inference is identical to scaling a REST API.
1 of 3