Skip to main content

One post tagged with "ai"

View All Tags

NVIDIA Dynamo on AKS: Disaggregated LLM Inference with H100 GPUs

· 15 min read
Diego Casati
Principal Cloud Architect, Azure Global Black Belt
Mohamad Al Jazaery
Principal Solution Engineer, Azure Global Black Belt

You've got your AKS cluster, your GPU quota is approved, and you're ready to serve large language models. But picking the right inference stack — vLLM, TensorRT-LLM, SGLang, disaggregated vs. unified — can cost you days before your first token lands.

That's the gap NVIDIA Dynamo fills.