8 posts tagged with "kubernetes"

View All Tags

Secretless Microsoft Entra ID Authentication for AKS with Istio and oauth2-proxy

July 29, 2026 · 15 min read

Diego Casati

Principal Cloud Architect, Azure Global Black Belt

Dominique St-Amand

Principal Cloud Architect, Azure Global Black Belt

Sometimes, you need to add authentication to an application without modifying its source code. It might be off-the-shelf software or an older in-house application built before modern authentication protocols such as OpenID Connect or SAML 2.0 became commonplace. So how can you protect it with modern authentication without changing the application itself?

This is where a reverse proxy can handle authentication on the application's behalf.

In this post we'll show how to protect the AKS Store Demo with an Istio Gateway (using the Gateway API), oauth2-proxy, and Microsoft Entra ID. Istio service mesh delegates every protected request to oauth2-proxy through Envoy external authorization. Envoy's external authorization filter delegates authorization decisions to an external HTTP or gRPC service, allowing flexible and centralized access control. Oauth2-proxy uses the authorization code flow with PKCE (Proof Key for Code Exchange) to authenticate the user. It uses AKS Workload Identity to authenticate the app registration when redeeming the authorization code.

There is no Entra application password to create, store in Kubernetes, or rotate.

TTFT-Driven Autoscaling for Disaggregated LLM Inference with NVIDIA Dynamo on AKS

May 11, 2026 · 16 min read

Diego Casati

Principal Cloud Architect, Azure Global Black Belt

Mohamad Al Jazaery

Principal Solution Engineer, Azure Global Black Belt

Most inference autoscalers react to CPU or GPU utilization. But for large language models the metric that actually matters to users is Time To First Token (TTFT) — how long they wait before the response starts streaming. A GPU can be 60% utilized and still be delivering 30-second TTFT under a burst of long-context requests.

In this post I'll show how to wire NVIDIA Dynamo disaggregated inference together with KEDA on AKS so that the system autoscales the decode worker fleet directly on TTFT p99 — using Azure Managed Prometheus as the metric source and AKS-managed GPU drivers so there is no NVIDIA GPU Operator to maintain.

NVIDIA Dynamo on AKS: Disaggregated LLM Inference with H100 GPUs

May 8, 2026 · 15 min read

Diego Casati

Principal Cloud Architect, Azure Global Black Belt

Mohamad Al Jazaery

Principal Solution Engineer, Azure Global Black Belt

You've got your AKS cluster, your GPU quota is approved, and you're ready to serve large language models. But picking the right inference stack — vLLM, TensorRT-LLM, SGLang, disaggregated vs. unified — can cost you days before your first token lands.

That's the gap NVIDIA Dynamo fills.

Continuous Profiling on AKS with Pyroscope, Blob Storage, and Managed Grafana

May 4, 2026 · 20 min read

Diego Casati

Principal Cloud Architect, Azure Global Black Belt

Post Updates

2026-05-20 — Updated based on lessons learned from a live deployment:

Removed hardcoded pyroscope.image.tag from values-azure.yaml to prevent chart/image version mismatches when the chart is upgraded
Added pyroscope.extraLabels with azure.workload.identity/use: "true" to propagate the label to all pod templates (the chart uses extraLabels, not podLabels)
Pinned --version 2.0.1 in the helm upgrade --install command
Added a Troubleshooting callout documenting the two most common crash patterns and their fixes

You deploy your workloads on AKS and collect metrics with Prometheus and logs with Loki. But when latency spikes hit, you stare at dashboards knowing something is slow without knowing where in your code the time is being spent.

That's the gap continuous profiling fills.

Tools of the Trade: Adding fuzziness to the mix

February 13, 2026 · 3 min read

Diego Casati

Principal Cloud Architect, Azure Global Black Belt

In this Tool of the Trade, we take a quick look at how fuzziness with fzf can enhance your day-to-day life in the terminal. This is one of those tools that you did not know you really needed until you try it.

ARO Storage Accounts: Under the Hood

December 18, 2025 · 8 min read

Diego Casati

Principal Cloud Architect, Azure Global Black Belt

You create an Azure Red Hat OpenShift cluster, and minutes later, you notice something interesting in the managed resource group: two storage accounts with cryptic names like cluster1a2b3c4d5e and imageregistry1a2b3c4d5e.

What are they for? Why two? And what happens if you accidentally delete one?

Understanding ICMP Traffic on AKS

November 28, 2025 · 10 min read

Diego Casati

Principal Cloud Architect, Azure Global Black Belt

You open a shell inside a pod, run ping 8.8.8.8, and nothing happens. Silence. Then you run curl https://www.microsoft.com and it succeeds immediately. DNS resolution also works normally.

So why does ping fail?

Tools of the Trade: Working with Multiple Clusters

November 27, 2025 · 5 min read

Diego Casati

Principal Cloud Architect, Azure Global Black Belt

Welcome to "Tools of the Trade" - a series where we share the tools and workflows that help us work more effectively. In this first post, I'll show you how I manage multiple AKS clusters without losing track of which cluster I'm working on. If you've ever accidentally deployed to the wrong cluster, this one's for you.