The Human Scale Problem in Platform Engineering

Platform engineering insights focused on scaling human capability, not just technical systems. Practical approaches to making complex infrastructure more humane and accessible.

Posted on: Jun 24, 2025

Ray Kao

Twitter: @raykao
GitHub: @raykao

Ray Kao is a Principal Cloud Architect at Microsoft, on the Digital and App Innovation, Azure Global Black Belt Team.

Diego Casati

Twitter: @ve6dpc
GitHub: @dcasati

Diego Casati is a Principal Cloud Architect at Microsoft, on the Digital and App Innovation, Azure Global Black Belt Team.

The Human Scale Problem in Platform Engineering

We keep doing this thing where we solve a problem, celebrate the victory, then realize we’ve created three new problems we didn’t even know existed.

Remember when manually configuring servers was the bottleneck? So we built containers. Great! Now we’re orchestrating thousands of them. Remember when monolithic deployments were too slow? So we built microservices. Fantastic! Now we’re drowning in distributed system complexity. We solved manual infrastructure provisioning with infrastructure as code. Perfect! Now we’re coordinating dozens of Terraform modules across environments and wondering how we got here.

Each step forward has been genuinely valuable. But we keep hitting the same pattern: our solutions outpace our ability to operate them at human scale.

The Knowledge Explosion

Think about what we’re asking people to know now. Container orchestration, service mesh configuration, observability platforms, security policy engines, AI/ML operations—roles that didn’t even exist a decade ago. No single person can be an expert in all of this, and that’s actually okay.

What we’ve created is necessary specialization. We need people who can go deep on specific problem domains because the breadth of knowledge has expanded beyond what any individual can reasonably master. Silos aren’t inherently bad—they’re a natural response to complexity that exceeds what one person can hold in their head.

The real problem isn’t that we have silos. Silos are actually fine. Creating clear roles/responsibilities is a good thing. It’s that we’re terrible at helping these specialists talk to each other.

The Coordination Crisis

Platform engineering is our attempt to build bridges between these specialists—creating interfaces and abstractions that let them work together without everyone needing to become an expert in everything. When it works, specialists can collaborate without understanding every detail of each other’s domains, while guardrails ensure their work still fits together properly.

But let’s be honest—we’re still figuring this out. We’re just now learning what the proper handoffs and demarcation lines should be between security engineers, platform engineers, SREs, and application developers. Even when we think we’ve drawn the boundaries correctly, there are still knowledge gaps and coordination challenges wherever these roles overlap each other.

And then there’s the tool problem. We have so many overlapping options that nobody knows who should own what. Should observability be owned by the platform team using Prometheus, or the SRE team using Datadog, or the application teams using Azure Monitor? Who owns the deployment pipeline—platform engineering with their GitOps approach, or the dev teams who want to use GitHub Actions and push directly?

Many organizations find themselves paralyzed by choice, afraid to commit to specific tools because of vendor lock-in concerns or the fear of having to rebuild their entire process when better alternatives emerge. So they end up with tool sprawl across teams, each solving similar problems in slightly different ways, creating even more coordination complexity.

The Monument Problem

Here’s where we keep making the same mistake: we treat software and engineering problems as one-and-done challenges. We chase perfect code and perfect execution, hoping that if we just build it right the first time, we’ll never need to touch it again. This has been the root of our collective error.

Look at something like mainframe systems (…this is not a punch down at mainframe momment by the way…) — brilliant solutions, built with the assumption they’d last forever. But the world evolved beyond the paradigm it represented, and now organizations are struggling to rebuild decades of accumulated logic and debt because the original systems had no way to introspect, understand, or extend beyond their initial design and/or we felt like the problem was solved so we didn’t bother to adjust and evolve it.

We built monuments instead of living systems. The solutions we build today must be designed differently.

The Current Moment

What I’m seeing across organizations—and I think we’re all experiencing this — is that these aren’t really new problems. They’re the same fundamental challenges we’ve always had, just amplified by cloud computing, microservices, and the reality of building software at “enterprise scale”. Platform engineering is the current term our industry is using to rally around addressing these amplified human problems.

But here’s what’s really exciting about this moment in our industry: we’re not just trying to encode operational knowledge into platforms/scripts/docs anymore. We’re discovering new ways to make systems more humane through AI assistance — creating interfaces that finally match how humans naturally want to interact with complex technology.

The difference isn’t that AI is magic; it’s that we’ve built sufficently complex systems that conversational interfaces are required and we now have tools available to us that can truly start to make sense of them WITH US.

The Fundamental Shift

We already have the fundamental tools for this job. Source control, automation runners, infrastructure as code, API calls…endless API calls. These aren’t going anywhere. They’re battle-tested, timeless, and they’ll remain at the core of what we do in platform engineering. Even AI will ultimately read from and output to this. The challenge isn’t finding new tools, it’s orchestrating the ones we have into coherent, user-friendly experiences.

What’s interesting about AI assistants is that they can become like that experienced colleague who remembers not just what we did, but why we did it. They can hold onto the collective knowledge we’ve built up over time, understanding which tools to use, grasping the reasoning behind our decisions, and remembering the rules and governance that shaped those choices. This institutional/tribal memory becomes accessible through simple conversation instead of forcing people to dig through documentation or doing intensive knowledge transfers that still don’t fully cover the breadth and depth of what we need to build and support.

What This Means for You

The common thread across all engineering challenges (or any task really) is that they’re fundamentally about scaling human capability, not just technical capability. The limiting factor in most organizations isn’t compute capacity or network bandwidth…it’s human cognitive capacity and the work items that are bound by human scale, such as time and speed.

Platform engineering recognizes this reality and focuses on amplifying human expertise rather than expecting people to become superhuman. It’s about building systems that make complex operations incrementally more understandable, moving towards enabling teams to focus on solving business problems instead of fighting with infrastructure, and create sustainable pathways for specialists to collaborate effectively.

This isn’t a problem we solve once and move on from. The challenges evolve as our organizations grow, as new technologies emerge, and as our understanding deepens. When we see a new pattern emerge that we’ve had a discussion about 5, 10, 20 times??? then we need as platform engineers to create a pattern for it. The solutions we build today must be designed to adapt and evolve with us.

Next in this series: We’ll dive into the specific challenge of scaling infrastructure complexity beyond human cognitive capacity, and explore how the productivity paradox is affecting development teams across the industry.

What’s your experience with the human scale problem in your organization? Are you seeing similar patterns where solutions create new coordination challenges?

Cross Posted to: https://www.linkedin.com/pulse/human-scale-problem-platform-engineering-ray-kao-cnjhc/

#PlatformEngineering #DevOps #DeveloperExperience #AgenticDevOps #AzureGlobalBlackBelts #FieldNotes