According to platformengineering.org, after auditing dozens of enterprise initiatives, 80% of internal developer platforms fail. Roughly 70% of platform engineering initiatives struggle with adoption. 47% of developers experience anxiety about job security when new tools are introduced. Here's why DIY IDPs collapse — backed by real community data — and what to do instead.
The IDP Promise vs. Reality
The pitch is seductive: build an internal developer platform — a self-service layer between your infrastructure and your developers. Developers get golden paths. Platform engineers get control. Everyone ships faster. The community calls this the "Field of Dreams" fallacy — "if we build it, they will come." But mandatory adoption doesn't work.
So your team spends months evaluating tools. You settle on Backstage for the portal, Crossplane for provisioning, ArgoCD for GitOps, and custom Terraform modules to wire it together. As one r/devops user put it, adopting Backstage is like receiving "all the parts of a Chevy on your desk" rather than a finished vehicle. Twelve to eighteen months in, you've got something that works — mostly — for one team's workflow.
Then the maintenance starts. Plugin upgrades break things. New teams have different requirements. The two engineers who built the platform leave, and nobody knows how the custom auth middleware works. Catalog metadata deteriorates rapidly as personnel change. What was supposed to eliminate toil has become a second full-time job — and teams measure deploys per day while ignoring real developer friction.
This isn't a hypothetical. One organization hastily implemented an IDP without involving developers in the process — it resulted in confusion, lower productivity, and ultimately a return to legacy systems. The failure mode isn't dramatic — it's slow. The platform drifts into irrelevance while developers quietly go back to doing things the old way. As the community puts it: "platform engineering IS change management" — yet teams consistently neglect the cultural aspects.
Here are the five failure modes we see again and again.
1
The Platform Team Becomes a Bottleneck
The whole point of building a platform was to eliminate the "file a ticket and wait" workflow. But here's what actually happens: instead of developers waiting on DevOps for a staging environment, they now wait on the platform team to add a new template, fix a broken golden path, or grant access to a new cluster. Every request gets labeled "urgent" to bypass the platform queue.
You didn't eliminate the bottleneck. You renamed it.
This happens because most teams treat the platform as infrastructure, not a product. The team builds what they need right now, declares victory, and moves on. But a platform is a product with users — and those users have requests, bugs, and edge cases. Developers who lose the freedom to choose their own tools will find ways to circumvent the platform. When the platform team consists of three infrastructure engineers who never built software products before, the backlog grows faster than they can work through it.
One pattern we see repeatedly: a platform team builds a Backstage instance with a service catalog and a few custom plugins. As one developer on r/devops noted: "The idea of Backstage is super cool, but the fact that I need to write a lot of React code instead of GitHub workflows and Terraform files made me leave the project." When the mobile team needs iOS build support, or the data team needs Spark job templates, the requests pile up. The platform team becomes the gatekeeper for every new workflow — the exact dynamic the IDP was supposed to prevent.
The fix isn't "hire more platform engineers." Forcing adoption via mandates just creates shadow IT. Successful teams learned to make the right way the easiest way — compliance through convenience, not coercion. The fix is choosing a platform that ships with extensibility built in, where adding a new template or workflow doesn't require a platform engineer to write a custom plugin.
2
Maintenance Eats All Your Capacity
This is the silent killer of DIY platforms. Your platform works on day one. By month six, you're spending most of your time keeping it alive. At scale, as one engineer warned, "Backstage is not an 'other duties as assigned' sort of tool. It will require dedicated resources" — typically 3 to 5 full-time engineers just for ongoing maintenance.
Platform Team Time Allocation — DIY IDP (typical)
- Maintaining CI/CD glue 35%
- Fixing broken env configs 25%
- Upgrading Kubernetes / Helm 15%
- Answering developer tickets 15%
- Actually building new features 10%
The numbers aren't exaggerated. When you build a platform from open-source components — Backstage, Crossplane, ArgoCD, custom Terraform modules, homegrown CLI tools — each component has its own release cycle, breaking changes, and security patches. Backstage typically requires 12 to 18 months for full implementation, ships weekly releases, and its plugin ecosystem frequently introduces compatibility issues.
The commercial alternatives aren't immune either. One team's experience with Port.io: "POC in several days and we were super excited" — but it "turned out to be insanely expensive" at scale. Another team noted that Cortex "pricing was expensive... We left Cortex for OpsLevel for half the price." The community consensus is clear: "You can't really buy an IDP, you can only build one. For portals, there are off-the-shelf offerings." Meanwhile, Backstage adoption outside Spotify averages around 10% of engineers, despite heavy investment — because the team burned out on maintenance before delivering features developers actually wanted.
The math is brutal. If you have a three-person platform team, and two of them are spending their time on Kubernetes upgrades, Helm chart fixes, and CI/CD pipeline debugging, you effectively have one-third of an engineer building net-new platform capabilities. At that rate, you'll never outpace the feature requests coming in from developer teams.
This is the core build-vs-buy calculation that most teams get wrong. They estimate the build cost but forget the carry cost — the ongoing maintenance that compounds every quarter. If deployment takes 5 minutes but debugging takes 4 hours due to opaque abstractions, the platform has failed. You're not saving time — you're redistributing it to places that are harder to measure.
3
It Only Works for One Team's Workflow
Here's a pattern that plays out at nearly every company that builds a DIY platform: the platform team builds golden paths based on their own stack. If the team runs Next.js on Kubernetes with Postgres, the templates work beautifully for that exact combination. Then the payments team shows up with Spring Boot, RDS, and a completely different deployment model, and nothing fits.
This is one of the most common mistakes: over-engineering for theoretical future needs and attempting an all-in-one platform without validating. Teams focus only on Day 1 — app creation and scaffolding — while Day 2 through Day 50 operations have far greater impact on developer productivity. The industry mantra of "start with ONE critical bottleneck" gets ignored in favor of boiling the ocean.
The fundamental tension is between opinionated defaults and flexibility — what the community calls the "Golden Cage Syndrome." You need opinionated defaults to move fast — but you also need escape hatches. When abstraction without escape hatches means something breaks and developers can't see what's happening underneath, trust erodes immediately. Most DIY platforms nail the defaults and completely miss the escape hatches.
When a team hits an edge case the platform can't handle, they have two options: (1) wait months for the platform team to add support, or (2) work around the platform entirely. Most choose option two — creating shadow IT. Developers who lose autonomy will find ways to circumvent the platform. And once a team goes around the platform, they never come back.
Successful teams learned to start with a Minimum Viable Platform — demonstrate value within weeks, not months. Allow inner sourcing so teams can contribute back. The platform should solve the most painful bottleneck first, not try to be everything. When it tries to be everything, it solves 80% of one team's problems and 20% of everyone else's — which means everyone else ignores it.
4
No One Documents It (Bus Factor of 1)
Every DIY platform starts as a hero project. One or two senior engineers who deeply understand the infrastructure decide to "build a proper platform." They work nights and weekends, stitching together custom controllers, webhook handlers, and deployment pipelines. They know every quirk of the system because they built it.
Then one of them takes a new job. The other goes on parental leave. Now you have a critical piece of infrastructure — the system that manages how code gets to production — and nobody understands how it works.
This is the bus factor problem, and it's endemic to DIY platforms. Catalog metadata in tools like Backstage deteriorates rapidly as personnel change — service ownership records become stale, API docs go out of date, and the platform slowly fills with ghost entries that nobody maintains. The platform is treated as infrastructure — something that should "just work" — and therefore doesn't get the same engineering rigor as application code.
The tribal knowledge problem compounds with time. Custom Helm charts reference internal conventions nobody wrote down. The CI pipeline has a conditional branch that handles a specific edge case from two years ago — but the comment just says "// don't remove this." The authentication middleware uses an undocumented API from your identity provider that worked in v2.3 but breaks in v3.0.
When something breaks at 2 AM — and it will — the on-call engineer is reading through uncommented Go code trying to figure out why the custom admission webhook is rejecting deployments. This is not an efficient use of engineering talent. It's an organizational risk.
5
You Can't Keep Up with the Ecosystem
The cloud-native ecosystem moves fast. Kubernetes ships three major releases per year. Helm, Terraform, and ArgoCD all have their own release cycles. New patterns emerge — GitOps, policy-as-code, eBPF networking, AI-powered infrastructure. Your DIY platform needs to keep pace with all of it.
But it won't. Because your platform team is busy fighting the fires from failures #1 through #4. They don't have time to evaluate whether the new Kubernetes gateway API should replace your custom ingress controller, or whether Karpenter would save 40% on your node costs compared to the Cluster Autoscaler you configured eighteen months ago.
Every time a company adds new applications, services, and clusters, the IDP needs changes. New deployment targets, new cloud regions, new compliance requirements — each one requires platform work. The Stack Overflow 2023 survey found that 47% of developers experience anxiety about job security when new technologies are introduced. In a fast-growing company, the platform is always six months behind what teams actually need — and the people using it are already stressed about the constant churn.
The ecosystem problem also hits developer experience. Your developers see that competitor companies have ephemeral preview environments for every PR, AI-assisted debugging, and automated cost optimization. Your DIY platform still requires developers to SSH into a shared staging server and tail logs manually. The gap between what's possible and what your platform provides widens every quarter.
This is the fundamental disadvantage of building in-house: you're competing with companies whose entire business is building developer platforms. They have dedicated teams for each capability — environment management, cost optimization, observability, security. Your three-person platform team can't match that investment, no matter how talented they are.
The Alternative: Buy the Platform, Own the Infrastructure
The build-vs-buy debate for developer platforms has shifted. In 2022, building your own IDP was defensible — the commercial options were immature and expensive. In 2026, after watching 80% of DIY platforms fail according to platformengineering.org, the industry has learned a hard lesson: voluntary adoption over mandates wins every time. Make the platform so good that developers choose it. That's where your engineering team should be spending its innovation budget.
The modern approach is BYOC — Bring Your Own Cloud. You keep full ownership of your infrastructure: your AWS account, your Kubernetes clusters, your data. The platform vendor provides the orchestration layer — environment provisioning, templates, RBAC, cost controls, and developer experience — without ever touching your production data.
This model solves each of the five failures directly:
Bottleneck eliminated: Developers get self-service from day one, with a template catalog that the platform team curates but doesn't gatekeep
Maintenance offloaded: The vendor handles upgrades, security patches, and ecosystem compatibility. Your team focuses on defining golden paths, not fixing infrastructure
Multi-workflow by default: Support for Docker Compose, Helm, Terraform, and Kubernetes Manifests in the same environment — not just the stack your platform team happens to know
Documentation built in: A commercial platform has documentation, support, and a community. No tribal knowledge, no bus factor of 1
Ecosystem keeps pace: The vendor's engineering team tracks Kubernetes releases, adds new integrations, and ships features continuously
Bunnyshell is built on this model. You connect your existing EKS, GKE, or AKS clusters. Define your environments in a single bunnyshell.yaml file. Publish templates to a service catalog. Set RBAC policies and cost limits. Developers get self-service environments — full-stack, production-like, ephemeral — without filing a single ticket.
Most teams are productive within days, not the 12-18 months a Backstage implementation typically requires. No need for 3-5 dedicated FTEs just to keep the platform running. And because Bunnyshell handles the platform layer, your engineers can focus on what they were hired to do: building your product.
Top comments (0)