Paul Twist

Posted on Jul 4

Why 88% of Agent Pilots Die: The Infrastructure Readiness Gap Nobody Talks About

#ai #agents #infrastructure #governance

Why 88% of Agent Pilots Die: The Infrastructure Readiness Gap Nobody Talks About

TL;DR: Production agent teams aren't failing because of models. They're failing because they lack a unified control plane for governance, observability, and cross-runtime orchestration. The infrastructure problem is not optional; it's the difference between a prototype and a system that stays up when humans depend on it.

You've probably heard the statistic by now: 88% of AI agents fail to reach production. But here's what nobody emphasizes enough — it's not because the agents aren't capable.

It's because the infrastructure underneath them was never designed for production in the first place.

I've been watching this unfold across production teams, and the pattern is consistent. Teams get a coding agent (Claude Code, Cursor, GitHub Copilot) or a reasoning agent working in a sandbox environment. It works beautifully in isolation. So they ship it. Then, six weeks in, they hit the wall. Not a model wall. An infrastructure wall.

The Wall Looks Like This

You're running agents across three runtimes now — maybe Claude Managed Agents for one team, Cursor for another, a custom harness for a third. Each platform has its own session management, its own audit log format, its own credential scoping. Your engineers have three logins. Your security team has three vendor agreements. Your observability stack has three disconnected tracing systems.

Then someone asks: "Where are all our agent sessions right now? What did agent X do yesterday? Can we enforce a spend limit across all of them without redeploying?"

The answer is: you can't. Not without building an abstraction layer yourself.

That's the infrastructure readiness problem.

The Numbers Tell a Specific Story

Let me separate the data:

79% of enterprises adopted AI agents in some form, yet only 11% run them in production.
Only 15% of companies are fully prepared for production AI agent deployment, even though 60% are investing millions.
Data quality and governance are cited as the biggest obstacles (42-39%), yet teams deploy anyway on top of fragile pipelines.

The infrastructure readiness gap isn't coming from model limitations. It's coming from the absence of a unified control plane.

What a Control Plane Actually Does

When I say "control plane," I don't mean a dashboard. I mean infrastructure that solves six concrete problems simultaneously:

Multi-runtime abstraction: One API to invoke agents on OpenCode, Hermes, Claude Managed Agents, or Cursor — without rewriting your application code.
Session durability: If a pod crashes or gets replaced during a deployment, the agent's session persists. No context loss.
Access control without console sprawl: Developers create and run agents without touching Bedrock, Anthropic, or Cursor consoles. One unified credential vault.
Governance at invocation time: Per-agent budgets, per-team spend limits, tool allowlisting — enforced before the request leaves your infrastructure.
Observability as a first-class layer: Every decision, every tool call, every failure captured in one place with structured tracing.
Audit trails that satisfy compliance: Not "we logged it somewhere," but "we captured the exact identity, the exact decision, the exact authorization, at every step."

Without a control plane, you're either building all six yourself (expensive, fragile, slow) or you're leaving them unbuilt (which is why 88% of pilots never ship).

Why the Framework Isn't Enough

Here's where production teams get confused. Frameworks like LangGraph, CrewAI, and Claude Agent SDK solve the agent logic layer beautifully. They handle orchestration, tool calling, memory within a single session.

But they don't solve the infrastructure layer. They can't, by design. A framework lives inside your application. It doesn't span across multiple runtimes. It doesn't manage team identity. It doesn't enforce governance across a fleet of agents. It doesn't know about compliance requirements that haven't been built yet.

The teams that make it past the 88% failure line are the ones who realized early: I need both a framework (for agent logic) and a control plane (for agent infrastructure).

The Bridge From Pilot to Production

Here's what I'm seeing work in 2026:

Step 1: Choose your frameworks and runtimes. Claude Code for some tasks, Cursor for others, Bedrock for high-volume work. Be deliberate. Optimization can wait.

Step 2: Deploy a control plane in front of them. One place where teams register agents, invoke agents, observe agent behavior, and enforce policy — regardless of the underlying runtime. This is where platforms like LiteLLM Agent Platform become essential — a single gateway and dashboard that lets your team create, schedule, and talk to coding agents across OpenCode, Claude Managed Agents, Cursor, OpenClaw, DeepAgents, without handing out console access.

Step 3: Add a fast data plane. Once agents are running at scale, latency compounds. A Rust gateway serves 15x the throughput on 11x less memory, with per-request overhead cut from 7.5ms to 0.05ms. For single agents, the Python gateway is fine. For fleets of agents making hundreds of calls in parallel, Rust starts mattering.

Step 4: Operationalize governance before chaos. Budget enforcement, tool approval, session recovery, incident response — these aren't nice-to-haves after you scale. They're prerequisites. Most agentic AI pilots stall because teams chase model capability instead of governance readiness.

The Readiness Checklist

Before you call your agent system "production," ask these questions:

Can I invoke an agent without giving the developer console access to my AI provider? (Governance)
If a pod crashes, is session state recovered automatically? (Durability)
Can I set per-agent budgets and enforce them without redeploying code? (Cost control)
Can I see the exact sequence of decisions that led to every agent action? (Observability)
Can I revoke agent access to a specific tool without redeploying? (Policy as infrastructure)
Do I have one place to query agent sessions across my entire fleet, regardless of runtime? (Unified control)

If you answer "no" to more than one of these, you're not ready for production. Not because the agent isn't smart enough, but because the infrastructure underneath it hasn't matured.

The Infrastructure Layer Matters More Than You Think

This is the insight that separates the 12% that succeed from the 88% that fail:

The organizations that ship production agents aren't the ones with the most capable models. They're the ones that invested in infrastructure before they scaled. They decided early: "One control plane for all our agents. One set of policies. One audit trail. One observability system."

That decision doesn't feel urgent when you're running one agent on one team. But by the time you're running five agents across three teams on two different runtimes, it's non-negotiable.

What You Should Do This Week

Audit your current agent setup. How many runtimes? How many consoles do your engineers need access to? How many separate observability systems?
Map your governance gaps. Can you actually enforce budgets right now? Do you have audit trails? Can you revoke tool access without code changes?
Evaluate control-plane platforms. Not based on latency benchmarks, but based on whether they solve the six infrastructure problems above. LiteLLM Agent Platform provides the canonical example — multi-runtime abstraction, session persistence, tool governance, cost tracking, observability, and audit logging as first-class features.
Plan for the data plane upgrade. If you're at 10+ agents or 100+ concurrent sessions, Rust gateways matter. But don't start there. Start with the control plane. Performance optimization is downstream of operational correctness.

The Uncomfortable Truth

Most teams don't fail at agents because of the models. They fail because they built production infrastructure as an afterthought.

The good news? The infrastructure layer is now mature. Organizations see faster returns when agent building, deployment, and monitoring live in a single governed environment. That's not hype. That's evidence from teams that made it past the 88%.

The uncomfortable part is that it requires a deliberate decision to invest in the control-plane layer upfront, not bolt it on after your first incident.

If you're planning agents for 2026, start there. Your future self will thank you.

What's your agent infrastructure look like right now? Are you feeling the governance/observability gap? Drop a comment below — I'm tracking what production teams are actually building.

Paul Twist is a European AI engineer and technical writer. He writes about production AI infrastructure, agent systems, and what it takes to move from demos to durable systems. Follow for deep dives on agent platforms, observability, governance, and the infrastructure gaps that most teams don't see coming.

Top comments (1)

mote • Jul 5

The 'control plane' framing is exactly right. What I keep seeing is that teams plan for orchestration and observability, but they skip the state layer entirely — and then they wonder why agents lose context mid-session, or why there's no durable record of what happened before the last restart.

Memory is the missing piece in most agent infrastructure diagrams. Not just 'the model needs context' but: where does session state actually live? What survives a crash? Can you replay an agent's reasoning from three sessions ago?

We're building moteDB (cargo add motedb) precisely as a unified state layer for this — multimodal memory that handles vectors, time-series events, and structured state in one place, with a Rust-native footprint that runs on the edge. Happy to go deeper if you want to dig into the storage layer angle here.