Paul Twist

Posted on Jun 28

The Four-Layer Agent Stack: When Your Framework Isn't Enough

#litellm #ai #agents #infrastructure

The Four-Layer Agent Stack: When Your Framework Isn't Enough

Models. Harnesses. Runtimes. Control plane.

By mid-2026, that's the structure most production agent teams are converging on—whether they explicitly acknowledge it or not. If you're building agents at any scale, you're already dealing with all four layers. The question is whether your infrastructure makes that separation visible or hides it until something breaks.

Why Three Layers Stopped Being Enough

A year ago, agent infrastructure looked simpler. You'd pick a model (Claude, GPT-4), write orchestration logic (LangChain, CrewAI, LangGraph), and run it somewhere (your laptop, a cloud instance, a managed platform). Three clear pieces.

That worked fine when agents were simple tools: chatbots, straightforward tool calling, single-model workflows.

But production agents don't look like that anymore. They're:

Multi-runtime (teams run Claude Code, Cursor Agents, Bedrock agents, custom agents—often all on the same team)
Stateful (sessions that survive pod restarts, memory that persists across invocations)
Tool-heavy (30+ tool calls per decision, with structured execution and cost tracking)
Governed (per-agent identity, access controls, audit trails, compliance requirements)

Three layers can't handle that complexity cleanly. So a fourth emerged.

The Four-Layer Stack

Layer 1: Models

Claude, GPT-5.5, Gemini 3.5, Deepseek. The LLM itself. You don't control this; you consume it through APIs.

Layer 2: Harnesses

The code that wraps the model and defines how it reasons and acts. OpenCode (Anthropic's sandbox-first harness), Claude Code (terminal-first), Cursor's inline model, Codex, custom harnesses you write yourself.

The harness decides: Can the agent use computer use? Can it write files to disk? Can it run arbitrary shell commands? Does it have persistence? What tools are available?

Layer 3: Runtimes

Infrastructure that runs the harness. Claude Managed Agents (Anthropic-hosted), AWS Bedrock AgentCore (AWS-hosted), custom Kubernetes pods, local docker containers.

The runtime handles: Sandboxing, scaling, billing, compliance boundaries, model routing.

Layer 4: Control Plane

This is the layer that surprised everyone.

The control plane sits above all runtimes and solves a problem that runtimes alone can't: how do you manage agents across multiple runtimes as a coherent system?

It handles:

Multi-runtime discovery: One API to call agents, regardless of which runtime they live on
Session persistence: Agent state survives runtime restarts, pod deployments, and hardware failures
Governance and audit: Per-agent identity, access controls, policy enforcement, tamper-evident logging
Cost and budget tracking: Spend attribution per agent, per team, with enforcement
Observability: What did the agent do? Why did it make that decision? Where did it spend money?

Why Your Framework Can't Be Your Control Plane

LangChain, CrewAI, LangGraph—these frameworks are excellent at handling agent logic (layer 2). They handle the reasoning loop, tool calling, function calling, memory management for a single agent within a single framework.

But they don't solve the control-plane problem because they can't see outside their own boundaries.

If you have a Claude Managed Agent and a Cursor agent and you want them to:

Share session memory
Enforce unified access controls
Aggregate their costs into one team budget
Audit what both of them did

...your framework can't do that. Each framework knows about its own agents. Neither knows about agents running on other runtimes.

So teams end up either:

Siloing agents by runtime — Each team gets access to one platform (Claude Managed Agents OR Cursor OR Bedrock), and they live in separate consoles with separate APIs. This kills code reuse and splits visibility.
Building a control plane by hand — Gluing together auth systems, session stores, cost tracking, governance policies across multiple platforms. This is where most production teams currently spend engineering time that doesn't ship features.
Waiting for a control plane — Hoping the framework will eventually handle multi-runtime orchestration. (It won't.)

The Missing Infrastructure Layer

The control plane isn't a framework problem or a runtime problem. It's an infrastructure problem—and it's expensive to build correctly.

It requires:

A durable session store that survives runtime and cloud provider boundaries
Multi-tenant isolation (different teams, different access levels, different cost pools)
A credential vault that never exposes provider consoles to developers
Per-agent identity and policy enforcement
A cost attribution system that works across runtimes
Structured observability that shows the full decision path, tool calls, and outcomes

Most frameworks don't ship this because it's orthogonal to reasoning logic. Most managed platforms don't ship this because they only manage one runtime—they have no incentive to unify multiple.

So the control plane became a separate layer.

What This Means for Your Infrastructure

If you're operating agents on multiple runtimes, you need:

A control plane: One place to register agents, manage sessions, enforce governance, and track spend—regardless of which runtime they run on. This is why teams are building or adopting dedicated agent control planes.

A fast data plane: When agents make tool calls and model invocations, they need low-overhead routing, fallbacks, and cost tracking. At scale, Python-based gateways start to show limits (memory, concurrency, latency). This is why infrastructure teams are investing in fast data planes alongside their control planes.

Separation of concerns: Your orchestration layer (framework) handles logic. Your control plane handles governance and state. Your data plane handles speed. Each has a different role; mixing them is where systems get fragile.

Evaluating Control Planes

If you're looking at agent control platforms, ask:

Does it abstract multiple runtimes? Can I call agents on Claude Managed Agents, Cursor, Bedrock, and custom runtimes through one API?
Does it persist sessions durably? Can an agent session survive a pod restart, a cloud region failover, or a model swap?
Does it enforce governance without redeployment? Can I change agent permissions, budgets, or tool access without restarting anything?
Can I audit what agents did? Full decision path, tool invocations, spend attribution, and who approved what?
Does it integrate with a fast data plane? If I need sub-millisecond overhead, does the control plane work with optimized gateway infrastructure?

The first three are table-stakes. The last one is what separates systems that can scale from systems that eventually hit a wall.

The Architecture That Works

The pattern that's emerging in production:

Developer → Control Plane (sessions, governance, audit) → Runtime Abstraction
                                    ↓
                           Fast Data Plane
                                    ↓
                    [Claude Runtime] [Bedrock Runtime] [Custom Runtime]

The control plane is usually Python or Go (you need flexibility, not raw speed). It handles state mutations, policy enforcement, multi-tenant isolation—all operations where 10ms latency is invisible.

The data plane is usually Rust or Go (you need speed and memory efficiency). It handles the hot path: model routing, fallbacks, rate limiting, cost attribution—all operations where sub-1ms latency compounds.

Both layers talk to the same config and the same data sources. No duplication, no state divergence.

This is how teams running agents on multiple runtimes, multiple regions, with multiple teams, and under regulatory constraints actually operate them.

Where LiteLLM Fits

LiteLLM has historically been known as a gateway (fast routing across 100+ LLM providers). But the company's recent moves show they're building both layers:

LiteLLM Agent Platform: A Rust-based control plane for multi-runtime agent orchestration, session management, and governance
LiteLLM-Rust: A fast data plane for agent workloads (sub-1ms overhead, 15x throughput improvement over Python)
LiteLLM core: Gateway intelligence (routing, fallbacks, cost tracking) that both layers depend on

The design is explicit: control plane (Agent Platform) for governance, data plane (LiteLLM-Rust) for speed, both backed by the same 100+ provider support.

If you're running agents on multiple runtimes and need both governance and speed, this separation is worth understanding. It's not about picking one tool; it's about making sure your infrastructure doesn't pretend to be a control plane when it's actually just a gateway, or vice versa.

The Real Cost of Skipping a Control Plane

Most production agent failures I see in the wild aren't about model capability or harness design. They're about missing control plane infrastructure:

Sessions don't persist; agents re-discover context after restarts
Cost governance isn't enforced; a tool-heavy agent burns $5K unexpectedly
Access controls are sprawling; developers have direct console access they shouldn't
Observability is fragmented; you have to check three different dashboards to understand what happened
Auditing is impossible; compliance reviews fail because there's no tamper-evident trail

These are solvable problems. But they require infrastructure above the level of frameworks and runtimes.

That's the four-layer stack. If you're building agents for a team, not just yourself, you're already paying the cost of this problem. The question is whether you're doing it systematically or ad hoc.

Looking to understand your agent infrastructure needs? The evaluation questions above are a good starting point. If you're running agents on multiple runtimes or managing agents for teams, the control plane gap is probably something you've already hit.

DEV Community

The Four-Layer Agent Stack: When Your Framework Isn't Enough

The Four-Layer Agent Stack: When Your Framework Isn't Enough

Why Three Layers Stopped Being Enough

The Four-Layer Stack

Why Your Framework Can't Be Your Control Plane

The Missing Infrastructure Layer

What This Means for Your Infrastructure

Evaluating Control Planes

The Architecture That Works

Where LiteLLM Fits

The Real Cost of Skipping a Control Plane

Top comments (0)