Paul Twist

Posted on Jun 19

The Multi-Runtime Agent Problem: Why Your Team Needs More Than One Runtime

#agents #infrastructure #llm #litellm

You're a platform lead at a 150-person company. Your ML team is building a data agent on Anthropic's Claude Managed Agents. Your DevOps team wrote their own scheduling runtime. Your security team wants a custom sandboxed environment. Your frontend team adopted Cursor's agent API for internal coding tasks.

Now your CEO asks: "Can we surface all of these agents to the company in one place?"

Welcome to the multi-runtime agent problem. And it's becoming the unglamorous infrastructure challenge nobody talks about.

The Runtime Fragmentation Reality

In 2026, teams don't run all agents on one platform. They can't. Different agent types need different runtimes:

Claude Managed Agents — Anthropic's first-party platform for natural-language task agents
Bedrock AgentCore — AWS's environment for agents that access AWS services
Cursor Agents API — agents that live inside code editors
Custom/Self-hosted runtimes — Daytona, E2B, or internal frameworks
Workflow platforms — N8N, Elastic agents, Databricks agents

Each runtime has a different API. Different session models. Different cost models. Different access control approaches. Different ways of invoking agents and waiting for results.

The problem emerges when you want to:

Centralize agent discovery — developers and non-technical users need to find and run agents, regardless of where they were built
Enforce unified governance — cost controls, access policies, audit logging that work across runtimes
Manage sessions consistently — persistent conversations that survive reboots, with memory that travels with the agent
Track outcomes in one place — observability, cost attribution, success metrics across all agents
Give teams access without console sprawl — developers shouldn't need Anthropic console access, AWS console access, and custom runtime dashboards

If you solve this with point-to-point integrations (Claude API → your UI, Bedrock API → your UI, etc.), you're building a fragile, expensive custom platform. If you ask engineers to pick a single runtime and standardize on it, you lose optionality.

Most teams pick neither solution. They leave agents in silos.

Why This Matters Now

The conversation on Reddit in May and June 2026 shifted from "should we build agents?" to "which agents should we run where?" The practical builders are asking:

Where do we get the biggest ROI per token spend?
Which runtime handles our specific task best?
How do we avoid rebuilding the same agent on three platforms?

This is economic rationality. Different agents have different homes.

But the infrastructure question is still open: how do you operate them all together?

The Control Plane Pattern

AI Gateway infrastructure is moving up the stack. Model runtimes are becoming managed, harnesses become specialized, and gateways become the control plane for agent work. LiteLLM is experimenting with this direction through LiteLLM Agent Platform—a unified agent control plane that lets teams register, invoke, observe, and govern agents across multiple runtimes.

Here's what this looks like in practice:

Before a control plane:

Engineering: Claude Managed Agents for task automation
Security: Custom Daytona sandbox for sensitive data
DevOps: N8N workflows for infrastructure changes
Result: Three separate dashboards, three cost centers, three auth systems, zero visibility into total agent spend

After a control plane:

One dashboard where all four agents are registered
One API for invoking any agent, regardless of runtime
One cost attribution system (agent cost, runtime cost, model cost, user cost)
One access control layer (which team can run which agent, with what budgets)
One session store (agent sessions persist across invocations)

The control plane doesn't replace the runtimes. It sits above them. It separates concerns: runtimes stay responsible for model routing, cost tracking, and rate limiting, while the control plane handles sandbox lifecycle, session persistence, and the management dashboard.

What a Multi-Runtime Agent Platform Needs

If you're evaluating whether a control plane fits your team, look for these capabilities:

Runtime abstraction — Can it talk to Claude Managed Agents, Bedrock AgentCore, self-hosted runtimes, and custom APIs? Or is it locked to one?

Persistent sessions — If an agent is stateful (remembers context, tools, artifacts), does the platform persist that session across reboots, or does every invocation start from scratch?

Unified access control — Can you grant "engineers can invoke the data-analysis agent, but not the financial-reporting agent" across all runtimes, using one policy language?

Cost governance — Does it track agent spend, enforce per-team budgets, and attribute costs correctly across runtimes and models?

Observability — Can you see which agent was invoked, by whom, when, with what result, and what it cost—regardless of runtime?

Easy onboarding — Do developers need to learn a new API for each runtime, or is there one interface?

LiteLLM Agent Platform provides: one place to call all your agents across OpenCode, Hermes, Claude Managed Agents, Cursor Agents API, and DeepAgents. It has a unified API across runtimes, one API to create and run agents, regardless of the runtime underneath, access controls so developers create and run agents without needing Bedrock or Anthropic console access, and persistent agent sessions across runs.

The Data Plane Still Matters

Here's where the other half of the infrastructure picture comes in. The control plane handles orchestration, governance, and multi-runtime abstraction. But agents make many LLM calls. Every millisecond of gateway latency compounds.

For coding agents like Claude Code that fan out many LLM calls per task, every millisecond of gateway overhead compounds across tool calls. This is why sub-millisecond gateway overhead on the hot path matters.

So production agent infrastructure needs both layers:

Data plane (fast gateway) — responsible for sub-millisecond routing, provider translation, and minimal overhead
Control plane (agent platform) — responsible for sessions, governance, discovery, cost attribution, and multi-runtime orchestration

LiteLLM-Rust is a minimal, MIT-licensed Rust AI Gateway built for coding agents. It's drop-in compatible with existing LiteLLM config.yaml and database, targets sub-millisecond overhead on Claude Code calls, and includes sandboxing (E2B + Daytona) with durable sessions, memory, artifacts, and vault on the roadmap.

Teams using both together get: single control plane for agent management + fast data plane for LLM routing.

The Practical Question

Do you need a multi-runtime agent control plane today?

Yes, if:

You have agents on more than one runtime or framework
You want one place to discover, run, and govern all agents
You're managing agent access for non-engineers
You need to enforce cost controls across agent teams
Your agents are stateful and need to survive reboots

Maybe later, if:

You have a single-runtime standardization (all Claude Managed Agents, all N8N, etc.)
You're still prototyping and the fragmentation doesn't hurt yet
You haven't hit the "we need to see total agent spend" moment

Not yet, if:

You have one agent, built in one place, run by one team

But watch the conversation. In 2026, more teams are hitting the multi-runtime case. And the infrastructure that solves it isn't designed yet—it's being built now.

Interested in trying the control plane pattern? LiteLLM Agent Platform is currently in alpha public preview. You can get started locally with Docker Desktop and docker compose — no cloud credentials needed to get started. For production, you can use EKS for sandboxes and Render for the web/worker; the LiteLLM Gateway stays as the model/router layer.

Feedback and contributions are welcome: https://github.com/BerriAI/litellm-agent-platform