Why the Inline Harness Matters: Your Agent Control Plane Just Got Lighter
Production teams are running agent control planes now. But many hit a wall: they looked at per-pod sandbox requirements and decided the infrastructure overhead wasn't worth it.
Last month, LiteLLM Agent Platform shipped the inline harness, and it changes that math entirely. This isn't a small feature—it's the difference between "we'll try this later" and "we can run this in production today."
The Pod Problem
Let me back up. When you run a coding agent (Claude Code, OpenCode, Cursor), you need:
- A control plane — one place to create agents, manage sessions, view history, enforce budgets, handle access control
- A runtime — Claude, OpenCode, or Cursor executing the agent logic
- Isolation — the agent needs its own sandbox, its own environment variables, its own file system
The standard approach: one pod per agent or per-team. Clean isolation. Obvious deployment model. But for teams with 5–20 agents across an engineering org, that's 5–20 pods. Each pod carries:
- Startup latency (30–60s first run)
- Memory overhead (200–500MB baseline)
- Infrastructure management complexity (horizontal scaling, crash recovery, resource requests)
- Cost multiplication across redundancy and regions
Many teams looked at this and said: "Great control plane, but we're not running 20 pods for agents. We'll stick with direct Anthropic console access."
What the Inline Harness Changes
The inline harness is a shared, inline opencode harness that ships as a first-class option in the harness picker—no per-agent pod required. Skills, MCP tools, system prompts, and memory all carry over.
This means:
- One OpenCode runtime handles multiple agents, shared across your team
- Session-scoped memory still works — search_memory and save_memory are available in inline sessions, with secret scrubbing on save
- Built-in MCP integration — Linear, Slack, and GitHub MCP servers are wired into the inline harness out of the box
- No infrastructure — it runs wherever your LiteLLM Agent Platform control plane runs
You get the control plane benefits (session persistence, budget enforcement, audit trails, team access, credential vault) without the per-pod cost.
What This Enables
The inline harness is the inflection point where production teams move from "we'll manage agents in the console" to "we'll run them on our control plane."
For teams with 5–15 agents:
You can now run all of them on a single OpenCode harness, shared across the team. Infrastructure cost drops from "5 pods × baseline overhead × regions" to "one shared harness." Agents still have session isolation, memory, scheduled execution, and full LiteLLM governance.
For teams starting with agents:
You're not forced to choose between "lightweight (no control plane) or heavy (per-pod infrastructure)." You start with the inline harness, get instant control plane benefits, and upgrade to per-pod if you hit the scaling ceiling.
For teams evaluating LiteLLM Agent Platform:
The objection "per-pod is too heavy for our org" is now off the table. You can deploy a lightweight, shared harness inside 24 hours and gain visibility into all your agents immediately.
The Operational Impact
Let's be concrete. Suppose you're an engineering team with three coding agents:
- Agent 1: PR reviewer (runs on schedule, touches GitHub API)
- Agent 2: Code quality checker (runs ad-hoc, touches linting APIs)
- Agent 3: Dependency updater (runs on schedule, touches package manager APIs)
Without inline harness (old model):
- 3 pods (or 1 pod with 3 harnesses)
- 3 env var sets (GitHub token, linting keys, package keys)
- Agent A can't share GitHub token safely with Agent B
- If the pod crashes, all agents restart
- Scaling is "add more pods"
With inline harness:
- 1 shared OpenCode runtime (part of your LiteLLM Agent Platform control plane)
- Environment variables on agent detail—configured env vars are shown as key/value pairs on the agent detail page
- Each agent has scoped credentials (the vault proxy handles token management per-agent)
- One agent's memory is isolated from another's
- Scaling is "increase control plane capacity," which is usually just database and API server
- Infrastructure cost drops by ~70%
What This Doesn't Change
The inline harness is not a replacement for per-pod isolation when you need it. If you have:
- Heavy sandboxing requirements (untrusted code)
- Complex resource isolation (one agent shouldn't affect another's latency)
- Different version constraints across agents
You still have the per-pod option. The inline harness is the pragmatic default for teams with standard isolation needs.
The Broader Pattern
This is how production infrastructure matures:
- First version — solve the hard problem (control plane, multi-runtime support)
- Second version — remove the blocker preventing adoption (per-pod overhead)
- Third version — teams confidently run it in production
LiteLLM Agent Platform is at step two. The inline harness removes the infrastructure objection, leaving only "is this the right control plane for my team?"
If you're evaluating agent control planes, test the inline harness first. Spend two days deploying it on your infrastructure. See what it means to have one place to create, run, and observe agents. Then decide if the per-pod model matters for your workload.
Because for most teams, it won't.
Next Steps
- Loadable skills in opencode—skills attached to an agent now load in both pod and inline opencode sessions, so you're not redesigning your agent logic to fit the harness
- Test it: https://docs.litellm-agent-platform.ai/quickstart
- Join the community: https://discord.gg/Nkxw3rm3EE
The inline harness is a small feature that unlocks a big change: production teams can now run agent control planes at scale without infrastructure complexity.
Paul Twist — European AI engineer & technical writer. I turn messy AI infrastructure into practical guides developers can actually use. Berlin-based, focused on production agent systems and open infrastructure.
Tag your agent control plane evaluation in the comments — what's holding you back from running agents on your infrastructure?
Top comments (0)