DEV Community

Paul Twist
Paul Twist

Posted on

The Missing Operational Layer Between Agent Prototypes and Production

You've built a working AI agent. It runs locally. It handles tool calls. It reasons through problems. You demo it to your team and everyone's impressed.

Then someone asks: "What happens when we deploy this to production?"

That's when reality hits: agents in production aren't like stateless web services. They're stateful systems that carry conversation history, tool results, intermediate reasoning, and context across multiple turns. If your container restarts—routine deployment, VM replacement, node upgrade—that entire session is gone. Your agent loses its memory. Work in progress disappears.

Add to that the operational complexity of teams: you can't let the marketing team's agent access the engineering team's secrets. You need isolation, access control, observability, and session persistence across infrastructure changes. But you don't want to rebuild all of that yourself.

This gap—between "working local agent" and "production-grade agent platform"—is exactly what most agent frameworks leave for you to solve.

What Production Agents Actually Need

If you're running agents at scale, you're managing:

  1. Session persistence: Agent state survives pod restarts, deployments, node replacements
  2. Per-team/per-context isolation: One agent can't access another's tools, secrets, or execution context
  3. Credential management: Secrets get injected safely into sandboxes without ever exposing real keys
  4. Lifecycle orchestration: Creating agent contexts, managing their execution, cleaning up when done
  5. Observability: Logs, traces, execution history—all queryable for debugging and compliance
  6. Runtime abstraction: Support multiple agent types (Claude Code, Codex, custom agents) on the same platform

Most agent frameworks handle logic. Few handle infrastructure.

The Self-Hosted Control Plane Approach

BerriAI recently open-sourced the LiteLLM Agent Platform—a self-hosted infrastructure layer for running multiple AI agents in production, built on top of the LiteLLM AI Gateway.

The design separates concerns cleanly:

  • LiteLLM Gateway handles model routing, cost tracking, rate limiting, and provider integrations across 100+ LLM APIs
  • LiteLLM Agent Platform handles sandbox lifecycle, session persistence, and the control plane dashboard

The platform uses the kubernetes-sigs/agent-sandbox CRD to teach your Kubernetes cluster how to manage agent sandboxes as first-class resources, the same way it manages pods or deployments.

This matters because it means:

  • Each agent session gets an isolated Kubernetes sandbox
  • Session history lives in a persistent Postgres store
  • Pod restarts don't lose agent context
  • Teams can't cross-contaminate secrets or data
  • The platform is a layer on top of your existing infrastructure, not a replacement

Why This Pattern Works for Teams

The platform addresses production concerns by providing two primitives: per-team/per-context sandboxes and persistent session management across pod restarts and upgrades.

Example flow:

  1. Engineering team creates a Claude Code agent to write a feature
  2. Agent runs in an isolated sandbox with only the secrets it needs (via credential vault)
  3. Agent executes tool calls—runs tests, pushes branches, reads logs
  4. Kubernetes node dies mid-execution
  5. Pod restarts, agent resumes from the exact same session state
  6. Marketing team's agent in the next sandbox over has zero access to engineering's context

No credential explosion. No session loss. No cross-contamination.

The quickstart is two commands: bin/kind-up.sh provisions a kind cluster, docker compose up boots Postgres, and starts the web process on port 3000. For production, the recommended path is AWS EKS for the sandbox cluster.

Where This Fits Your Stack

You don't need LiteLLM Agent Platform if:

  • You're running a single agent in a script
  • You're prototyping and don't care about state persistence yet
  • You're using a managed agent platform (AWS Bedrock Agents, Claude Agents with Anthropic hosting)

You do need it (or something like it) if:

  • You're scaling agents across teams and need strict isolation
  • You need session history to survive infrastructure changes
  • You're running agents on your own Kubernetes infrastructure
  • You care about observability and audit trails
  • You need support for multiple agent runtimes (Claude Code, Codex, etc.)

The platform is open source under MIT license and currently in alpha preview. The architecture is designed to be straightforward to operate—Postgres for state, Kubernetes for orchestration, a Next.js dashboard for management.

The Broader Point

The conversation around AI agents has been focused on capability: which agent can solve harder problems, which framework is easier to build with, which model is smarter. That's important, but it misses something: the infrastructure layer between prototype and production is where most real cost and complexity lives.

Agent frameworks solve the logic problem. Hosted platforms solve the operations problem at the cost of control and data residency. Self-hosted infrastructure platforms let you solve both without vendor lock-in.

That's the operational layer production teams actually need.


Want to dig deeper? Check out the LiteLLM Agent Platform repo and the official documentation. The quickstart is genuinely two commands if you have Docker and Kubernetes locally.

If you're running agents in production today, what's your biggest operational pain point? Session persistence, isolation, observability, or something else? I'm curious how teams are solving this.

Top comments (0)