Sandro Munda for RootCX

Posted on May 5 • Originally published at rootcx.com

How to Deploy AI Agents to Production (Not Just a Demo)

#ai #agents #llm #security

In 2025, a researcher embedded a prompt injection in a code file. When an AI agent opened it, the agent read .env credentials and sent them over the network using commands that were on the agent's allowlist. No confirmation prompt fired. No safety check triggered. The credentials were gone. CVE-2025-55284.

That agent was running locally.

Imagine it had access to your production database.

This is the gap between "my agent works" and "my agent is safe to deploy." Every framework helps you build agents. None of them solve what happens when agents touch real data, real users, and real consequences.

This guide is about the second part. What production actually requires, which frameworks handle what, and how to ship agents that will not embarrass you at your next security review. Or, if you just want agents running safely today: skip to the fast path.

The demo-to-production gap

Here is the gap in one table:

Your laptop
Production

Auth
Your API key, hardcoded
Per-user tokens, scoped, rotated

Permissions
Agent can do anything
Least-privilege, per-tool, per-resource

Audit
print(result)
Immutable log: who asked, what ran, what happened

Errors
Restart the script
Retry, fallback, alert, degrade

Cost
$0.50 per demo
$50k/month without guardrails

Security
Trust the model
Zero-trust, sandboxed, validated

Users
You
500 people, concurrently

If any row in the "Production" column is not handled, you do not have a production agent. You have a demo with a public URL.

Every framework, one honest table

You have probably looked at some of these. Here is what they actually give you for production, and what they leave to you:

Framework
What it handles well
What you still build

CrewAI
Multi-agent orchestration, role-based teams, human-in-the-loop
Auth, RBAC, audit trail, cost control, persistence (defaults to SQLite)

LangGraph
Stateful graphs, checkpointing, observability (via LangSmith)
Multi-tenant auth, security boundaries between agents, cost control. Production features require LangSmith (proprietary)

OpenAI Agents SDK
Clean agent-to-agent handoffs, guardrails, minimal abstraction
Multi-tenant isolation, audit trail, cost control. Locked to OpenAI models

Claude Agent SDK
Tool allowlists, lifecycle hooks, in-process MCP tools
Multi-agent coordination, checkpointing, managed deployment, cost management

Vercel AI SDK
Streaming, model-agnostic tool use, deploys anywhere
Stateless by default. No persistence, no agent registry, no human-in-the-loop

Mastra
TypeScript-native, multi-runtime (Node, Bun, Deno, Workers)
Auth, audit trail, multi-tenant. Newer framework, smaller ecosystem

Hermes (Nous Research)
Self-hosted function-calling models (8B-70B+), no API costs
Everything else. Hermes is a model layer, not a framework. You build the entire agent stack

Letta (MemGPT)
Persistent memory (core + archival + recall), agents that learn
Horizontal scaling, RBAC, multi-tenant. Auth is server-level password only

Notice the pattern? The "What you still build" column is almost identical across all 8. Auth. Permissions. Audit. Multi-tenant. Cost control.

That is not a framework problem. That is an infrastructure problem. And frameworks do not solve infrastructure.

5 problems that will bite you in production

I could list 10. But you will remember 5. These are the ones that cause real incidents, real cost overruns, and real failed audits.

1. Your agent has no identity

Who is this agent acting for? Most agents authenticate with a static API key, hardcoded or pulled from an env var. That key cannot be scoped per-user, cannot be revoked per-session, and if a prompt injection leaks it, your entire system is compromised.

The correct pattern: agents authenticate like users. Each agent has its own identity. When it acts on behalf of a human, it exchanges that human's token for a downscoped credential (OAuth token exchange, RFC 8693). Every action is tied to both the agent and the user who triggered it.

In practice, almost nobody does this. It is too complex to implement from scratch on top of a framework.

In RootCX: agents authenticate through the same OIDC layer as humans (Okta, Microsoft Entra, Google Workspace, Auth0). Each agent gets its own identity. Actions are tied to both agent and user. One auth system, humans and agents.

2. Your agent can do too much

OWASP calls this LLM06: Excessive Agency. Three root causes: too many tools available, too-broad permissions on each tool, and no human confirmation before high-impact actions.

Your agent has access to the database. Can it read all tables? Can it write? Can it DELETE? Can it see the salary table? The HR records? The financial data?

"But my prompt says not to" is not a security control. The CVE-2025-53773 exploit against GitHub Copilot proved this: a command injection via prompt injection enabled arbitrary code execution on the developer's machine. The model did exactly what it was told. By the attacker.

The fix: tool allowlists enforced at the infrastructure level, not the model level. Not "please don't access HR data" in a system prompt, but a permission engine that rejects the query before it reaches the database.

In RootCX: RBAC applies to agents and humans identically. Namespaced permissions (orders.read, orders.update, salary.deny). The platform enforces them on every action. An agent assigned the "support" role cannot see data that the "support" role does not allow. See how RBAC works.

3. Nobody knows what your agent did

Your agent updated a customer record. Refunded an order. Sent a follow-up email. Three days later, the customer complains they never authorized the refund.

What happened? When? Which agent? Triggered by whom? What parameters? What was the result?

If you cannot answer in 30 seconds, you do not have a production system. SOC 2 requires demonstrating that automated systems have access controls and monitoring. HIPAA requires audit controls for any system touching patient data.

LangSmith gives you traces. That is the closest any framework gets. But traces are developer tooling, not compliance evidence. You need an immutable audit trail at the data layer, not the application layer (where the agent could theoretically bypass it).

In RootCX: every action (human or agent) is logged at the database trigger level. Immutable. Queryable by agent, user, resource, time. Not application-level logging. Built into the platform.

4. A fired employee's agent is still running

Someone leaves the company. IT disables their Okta account. Their agent? Still running. Still authenticated. Still accessing data. For hours. Maybe days. Until something crashes or someone notices.

This is the same offboarding problem from SSO, but worse. Because the agent runs in the background. Nobody is looking at it. It does not "log out" when the person leaves the building.

You need: server-side sessions with short TTLs, token refresh that checks the IdP on every renewal, and automatic session kill when refresh fails.

In RootCX: disable the user in your IdP, every agent running under their authority loses access within minutes. Sessions killed on failed token refresh. No orphaned agents running with revoked credentials.

5. One bad prompt burns $10,000

An agent without cost guardrails is a credit card with no limit, operated by a non-deterministic system.

The math: at 95% per-step accuracy, a 10-step agent succeeds 60% of the time. A 20-step agent: 36%. A 50-step agent: 8%. Failed steps still cost tokens. Retries cost more tokens. An agent stuck in a loop at 3am will keep burning money until you wake up.

You need: token budgets per session, hard iteration limits (stop after N tool calls regardless of state), spending caps per agent per day, and circuit breakers (if error rate exceeds threshold, pause everything and alert).

No framework provides this. But your CFO will ask about it.

Real incidents that prove this is not theoretical

These all happened in 2025:

Incident
What went wrong

CVE-2025-55284 (Claude Code)
Prompt injection in a code file triggered allowlisted commands to read credentials and send them over the network

CVE-2025-53773 (GitHub Copilot)
Command injection via prompt injection enabled arbitrary local code execution

WhatsApp MCP exfiltration (Invariant Labs)
Malicious MCP server tricked an agent into leaking private messages to an attacker-controlled endpoint

Cross-agent escalation
Agent A wrote malicious config to Agent B's directory, freeing Agent B from its sandbox

Manus AI kill chain
Prompt injection in a PDF triggered port exposure + credential exfiltration on the agent's VS Code Server

The common thread: every exploit relied on the model deciding what to do, with security enforced at the model layer. The model followed instructions. From the attacker.

Security at the infrastructure layer (policy engines, permission systems, network isolation) would have blocked every one of these.

Agents on RootCX: what it looks like when infrastructure handles it

On RootCX, an AI agent is a first-class app. It deploys on the same platform as your internal tools and inherits everything.

Auth. OIDC with Okta, Microsoft Entra, Google Workspace, Auth0. Each agent has its own identity.
RBAC. Same permission model as humans. Namespaced, wildcards, inheritance. Define once, enforced everywhere.
Audit. Every action logged at the database trigger level. Immutable. Agent + user + resource + time.
Shared database. One PostgreSQL per project. Agents and apps read/write the same data. RBAC enforces who sees what.
Session revocation. Disable user in IdP, agent access dies in minutes.
Channels. Agents serve users via Slack and Telegram. Your team pilots operations from where they work.
MCP. Extend agent capabilities by plugging in any MCP server. No hardcoded integrations.

The agent does not need its own auth system, its own database, its own permission model, or its own logging. Those are structural. They exist before you write your first line of agent code.

Build with Claude Code, Cursor, or RootCX Studio. Deploy to your project. The infrastructure is already there.

SSO included on every plan, including free. No credit card required.

Start your project on RootCX and deploy your first agent today.

How to choose

Situation
Path

Research prototype, just you
Any framework + your laptop

Production, you want control over infra
LangGraph + LangSmith (accept vendor lock-in)

Production, need auth + RBAC + audit now
RootCX (free tier, no credit card)

Custom orchestration, build the infra layer yourself
Claude Agent SDK + your own auth/audit stack

Full sovereignty, self-hosted models
Hermes + build everything from scratch

Pre-launch checklist

Before your agent goes live, verify:

Agent has its own identity (not a shared API key)
Permissions follow least-privilege (only the tools and data it needs)
Every action is logged: who triggered, what agent, what parameters, what result
Disabling the user in your IdP revokes agent access within minutes
Token budget or iteration limit prevents runaway costs
High-impact actions require human confirmation
Agent cannot modify its own configuration or permissions
Data access enforced at infrastructure level (not prompt level)
Kill switch exists (stop all agent activity immediately)
Tested against adversarial inputs, not just happy-path

On RootCX, items 1-4 and 7-8 are structural. The rest depend on your agent design.

FAQ

What is the difference between an AI agent and a chatbot?

A chatbot answers questions. An agent acts. It reads data, calls tools, updates records, triggers workflows, follows up. A chatbot tells you the order status. An agent cancels the order, refunds the customer, and updates the CRM.

Which AI agent framework is best for production?

None are complete on their own. LangGraph + LangSmith is the most production-invested, but comes with vendor lock-in and still leaves auth/RBAC to you. For internal tools and enterprise agents, a platform approach removes the most infrastructure work. RootCX is purpose-built for this.

How do I secure an AI agent in production?

Security at the infrastructure layer, not the model layer. Tool allowlists (not blacklists), per-agent RBAC, short-lived tokens, immutable audit logs, network isolation. OWASP LLM06 (Excessive Agency) is the reference.

Can AI agents pass a SOC 2 audit?

Yes, if the infrastructure supports it. You need: identity-based access (not shared keys), immutable audit logs, least-privilege enforcement evidence, and immediate revocation capability. Infrastructure requirements, not framework features.

How do I prevent AI agents from running up costs?

Token budgets per session. Spending caps per agent per day. Iteration limits (max N tool calls per task). Circuit breakers (pause on error spikes). Cost attribution (tag every call to a user + agent for anomaly detection).

What is MCP and why does it matter for agents?

MCP (Model Context Protocol) standardizes how agents connect to external tools and data. Instead of hardcoding integrations, agents connect to MCP servers that expose capabilities. RootCX supports MCP natively, so agents reach any external system through one interface.

Build your agent with whatever framework you want. Deploy it on infrastructure that handles the hard parts. Ship it today.

Related reading:

DEV Community