You've wired up the LLM. Now it needs to actually do something.
You built a basic Claude script. It answers questions. It summarizes text. It's clever.
Then you try to make it do real work: read a file, execute some code, check the result, handle an error, loop back. Suddenly you're writing an agent loop from scratch. You're managing state across turns, sandboxing execution environments, isolating sessions per user, and debugging tool calls that silently fail.
That's the state problem. And it kills most agentic projects before they ship.
Claude Managed Agents is Anthropic's answer. Instead of building the harness yourself, you define what your agent should do and let the infrastructure handle the rest.
This guide is for builders who want to understand how it actually works, where it costs you, and when it's worth it.
What Managed Agents Actually Are
Forget the marketing framing for a second.
At its core, a Managed Agent is three things bound together:
- A model with a system prompt and a defined toolset
- An environment: a cloud container where the agent executes code, reads and writes files, and browses the web
- A session: a stateful run that tracks every event, every tool call, and every output
Without Managed Agents, you're responsible for all three. You manage the prompt loop, spin up your own execution sandbox, and figure out how to persist state between turns. It's doable. Teams do it. But it takes weeks to get right and longer to get safe.
Managed Agents collapses that into a single API. You define the what. Anthropic handles the where and how.
The Three Building Blocks
1. The Agent
The agent is the reusable definition. Think of it as a job description: the model to use, the system prompt, and the tools available.
import anthropic
client = anthropic.Anthropic()
agent = client.beta.agents.create(
name="Data Analysis Assistant",
model={"type": "model", "id": "claude-opus-4-7"},
system="You are a data analyst. Write clean Python, execute it, verify the output.",
tools=[{"type": "agent_toolset_20260401"}],
)
print(f"Agent ID: {agent.id}")
Save that agent.id. You'll reference it for every session you create. The agent definition is versioned, so you can pin sessions to specific versions while you iterate.
The agent_toolset_20260401 tool type gives Claude the full built-in suite: bash, file operations, web search, web browsing, code execution, and MCP server connections. You can also configure tools individually if you want tighter control.
2. The Environment
The environment is the container where the agent does its work. Every session gets its own isolated instance.
environment = client.beta.environments.create(
name="python-data-env",
config={
"type": "cloud",
"networking": {"type": "unrestricted"},
"packages": {
"pip": ["pandas", "matplotlib", "numpy"]
}
}
)
print(f"Environment ID: {environment.id}")
A few things worth knowing here:
- Multiple sessions can reference the same environment definition, but each gets its own isolated container
- Pre-installed packages are cached across sessions that share the same environment, which cuts cold-start time
- Networking can be unrestricted or sandboxed depending on what your agent needs
The environment is where "AI that talks" becomes "AI that does."
3. The Session
The session is the actual running instance. It wires the agent to the environment and tracks everything that happens.
session = client.beta.sessions.create(
agent=agent.id,
environment_id=environment.id,
title="Q2 Sales Analysis"
)
print(f"Session ID: {session.id}")
Creating a session provisions the container but doesn't start any work. Work starts when you send an event.
Sending Work and Streaming Results
Sessions use server-sent events (SSE). You stream responses as they happen instead of waiting for a final answer.
with client.beta.sessions.events.stream(session.id) as stream:
# Send the task after the stream opens
client.beta.sessions.events.send(
session.id,
events=[{
"type": "user.message",
"content": [{
"type": "text",
"text": "Load sales_q2.csv, calculate total revenue by region, and save a summary to report.txt"
}]
}]
)
for event in stream:
match event.type:
case "agent.message":
for block in event.content:
print(block.text, end="")
case "agent.tool_use":
print(f"\n[Tool: {event.name}]")
case "session.status_idle":
print("\n\nDone.")
break
The agent will write the code, execute it, check the output file, and confirm success. You see every tool call in the stream. No black box.
The State Problem (and How Memory Stores Solve It)
By default, each session starts fresh. When it ends, everything the agent learned is gone.
That's fine for one-shot tasks. It's a problem for anything that needs to remember context across runs.
Memory stores solve this. They're persistent collections of text documents that mount directly into a session's container filesystem. The agent reads and writes them with the same file tools it uses for everything else.
# Create the store
store = client.beta.memory_stores.create(name="user-project-context")
# Attach it to a session
session = client.beta.sessions.create(
agent=agent.id,
environment_id=environment.id,
resources=[{
"type": "memory_store",
"memory_store_id": store.id,
"access": "read_write",
"instructions": "Project conventions and prior decisions. Check before starting any task."
}]
)
A few things that are easy to miss:
- Memory stores can be shared across sessions in read-only mode for reference material (team conventions, domain knowledge)
- The maximum is 8 stores per session
-
Be careful with read-write access on stores that process untrusted input. If the agent handles user-supplied prompts or fetched web content, a prompt injection could write malicious content into the store. Later sessions then read it as trusted memory. Use
read_onlyfor anything the agent doesn't need to modify.
The Catch: memory stores are still in beta, so capacity and rate limits apply.
The Economics: What This Actually Costs
This is the part most guides skip. Don't skip it.
Managed Agents uses two cost components:
- Standard token rates for the model you pick. For Claude Sonnet 4.6: $3 per million input tokens, $15 per million output tokens. Haiku is cheaper, Opus is more expensive.
- $0.08 per session-hour for active container runtime. Idle time doesn't count.
So for a 5-step data analysis task that runs for 12 minutes:
| Component | Est. Usage | Cost |
|---|---|---|
| Input tokens | ~50,000 | ~$0.15 |
| Output tokens | ~10,000 | ~$0.15 |
| Session runtime (12 min) | 0.2 hours | ~$0.016 |
| Total | ~$0.32 |
That's a reasonable number for a task that would take a developer 30 minutes to do manually. For lightweight tasks it stays cheap. For long-running loops without guardrails, it can surprise you.
The infinite loop risk is real. Set a max_steps limit on your sessions. A loop that hallucinates a tool call, gets a bad result, and keeps retrying will burn tokens and session hours until you interrupt it. The console gives you session tracing and cost attribution per workflow, but the guardrail needs to be in your code.
Chatbot vs. Claude.ai vs. Managed Agents: Pick the Right Tool
This is the question most guides skip. Not "how do managed agents work" but "should I even be building one?"
The answer depends on what your task actually requires. Here's a clean way to think through it.
Standard Chatbot (Claude API, Messages endpoint)
You send a message. You get a response. Done.
No state between turns unless you manually pass the conversation history. No tool execution unless you build the loop yourself. No sandboxed environment. Just tokens in, tokens out.
Right choice when:
- You're building a Q&A assistant, a support bot, or a summarization tool
- Your users need fast responses (sub-second matters here)
- The task lives entirely in language: no files, no code, no external actions
- Cost sensitivity is high and you want the cheapest path
The ceiling: The moment your user says "can you run this for me" or "update that file" or "check the latest price," a standard chatbot hits a wall. You either fake it or build the infrastructure yourself.
Claude.ai (the chat product)
This is Claude with a UI, memory, file uploads, and some built-in tools like web search and code execution. It's powerful for personal use and knowledge work.
But it's a product, not a platform. You can't embed it in your own app, control the session lifecycle, route tasks programmatically, or isolate environments per user. What you see is what you get.
Right choice when:
- You're an individual or small team doing knowledge work
- You need a capable assistant without writing any code
- Your workflow is conversational, not automated
The ceiling: You can't build a product on top of it. There's no API surface for sessions, no custom agent definitions, no per-user isolation. It's a tool for using Claude, not a platform for deploying Claude.
Claude Managed Agents (API, beta)
This is where you cross from "using an AI" to "deploying an AI system."
The agent runs in a sandboxed container. It can execute code, read and write files, browse the web, call external services, and loop until a task is complete. Each user gets an isolated session. State can persist across turns via memory stores. You control the model, the tools, the environment, and the session lifecycle.
Right choice when:
- The task requires doing something, not just saying something
- You need per-user isolation at the execution layer, not just the prompt layer
- You're shipping this to real users and need production-grade session management
- The workflow is multi-step: the agent needs to check results, handle errors, and loop
The ceiling: Cost and latency. Container provisioning adds a few seconds of cold-start time. The session-hour runtime fee adds up for long-running or high-volume workloads. And the API is still in beta, so some rough edges remain.
The decision in one line
| Chatbot | Claude.ai | Managed Agents | |
|---|---|---|---|
| Stateful execution | No | Partial | Yes |
| Sandboxed environment | No | No | Yes |
| Per-user session isolation | No | No | Yes |
| Custom agent definition | Yes | No | Yes |
| Embeddable in your product | Yes | No | Yes |
| Cold-start latency | None | None | Yes |
| Runtime cost | Tokens only | Subscription | Tokens + $0.08/hr |
| Best for | Language tasks | Personal use | Autonomous workflows |
If your task lives entirely in language: use the Messages API.
If you're building for yourself: Claude.ai is probably enough.
If your task requires doing something in a real environment, across multiple steps, for multiple users: that's what Managed Agents is for.
A Real Workflow: File-to-Insight Pipeline
Here's a condensed end-to-end example. The agent takes an uploaded CSV, analyzes it, and writes a report.
import anthropic
client = anthropic.Anthropic()
# 1. Define the agent once
agent = client.beta.agents.create(
name="CSV Analyst",
model={"type": "model", "id": "claude-opus-4-7"},
system=(
"You are a data analyst. When given a CSV file path:\n"
"1. Load and inspect it\n"
"2. Identify key metrics\n"
"3. Write clean Python to calculate them\n"
"4. Execute the code and verify results\n"
"5. Write a plain-English summary to report.txt"
),
tools=[{"type": "agent_toolset_20260401"}]
)
# 2. Define the environment once
environment = client.beta.environments.create(
name="csv-analysis-env",
config={
"type": "cloud",
"networking": {"type": "restricted"}, # no outbound needed
"packages": {"pip": ["pandas"]}
}
)
# 3. Per-user session
session = client.beta.sessions.create(
agent=agent.id,
environment_id=environment.id,
title="Q2 Revenue Analysis"
)
# 4. Stream the task
with client.beta.sessions.events.stream(session.id) as stream:
client.beta.sessions.events.send(
session.id,
events=[{
"type": "user.message",
"content": [{"type": "text", "text": "Analyze /data/sales_q2.csv and write report.txt"}]
}]
)
for event in stream:
if event.type == "agent.tool_use":
print(f"[{event.name}]")
elif event.type == "agent.message":
for block in event.content:
print(block.text, end="")
elif event.type == "session.status_idle":
break
print("\nAnalysis complete.")
The agent handles the loop. You handle the output.
Wiring It Into an Application
Scripts are fine for testing. But at some point you need a real interface that a human can actually use.
The pattern here is straightforward. Your agent and environment are already created and saved. You reference their IDs, create a fresh session per conversation, and stream the response directly into your UI.
Here's a minimal Streamlit chat app that does exactly that:
import os
import anthropic
import streamlit as st
st.set_page_config(page_title="Agent Chat", page_icon="💬")
# Pull from env or let the user paste it in
API_KEY = os.getenv("ANTHROPIC_API_KEY") or st.text_input("API Key", type="password")
# Reference your already-created agent and environment by ID
AGENT_ID = "agent_01xxxxxxxxxxxxxxxxxxxxx"
ENV_ID = "env_014xxxxxxxxxxxxxxxxxxxxx"
if "messages" not in st.session_state:
st.session_state.messages = []
# Render conversation history
for role, content in st.session_state.messages:
with st.chat_message(role):
st.markdown(content)
prompt = st.chat_input("Message...")
if prompt and API_KEY:
st.session_state.messages.append(("user", prompt))
with st.chat_message("user"):
st.markdown(prompt)
with st.chat_message("assistant"):
placeholder = st.empty()
response = ""
client = anthropic.Anthropic(api_key=API_KEY)
# New session per conversation turn
session = client.beta.sessions.create(
agent={"type": "agent", "id": AGENT_ID},
environment_id=ENV_ID,
)
# Stream and render tokens as they arrive
with client.beta.sessions.events.stream(session_id=session.id) as stream:
client.beta.sessions.events.send(
session_id=session.id,
events=[{
"type": "user.message",
"content": [{"type": "text", "text": prompt}]
}],
)
for event in stream:
if event.type == "agent.message":
for block in event.content:
response += block.text
placeholder.markdown(response + "▌") # streaming cursor
elif event.type == "session.status_idle":
break
placeholder.markdown(response)
st.session_state.messages.append(("assistant", response))
A few things worth noting in this pattern:
The agent and environment are created once, used many times. You don't recreate them per request. You create them once during setup (or in a separate init script), save the IDs, and reference them here. Session creation is the only per-request API call.
Each turn gets a fresh session. This keeps isolation clean. The conversation history lives in st.session_state, not in the session itself. If you need memory to persist across turns inside the agent's context, attach a memory store to each new session and let the agent write to it.
The streaming cursor (▌) is a small detail that matters. It gives users a visible signal that the agent is still working. Without it, a slow tool-execution step looks like the app is frozen. Token-by-token rendering with a cursor is the right default for any agent-backed UI.
The API key flow is practical for quick demos. For production, use os.getenv("ANTHROPIC_API_KEY") only and remove the st.text_input fallback. You don't want users pasting API keys into a public interface.
Failure Modes to Know Before You Ship
Session timeouts. Sessions have lifecycle limits. If a task runs longer than expected or the agent stalls, you need to handle re-connection or interruption gracefully in your client code.
Tool-call hallucinations. The agent can call a tool with plausible but wrong parameters, get an error, and retry in a way that wastes steps. Inspect the event stream in your console during testing. The tracing is detailed enough to catch these patterns early.
State contamination. If you're using read-write memory stores with untrusted input, see the prompt injection warning above. Default to read_only until you have a reason not to.
Cold starts. First-session container provisioning takes a few seconds. Reusing environment definitions helps because packages are cached. Factor this into any latency-sensitive user-facing workflow.
Where to Go From Here
The API is in beta (managed-agents-2026-04-01 header required, though the SDK sets it automatically). Some features are in research preview: outcomes-based task delegation and multiagent orchestration require separate access requests.
Start small. Build a session that does one useful thing end-to-end. Watch the event stream. Check the console for cost attribution. Then extend it.
The infrastructure problem is already solved. The interesting question now is what you build with it.
Official docs: platform.claude.com/docs/en/managed-agents
Quickstart: Get started with Claude Managed Agents
Pricing: Token rates + $0.08/session-hour. Full details in the Managed Agents pricing docs.


Top comments (0)