rafflesia Khan

Posted on Apr 26

Claude Managed Agents: The Builder's Guide to Production-Ready AI Workflows

#claude #ai #python #agentskills

You've wired up the LLM. Now it needs to actually do something.

You built a basic Claude script. It answers questions. It summarizes text. It's clever.

Then you try to make it do real work: read a file, execute some code, check the result, handle an error, loop back. Suddenly you're writing an agent loop from scratch. You're managing state across turns, sandboxing execution environments, isolating sessions per user, and debugging tool calls that silently fail.

That's the state problem. And it kills most agentic projects before they ship.

Claude Managed Agents is Anthropic's answer. Instead of building the harness yourself, you define what your agent should do and let the infrastructure handle the rest.

This guide is for builders who want to understand how it actually works, where it costs you, and when it's worth it.

What Managed Agents Actually Are

Forget the marketing framing for a second.

At its core, a Managed Agent is three things bound together:

A model with a system prompt and a defined toolset
An environment: a cloud container where the agent executes code, reads and writes files, and browses the web
A session: a stateful run that tracks every event, every tool call, and every output

Without Managed Agents, you're responsible for all three. You manage the prompt loop, spin up your own execution sandbox, and figure out how to persist state between turns. It's doable. Teams do it. But it takes weeks to get right and longer to get safe.

Managed Agents collapses that into a single API. You define the what. Anthropic handles the where and how.

The Three Building Blocks

1. The Agent

The agent is the reusable definition. Think of it as a job description: the model to use, the system prompt, and the tools available.

import anthropic

client = anthropic.Anthropic()

agent = client.beta.agents.create(
    name="Data Analysis Assistant",
    model={"type": "model", "id": "claude-opus-4-7"},
    system="You are a data analyst. Write clean Python, execute it, verify the output.",
    tools=[{"type": "agent_toolset_20260401"}],
)

print(f"Agent ID: {agent.id}")

Save that agent.id. You'll reference it for every session you create. The agent definition is versioned, so you can pin sessions to specific versions while you iterate.

The agent_toolset_20260401 tool type gives Claude the full built-in suite: bash, file operations, web search, web browsing, code execution, and MCP server connections. You can also configure tools individually if you want tighter control.

2. The Environment

The environment is the container where the agent does its work. Every session gets its own isolated instance.

environment = client.beta.environments.create(
    name="python-data-env",
    config={
        "type": "cloud",
        "networking": {"type": "unrestricted"},
        "packages": {
            "pip": ["pandas", "matplotlib", "numpy"]
        }
    }
)

print(f"Environment ID: {environment.id}")

A few things worth knowing here:

Multiple sessions can reference the same environment definition, but each gets its own isolated container
Pre-installed packages are cached across sessions that share the same environment, which cuts cold-start time
Networking can be unrestricted or sandboxed depending on what your agent needs

The environment is where "AI that talks" becomes "AI that does."

3. The Session

The session is the actual running instance. It wires the agent to the environment and tracks everything that happens.

session = client.beta.sessions.create(
    agent=agent.id,
    environment_id=environment.id,
    title="Q2 Sales Analysis"
)

print(f"Session ID: {session.id}")

Creating a session provisions the container but doesn't start any work. Work starts when you send an event.

Sending Work and Streaming Results

Sessions use server-sent events (SSE). You stream responses as they happen instead of waiting for a final answer.

with client.beta.sessions.events.stream(session.id) as stream:
    # Send the task after the stream opens
    client.beta.sessions.events.send(
        session.id,
        events=[{
            "type": "user.message",
            "content": [{
                "type": "text",
                "text": "Load sales_q2.csv, calculate total revenue by region, and save a summary to report.txt"
            }]
        }]
    )

    for event in stream:
        match event.type:
            case "agent.message":
                for block in event.content:
                    print(block.text, end="")
            case "agent.tool_use":
                print(f"\n[Tool: {event.name}]")
            case "session.status_idle":
                print("\n\nDone.")
                break

The agent will write the code, execute it, check the output file, and confirm success. You see every tool call in the stream. No black box.

The State Problem (and How Memory Stores Solve It)

By default, each session starts fresh. When it ends, everything the agent learned is gone.

That's fine for one-shot tasks. It's a problem for anything that needs to remember context across runs.

Memory stores solve this. They're persistent collections of text documents that mount directly into a session's container filesystem. The agent reads and writes them with the same file tools it uses for everything else.

# Create the store
store = client.beta.memory_stores.create(name="user-project-context")

# Attach it to a session
session = client.beta.sessions.create(
    agent=agent.id,
    environment_id=environment.id,
    resources=[{
        "type": "memory_store",
        "memory_store_id": store.id,
        "access": "read_write",
        "instructions": "Project conventions and prior decisions. Check before starting any task."
    }]
)

A few things that are easy to miss:

Memory stores can be shared across sessions in read-only mode for reference material (team conventions, domain knowledge)
The maximum is 8 stores per session
Be careful with read-write access on stores that process untrusted input. If the agent handles user-supplied prompts or fetched web content, a prompt injection could write malicious content into the store. Later sessions then read it as trusted memory. Use read_only for anything the agent doesn't need to modify.

The Catch: memory stores are still in beta, so capacity and rate limits apply.

The Economics: What This Actually Costs

This is the part most guides skip. Don't skip it.

Managed Agents uses two cost components:

Standard token rates for the model you pick. For Claude Sonnet 4.6: $3 per million input tokens, $15 per million output tokens. Haiku is cheaper, Opus is more expensive.
$0.08 per session-hour for active container runtime. Idle time doesn't count.

So for a 5-step data analysis task that runs for 12 minutes:

Component	Est. Usage	Cost
Input tokens	~50,000	~$0.15
Output tokens	~10,000	~$0.15
Session runtime (12 min)	0.2 hours	~$0.016
Total		~$0.32

That's a reasonable number for a task that would take a developer 30 minutes to do manually. For lightweight tasks it stays cheap. For long-running loops without guardrails, it can surprise you.

The infinite loop risk is real. Set a max_steps limit on your sessions. A loop that hallucinates a tool call, gets a bad result, and keeps retrying will burn tokens and session hours until you interrupt it. The console gives you session tracing and cost attribution per workflow, but the guardrail needs to be in your code.

Chatbot vs. Claude.ai vs. Managed Agents: Pick the Right Tool

This is the question most guides skip. Not "how do managed agents work" but "should I even be building one?"

The answer depends on what your task actually requires. Here's a clean way to think through it.

Standard Chatbot (Claude API, Messages endpoint)

You send a message. You get a response. Done.

No state between turns unless you manually pass the conversation history. No tool execution unless you build the loop yourself. No sandboxed environment. Just tokens in, tokens out.

Right choice when:

You're building a Q&A assistant, a support bot, or a summarization tool
Your users need fast responses (sub-second matters here)
The task lives entirely in language: no files, no code, no external actions
Cost sensitivity is high and you want the cheapest path

The ceiling: The moment your user says "can you run this for me" or "update that file" or "check the latest price," a standard chatbot hits a wall. You either fake it or build the infrastructure yourself.

Claude.ai (the chat product)

This is Claude with a UI, memory, file uploads, and some built-in tools like web search and code execution. It's powerful for personal use and knowledge work.

But it's a product, not a platform. You can't embed it in your own app, control the session lifecycle, route tasks programmatically, or isolate environments per user. What you see is what you get.

Right choice when:

You're an individual or small team doing knowledge work
You need a capable assistant without writing any code
Your workflow is conversational, not automated

The ceiling: You can't build a product on top of it. There's no API surface for sessions, no custom agent definitions, no per-user isolation. It's a tool for using Claude, not a platform for deploying Claude.

Claude Managed Agents (API, beta)

This is where you cross from "using an AI" to "deploying an AI system."

The agent runs in a sandboxed container. It can execute code, read and write files, browse the web, call external services, and loop until a task is complete. Each user gets an isolated session. State can persist across turns via memory stores. You control the model, the tools, the environment, and the session lifecycle.

Right choice when:

The task requires doing something, not just saying something
You need per-user isolation at the execution layer, not just the prompt layer
You're shipping this to real users and need production-grade session management
The workflow is multi-step: the agent needs to check results, handle errors, and loop

The ceiling: Cost and latency. Container provisioning adds a few seconds of cold-start time. The session-hour runtime fee adds up for long-running or high-volume workloads. And the API is still in beta, so some rough edges remain.

The decision in one line

	Chatbot	Claude.ai	Managed Agents
Stateful execution	No	Partial	Yes
Sandboxed environment	No	No	Yes
Per-user session isolation	No	No	Yes
Custom agent definition	Yes	No	Yes
Embeddable in your product	Yes	No	Yes
Cold-start latency	None	None	Yes
Runtime cost	Tokens only	Subscription	Tokens + $0.08/hr
Best for	Language tasks	Personal use	Autonomous workflows

If your task lives entirely in language: use the Messages API.
If you're building for yourself: Claude.ai is probably enough.
If your task requires doing something in a real environment, across multiple steps, for multiple users: that's what Managed Agents is for.

A Real Workflow: File-to-Insight Pipeline

Here's a condensed end-to-end example. The agent takes an uploaded CSV, analyzes it, and writes a report.

import anthropic

client = anthropic.Anthropic()

# 1. Define the agent once
agent = client.beta.agents.create(
    name="CSV Analyst",
    model={"type": "model", "id": "claude-opus-4-7"},
    system=(
        "You are a data analyst. When given a CSV file path:\n"
        "1. Load and inspect it\n"
        "2. Identify key metrics\n"
        "3. Write clean Python to calculate them\n"
        "4. Execute the code and verify results\n"
        "5. Write a plain-English summary to report.txt"
    ),
    tools=[{"type": "agent_toolset_20260401"}]
)

# 2. Define the environment once
environment = client.beta.environments.create(
    name="csv-analysis-env",
    config={
        "type": "cloud",
        "networking": {"type": "restricted"},  # no outbound needed
        "packages": {"pip": ["pandas"]}
    }
)

# 3. Per-user session
session = client.beta.sessions.create(
    agent=agent.id,
    environment_id=environment.id,
    title="Q2 Revenue Analysis"
)

# 4. Stream the task
with client.beta.sessions.events.stream(session.id) as stream:
    client.beta.sessions.events.send(
        session.id,
        events=[{
            "type": "user.message",
            "content": [{"type": "text", "text": "Analyze /data/sales_q2.csv and write report.txt"}]
        }]
    )

    for event in stream:
        if event.type == "agent.tool_use":
            print(f"[{event.name}]")
        elif event.type == "agent.message":
            for block in event.content:
                print(block.text, end="")
        elif event.type == "session.status_idle":
            break

print("\nAnalysis complete.")

The agent handles the loop. You handle the output.

Wiring It Into an Application

Scripts are fine for testing. But at some point you need a real interface that a human can actually use.

The pattern here is straightforward. Your agent and environment are already created and saved. You reference their IDs, create a fresh session per conversation, and stream the response directly into your UI.

Here's a minimal Streamlit chat app that does exactly that:

import os
import anthropic
import streamlit as st

st.set_page_config(page_title="Agent Chat", page_icon="💬")

# Pull from env or let the user paste it in
API_KEY = os.getenv("ANTHROPIC_API_KEY") or st.text_input("API Key", type="password")

# Reference your already-created agent and environment by ID
AGENT_ID = "agent_01xxxxxxxxxxxxxxxxxxxxx"
ENV_ID   = "env_014xxxxxxxxxxxxxxxxxxxxx"

if "messages" not in st.session_state:
    st.session_state.messages = []

# Render conversation history
for role, content in st.session_state.messages:
    with st.chat_message(role):
        st.markdown(content)

prompt = st.chat_input("Message...")

if prompt and API_KEY:
    st.session_state.messages.append(("user", prompt))
    with st.chat_message("user"):
        st.markdown(prompt)

    with st.chat_message("assistant"):
        placeholder = st.empty()
        response = ""

        client = anthropic.Anthropic(api_key=API_KEY)

        # New session per conversation turn
        session = client.beta.sessions.create(
            agent={"type": "agent", "id": AGENT_ID},
            environment_id=ENV_ID,
        )

        # Stream and render tokens as they arrive
        with client.beta.sessions.events.stream(session_id=session.id) as stream:
            client.beta.sessions.events.send(
                session_id=session.id,
                events=[{
                    "type": "user.message",
                    "content": [{"type": "text", "text": prompt}]
                }],
            )

            for event in stream:
                if event.type == "agent.message":
                    for block in event.content:
                        response += block.text
                        placeholder.markdown(response + "▌")  # streaming cursor
                elif event.type == "session.status_idle":
                    break

        placeholder.markdown(response)
        st.session_state.messages.append(("assistant", response))

A few things worth noting in this pattern:

The agent and environment are created once, used many times. You don't recreate them per request. You create them once during setup (or in a separate init script), save the IDs, and reference them here. Session creation is the only per-request API call.

Each turn gets a fresh session. This keeps isolation clean. The conversation history lives in st.session_state, not in the session itself. If you need memory to persist across turns inside the agent's context, attach a memory store to each new session and let the agent write to it.

The streaming cursor (▌) is a small detail that matters. It gives users a visible signal that the agent is still working. Without it, a slow tool-execution step looks like the app is frozen. Token-by-token rendering with a cursor is the right default for any agent-backed UI.

The API key flow is practical for quick demos. For production, use os.getenv("ANTHROPIC_API_KEY") only and remove the st.text_input fallback. You don't want users pasting API keys into a public interface.

Failure Modes to Know Before You Ship

Session timeouts. Sessions have lifecycle limits. If a task runs longer than expected or the agent stalls, you need to handle re-connection or interruption gracefully in your client code.

Tool-call hallucinations. The agent can call a tool with plausible but wrong parameters, get an error, and retry in a way that wastes steps. Inspect the event stream in your console during testing. The tracing is detailed enough to catch these patterns early.

State contamination. If you're using read-write memory stores with untrusted input, see the prompt injection warning above. Default to read_only until you have a reason not to.

Cold starts. First-session container provisioning takes a few seconds. Reusing environment definitions helps because packages are cached. Factor this into any latency-sensitive user-facing workflow.

Where to Go From Here

The API is in beta (managed-agents-2026-04-01 header required, though the SDK sets it automatically). Some features are in research preview: outcomes-based task delegation and multiagent orchestration require separate access requests.

Start small. Build a session that does one useful thing end-to-end. Watch the event stream. Check the console for cost attribution. Then extend it.

The infrastructure problem is already solved. The interesting question now is what you build with it.

Official docs: platform.claude.com/docs/en/managed-agents

Quickstart: Get started with Claude Managed Agents

Pricing: Token rates + $0.08/session-hour. Full details in the Managed Agents pricing docs.