Claude Managed Agents — The Complete Guide: Brain/Hands/Session Decoupled Architecture, MCP Connectors, and Multi-Agent Orchestration for Production AI
Published on ManoIT Tech Blog (Korean original). On April 8, 2026 Anthropic launched Claude Managed Agents — a fully hosted agent harness that handles the execution loop, sandboxing, tool orchestration, session persistence, and event streaming. This post dissects the architecture, walks through MCP connector wiring, demonstrates multi-agent orchestration patterns, and analyzes three real production deployments (Notion, Rakuten, Sentry).
1. Why Managed Agents — The Structural Limits of Rolling Your Own Agent Loop
If you have ever shipped an agent built on top of the Messages API, you know the pattern: wrap a while loop around client.messages.create(), parse tool_use blocks, execute tools in a local sandbox, feed tool_result back in, repeat until stop_reason == "end_turn". It works — until it doesn't.
The failure modes are predictable: a runaway loop burns through your rate budget, a tool call hangs and there is no timeout, the sandbox leaks filesystem state between users, long-running sessions exceed context and lose memory, stream consumers drop events on reconnect, and observability boils down to log scraping. Every team that runs agents in production eventually builds the same harness — and every team's harness has different bugs.
Managed Agents is Anthropic's answer: a hosted runtime that owns the execution loop, the sandbox, session persistence, and the event stream, so your application owns only the product logic.
2. Architecture Deep Dive — Brain / Hands / Session
The platform splits a running agent into three decoupled planes:
| Plane | Responsibility | Anthropic Ops |
|---|---|---|
| Brain | Inference — the LLM that reasons and decides | Model inference (Opus 4.7 / Sonnet 4.6) |
| Hands | Execution — the sandboxed environment that runs tools | Ubuntu sandbox, file I/O, bash, network allowlist |
| Session | State — the thread that accumulates messages, tool results, and events | Persistent storage, event bus, stream replay |
Why decoupling matters. In a homegrown harness the three planes collapse into one process. If inference is slow, tool execution queues. If a tool hangs, the loop stalls. If the process dies, session state is gone. The Managed Agents topology isolates each plane so that p50 TTFT drops ~60% and p95 TTFT drops >90% compared to naive loops, because tool execution and model inference overlap, and streams survive process restarts.
3. The Four Core Resources
Every Managed Agents deployment revolves around four REST resources:
| Resource | Purpose | Key fields |
|---|---|---|
Agent |
The blueprint — model, tools, system prompt, beta flags, MCP servers |
model, toolset, system, mcp_servers, vault_ids
|
Environment |
The sandbox template — base image, resource limits, network policy |
base_image, cpu, memory, network_allowlist
|
Session |
A running instance — one Agent in one Environment, with a thread |
agent_id, environment_id, thread_id
|
Events |
The stream — every model output, tool call, result, and status change | SSE stream, typed event payloads |
The required beta header is managed-agents-2026-04-01, and the default toolset is agent_toolset_20260401 which bundles bash, read, write, edit, glob, grep, web_search, and web_fetch.
4. Quickstart — Your First Session in 10 Minutes
Below is the minimum code to create an Agent, an Environment, and a Session, then send it a message and consume the event stream.
import anthropic
client = anthropic.Anthropic(
default_headers={"anthropic-beta": "managed-agents-2026-04-01"}
)
# 1) Define the Agent (the blueprint)
agent = client.agents.create(
model="claude-opus-4-7",
toolset="agent_toolset_20260401",
system="You are a production-grade coding agent. Always write tests.",
)
# 2) Define the Environment (the sandbox template)
env = client.environments.create(
base_image="ubuntu-24.04-dev",
cpu=2,
memory_mb=4096,
network_allowlist=["github.com", "pypi.org"],
)
# 3) Launch a Session
session = client.sessions.create(
agent_id=agent.id,
environment_id=env.id,
)
# 4) Send a message and stream events
stream = client.sessions.messages.send(
session_id=session.id,
content="Clone psf/requests, find the top 3 slowest tests, propose fixes.",
stream=True,
)
for event in stream:
if event.type == "agent.thread_message_received":
print(event.data.content)
The session keeps running after the initial turn — you can attach more messages to the same thread_id and the agent resumes with full memory of earlier turns.
5. MCP Connectors — The Standard Interface for External Systems
Managed Agents natively supports MCP (Model Context Protocol) — but only the remote HTTP streamable transport, not stdio. Credentials live in Anthropic's Vault and never enter the sandbox.
vault_entry = client.vaults.secrets.create(
name="github-pat",
value=os.environ["GITHUB_PAT"],
)
agent = client.agents.create(
model="claude-sonnet-4-6",
toolset="agent_toolset_20260401",
mcp_servers=[
{
"name": "github",
"url": "https://mcp.github.com/streamable",
"vault_ids": [vault_entry.id],
},
{
"name": "linear",
"url": "https://mcp.linear.app/streamable",
"vault_ids": [linear_secret.id],
},
],
system="You triage GitHub issues and sync status to Linear.",
)
The credential isolation pattern is the killer feature: your production PAT never touches a sandbox process, so a prompt injection that convinces the agent to cat ~/.env returns nothing because the token lives one network hop away.
6. Multi-Agent — Supervisor Pattern over Session Threads
Managed Agents implements multi-agent orchestration through callable_agents (currently one-level delegation only, research preview). The supervisor receives the top-level goal and delegates subtasks to specialized agents over session threads.
reviewer = client.agents.create(
model="claude-sonnet-4-6",
toolset="agent_toolset_20260401",
system="You review diffs for security, performance, and style. Be specific.",
)
test_writer = client.agents.create(
model="claude-sonnet-4-6",
toolset="agent_toolset_20260401",
system="You write pytest tests with >80% coverage. Include edge cases.",
)
orchestrator = client.agents.create(
model="claude-opus-4-7",
toolset="agent_toolset_20260401",
callable_agents=[reviewer.id, test_writer.id],
system=(
"You orchestrate code review. Delegate review to the reviewer agent, "
"delegate test writing to the test_writer agent, then synthesize."
),
)
Event types like session.thread_created and agent.thread_message_sent let you observe delegation as it happens. Rakuten built their Slack/Teams bot network on this pattern — one supervisor per workspace, domain sub-agents per channel, full deployment in one week.
7. Event Stream Design — Production SSE Consumption
The event stream is SSE-based and resumable. A production consumer must handle reconnect, replay, and backpressure.
import time
import httpx
def consume(session_id: str, last_event_id: str | None = None):
backoff = 1.0
while True:
try:
headers = {
"anthropic-version": "2023-06-01",
"anthropic-beta": "managed-agents-2026-04-01",
"Accept": "text/event-stream",
}
if last_event_id:
headers["Last-Event-ID"] = last_event_id
with httpx.stream(
"GET",
f"https://api.anthropic.com/v1/sessions/{session_id}/events",
headers=headers,
timeout=None,
) as r:
for event in r.iter_sse():
last_event_id = event.id
yield event
backoff = 1.0
except httpx.HTTPError:
time.sleep(min(backoff, 30))
backoff *= 2
Key event types to handle: session.thread_created, session.thread_idle, agent.thread_message_sent, agent.thread_message_received, agent.tool_use_started, agent.tool_use_completed, session.closed.
8. Research Preview Features — Outcomes, Memory, Multiagent
| Feature | What it does | Status |
|---|---|---|
| Outcomes | Structured success/failure signals the agent self-reports for eval pipelines | Research preview |
| Memory | Persistent semantic memory that survives session boundaries | Research preview |
| Multiagent |
callable_agents — 1-level delegation between agents |
Research preview |
All three require an allowlist opt-in. Expect breaking changes. Production-critical work should pin to GA features only.
9. Production Case Studies — Notion, Rakuten, Sentry
Notion — Workspace agents. Every Notion workspace gets a dedicated agent that reads from the workspace databases via the Notion MCP server. Users chat with it to summarize projects, draft pages, and route action items. Key insight: one agent per workspace gives durable per-tenant memory without cross-tenant leakage.
Rakuten — Slack/Teams sub-agents per domain. A supervisor agent fans out to domain-specialist sub-agents (logistics, finance, HR). They deployed in one week because Managed Agents erased the entire harness layer. Key insight: fast time-to-value comes from offloading infra, not from writing clever prompts.
Sentry — Debug + PR agent. When an error hits production, a session launches that reads Sentry context via MCP, clones the repo, reproduces the bug, proposes a fix, and opens a PR. Key insight: long-running sessions + event streams let engineers watch the agent work and intervene mid-flight.
10. Cost Model and Budget Design
Pricing is transparent: $0.08 per session-hour for compute plus standard token costs for inference. A ballpark estimator:
def estimate_monthly_cost(
sessions_per_day: int,
avg_session_minutes: float,
avg_input_tokens: int,
avg_output_tokens: int,
model: str = "opus-4-7",
) -> float:
prices = {
"opus-4-7": {"in": 15.0 / 1_000_000, "out": 75.0 / 1_000_000},
"sonnet-4-6": {"in": 3.0 / 1_000_000, "out": 15.0 / 1_000_000},
}
p = prices[model]
days = 30
session_hours = sessions_per_day * (avg_session_minutes / 60) * days
compute_cost = session_hours * 0.08
inference_cost = sessions_per_day * days * (
avg_input_tokens * p["in"] + avg_output_tokens * p["out"]
)
return compute_cost + inference_cost
Rate limits: 60 RPM for create endpoints, 600 RPM for read endpoints. Above that, open a support ticket for a quota increase.
11. Messages API vs Agent SDK vs Managed Agents — Decision Matrix
| Dimension | Messages API | Agent SDK (self-hosted harness) | Managed Agents |
|---|---|---|---|
| Control | Full | High | Medium |
| Infra ownership | All yours | All yours | Anthropic |
| Time-to-production | Weeks–months | Days–weeks | Hours–days |
| Sandbox | DIY | DIY | Managed |
| Session persistence | DIY | DIY | Managed |
| Event stream | DIY | DIY | Managed SSE |
| Multi-agent | DIY | DIY | Built-in (preview) |
| Best for | Simple single-turn tools | Research, custom harnesses | Production agent products |
12. ManoIT Production Checklist
Before going to production:
- Pin
anthropic-beta: managed-agents-2026-04-01explicitly in every client. - Put all credentials in Vault — never in
systemprompts, never in environment variables inside the sandbox. - Implement
Last-Event-IDresume on every SSE consumer. - Set
network_allowlistas tight as the workload allows. Deny by default. - Gate
callable_agentsbehind feature flags — it is research preview. - Log
session.id,thread.id, andevent.idon every inbound webhook; those are your correlation keys. - Budget guardrails: pre-compute cost per session and kill long-runners over threshold.
- For multi-tenant SaaS: one Agent per tenant (Notion pattern) — do not share Agents across tenants.
13. Conclusion — Agent Infrastructure Is Standardizing
Claude Managed Agents is the first production-grade managed harness from a frontier lab. It doesn't replace the Messages API or the Agent SDK — it stacks on top, giving teams the choice between low-level control and hosted convenience. The three Brain/Hands/Session planes, the MCP connector model, and the Vault credential isolation pattern will almost certainly show up in competing platforms within the year. If you are shipping an AI agent product in 2026, evaluate Managed Agents before you write another line of harness code.
This post was originally published on the ManoIT Tech Blog in Korean. The English version was adapted for dev.to. Written with assistance from Claude (Anthropic).
Originally published at ManoIT Tech Blog.
Top comments (0)