daniel jeong

Posted on Apr 20 • Originally published at manoit.co.kr

Claude Managed Agents — The Complete Guide: Brain/Hands/Session Architecture, MCP Connectors, and Multi-Agent Orchestration

#ai #llm #agents #machinelearning

Claude Managed Agents — The Complete Guide: Brain/Hands/Session Decoupled Architecture, MCP Connectors, and Multi-Agent Orchestration for Production AI

Published on ManoIT Tech Blog (Korean original). On April 8, 2026 Anthropic launched Claude Managed Agents — a fully hosted agent harness that handles the execution loop, sandboxing, tool orchestration, session persistence, and event streaming. This post dissects the architecture, walks through MCP connector wiring, demonstrates multi-agent orchestration patterns, and analyzes three real production deployments (Notion, Rakuten, Sentry).

1. Why Managed Agents — The Structural Limits of Rolling Your Own Agent Loop

If you have ever shipped an agent built on top of the Messages API, you know the pattern: wrap a while loop around client.messages.create(), parse tool_use blocks, execute tools in a local sandbox, feed tool_result back in, repeat until stop_reason == "end_turn". It works — until it doesn't.

The failure modes are predictable: a runaway loop burns through your rate budget, a tool call hangs and there is no timeout, the sandbox leaks filesystem state between users, long-running sessions exceed context and lose memory, stream consumers drop events on reconnect, and observability boils down to log scraping. Every team that runs agents in production eventually builds the same harness — and every team's harness has different bugs.

Managed Agents is Anthropic's answer: a hosted runtime that owns the execution loop, the sandbox, session persistence, and the event stream, so your application owns only the product logic.

2. Architecture Deep Dive — Brain / Hands / Session

The platform splits a running agent into three decoupled planes:

Plane	Responsibility	Anthropic Ops
Brain	Inference — the LLM that reasons and decides	Model inference (Opus 4.7 / Sonnet 4.6)
Hands	Execution — the sandboxed environment that runs tools	Ubuntu sandbox, file I/O, bash, network allowlist
Session	State — the thread that accumulates messages, tool results, and events	Persistent storage, event bus, stream replay

Why decoupling matters. In a homegrown harness the three planes collapse into one process. If inference is slow, tool execution queues. If a tool hangs, the loop stalls. If the process dies, session state is gone. The Managed Agents topology isolates each plane so that p50 TTFT drops ~60% and p95 TTFT drops >90% compared to naive loops, because tool execution and model inference overlap, and streams survive process restarts.

3. The Four Core Resources

Every Managed Agents deployment revolves around four REST resources:

Resource	Purpose	Key fields
`Agent`	The blueprint — model, tools, system prompt, beta flags, MCP servers	`model`, `toolset`, `system`, `mcp_servers`, `vault_ids`
`Environment`	The sandbox template — base image, resource limits, network policy	`base_image`, `cpu`, `memory`, `network_allowlist`
`Session`	A running instance — one Agent in one Environment, with a thread	`agent_id`, `environment_id`, `thread_id`
`Events`	The stream — every model output, tool call, result, and status change	SSE stream, typed event payloads

The required beta header is managed-agents-2026-04-01, and the default toolset is agent_toolset_20260401 which bundles bash, read, write, edit, glob, grep, web_search, and web_fetch.

4. Quickstart — Your First Session in 10 Minutes

Below is the minimum code to create an Agent, an Environment, and a Session, then send it a message and consume the event stream.

import anthropic

client = anthropic.Anthropic(
    default_headers={"anthropic-beta": "managed-agents-2026-04-01"}
)

# 1) Define the Agent (the blueprint)
agent = client.agents.create(
    model="claude-opus-4-7",
    toolset="agent_toolset_20260401",
    system="You are a production-grade coding agent. Always write tests.",
)

# 2) Define the Environment (the sandbox template)
env = client.environments.create(
    base_image="ubuntu-24.04-dev",
    cpu=2,
    memory_mb=4096,
    network_allowlist=["github.com", "pypi.org"],
)

# 3) Launch a Session
session = client.sessions.create(
    agent_id=agent.id,
    environment_id=env.id,
)

# 4) Send a message and stream events
stream = client.sessions.messages.send(
    session_id=session.id,
    content="Clone psf/requests, find the top 3 slowest tests, propose fixes.",
    stream=True,
)

for event in stream:
    if event.type == "agent.thread_message_received":
        print(event.data.content)

The session keeps running after the initial turn — you can attach more messages to the same thread_id and the agent resumes with full memory of earlier turns.

5. MCP Connectors — The Standard Interface for External Systems

Managed Agents natively supports MCP (Model Context Protocol) — but only the remote HTTP streamable transport, not stdio. Credentials live in Anthropic's Vault and never enter the sandbox.

vault_entry = client.vaults.secrets.create(
    name="github-pat",
    value=os.environ["GITHUB_PAT"],
)

agent = client.agents.create(
    model="claude-sonnet-4-6",
    toolset="agent_toolset_20260401",
    mcp_servers=[
        {
            "name": "github",
            "url": "https://mcp.github.com/streamable",
            "vault_ids": [vault_entry.id],
        },
        {
            "name": "linear",
            "url": "https://mcp.linear.app/streamable",
            "vault_ids": [linear_secret.id],
        },
    ],
    system="You triage GitHub issues and sync status to Linear.",
)

The credential isolation pattern is the killer feature: your production PAT never touches a sandbox process, so a prompt injection that convinces the agent to cat ~/.env returns nothing because the token lives one network hop away.

6. Multi-Agent — Supervisor Pattern over Session Threads

Managed Agents implements multi-agent orchestration through callable_agents (currently one-level delegation only, research preview). The supervisor receives the top-level goal and delegates subtasks to specialized agents over session threads.

reviewer = client.agents.create(
    model="claude-sonnet-4-6",
    toolset="agent_toolset_20260401",
    system="You review diffs for security, performance, and style. Be specific.",
)

test_writer = client.agents.create(
    model="claude-sonnet-4-6",
    toolset="agent_toolset_20260401",
    system="You write pytest tests with >80% coverage. Include edge cases.",
)

orchestrator = client.agents.create(
    model="claude-opus-4-7",
    toolset="agent_toolset_20260401",
    callable_agents=[reviewer.id, test_writer.id],
    system=(
        "You orchestrate code review. Delegate review to the reviewer agent, "
        "delegate test writing to the test_writer agent, then synthesize."
    ),
)

Event types like session.thread_created and agent.thread_message_sent let you observe delegation as it happens. Rakuten built their Slack/Teams bot network on this pattern — one supervisor per workspace, domain sub-agents per channel, full deployment in one week.

7. Event Stream Design — Production SSE Consumption

The event stream is SSE-based and resumable. A production consumer must handle reconnect, replay, and backpressure.

import time
import httpx

def consume(session_id: str, last_event_id: str | None = None):
    backoff = 1.0
    while True:
        try:
            headers = {
                "anthropic-version": "2023-06-01",
                "anthropic-beta": "managed-agents-2026-04-01",
                "Accept": "text/event-stream",
            }
            if last_event_id:
                headers["Last-Event-ID"] = last_event_id

            with httpx.stream(
                "GET",
                f"https://api.anthropic.com/v1/sessions/{session_id}/events",
                headers=headers,
                timeout=None,
            ) as r:
                for event in r.iter_sse():
                    last_event_id = event.id
                    yield event
                    backoff = 1.0
        except httpx.HTTPError:
            time.sleep(min(backoff, 30))
            backoff *= 2

Key event types to handle: session.thread_created, session.thread_idle, agent.thread_message_sent, agent.thread_message_received, agent.tool_use_started, agent.tool_use_completed, session.closed.

8. Research Preview Features — Outcomes, Memory, Multiagent

Feature	What it does	Status
Outcomes	Structured success/failure signals the agent self-reports for eval pipelines	Research preview
Memory	Persistent semantic memory that survives session boundaries	Research preview
Multiagent	`callable_agents` — 1-level delegation between agents	Research preview

All three require an allowlist opt-in. Expect breaking changes. Production-critical work should pin to GA features only.

9. Production Case Studies — Notion, Rakuten, Sentry

Notion — Workspace agents. Every Notion workspace gets a dedicated agent that reads from the workspace databases via the Notion MCP server. Users chat with it to summarize projects, draft pages, and route action items. Key insight: one agent per workspace gives durable per-tenant memory without cross-tenant leakage.

Rakuten — Slack/Teams sub-agents per domain. A supervisor agent fans out to domain-specialist sub-agents (logistics, finance, HR). They deployed in one week because Managed Agents erased the entire harness layer. Key insight: fast time-to-value comes from offloading infra, not from writing clever prompts.

Sentry — Debug + PR agent. When an error hits production, a session launches that reads Sentry context via MCP, clones the repo, reproduces the bug, proposes a fix, and opens a PR. Key insight: long-running sessions + event streams let engineers watch the agent work and intervene mid-flight.

10. Cost Model and Budget Design

Pricing is transparent: $0.08 per session-hour for compute plus standard token costs for inference. A ballpark estimator:

def estimate_monthly_cost(
    sessions_per_day: int,
    avg_session_minutes: float,
    avg_input_tokens: int,
    avg_output_tokens: int,
    model: str = "opus-4-7",
) -> float:
    prices = {
        "opus-4-7":   {"in": 15.0 / 1_000_000, "out": 75.0 / 1_000_000},
        "sonnet-4-6": {"in":  3.0 / 1_000_000, "out": 15.0 / 1_000_000},
    }
    p = prices[model]
    days = 30
    session_hours = sessions_per_day * (avg_session_minutes / 60) * days
    compute_cost = session_hours * 0.08
    inference_cost = sessions_per_day * days * (
        avg_input_tokens * p["in"] + avg_output_tokens * p["out"]
    )
    return compute_cost + inference_cost

Rate limits: 60 RPM for create endpoints, 600 RPM for read endpoints. Above that, open a support ticket for a quota increase.

11. Messages API vs Agent SDK vs Managed Agents — Decision Matrix

Dimension	Messages API	Agent SDK (self-hosted harness)	Managed Agents
Control	Full	High	Medium
Infra ownership	All yours	All yours	Anthropic
Time-to-production	Weeks–months	Days–weeks	Hours–days
Sandbox	DIY	DIY	Managed
Session persistence	DIY	DIY	Managed
Event stream	DIY	DIY	Managed SSE
Multi-agent	DIY	DIY	Built-in (preview)
Best for	Simple single-turn tools	Research, custom harnesses	Production agent products

12. ManoIT Production Checklist

Before going to production:

Pin anthropic-beta: managed-agents-2026-04-01 explicitly in every client.
Put all credentials in Vault — never in system prompts, never in environment variables inside the sandbox.
Implement Last-Event-ID resume on every SSE consumer.
Set network_allowlist as tight as the workload allows. Deny by default.
Gate callable_agents behind feature flags — it is research preview.
Log session.id, thread.id, and event.id on every inbound webhook; those are your correlation keys.
Budget guardrails: pre-compute cost per session and kill long-runners over threshold.
For multi-tenant SaaS: one Agent per tenant (Notion pattern) — do not share Agents across tenants.

13. Conclusion — Agent Infrastructure Is Standardizing

Claude Managed Agents is the first production-grade managed harness from a frontier lab. It doesn't replace the Messages API or the Agent SDK — it stacks on top, giving teams the choice between low-level control and hosted convenience. The three Brain/Hands/Session planes, the MCP connector model, and the Vault credential isolation pattern will almost certainly show up in competing platforms within the year. If you are shipping an AI agent product in 2026, evaluate Managed Agents before you write another line of harness code.

This post was originally published on the ManoIT Tech Blog in Korean. The English version was adapted for dev.to. Written with assistance from Claude (Anthropic).

Originally published at ManoIT Tech Blog.

DEV Community