DEV Community

daniel jeong
daniel jeong

Posted on • Originally published at manoit.co.kr

Claude Managed Agents — The Complete Guide: Brain/Hands/Session Architecture, MCP Connectors, and Multi-Agent Orchestration

Claude Managed Agents — The Complete Guide: Brain/Hands/Session Decoupled Architecture, MCP Connectors, and Multi-Agent Orchestration for Production AI

Published on ManoIT Tech Blog (Korean original). On April 8, 2026 Anthropic launched Claude Managed Agents — a fully hosted agent harness that handles the execution loop, sandboxing, tool orchestration, session persistence, and event streaming. This post dissects the architecture, walks through MCP connector wiring, demonstrates multi-agent orchestration patterns, and analyzes three real production deployments (Notion, Rakuten, Sentry).


1. Why Managed Agents — The Structural Limits of Rolling Your Own Agent Loop

If you have ever shipped an agent built on top of the Messages API, you know the pattern: wrap a while loop around client.messages.create(), parse tool_use blocks, execute tools in a local sandbox, feed tool_result back in, repeat until stop_reason == "end_turn". It works — until it doesn't.

The failure modes are predictable: a runaway loop burns through your rate budget, a tool call hangs and there is no timeout, the sandbox leaks filesystem state between users, long-running sessions exceed context and lose memory, stream consumers drop events on reconnect, and observability boils down to log scraping. Every team that runs agents in production eventually builds the same harness — and every team's harness has different bugs.

Managed Agents is Anthropic's answer: a hosted runtime that owns the execution loop, the sandbox, session persistence, and the event stream, so your application owns only the product logic.

2. Architecture Deep Dive — Brain / Hands / Session

The platform splits a running agent into three decoupled planes:

Plane Responsibility Anthropic Ops
Brain Inference — the LLM that reasons and decides Model inference (Opus 4.7 / Sonnet 4.6)
Hands Execution — the sandboxed environment that runs tools Ubuntu sandbox, file I/O, bash, network allowlist
Session State — the thread that accumulates messages, tool results, and events Persistent storage, event bus, stream replay

Why decoupling matters. In a homegrown harness the three planes collapse into one process. If inference is slow, tool execution queues. If a tool hangs, the loop stalls. If the process dies, session state is gone. The Managed Agents topology isolates each plane so that p50 TTFT drops ~60% and p95 TTFT drops >90% compared to naive loops, because tool execution and model inference overlap, and streams survive process restarts.

3. The Four Core Resources

Every Managed Agents deployment revolves around four REST resources:

Resource Purpose Key fields
Agent The blueprint — model, tools, system prompt, beta flags, MCP servers model, toolset, system, mcp_servers, vault_ids
Environment The sandbox template — base image, resource limits, network policy base_image, cpu, memory, network_allowlist
Session A running instance — one Agent in one Environment, with a thread agent_id, environment_id, thread_id
Events The stream — every model output, tool call, result, and status change SSE stream, typed event payloads

The required beta header is managed-agents-2026-04-01, and the default toolset is agent_toolset_20260401 which bundles bash, read, write, edit, glob, grep, web_search, and web_fetch.

4. Quickstart — Your First Session in 10 Minutes

Below is the minimum code to create an Agent, an Environment, and a Session, then send it a message and consume the event stream.

import anthropic

client = anthropic.Anthropic(
    default_headers={"anthropic-beta": "managed-agents-2026-04-01"}
)

# 1) Define the Agent (the blueprint)
agent = client.agents.create(
    model="claude-opus-4-7",
    toolset="agent_toolset_20260401",
    system="You are a production-grade coding agent. Always write tests.",
)

# 2) Define the Environment (the sandbox template)
env = client.environments.create(
    base_image="ubuntu-24.04-dev",
    cpu=2,
    memory_mb=4096,
    network_allowlist=["github.com", "pypi.org"],
)

# 3) Launch a Session
session = client.sessions.create(
    agent_id=agent.id,
    environment_id=env.id,
)

# 4) Send a message and stream events
stream = client.sessions.messages.send(
    session_id=session.id,
    content="Clone psf/requests, find the top 3 slowest tests, propose fixes.",
    stream=True,
)

for event in stream:
    if event.type == "agent.thread_message_received":
        print(event.data.content)
Enter fullscreen mode Exit fullscreen mode

The session keeps running after the initial turn — you can attach more messages to the same thread_id and the agent resumes with full memory of earlier turns.

5. MCP Connectors — The Standard Interface for External Systems

Managed Agents natively supports MCP (Model Context Protocol) — but only the remote HTTP streamable transport, not stdio. Credentials live in Anthropic's Vault and never enter the sandbox.

vault_entry = client.vaults.secrets.create(
    name="github-pat",
    value=os.environ["GITHUB_PAT"],
)

agent = client.agents.create(
    model="claude-sonnet-4-6",
    toolset="agent_toolset_20260401",
    mcp_servers=[
        {
            "name": "github",
            "url": "https://mcp.github.com/streamable",
            "vault_ids": [vault_entry.id],
        },
        {
            "name": "linear",
            "url": "https://mcp.linear.app/streamable",
            "vault_ids": [linear_secret.id],
        },
    ],
    system="You triage GitHub issues and sync status to Linear.",
)
Enter fullscreen mode Exit fullscreen mode

The credential isolation pattern is the killer feature: your production PAT never touches a sandbox process, so a prompt injection that convinces the agent to cat ~/.env returns nothing because the token lives one network hop away.

6. Multi-Agent — Supervisor Pattern over Session Threads

Managed Agents implements multi-agent orchestration through callable_agents (currently one-level delegation only, research preview). The supervisor receives the top-level goal and delegates subtasks to specialized agents over session threads.

reviewer = client.agents.create(
    model="claude-sonnet-4-6",
    toolset="agent_toolset_20260401",
    system="You review diffs for security, performance, and style. Be specific.",
)

test_writer = client.agents.create(
    model="claude-sonnet-4-6",
    toolset="agent_toolset_20260401",
    system="You write pytest tests with >80% coverage. Include edge cases.",
)

orchestrator = client.agents.create(
    model="claude-opus-4-7",
    toolset="agent_toolset_20260401",
    callable_agents=[reviewer.id, test_writer.id],
    system=(
        "You orchestrate code review. Delegate review to the reviewer agent, "
        "delegate test writing to the test_writer agent, then synthesize."
    ),
)
Enter fullscreen mode Exit fullscreen mode

Event types like session.thread_created and agent.thread_message_sent let you observe delegation as it happens. Rakuten built their Slack/Teams bot network on this pattern — one supervisor per workspace, domain sub-agents per channel, full deployment in one week.

7. Event Stream Design — Production SSE Consumption

The event stream is SSE-based and resumable. A production consumer must handle reconnect, replay, and backpressure.

import time
import httpx

def consume(session_id: str, last_event_id: str | None = None):
    backoff = 1.0
    while True:
        try:
            headers = {
                "anthropic-version": "2023-06-01",
                "anthropic-beta": "managed-agents-2026-04-01",
                "Accept": "text/event-stream",
            }
            if last_event_id:
                headers["Last-Event-ID"] = last_event_id

            with httpx.stream(
                "GET",
                f"https://api.anthropic.com/v1/sessions/{session_id}/events",
                headers=headers,
                timeout=None,
            ) as r:
                for event in r.iter_sse():
                    last_event_id = event.id
                    yield event
                    backoff = 1.0
        except httpx.HTTPError:
            time.sleep(min(backoff, 30))
            backoff *= 2
Enter fullscreen mode Exit fullscreen mode

Key event types to handle: session.thread_created, session.thread_idle, agent.thread_message_sent, agent.thread_message_received, agent.tool_use_started, agent.tool_use_completed, session.closed.

8. Research Preview Features — Outcomes, Memory, Multiagent

Feature What it does Status
Outcomes Structured success/failure signals the agent self-reports for eval pipelines Research preview
Memory Persistent semantic memory that survives session boundaries Research preview
Multiagent callable_agents — 1-level delegation between agents Research preview

All three require an allowlist opt-in. Expect breaking changes. Production-critical work should pin to GA features only.

9. Production Case Studies — Notion, Rakuten, Sentry

Notion — Workspace agents. Every Notion workspace gets a dedicated agent that reads from the workspace databases via the Notion MCP server. Users chat with it to summarize projects, draft pages, and route action items. Key insight: one agent per workspace gives durable per-tenant memory without cross-tenant leakage.

Rakuten — Slack/Teams sub-agents per domain. A supervisor agent fans out to domain-specialist sub-agents (logistics, finance, HR). They deployed in one week because Managed Agents erased the entire harness layer. Key insight: fast time-to-value comes from offloading infra, not from writing clever prompts.

Sentry — Debug + PR agent. When an error hits production, a session launches that reads Sentry context via MCP, clones the repo, reproduces the bug, proposes a fix, and opens a PR. Key insight: long-running sessions + event streams let engineers watch the agent work and intervene mid-flight.

10. Cost Model and Budget Design

Pricing is transparent: $0.08 per session-hour for compute plus standard token costs for inference. A ballpark estimator:

def estimate_monthly_cost(
    sessions_per_day: int,
    avg_session_minutes: float,
    avg_input_tokens: int,
    avg_output_tokens: int,
    model: str = "opus-4-7",
) -> float:
    prices = {
        "opus-4-7":   {"in": 15.0 / 1_000_000, "out": 75.0 / 1_000_000},
        "sonnet-4-6": {"in":  3.0 / 1_000_000, "out": 15.0 / 1_000_000},
    }
    p = prices[model]
    days = 30
    session_hours = sessions_per_day * (avg_session_minutes / 60) * days
    compute_cost = session_hours * 0.08
    inference_cost = sessions_per_day * days * (
        avg_input_tokens * p["in"] + avg_output_tokens * p["out"]
    )
    return compute_cost + inference_cost
Enter fullscreen mode Exit fullscreen mode

Rate limits: 60 RPM for create endpoints, 600 RPM for read endpoints. Above that, open a support ticket for a quota increase.

11. Messages API vs Agent SDK vs Managed Agents — Decision Matrix

Dimension Messages API Agent SDK (self-hosted harness) Managed Agents
Control Full High Medium
Infra ownership All yours All yours Anthropic
Time-to-production Weeks–months Days–weeks Hours–days
Sandbox DIY DIY Managed
Session persistence DIY DIY Managed
Event stream DIY DIY Managed SSE
Multi-agent DIY DIY Built-in (preview)
Best for Simple single-turn tools Research, custom harnesses Production agent products

12. ManoIT Production Checklist

Before going to production:

  1. Pin anthropic-beta: managed-agents-2026-04-01 explicitly in every client.
  2. Put all credentials in Vault — never in system prompts, never in environment variables inside the sandbox.
  3. Implement Last-Event-ID resume on every SSE consumer.
  4. Set network_allowlist as tight as the workload allows. Deny by default.
  5. Gate callable_agents behind feature flags — it is research preview.
  6. Log session.id, thread.id, and event.id on every inbound webhook; those are your correlation keys.
  7. Budget guardrails: pre-compute cost per session and kill long-runners over threshold.
  8. For multi-tenant SaaS: one Agent per tenant (Notion pattern) — do not share Agents across tenants.

13. Conclusion — Agent Infrastructure Is Standardizing

Claude Managed Agents is the first production-grade managed harness from a frontier lab. It doesn't replace the Messages API or the Agent SDK — it stacks on top, giving teams the choice between low-level control and hosted convenience. The three Brain/Hands/Session planes, the MCP connector model, and the Vault credential isolation pattern will almost certainly show up in competing platforms within the year. If you are shipping an AI agent product in 2026, evaluate Managed Agents before you write another line of harness code.


This post was originally published on the ManoIT Tech Blog in Korean. The English version was adapted for dev.to. Written with assistance from Claude (Anthropic).


Originally published at ManoIT Tech Blog.

Top comments (0)