Originally published on NextFuture
TL;DR: Claude Managed Agents is Anthropic's new hosted agent execution environment (public beta, April 2026) that lets developers build and deploy AI agents on the cloud without managing their own runtime, sandboxing, or tool execution infrastructure. You define the agent — Anthropic handles the rest. This deep dive covers the architecture, API, real-world pricing, and when you should (or shouldn't) use Managed Agents.
The Problem: Why Running AI Agents Is Hard
Over the past two years, building AI agents has become both popular and unnecessarily complex. Most teams end up solving the same infrastructure problems from scratch:
- **Context window management:** Long-running agents overflow context and need summarization or chunking strategies.
- **Safe tool execution:** Running LLM-generated code in production without getting exploited.
- **Long-running sessions:** The user closes the tab — but the agent needs to keep going. Where does state live?
- **Error recovery:** The 7th LLM call fails. Does the entire workflow retry from scratch?
- **Observability:** How do you debug when the agent does something unexpected?
Existing solutions like LangGraph, AutoGen, or custom Claude API harnesses all work — but they all require you to own and maintain the infrastructure. Claude Managed Agents is Anthropic's answer to this problem.
What Claude Managed Agents Actually Is
Managed Agents is not a new model or a chatbot. It's a hosted agent execution environment — Anthropic provides the full runtime for running agent loops, and you only write logic at a high level.
In simpler terms: instead of writing while agent.is_running(): response = claude.call(...); execute_tools(response), you declare the agent once and call an API to assign tasks. Anthropic handles all the orchestration.
"We want to decouple the brain (Claude) from the hands (tool execution infrastructure). Managed Agents is the infrastructure layer." — Anthropic Engineering Blog
Architecture: Brain vs. Hands
Anthropic describes the architecture using a brain/hands separation model:
Layer
Responsible Party
Example
**Brain (Reasoning)**
Claude model
Decides which tool to call and with what parameters
**Hands (Execution)**
Managed Agents runtime
Runs bash, reads files, calls web search inside a sandbox
**Orchestration**
Managed Agents harness
Manages context, retries, and checkpointing
**Your code**
Developer
Declares the agent, sends tasks, reads results
When you create a session and send a task, the execution flow looks like this:
- Task is received by the Managed Agents runtime
- Runtime spins up a sandboxed environment (isolated container)
- Claude receives the task, system prompt, and tool definitions
- Claude responds with tool calls → runtime executes them → results return to Claude
- The loop continues until Claude completes the task or hits limits
- Checkpoints are saved after each significant step
- Final output is returned via SSE streaming or polling
Core Features (Generally Available)
1. Sandboxed Execution
All tool execution happens inside an isolated container. Agents can run bash commands, read and write files, and install packages — but cannot affect the host system or other sessions. Each session has its own file system and network namespace.
2. Long-Running Sessions
Sessions can run for hours, even when the client disconnects. When you reconnect, pending outputs are delivered via the SSE event stream. This is the most critical feature for production workflows.
3. Automatic Checkpointing
The runtime automatically saves checkpoints after major tool execution steps. If a session crashes or times out, you can resume from the last checkpoint instead of starting over.
4. Credential Management
Secrets (API keys, tokens) are injected into the sandbox via an encrypted vault — agents can use them but cannot exfiltrate the actual values.
5. Built-in Agent Toolset
Use the agent_toolset_20260401 tool type to enable the full default tool suite: bash, file operations, web search, web fetch, and code execution (Python/JS). No need to define individual tools.
Research Preview Features (Access Required)
Outcomes API
Instead of saying "do X", you declare the desired outcome and success criteria. Claude self-evaluates and iterates until it gets there. Think of it as writing test cases instead of implementation instructions.
Multi-Agent Orchestration
An orchestrator agent can spawn and coordinate multiple sub-agents in parallel. Managed Agents handles communication and state sharing between agents.
Persistent Memory
Agents can read and write to a memory store that persists across sessions. The most obvious use case: agents that remember user context across multiple interactions.
API and Code Examples
All Managed Agents API requests require the beta header anthropic-beta: managed-agents-2026-04-01. The Python SDK adds this automatically when using client.beta.
Create an Agent Definition
import anthropic
client = anthropic.Anthropic()
agent = client.beta.agents.create(
name="Code Review Agent",
model="claude-opus-4-6",
system="""You are an expert code reviewer.
Analyze the provided code for bugs, security issues, and style problems.
Always provide specific line numbers and actionable suggestions.""",
tool_choice={"type": "agent_toolset", "version": "20260401"},
)
Create an Environment and Session
# Environment defines the sandbox configuration
env = client.beta.environments.create(
name="code-review-env",
compute={"cpu": 2, "memory_gb": 4},
secrets=[
{"name": "GITHUB_TOKEN", "value": "ghp_xxxx"}
]
)
# Session is a specific execution instance
session = client.beta.sessions.create(
agent_id=agent.id,
environment_id=env.id,
metadata={"user_id": "user_123"}
)
Send a Task and Stream Results
# Send the task
message = client.beta.sessions.messages.create(
session_id=session.id,
content="Review this PR: https://github.com/org/repo/pull/42"
)
# Stream output via SSE
with client.beta.sessions.stream(session.id) as stream:
for event in stream:
if event.type == "content_block_delta":
print(event.delta.text, end="", flush=True)
elif event.type == "session_completed":
print("
✅ Done")
break
Resume a Session After Disconnect
# Fetch pending outputs after reconnecting
outputs = client.beta.sessions.outputs.list(
session_id=session.id,
since_sequence=last_seen_sequence
)
for output in outputs:
print(output.content)
Pricing: Real-World Cost Breakdown
Claude Managed Agents has two cost components:
Component
Rate
Notes
**Token usage**
Standard Claude Platform rates
Input/output tokens billed per model
**Runtime**
**$0.08 / session-hour**
Only charged when the session is active, not idle
To put it in perspective: a complex 30-minute task (0.5h) with claude-opus-4-6 costs ~$0.04 in runtime fees plus token cost. Switching to claude-haiku-4-5 significantly reduces token costs while runtime fees stay constant.
Cost optimization tip: Use claude-haiku-4-5 for simple sub-tasks and reserve Opus for complex reasoning. A multi-agent pattern with model mixing can reduce token costs by 60–70%.
Managed Agents vs. Building Your Own Agent Loop
Criteria
Managed Agents
Self-hosted (LangGraph / Custom)
**Time to first agent**
~30 minutes
1–2 weeks
**Sandboxing**
Built-in, hardened
DIY (Docker, gVisor, etc.)
**Long-running sessions**
Native support
Requires Redis + websocket management
**Scaling**
Auto-scales
You provision infrastructure
**Vendor lock-in**
High (Anthropic-only)
Low (portable)
**Customization**
Limited to the API surface
Full control
**Cost predictability**
Moderate (runtime fee adds up)
Higher upfront, but controllable
**Observability**
Built-in execution tracing
DIY (Langfuse, etc.)
Best Use Cases
Managed Agents shines in these scenarios:
- **Internal dev tools:** Code review agents, CI/CD automation, documentation generators
- **Data processing pipelines:** Agents that analyze reports and synthesize data from multiple sources
- **Research automation:** Web research + synthesis + structured output
- **Rapid prototyping:** Proof-of-concept agents in hours instead of days
- **Teams without DevOps:** Startups and indie developers who don't want to manage Kubernetes
Conversely, avoid Managed Agents when:
- You need fine-grained control over the execution environment
- Compliance requires data to never leave your on-premise infrastructure
- You want to use models other than Claude (GPT-4, Gemini)
- Cost is the top priority at large scale
Hands-On: Build a PR Review Agent in 30 Minutes
Here's a complete working agent that reviews GitHub Pull Requests:
import anthropic
import os
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
def create_pr_review_agent():
return client.beta.agents.create(
name="PR Review Bot",
model="claude-opus-4-6",
system="""You are a senior software engineer conducting code reviews.
For each PR:
1. Fetch the diff using the GitHub CLI (gh pr diff )
2. Identify bugs, security issues, and performance problems
3. Check for test coverage
4. Provide constructive, specific feedback with line references
5. Rate severity: CRITICAL / MAJOR / MINOR / SUGGESTION
Always end with a summary table.""",
tool_choice={"type": "agent_toolset", "version": "20260401"},
)
def review_pr(agent_id: str, env_id: str, pr_url: str) -> str:
session = client.beta.sessions.create(
agent_id=agent_id,
environment_id=env_id,
)
client.beta.sessions.messages.create(
session_id=session.id,
content=f"Please review this pull request: {pr_url}"
)
result = []
with client.beta.sessions.stream(session.id) as stream:
for event in stream:
if event.type == "content_block_delta":
result.append(event.delta.text)
elif event.type == "session_completed":
break
return "".join(result)
# One-time setup
agent = create_pr_review_agent()
env = client.beta.environments.create(
name="pr-review",
secrets=[{"name": "GITHUB_TOKEN", "value": os.environ["GITHUB_TOKEN"]}]
)
# Usage
review = review_pr(agent.id, env.id, "https://github.com/myorg/myrepo/pull/123")
print(review)
Community Reactions: What Developers Actually Think
After one week of public beta, the developer community has had some notable reactions:
Positive: Startups and indie hackers are particularly enthusiastic about the onboarding speed. One developer on Hacker News reported going from "zero to working agent" in 45 minutes — compared to 3 days with a self-hosted approach.
Concerns: Enterprise users are worried about vendor lock-in and data residency. Managed Agents currently doesn't support VPC peering or private endpoints — all traffic goes through Anthropic's public infrastructure.
Pricing feedback: The $0.08/session-hour rate has received mixed reactions. For simple tasks (<5 minutes), the overhead is negligible. For long-running research agents (4–8 hours), runtime cost can exceed token cost.
What's Coming Next
Based on documentation signals and Anthropic's engineering blog, features in development include:
- Private networking: Agents connecting to internal services via VPN or private link
Custom tool registration: Register your own tools for agents to use as built-ins
Agent marketplace: Share and reuse agent definitions
Outcomes API GA: Automated output evaluation against success criteria
Regional deployments: EU and Asia regions for compliance requirements
Final Verdict
Claude Managed Agents solves a real problem and solves it well. If you're spending more time on agent infrastructure than agent logic, that's a clear signal to try Managed Agents. The current beta is stable enough for small-to-medium production use cases.
That said, for teams with data sovereignty requirements, multi-model needs, or extreme cost optimization at scale — self-hosting is still the right call. Managed Agents isn't a silver bullet, but it's an excellent fit for the right use case.
Anthropic is directly competing with AWS Bedrock Agents and Google Vertex AI Agents in this segment. With advantages in model quality and developer experience, Managed Agents has real potential to become the standard deployment target for Claude-based agents in 2026.
To get started, visit platform.claude.com/docs/en/managed-agents/quickstart and request beta access. There's currently no waitlist — you can start immediately with an existing API key.
This article was originally published on NextFuture. Follow us for more fullstack & AI engineering content.
Top comments (4)
The “brain versus hands” framing is the clearest explanation I’ve seen of why managed agent runtimes are compelling even for teams that could build their own. The 30-minute vs. 1-2 week comparison is real — we’ve seen that gap play out exactly on teams that underestimate the work of safe tool sandboxing, session persistence, and credential management.
The enterprise data residency concern is the one I keep coming back to. The current story — where your data traverses Anthropic’s infrastructure even for on-prem-adjacent workloads — is going to be a hard blocker for financial services and healthcare customers regardless of contractual protections. The interesting design question is whether private VPC peering or bring-your-own-container options would actually solve the concern, or if the issue is the model itself being a remote API call.
For teams hitting the $0.08/session-hour ceiling: what’s your experience with session reuse patterns? Batching multiple short tasks into a single persistent session to amortize costs seems like the obvious optimization, but session state contamination is a real risk there.
The "brain vs. hands" framing is a useful shorthand — though one thing I'd add to the managed-vs-self-hosted calculus that often gets overlooked: latency variance, not just average latency.
Managed platforms optimize for p50. If your agent orchestration depends on tight coordination between multiple agents (where p99 latency of one agent becomes p50 of the next), variance compounds fast. We've seen workflows where managed infrastructure was fine in dev but fell apart under production load because tail latency on a single agent caused the entire chain to miss SLA.
The tradeoffs we've landed on: managed is excellent for exploratory work, burst capacity, and tasks where individual agent latency doesn't compound. Self-hosted wins when you need deterministic performance, data residency guarantees, or cost predictability at high volume. The cost model is also worth spelling out more explicitly — managed makes sense when agent invocations are infrequent, but the economics flip quickly as call volume scales past a few thousand daily invocations.
The "brain vs. hands" framing is genuinely useful for communicating this to teams evaluating managed agent platforms. Separating reasoning from execution also makes the pricing model more legible — you're paying for the execution environment, not the cognition itself.
The vendor lock-in concern is legitimate but I think it's often miscalibrated. The real lock-in isn't "Anthropic hosts my agents." It's "my domain logic is embedded in the agent's system prompt and tool schema rather than in ordinary functions the agent calls." That's what's actually expensive to migrate. The mitigation isn't avoiding managed infrastructure; it's keeping business logic in plain, testable functions — the agent is just the router that invokes them.
One thing I'm curious about: how does checkpointing work when a parent agent spawns parallel sub-agents? If a task fans out to 5 sub-agents, do they each get independent checkpoint contexts, or does recovery require the entire fan-out to restart? That's where the session model gets interesting for anything with meaningful concurrency.
Good writeup — the PR review example is a solid concrete use case that makes the abstract infrastructure story tangible.
The sandboxing and session persistence pieces are exactly where teams struggle most in DIY setups — you've nailed the core value proposition.
One angle I'd add to the "why Managed Agents" case: it meaningfully changes your security threat model. When you run agents on your own infrastructure, you're responsible for the isolation boundary between the agent's execution environment and your production systems. With managed execution, Anthropic handles that layer — and their threat model is more sophisticated than what most engineering teams would implement themselves (credential isolation, filesystem restrictions, network egress controls).
The failure pattern we see most in self-hosted setups is agents "escaping" their intended scope — a research agent with filesystem access starts writing to unintended locations, or a coding agent with shell access runs commands it wasn't supposed to. Managed environments constrain this at the infrastructure level, not the prompt level, which is a fundamentally stronger guarantee.
Curious about the egress controls side: can you configure which external APIs the agent is allowed to call from within the managed environment, or is that still left to the developer to enforce inside the tool definitions?
Some comments may only be visible to logged-in visitors. Sign in to view all comments.