DEV Community

BeanBean
BeanBean

Posted on • Originally published at nextfuture.io.vn

Claude Managed Agents Deep Dive: Anthropic's New AI Agent Infrastructure (2026)

Originally published on NextFuture

TL;DR: Claude Managed Agents is Anthropic's new hosted agent execution environment (public beta, April 2026) that lets developers build and deploy AI agents on the cloud without managing their own runtime, sandboxing, or tool execution infrastructure. You define the agent — Anthropic handles the rest. This deep dive covers the architecture, API, real-world pricing, and when you should (or shouldn't) use Managed Agents.

The Problem: Why Running AI Agents Is Hard

Over the past two years, building AI agents has become both popular and unnecessarily complex. Most teams end up solving the same infrastructure problems from scratch:

- **Context window management:** Long-running agents overflow context and need summarization or chunking strategies.

- **Safe tool execution:** Running LLM-generated code in production without getting exploited.

- **Long-running sessions:** The user closes the tab — but the agent needs to keep going. Where does state live?

- **Error recovery:** The 7th LLM call fails. Does the entire workflow retry from scratch?

- **Observability:** How do you debug when the agent does something unexpected?
Enter fullscreen mode Exit fullscreen mode

Existing solutions like LangGraph, AutoGen, or custom Claude API harnesses all work — but they all require you to own and maintain the infrastructure. Claude Managed Agents is Anthropic's answer to this problem.

What Claude Managed Agents Actually Is

Managed Agents is not a new model or a chatbot. It's a hosted agent execution environment — Anthropic provides the full runtime for running agent loops, and you only write logic at a high level.

In simpler terms: instead of writing while agent.is_running(): response = claude.call(...); execute_tools(response), you declare the agent once and call an API to assign tasks. Anthropic handles all the orchestration.

"We want to decouple the brain (Claude) from the hands (tool execution infrastructure). Managed Agents is the infrastructure layer." — Anthropic Engineering Blog

Architecture: Brain vs. Hands

Anthropic describes the architecture using a brain/hands separation model:

    Layer
    Responsible Party
    Example




    **Brain (Reasoning)**
    Claude model
    Decides which tool to call and with what parameters


    **Hands (Execution)**
    Managed Agents runtime
    Runs bash, reads files, calls web search inside a sandbox


    **Orchestration**
    Managed Agents harness
    Manages context, retries, and checkpointing


    **Your code**
    Developer
    Declares the agent, sends tasks, reads results
Enter fullscreen mode Exit fullscreen mode

When you create a session and send a task, the execution flow looks like this:

- Task is received by the Managed Agents runtime

- Runtime spins up a sandboxed environment (isolated container)

- Claude receives the task, system prompt, and tool definitions

- Claude responds with tool calls → runtime executes them → results return to Claude

- The loop continues until Claude completes the task or hits limits

- Checkpoints are saved after each significant step

- Final output is returned via SSE streaming or polling
Enter fullscreen mode Exit fullscreen mode

Core Features (Generally Available)

1. Sandboxed Execution

All tool execution happens inside an isolated container. Agents can run bash commands, read and write files, and install packages — but cannot affect the host system or other sessions. Each session has its own file system and network namespace.

2. Long-Running Sessions

Sessions can run for hours, even when the client disconnects. When you reconnect, pending outputs are delivered via the SSE event stream. This is the most critical feature for production workflows.

3. Automatic Checkpointing

The runtime automatically saves checkpoints after major tool execution steps. If a session crashes or times out, you can resume from the last checkpoint instead of starting over.

4. Credential Management

Secrets (API keys, tokens) are injected into the sandbox via an encrypted vault — agents can use them but cannot exfiltrate the actual values.

5. Built-in Agent Toolset

Use the agent_toolset_20260401 tool type to enable the full default tool suite: bash, file operations, web search, web fetch, and code execution (Python/JS). No need to define individual tools.

Research Preview Features (Access Required)

Outcomes API

Instead of saying "do X", you declare the desired outcome and success criteria. Claude self-evaluates and iterates until it gets there. Think of it as writing test cases instead of implementation instructions.

Multi-Agent Orchestration

An orchestrator agent can spawn and coordinate multiple sub-agents in parallel. Managed Agents handles communication and state sharing between agents.

Persistent Memory

Agents can read and write to a memory store that persists across sessions. The most obvious use case: agents that remember user context across multiple interactions.

API and Code Examples

All Managed Agents API requests require the beta header anthropic-beta: managed-agents-2026-04-01. The Python SDK adds this automatically when using client.beta.

Create an Agent Definition

import anthropic

client = anthropic.Anthropic()

agent = client.beta.agents.create(
    name="Code Review Agent",
    model="claude-opus-4-6",
    system="""You are an expert code reviewer.
    Analyze the provided code for bugs, security issues, and style problems.
    Always provide specific line numbers and actionable suggestions.""",
    tool_choice={"type": "agent_toolset", "version": "20260401"},
)
Enter fullscreen mode Exit fullscreen mode

Create an Environment and Session

# Environment defines the sandbox configuration
env = client.beta.environments.create(
    name="code-review-env",
    compute={"cpu": 2, "memory_gb": 4},
    secrets=[
        {"name": "GITHUB_TOKEN", "value": "ghp_xxxx"}
    ]
)

# Session is a specific execution instance
session = client.beta.sessions.create(
    agent_id=agent.id,
    environment_id=env.id,
    metadata={"user_id": "user_123"}
)
Enter fullscreen mode Exit fullscreen mode

Send a Task and Stream Results

# Send the task
message = client.beta.sessions.messages.create(
    session_id=session.id,
    content="Review this PR: https://github.com/org/repo/pull/42"
)

# Stream output via SSE
with client.beta.sessions.stream(session.id) as stream:
    for event in stream:
        if event.type == "content_block_delta":
            print(event.delta.text, end="", flush=True)
        elif event.type == "session_completed":
            print("
✅ Done")
            break
Enter fullscreen mode Exit fullscreen mode

Resume a Session After Disconnect

# Fetch pending outputs after reconnecting
outputs = client.beta.sessions.outputs.list(
    session_id=session.id,
    since_sequence=last_seen_sequence
)

for output in outputs:
    print(output.content)
Enter fullscreen mode Exit fullscreen mode

Pricing: Real-World Cost Breakdown

Claude Managed Agents has two cost components:

    Component
    Rate
    Notes




    **Token usage**
    Standard Claude Platform rates
    Input/output tokens billed per model


    **Runtime**
    **$0.08 / session-hour**
    Only charged when the session is active, not idle
Enter fullscreen mode Exit fullscreen mode

To put it in perspective: a complex 30-minute task (0.5h) with claude-opus-4-6 costs ~$0.04 in runtime fees plus token cost. Switching to claude-haiku-4-5 significantly reduces token costs while runtime fees stay constant.

Cost optimization tip: Use claude-haiku-4-5 for simple sub-tasks and reserve Opus for complex reasoning. A multi-agent pattern with model mixing can reduce token costs by 60–70%.

Managed Agents vs. Building Your Own Agent Loop

    Criteria
    Managed Agents
    Self-hosted (LangGraph / Custom)




    **Time to first agent**
    ~30 minutes
    1–2 weeks


    **Sandboxing**
    Built-in, hardened
    DIY (Docker, gVisor, etc.)


    **Long-running sessions**
    Native support
    Requires Redis + websocket management


    **Scaling**
    Auto-scales
    You provision infrastructure


    **Vendor lock-in**
    High (Anthropic-only)
    Low (portable)


    **Customization**
    Limited to the API surface
    Full control


    **Cost predictability**
    Moderate (runtime fee adds up)
    Higher upfront, but controllable


    **Observability**
    Built-in execution tracing
    DIY (Langfuse, etc.)
Enter fullscreen mode Exit fullscreen mode

Best Use Cases

Managed Agents shines in these scenarios:

- **Internal dev tools:** Code review agents, CI/CD automation, documentation generators

- **Data processing pipelines:** Agents that analyze reports and synthesize data from multiple sources

- **Research automation:** Web research + synthesis + structured output

- **Rapid prototyping:** Proof-of-concept agents in hours instead of days

- **Teams without DevOps:** Startups and indie developers who don't want to manage Kubernetes
Enter fullscreen mode Exit fullscreen mode

Conversely, avoid Managed Agents when:

- You need fine-grained control over the execution environment

- Compliance requires data to never leave your on-premise infrastructure

- You want to use models other than Claude (GPT-4, Gemini)

- Cost is the top priority at large scale
Enter fullscreen mode Exit fullscreen mode

Hands-On: Build a PR Review Agent in 30 Minutes

Here's a complete working agent that reviews GitHub Pull Requests:

import anthropic
import os

client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

def create_pr_review_agent():
    return client.beta.agents.create(
        name="PR Review Bot",
        model="claude-opus-4-6",
        system="""You are a senior software engineer conducting code reviews.

        For each PR:
        1. Fetch the diff using the GitHub CLI (gh pr diff )
        2. Identify bugs, security issues, and performance problems
        3. Check for test coverage
        4. Provide constructive, specific feedback with line references
        5. Rate severity: CRITICAL / MAJOR / MINOR / SUGGESTION

        Always end with a summary table.""",
        tool_choice={"type": "agent_toolset", "version": "20260401"},
    )

def review_pr(agent_id: str, env_id: str, pr_url: str) -> str:
    session = client.beta.sessions.create(
        agent_id=agent_id,
        environment_id=env_id,
    )

    client.beta.sessions.messages.create(
        session_id=session.id,
        content=f"Please review this pull request: {pr_url}"
    )

    result = []
    with client.beta.sessions.stream(session.id) as stream:
        for event in stream:
            if event.type == "content_block_delta":
                result.append(event.delta.text)
            elif event.type == "session_completed":
                break

    return "".join(result)

# One-time setup
agent = create_pr_review_agent()
env = client.beta.environments.create(
    name="pr-review",
    secrets=[{"name": "GITHUB_TOKEN", "value": os.environ["GITHUB_TOKEN"]}]
)

# Usage
review = review_pr(agent.id, env.id, "https://github.com/myorg/myrepo/pull/123")
print(review)
Enter fullscreen mode Exit fullscreen mode

Community Reactions: What Developers Actually Think

After one week of public beta, the developer community has had some notable reactions:

Positive: Startups and indie hackers are particularly enthusiastic about the onboarding speed. One developer on Hacker News reported going from "zero to working agent" in 45 minutes — compared to 3 days with a self-hosted approach.

Concerns: Enterprise users are worried about vendor lock-in and data residency. Managed Agents currently doesn't support VPC peering or private endpoints — all traffic goes through Anthropic's public infrastructure.

Pricing feedback: The $0.08/session-hour rate has received mixed reactions. For simple tasks (<5 minutes), the overhead is negligible. For long-running research agents (4–8 hours), runtime cost can exceed token cost.

What's Coming Next

Based on documentation signals and Anthropic's engineering blog, features in development include:

- Private networking: Agents connecting to internal services via VPN or private link

  • Custom tool registration: Register your own tools for agents to use as built-ins

  • Agent marketplace: Share and reuse agent definitions

  • Outcomes API GA: Automated output evaluation against success criteria

  • Regional deployments: EU and Asia regions for compliance requirements

Enter fullscreen mode Exit fullscreen mode




Final Verdict

Claude Managed Agents solves a real problem and solves it well. If you're spending more time on agent infrastructure than agent logic, that's a clear signal to try Managed Agents. The current beta is stable enough for small-to-medium production use cases.

That said, for teams with data sovereignty requirements, multi-model needs, or extreme cost optimization at scale — self-hosting is still the right call. Managed Agents isn't a silver bullet, but it's an excellent fit for the right use case.

Anthropic is directly competing with AWS Bedrock Agents and Google Vertex AI Agents in this segment. With advantages in model quality and developer experience, Managed Agents has real potential to become the standard deployment target for Claude-based agents in 2026.

To get started, visit platform.claude.com/docs/en/managed-agents/quickstart and request beta access. There's currently no waitlist — you can start immediately with an existing API key.


This article was originally published on NextFuture. Follow us for more fullstack & AI engineering content.

Top comments (1)

Collapse
 
ali_muwwakkil_a776a21aa9c profile image
Ali Muwwakkil

A common pitfall with AI agents is over-relying on raw prompt engineering without integrating into existing workflows. In our experience with enterprise teams, success comes from embedding these agents into the daily ops using RAG architectures. This ensures that AI outputs are meaningful and actionable, not just impressive demos. - Ali Muwwakkil (ali-muwwakkil on LinkedIn)