Hassann

Posted on May 19 • Originally published at apidog.com

Claude Managed Agents vs Agent SDK (2026): Which to Choose

You’ve decided to ship a production AI agent on Claude. The first architecture decision is where the agent loop runs: in Anthropic’s hosted runtime with Claude Managed Agents, or inside your own service with the Claude Agent SDK. The choice affects data residency, cost, observability, tool execution, and who gets paged when a tool call hangs.

Try Apidog today

TL;DR

Use Claude Managed Agents when you want Anthropic to host the agent loop, sandbox, and session state for long-running or asynchronous jobs.

Use the Claude Agent SDK when you need the loop inside your own process, closer to your tools, filesystem, private services, and compliance controls.

Both use Claude models and support MCP. The practical decision is operational ownership.

Introduction

In 2026, building an AI agent no longer means wrapping a chat completion in a while loop. Anthropic gives you two production paths:

Claude Managed Agents: a hosted REST API where Anthropic runs the loop, sandbox, and session state.
Claude Agent SDK: a Python or TypeScript library where the loop runs inside your own service.

Most useful agents call APIs: payments, ticketing, inventory, pricing, logs, internal admin tools, and MCP servers. That means your agent is only as reliable as the APIs it calls.

Before you choose a runtime, make sure you can design, mock, test, and debug those APIs under agent-style traffic. A platform like Apidog helps you mock dependencies, run contract tests, and validate MCP servers before an agent touches production data.

For more background on the hosted option, see the Claude Managed Agents guide.

What Claude Managed Agents is

Claude Managed Agents is a hosted agent runtime. Instead of building your own loop, sandbox, session store, and execution environment, you configure an agent and let Anthropic run it.

It launched in public beta in April 2026 and requires the managed-agents-2026-04-01 beta header on requests. The official SDK can set that header for you.

Managed Agents is built around four concepts:

Concept	What it means
Agent	Model, system prompt, tools, MCP servers, and skills
Environment	Container template with installed packages and network rules
Session	A running agent instance for one task
Events	User messages, tool results, status updates, and streamed output

A typical flow looks like this:

Create agent
  -> Configure environment
  -> Start session
  -> Send user event
  -> Stream agent events
  -> Send tool results or follow-up events
  -> Fetch event history for audit/debugging

Managed Agents includes built-in tools such as:

Bash
File read/write/edit
Glob and grep
Web search and fetch
MCP server connections

It is a strong fit when you need:

Long-running execution
Asynchronous jobs
Stateful sessions
Hosted sandboxing
Less infrastructure to operate
A fetchable event log

It is also available on Claude Platform on AWS, with differences in feature availability and session behavior. Check the current docs if your deployment is cloud-constrained.

Two implementation details matter:

Custom tools still execute in your application. Claude requests the tool call, but your app performs the action and returns the result through the event stream.
Some features are gated. Outcomes and multi-agent capabilities are research-preview features behind separate access.

For the broader architectural pattern, see agentic AI architecture.

What the Claude Agent SDK is

The Claude Agent SDK is a Python and TypeScript library that runs the agent loop in your own process. It exposes the same kind of loop, built-in tools, and context management used by Claude Code.

Install it in your service:

pip install claude-agent-sdk

or:

npm install @anthropic-ai/claude-agent-sdk

With the SDK, your process owns:

The agent loop
Tool execution
Permissions
Session state
Logging
Sandbox strategy
Deployment and scaling

A minimal Python shape looks like this:

from claude_agent_sdk import query, ClaudeAgentOptions

options = ClaudeAgentOptions(
    allowed_tools=["Read", "Grep", "WebFetch"]
)

async for message in query(
    prompt="Review this API contract and identify breaking changes.",
    options=options,
):
    print(message)

The key difference from a plain client SDK is that you do not write the full tool_use loop yourself. The Agent SDK handles the loop and built-in tool execution.

The SDK includes:

Built-in tools: Read, Write, Edit, Bash, Glob, Grep, WebSearch, WebFetch, Monitor, and AskUserQuestion
Hooks: lifecycle callbacks such as PreToolUse, PostToolUse, Stop, SessionStart, SessionEnd, and UserPromptSubmit
Subagents: specialized agents for focused subtasks
MCP support: connect APIs, databases, browsers, and internal systems
Permissions: approve, block, or require approval for tools
Sessions: resume or fork context using local JSONL state

Example policy hook shape:

async def pre_tool_use_hook(event):
    if event.tool_name == "Bash" and "rm -rf" in event.input.get("command", ""):
        raise PermissionError("Blocked destructive shell command")

    if event.tool_name == "refund_payment":
        amount = event.input.get("amount", 0)
        if amount > 500:
            return {"requires_human_approval": True}

Because the loop runs locally, the SDK can also read Claude Code-style project configuration:

.claude/skills/
slash commands
CLAUDE.md
plugins

Authentication supports the Anthropic API, Amazon Bedrock, Claude Platform on AWS, Google Vertex AI, and Azure AI Foundry.

For setup examples, see setting up the Claude Agent SDK with a Claude plan and building your own Claude Code.

One billing detail to plan for: starting June 15, 2026, Agent SDK and claude -p usage on subscription plans draws from a separate monthly Agent SDK credit, distinct from interactive usage limits. Always verify current terms with Anthropic before budgeting.

Managed Agents vs Agent SDK

Check current prices on Anthropic’s pricing page and the Managed Agents docs before committing budget.

Dimension	Claude Managed Agents	Claude Agent SDK
Where the loop runs	Anthropic-managed infrastructure	Your process and infrastructure
Interface	REST API + SSE event stream	Python or TypeScript library
Control over loop	Configured and steered by events	Full in-process control
Cost model	Claude token rates + active session runtime fee	Claude token rates + your compute
Ops burden	Lower	Higher
Observability	Hosted event log	Your hooks, logs, and tracing
Latency profile	Hosted runtime network hop	You control proximity to tools/data
Data residency	Sandbox and session state in Anthropic/AWS environment	Tool execution and state stay with you
Custom tools	Your app executes and returns results over stream	In-process functions
Best fit	Long-running async agents	Private, regulated, or tightly controlled agents

Cost: runtime fee vs infrastructure cost

Managed Agents charges standard Claude token rates plus a runtime fee for active session time. A session that runs for a long time can accrue runtime cost even between tool calls.

The SDK has no Anthropic-managed runtime fee, but you pay for:

Worker nodes
Sandboxes
Queues
Autoscaling
Logs and traces
On-call support
Security controls

A simple way to evaluate cost:

Managed Agents cost =
  Claude tokens
  + active session runtime
  + custom tool infrastructure

Agent SDK cost =
  Claude tokens
  + application compute
  + sandbox/runtime infrastructure
  + engineering operations

The SDK may look cheaper until you include operational cost.

Data residency and compliance

This is often the deciding factor.

Use the Agent SDK if:

Session state cannot leave your infrastructure
Tools must run inside a VPC
Internal APIs are not internet-accessible
Regulated data cannot sit in a hosted sandbox
You need full audit control over every tool invocation

Use Managed Agents if your compliance posture allows Anthropic-hosted or AWS-hosted sandbox/session state and you value managed execution more than infrastructure control.

Observability model

Managed Agents gives you a hosted event log that you can fetch for debugging and audits.

With the SDK, you build the observability layer yourself. Use hooks to emit structured events:

{
  "event": "tool_call",
  "session_id": "sess_123",
  "tool": "refund_payment",
  "input_hash": "9f86d081",
  "status": "approved",
  "timestamp": "2026-04-18T10:15:00Z"
}

At minimum, log:

Prompt/session IDs
Tool name
Input schema version
Output schema version
Latency
Error class
Retry count
Approval decisions
Parent tool/subagent IDs

Testing the APIs your agents call

No matter which runtime you choose, test the dependencies first. A perfect reasoning loop still fails if the payments API, ticketing API, or MCP server returns unexpected data.

Test three layers.

1. API contracts

Every tool is an API with a schema. Mock it and assert request/response shapes.

For example, a refund tool contract might require:

{
  "transaction_id": "txn_123",
  "amount": 49.99,
  "currency": "USD",
  "reason": "duplicate_charge"
}

Expected response:

{
  "refund_id": "rf_456",
  "status": "accepted",
  "created_at": "2026-04-18T10:15:00Z"
}

With Apidog, you can mock payments or ticketing endpoints, define the schema, and run contract tests on a schedule. When the real service drifts, the test fails before the agent breaks production.

For a deeper workflow, see how to test AI agents that call APIs.

2. MCP servers

Both Managed Agents and the Agent SDK can use MCP. An MCP server is still a service, and it can fail in ordinary ways:

Tool name changes
Input schema changes
Output field removed
Timeout behavior changes
Error response becomes unstructured prose
Pagination changes
Auth behavior changes

Test the MCP server directly before connecting a live agent.

See MCP server testing with Apidog for a practical way to enumerate exposed tools and exercise them.

Apidog also includes an AI agent and A2A debugger so you can inspect the traffic an agent generates.

3. Agent request behavior

Agents do not call APIs like humans. They may:

Retry aggressively
Read partial data
Call the same endpoint repeatedly
Mix exploratory and mutating calls
Recover from errors in surprising ways

Replay realistic traffic against mocks before production.

Useful checks:

Does the agent retry idempotently?
Does it re-send mutation requests after a timeout?
Does it validate required fields before calling tools?
Does it stop after repeated 4xx errors?
Does it ask for approval before sensitive actions?
Does it handle pagination?
Does it handle partial failures?

Managed Agents hides the loop, so combine its event log with API-level tests. The SDK exposes the loop, so instrument it with hooks and still run the same API contract tests.

Either way, Download Apidog and put the agent’s dependencies under test before using real customer data.

Decision framework

Answer these questions in order.

Choose Claude Managed Agents if:

The agent runs for minutes or hours.
The work is asynchronous.
You do not want to operate a job runner, sandbox, and session store.
Your team is small and ops capacity is limited.
A hosted event log is enough for your audit/debugging needs.
Your data policy allows Anthropic-hosted or AWS-hosted session state.
You are comfortable with beta status and gated research-preview features.

Choose the Claude Agent SDK if:

The agent must run inside your VPC.
Tools need direct access to private databases or internal services.
Session state must stay on your infrastructure.
You need custom permissions and audit hooks.
You need in-process tool logic.
Regulatory constraints rule out a hosted sandbox.
You want to use Bedrock, Vertex, or Azure contracts while keeping the loop in-house.
You are prototyping locally against your filesystem.

Common migration path

A practical path is:

Prototype locally with Agent SDK
  -> Mock and contract-test APIs
  -> Validate tool behavior
  -> Decide whether managed hosting is acceptable
  -> Move to Managed Agents if ops savings justify migration

Do not treat this as a config switch. You are moving from a library model to REST + event streams, and custom tool execution works differently.

If you are also comparing agent/model options, see the Claude vs Codex comparison for 2026.

Real-world use cases

A payments refund agent

A fintech team wants an agent that can:

Read a support ticket.
Look up a transaction.
Check refund policy.
Issue the refund.
Write a summary back to the ticket.

This touches money, so every action needs a contract and audit trail.

The Agent SDK is the natural fit:

Run inside the VPC.
Keep session state internal.
Use PreToolUse hooks for approval.
Log every refund attempt.
Block dangerous or duplicate actions.

Example approval policy:

async def pre_tool_use(event):
    if event.tool_name != "issue_refund":
        return

    amount = event.input["amount"]

    if amount > 500:
        return {
            "status": "requires_approval",
            "reason": "Refund exceeds threshold"
        }

    if not event.input.get("transaction_id"):
        raise ValueError("transaction_id is required")

Before launch:

Mock payments and ledger APIs in Apidog.
Write contract tests for lookup and refund calls.
Replay historical tickets against mocks.
Verify retry behavior after timeouts.
Confirm the agent does not duplicate successful refunds after a 504.

That last case is exactly why API-level testing matters.

An asynchronous support-ticket triage agent

A SaaS company wants an agent to process thousands of tickets per day:

Classify the ticket.
Pull related logs.
Draft a response.
Resolve or escalate.

Each ticket takes a few minutes of tool calls, and the data is low-sensitivity.

Managed Agents fits well:

Long-running async work
Small team
No worker fleet to operate
Hosted event log per ticket
Stateful sessions

The team still tests dependencies:

Mock the logging API.
Contract-test the ticketing MCP server.
Validate schema changes before production.
Inspect agent-generated request traffic in Apidog.

Managed hosting reduces runtime work. It does not remove responsibility for API correctness.

An internal data-ops agent behind the firewall

A platform team wants an agent that responds to requests like:

Back-fill yesterday’s failed ETL partitions.

The agent needs to:

Query an internal job API.
Run remediation scripts.
Report job status.
Log all actions.

The internal services are private and sensitive.

The Agent SDK wins by requirement:

It runs where private services are reachable.
Session state stays internal.
Internal services can be exposed through MCP.
SDK hooks can log commands to the existing audit pipeline.

This is not a preference issue. The hosted sandbox cannot reach private systems unless you expose them, which may violate the security model.

For context on why agents are becoming major API consumers, see AI agents as the new API consumers.

Implementation checklist

Before shipping either option, verify:

[ ] Runtime choice documented: Managed Agents or SDK
[ ] Data residency reviewed
[ ] Tool permissions defined
[ ] Human approval rules implemented
[ ] API contracts mocked
[ ] MCP tools tested directly
[ ] Retry behavior tested
[ ] Mutating calls made idempotent
[ ] Session/event logs available
[ ] Error handling tested
[ ] Pricing verified from Anthropic source
[ ] Beta feature availability checked
[ ] Incident ownership assigned

Conclusion

The Managed Agents vs Agent SDK decision is mostly about operations and data governance.

Carry away these rules:

Managed Agents hosts the loop and sandbox.
The Agent SDK runs the loop in your process.
Managed Agents reduces ops burden but moves session state into hosted infrastructure.
The SDK gives control but requires you to operate the runtime.
Data residency often decides the architecture.
Cost depends on workload shape, not only token pricing.
API and MCP testing are required either way.

Next step: before wiring an agent to customer-facing systems, put its API and MCP dependencies under test. Download Apidog to mock endpoints, run contract tests, and debug the agent’s real request traffic.

FAQ

What’s the core difference between Claude Managed Agents and the Claude Agent SDK?

Managed Agents is a hosted REST API where Anthropic runs the agent loop and per-session sandbox. The Agent SDK is a Python or TypeScript library that runs the loop inside your own process.

Same Claude models. Different operational ownership.

Is the Claude Agent SDK the same as the old Claude Code SDK?

Yes. The Claude Code SDK was renamed to the Claude Agent SDK to reflect broader agent use cases beyond coding tasks.

Which option is cheaper?

It depends on workload shape.

Managed Agents charges standard Claude token rates plus active session runtime. The SDK has no hosted runtime fee, but you pay for compute, scaling, sandboxing, and operations.

Check Anthropic’s current pricing before budgeting.

Can I use MCP servers with both?

Yes. Both support MCP.

Test MCP servers before connecting them to either runtime. The MCP server testing with Apidog guide shows how to exercise each exposed tool.

How do I keep customer data out of Anthropic’s infrastructure?

Use the Agent SDK and run the loop in your own environment. Tool execution and session state stay on your infrastructure.

With Managed Agents, sandbox and event log state live in Anthropic’s environment or the AWS option, subject to current availability and constraints.

Is Claude Managed Agents production-ready?

Claude Managed Agents launched in public beta in April 2026 and requires the managed-agents-2026-04-01 beta header. Some capabilities, such as outcomes and multi-agent features, are gated behind research-preview access.

Check the current docs before production use.

How do I test an agent before it touches real APIs?

Mock every API and MCP server the agent calls. Then:

Write contract tests.
Replay realistic traffic.
Inspect actual agent requests.
Validate retries and idempotency.
Test error paths.

Apidog supports mocks, contract testing, and AI agent/A2A debugging. See how to test AI agents that call APIs.

Can I start on one option and switch later?

Yes, but it is a migration project.

A common path is to prototype with the Agent SDK locally, then move to Managed Agents if hosted execution is a better production fit. Plan for interface changes, tool execution differences, and session-state migration.

DEV Community