You’ve decided to ship a production AI agent on Claude. The first architecture decision is where the agent loop runs: in Anthropic’s hosted runtime with Claude Managed Agents, or inside your own service with the Claude Agent SDK. The choice affects data residency, cost, observability, tool execution, and who gets paged when a tool call hangs.
TL;DR
Use Claude Managed Agents when you want Anthropic to host the agent loop, sandbox, and session state for long-running or asynchronous jobs.
Use the Claude Agent SDK when you need the loop inside your own process, closer to your tools, filesystem, private services, and compliance controls.
Both use Claude models and support MCP. The practical decision is operational ownership.
Introduction
In 2026, building an AI agent no longer means wrapping a chat completion in a while loop. Anthropic gives you two production paths:
- Claude Managed Agents: a hosted REST API where Anthropic runs the loop, sandbox, and session state.
- Claude Agent SDK: a Python or TypeScript library where the loop runs inside your own service.
Most useful agents call APIs: payments, ticketing, inventory, pricing, logs, internal admin tools, and MCP servers. That means your agent is only as reliable as the APIs it calls.
Before you choose a runtime, make sure you can design, mock, test, and debug those APIs under agent-style traffic. A platform like Apidog helps you mock dependencies, run contract tests, and validate MCP servers before an agent touches production data.
For more background on the hosted option, see the Claude Managed Agents guide.
What Claude Managed Agents is
Claude Managed Agents is a hosted agent runtime. Instead of building your own loop, sandbox, session store, and execution environment, you configure an agent and let Anthropic run it.
It launched in public beta in April 2026 and requires the managed-agents-2026-04-01 beta header on requests. The official SDK can set that header for you.
Managed Agents is built around four concepts:
| Concept | What it means |
|---|---|
| Agent | Model, system prompt, tools, MCP servers, and skills |
| Environment | Container template with installed packages and network rules |
| Session | A running agent instance for one task |
| Events | User messages, tool results, status updates, and streamed output |
A typical flow looks like this:
Create agent
-> Configure environment
-> Start session
-> Send user event
-> Stream agent events
-> Send tool results or follow-up events
-> Fetch event history for audit/debugging
Managed Agents includes built-in tools such as:
- Bash
- File read/write/edit
- Glob and grep
- Web search and fetch
- MCP server connections
It is a strong fit when you need:
- Long-running execution
- Asynchronous jobs
- Stateful sessions
- Hosted sandboxing
- Less infrastructure to operate
- A fetchable event log
It is also available on Claude Platform on AWS, with differences in feature availability and session behavior. Check the current docs if your deployment is cloud-constrained.
Two implementation details matter:
- Custom tools still execute in your application. Claude requests the tool call, but your app performs the action and returns the result through the event stream.
- Some features are gated. Outcomes and multi-agent capabilities are research-preview features behind separate access.
For the broader architectural pattern, see agentic AI architecture.
What the Claude Agent SDK is
The Claude Agent SDK is a Python and TypeScript library that runs the agent loop in your own process. It exposes the same kind of loop, built-in tools, and context management used by Claude Code.
Install it in your service:
pip install claude-agent-sdk
or:
npm install @anthropic-ai/claude-agent-sdk
With the SDK, your process owns:
- The agent loop
- Tool execution
- Permissions
- Session state
- Logging
- Sandbox strategy
- Deployment and scaling
A minimal Python shape looks like this:
from claude_agent_sdk import query, ClaudeAgentOptions
options = ClaudeAgentOptions(
allowed_tools=["Read", "Grep", "WebFetch"]
)
async for message in query(
prompt="Review this API contract and identify breaking changes.",
options=options,
):
print(message)
The key difference from a plain client SDK is that you do not write the full tool_use loop yourself. The Agent SDK handles the loop and built-in tool execution.
The SDK includes:
-
Built-in tools:
Read,Write,Edit,Bash,Glob,Grep,WebSearch,WebFetch,Monitor, andAskUserQuestion -
Hooks: lifecycle callbacks such as
PreToolUse,PostToolUse,Stop,SessionStart,SessionEnd, andUserPromptSubmit - Subagents: specialized agents for focused subtasks
- MCP support: connect APIs, databases, browsers, and internal systems
- Permissions: approve, block, or require approval for tools
- Sessions: resume or fork context using local JSONL state
Example policy hook shape:
async def pre_tool_use_hook(event):
if event.tool_name == "Bash" and "rm -rf" in event.input.get("command", ""):
raise PermissionError("Blocked destructive shell command")
if event.tool_name == "refund_payment":
amount = event.input.get("amount", 0)
if amount > 500:
return {"requires_human_approval": True}
Because the loop runs locally, the SDK can also read Claude Code-style project configuration:
.claude/skills/- slash commands
CLAUDE.md- plugins
Authentication supports the Anthropic API, Amazon Bedrock, Claude Platform on AWS, Google Vertex AI, and Azure AI Foundry.
For setup examples, see setting up the Claude Agent SDK with a Claude plan and building your own Claude Code.
One billing detail to plan for: starting June 15, 2026, Agent SDK and claude -p usage on subscription plans draws from a separate monthly Agent SDK credit, distinct from interactive usage limits. Always verify current terms with Anthropic before budgeting.
Managed Agents vs Agent SDK
Check current prices on Anthropic’s pricing page and the Managed Agents docs before committing budget.
| Dimension | Claude Managed Agents | Claude Agent SDK |
|---|---|---|
| Where the loop runs | Anthropic-managed infrastructure | Your process and infrastructure |
| Interface | REST API + SSE event stream | Python or TypeScript library |
| Control over loop | Configured and steered by events | Full in-process control |
| Cost model | Claude token rates + active session runtime fee | Claude token rates + your compute |
| Ops burden | Lower | Higher |
| Observability | Hosted event log | Your hooks, logs, and tracing |
| Latency profile | Hosted runtime network hop | You control proximity to tools/data |
| Data residency | Sandbox and session state in Anthropic/AWS environment | Tool execution and state stay with you |
| Custom tools | Your app executes and returns results over stream | In-process functions |
| Best fit | Long-running async agents | Private, regulated, or tightly controlled agents |
Cost: runtime fee vs infrastructure cost
Managed Agents charges standard Claude token rates plus a runtime fee for active session time. A session that runs for a long time can accrue runtime cost even between tool calls.
The SDK has no Anthropic-managed runtime fee, but you pay for:
- Worker nodes
- Sandboxes
- Queues
- Autoscaling
- Logs and traces
- On-call support
- Security controls
A simple way to evaluate cost:
Managed Agents cost =
Claude tokens
+ active session runtime
+ custom tool infrastructure
Agent SDK cost =
Claude tokens
+ application compute
+ sandbox/runtime infrastructure
+ engineering operations
The SDK may look cheaper until you include operational cost.
Data residency and compliance
This is often the deciding factor.
Use the Agent SDK if:
- Session state cannot leave your infrastructure
- Tools must run inside a VPC
- Internal APIs are not internet-accessible
- Regulated data cannot sit in a hosted sandbox
- You need full audit control over every tool invocation
Use Managed Agents if your compliance posture allows Anthropic-hosted or AWS-hosted sandbox/session state and you value managed execution more than infrastructure control.
Observability model
Managed Agents gives you a hosted event log that you can fetch for debugging and audits.
With the SDK, you build the observability layer yourself. Use hooks to emit structured events:
{
"event": "tool_call",
"session_id": "sess_123",
"tool": "refund_payment",
"input_hash": "9f86d081",
"status": "approved",
"timestamp": "2026-04-18T10:15:00Z"
}
At minimum, log:
- Prompt/session IDs
- Tool name
- Input schema version
- Output schema version
- Latency
- Error class
- Retry count
- Approval decisions
- Parent tool/subagent IDs
Testing the APIs your agents call
No matter which runtime you choose, test the dependencies first. A perfect reasoning loop still fails if the payments API, ticketing API, or MCP server returns unexpected data.
Test three layers.
1. API contracts
Every tool is an API with a schema. Mock it and assert request/response shapes.
For example, a refund tool contract might require:
{
"transaction_id": "txn_123",
"amount": 49.99,
"currency": "USD",
"reason": "duplicate_charge"
}
Expected response:
{
"refund_id": "rf_456",
"status": "accepted",
"created_at": "2026-04-18T10:15:00Z"
}
With Apidog, you can mock payments or ticketing endpoints, define the schema, and run contract tests on a schedule. When the real service drifts, the test fails before the agent breaks production.
For a deeper workflow, see how to test AI agents that call APIs.
2. MCP servers
Both Managed Agents and the Agent SDK can use MCP. An MCP server is still a service, and it can fail in ordinary ways:
- Tool name changes
- Input schema changes
- Output field removed
- Timeout behavior changes
- Error response becomes unstructured prose
- Pagination changes
- Auth behavior changes
Test the MCP server directly before connecting a live agent.
See MCP server testing with Apidog for a practical way to enumerate exposed tools and exercise them.
Apidog also includes an AI agent and A2A debugger so you can inspect the traffic an agent generates.
3. Agent request behavior
Agents do not call APIs like humans. They may:
- Retry aggressively
- Read partial data
- Call the same endpoint repeatedly
- Mix exploratory and mutating calls
- Recover from errors in surprising ways
Replay realistic traffic against mocks before production.
Useful checks:
Does the agent retry idempotently?
Does it re-send mutation requests after a timeout?
Does it validate required fields before calling tools?
Does it stop after repeated 4xx errors?
Does it ask for approval before sensitive actions?
Does it handle pagination?
Does it handle partial failures?
Managed Agents hides the loop, so combine its event log with API-level tests. The SDK exposes the loop, so instrument it with hooks and still run the same API contract tests.
Either way, Download Apidog and put the agent’s dependencies under test before using real customer data.
Decision framework
Answer these questions in order.
Choose Claude Managed Agents if:
- The agent runs for minutes or hours.
- The work is asynchronous.
- You do not want to operate a job runner, sandbox, and session store.
- Your team is small and ops capacity is limited.
- A hosted event log is enough for your audit/debugging needs.
- Your data policy allows Anthropic-hosted or AWS-hosted session state.
- You are comfortable with beta status and gated research-preview features.
Choose the Claude Agent SDK if:
- The agent must run inside your VPC.
- Tools need direct access to private databases or internal services.
- Session state must stay on your infrastructure.
- You need custom permissions and audit hooks.
- You need in-process tool logic.
- Regulatory constraints rule out a hosted sandbox.
- You want to use Bedrock, Vertex, or Azure contracts while keeping the loop in-house.
- You are prototyping locally against your filesystem.
Common migration path
A practical path is:
Prototype locally with Agent SDK
-> Mock and contract-test APIs
-> Validate tool behavior
-> Decide whether managed hosting is acceptable
-> Move to Managed Agents if ops savings justify migration
Do not treat this as a config switch. You are moving from a library model to REST + event streams, and custom tool execution works differently.
If you are also comparing agent/model options, see the Claude vs Codex comparison for 2026.
Real-world use cases
A payments refund agent
A fintech team wants an agent that can:
- Read a support ticket.
- Look up a transaction.
- Check refund policy.
- Issue the refund.
- Write a summary back to the ticket.
This touches money, so every action needs a contract and audit trail.
The Agent SDK is the natural fit:
- Run inside the VPC.
- Keep session state internal.
- Use
PreToolUsehooks for approval. - Log every refund attempt.
- Block dangerous or duplicate actions.
Example approval policy:
async def pre_tool_use(event):
if event.tool_name != "issue_refund":
return
amount = event.input["amount"]
if amount > 500:
return {
"status": "requires_approval",
"reason": "Refund exceeds threshold"
}
if not event.input.get("transaction_id"):
raise ValueError("transaction_id is required")
Before launch:
- Mock payments and ledger APIs in Apidog.
- Write contract tests for lookup and refund calls.
- Replay historical tickets against mocks.
- Verify retry behavior after timeouts.
- Confirm the agent does not duplicate successful refunds after a
504.
That last case is exactly why API-level testing matters.
An asynchronous support-ticket triage agent
A SaaS company wants an agent to process thousands of tickets per day:
- Classify the ticket.
- Pull related logs.
- Draft a response.
- Resolve or escalate.
Each ticket takes a few minutes of tool calls, and the data is low-sensitivity.
Managed Agents fits well:
- Long-running async work
- Small team
- No worker fleet to operate
- Hosted event log per ticket
- Stateful sessions
The team still tests dependencies:
- Mock the logging API.
- Contract-test the ticketing MCP server.
- Validate schema changes before production.
- Inspect agent-generated request traffic in Apidog.
Managed hosting reduces runtime work. It does not remove responsibility for API correctness.
An internal data-ops agent behind the firewall
A platform team wants an agent that responds to requests like:
Back-fill yesterday’s failed ETL partitions.
The agent needs to:
- Query an internal job API.
- Run remediation scripts.
- Report job status.
- Log all actions.
The internal services are private and sensitive.
The Agent SDK wins by requirement:
- It runs where private services are reachable.
- Session state stays internal.
- Internal services can be exposed through MCP.
- SDK hooks can log commands to the existing audit pipeline.
This is not a preference issue. The hosted sandbox cannot reach private systems unless you expose them, which may violate the security model.
For context on why agents are becoming major API consumers, see AI agents as the new API consumers.
Implementation checklist
Before shipping either option, verify:
- [ ] Runtime choice documented: Managed Agents or SDK
- [ ] Data residency reviewed
- [ ] Tool permissions defined
- [ ] Human approval rules implemented
- [ ] API contracts mocked
- [ ] MCP tools tested directly
- [ ] Retry behavior tested
- [ ] Mutating calls made idempotent
- [ ] Session/event logs available
- [ ] Error handling tested
- [ ] Pricing verified from Anthropic source
- [ ] Beta feature availability checked
- [ ] Incident ownership assigned
Conclusion
The Managed Agents vs Agent SDK decision is mostly about operations and data governance.
Carry away these rules:
- Managed Agents hosts the loop and sandbox.
- The Agent SDK runs the loop in your process.
- Managed Agents reduces ops burden but moves session state into hosted infrastructure.
- The SDK gives control but requires you to operate the runtime.
- Data residency often decides the architecture.
- Cost depends on workload shape, not only token pricing.
- API and MCP testing are required either way.
Next step: before wiring an agent to customer-facing systems, put its API and MCP dependencies under test. Download Apidog to mock endpoints, run contract tests, and debug the agent’s real request traffic.
FAQ
What’s the core difference between Claude Managed Agents and the Claude Agent SDK?
Managed Agents is a hosted REST API where Anthropic runs the agent loop and per-session sandbox. The Agent SDK is a Python or TypeScript library that runs the loop inside your own process.
Same Claude models. Different operational ownership.
Is the Claude Agent SDK the same as the old Claude Code SDK?
Yes. The Claude Code SDK was renamed to the Claude Agent SDK to reflect broader agent use cases beyond coding tasks.
Which option is cheaper?
It depends on workload shape.
Managed Agents charges standard Claude token rates plus active session runtime. The SDK has no hosted runtime fee, but you pay for compute, scaling, sandboxing, and operations.
Check Anthropic’s current pricing before budgeting.
Can I use MCP servers with both?
Yes. Both support MCP.
Test MCP servers before connecting them to either runtime. The MCP server testing with Apidog guide shows how to exercise each exposed tool.
How do I keep customer data out of Anthropic’s infrastructure?
Use the Agent SDK and run the loop in your own environment. Tool execution and session state stay on your infrastructure.
With Managed Agents, sandbox and event log state live in Anthropic’s environment or the AWS option, subject to current availability and constraints.
Is Claude Managed Agents production-ready?
Claude Managed Agents launched in public beta in April 2026 and requires the managed-agents-2026-04-01 beta header. Some capabilities, such as outcomes and multi-agent features, are gated behind research-preview access.
Check the current docs before production use.
How do I test an agent before it touches real APIs?
Mock every API and MCP server the agent calls. Then:
- Write contract tests.
- Replay realistic traffic.
- Inspect actual agent requests.
- Validate retries and idempotency.
- Test error paths.
Apidog supports mocks, contract testing, and AI agent/A2A debugging. See how to test AI agents that call APIs.
Can I start on one option and switch later?
Yes, but it is a migration project.
A common path is to prototype with the Agent SDK locally, then move to Managed Agents if hosted execution is a better production fit. Plan for interface changes, tool execution differences, and session-state migration.
Top comments (0)