Shipped 3 production agents in one weekend using the Claude Agent SDK, each under 50 lines of glue code.
Blog syndication agent cross-posts to Dev.to and Hashnode in a single call, handled by built-in tool retries.
Shopify inventory watcher runs on cron, wires an MCP server natively, and uses hooks for rate limiting.
Release note writer spawns a categorizer subagent so git log diffs never bloat the main context window.
I spent one weekend building with the Claude Agent SDK and shipped three agents I actually use. Not demos. Real agents, running in production, replacing scripts I was tired of maintaining. The Claude Agent SDK turns out to be the piece I kept wishing I had every time I wrote a one-off automation with the raw Anthropic SDK. The loop, the tool calling, the subagents, the session state, the MCP plumbing. All of it pre-wired, all of it matching the harness Claude Code itself runs on.
This is a field report. What the SDK made easy, where I had to drop down to lower primitives, and what still hurts. If you are trying to decide whether this SDK deserves a weekend of your time, skip to the three agents below and read the code. That is the best pitch I can give you.
What the Agent SDK Actually Is
The naming space around Anthropic's stack is confusing, so I want to be precise.
The raw Anthropic SDK (anthropic on pip, @anthropic-ai/sdk on npm) is a client for the API. You call messages.create, you get a response, you handle tool calls yourself, you write your own agent loop if you want one. It is a great low-level library and a terrible agent framework.
Claude Code plugins are extensions that run inside the Claude Code CLI. They ship as skills, commands, hooks, and MCP servers. You install them, they run in Claude Code, that is it.
Claude Managed Agents are Anthropic's hosted runtime. You upload an agent, they run it for you. Useful for production, less useful when you want the agent embedded inside your own app.
The Claude Agent SDK is the middle layer. It is the same harness Claude Code runs on, repackaged as a library you embed. You get the agentic loop, a tool registry, subagent spawning, the hook system (PreToolUse, PostToolUse, UserPromptSubmit), session state, MCP server wiring, and permission policy. You write a small amount of glue and ship.
Here is the difference in code. Raw SDK, bare minimum tool-calling loop:
from anthropic import Anthropic
client = Anthropic()
messages = [{"role": "user", "content": "syndicate my latest post"}]
while True:
resp = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=4096,
tools=my_tools,
messages=messages,
)
if resp.stop_reason == "end_turn":
break
# parse tool_use blocks, run tools, append tool_result, loop again...
Same job with the Agent SDK:
from claude_agent_sdk import query
async for event in query(
prompt="syndicate my latest post",
options={"tools": my_tools, "system_prompt": SYNDICATION_PROMPT},
):
print(event)
The loop is gone. Tool calls, retries, termination, event streaming, all handled. If you already use Claude Code, you know this harness. You are 80% of the way there before you write a single line.
A few things worth knowing before you start:
The SDK streams structured events, not raw text. You get
tool_use,tool_result,assistant_message, and lifecycle events you can route wherever you want.Permissions are policy-driven. You declare which tools the agent can call, and the harness enforces it. No accidental shell access.
Session state persists across calls if you pass a session ID. Useful for long-running agents that pick up where they left off.
Prompt caching is on by default when you use the SDK's system prompt helpers. You do not have to remember to flip it on.
That is the primer. Now the three agents.
Agent 1: Blog Syndication Agent
I publish on raxxo.shop. I want the same article on Dev.to and Hashnode within minutes of publish. I had a Node script doing this. It broke every few weeks when an API shifted. I wanted something that could read docs, adapt, and report back with human-readable output.
The whole agent is 47 lines of Python. The core:
from claude_agent_sdk import query, tool
@tool
async def post_to_devto(title: str, body_markdown: str, canonical_url: str):
"""Cross-post to Dev.to with a canonical link back to raxxo.shop."""
return await devto_client.create_article(
title=title, body_markdown=body_markdown, canonical_url=canonical_url
)
@tool
async def post_to_hashnode(title: str, body_md: str, canonical_url: str):
"""Cross-post to Hashnode."""
return await hashnode_client.publish(title, body_md, canonical_url)
async for event in query(
prompt=f"Syndicate this article to Dev.to and Hashnode: {url}",
options={"tools": [post_to_devto, post_to_hashnode, fetch_article]},
):
log(event)
What the SDK handled for free:
Tool retries when a platform returned 502
Loop termination once both posts returned IDs
Event logging I can tail in real time
Sane defaults for
max_tokensand turn limits
What I still had to build:
Platform clients (these are not the SDK's job)
A custom token counter, because I want per-invocation budget tracking in my own Postgres table
The agent also does something the script never did well. When Dev.to rejects a tag ("you can only use five"), the model reads the error, trims the tag list, and retries. When Hashnode asks for a publication ID I forgot to pass, the model inspects the error and calls a list_publications tool I gave it. That kind of recovery is the whole reason I wanted an agent here instead of a script.
Cost with prompt caching on: 0.17 EUR per syndication. The script it replaced cost me about 30 minutes of debugging every two weeks. That math works. For the cost of a single cappuccino I get a full month of syndication runs.
Agent 2: Shopify Inventory Watcher
This one runs on a cron every four hours. It reads inventory levels for every SKU on my Shopify store, flags anything under a threshold, and posts a summary to a Slack webhook. Boring. Exactly the kind of thing an agent should do.
The interesting part is the MCP server. Shopify has a decent Admin API, but wrapping it as tools one by one was tedious. I wrote a small Shopify MCP server that exposes list_products, get_inventory_levels, get_low_stock. The Agent SDK wires MCP servers in natively:
options = {
"mcp_servers": {
"shopify": {
"command": "node",
"args": ["./mcp-shopify/dist/index.js"],
"env": {"SHOPIFY_ADMIN_TOKEN": os.environ["SHOPIFY_ADMIN_TOKEN"]},
}
},
"allowed_tools": ["mcp__shopify__list_products", "mcp__shopify__get_low_stock"],
}
No adapter code. No re-wrapping. The MCP server ships tools, the agent sees them, the loop handles the calls. This was the moment MCP clicked for me. It is not an afterthought in the Agent SDK. It is a first-class input.
The other piece I use here is hooks. Shopify rate-limits hard. I do not want an agent hammering the API. A PreToolUse hook checks a Redis counter and sleeps if the bucket is empty:
async def rate_limit_hook(tool_name: str, tool_input: dict):
if tool_name.startswith("mcp__shopify__"):
wait = await redis.get_wait_ms("shopify")
if wait > 0:
await asyncio.sleep(wait / 1000)
return {"decision": "approve"}
options["hooks"] = {"PreToolUse": [rate_limit_hook]}
This is the feature I think people underuse the most. Hooks let you enforce policy without polluting the prompt. Rate limits, audit logs, permission gates, budget caps. All orthogonal to what the agent is doing. The agent does not need to know a rate limiter exists. The hook just slows it down when it has to.
The Slack summary the agent writes is the second nice surprise. I did not prompt it to format anything specific. It just writes three or four lines per run, sorted by urgency, with the SKU names humans actually recognize instead of the cryptic IDs Shopify stores internally. Small thing, but it means I read the message instead of ignoring it.
Runtime per invocation: about 0.29 EUR. Twelve hours of runtime per week. I stopped losing sales to stockouts in week two. The Node cron job this replaced was 240 lines. The agent is 62.
Agent 3: Release Note Writer
The third agent reads git log since the last tag, summarizes it into a changelog, and opens a PR against the CHANGELOG.md file. The trick here is that a week of commits on an active repo can be 40 kilobytes of diff. If I pass that into the main context, the summary prompt gets buried and the output goes sideways.
The fix is subagents. The main agent has one job: orchestrate. When it gets a commit list, it spawns a categorizer subagent with its own context window, hands it the raw diff, and asks for structured output:
options = {
"agents": {
"categorizer": {
"description": "Classify a single commit into fix/feat/docs/refactor/perf/test/chore with a one-line summary.",
"prompt": "You read one git commit diff and return a JSON object: {type, scope, summary}. Nothing else.",
"model": "claude-haiku-4-5",
}
}
}
The main agent spawns categorizer once per commit batch. Haiku runs each call in about 900ms. The main context only ever sees the structured output, not the raw diffs. My summary stays tight, the PR body stays under 300 lines, and the token cost drops by about 60% compared to my first version where everything lived in one context.
Subagents are the right tool when the alternative is context bloat. I had been avoiding them because the mental model felt like overhead. It is not. It is one config block and one spawn call. That is the whole API.
Where I had to drop down: I wanted custom retry semantics on the GitHub API call (it returns 422 on a stale base ref and needs a rebase before retry). The SDK's built-in retries are generic. I added a PostToolUse hook that catches the 422, runs a rebase, and retries. Fine. Five lines.
What still hurts:
Local file I/O permission policy is verbose. You end up writing allowlists by path prefix and it feels like 2014 seccomp rules.
Docs are still catching up to the API surface. I read the Claude Code source a few times to confirm behavior.
Error messages from tool validation are blunt. A bad tool schema gives you a stack trace, not a hint.
One more detail. The subagent runs on Haiku, the main agent runs on Sonnet. Mixing models is a config flag, not a rewrite. This is the thing I would have needed a weekend to build from scratch with the raw SDK. In the Agent SDK it is a single key in the agent definition.
Cost per release note run: 0.20 EUR. Runs about six times a week across three repos. The changelogs are better than the ones I used to write by hand at 11pm on a Friday. That was the bar I set, and it cleared.
Bottom Line
Three agents, one weekend, under 150 total lines of glue code. Every one of them replaced something I was actively maintaining by hand. Every one of them costs under 0.30 EUR per invocation with prompt caching on. Combined, they now do about 90 hours of work per month for something like 7 EUR in API spend.
The Claude Agent SDK is what I wanted the first time I wrote a tool-calling loop against the raw Anthropic SDK. If you already know Claude Code, the mental model carries over directly. Hooks, subagents, and MCP are the three features worth learning first. They are what make this feel like a real agent framework instead of a thin wrapper over messages.create.
If you are still rolling your own loop, stop. Spend a Saturday with the SDK, build one real agent, and see if anything still justifies the lower-level code. For me, almost nothing did. The only places I reached for the raw Anthropic SDK after that weekend were a token counter for per-invocation budget tracking and one odd retry case. Everything else the harness does better than I would have.
Top comments (0)