Jangwook Kim

Posted on May 18 • Originally published at jangwook.net

Claude Agent SDK Subagent Orchestration Tutorial — Parallel Multi-Agent Processing in Practice

#claude #anthropicsdk #subagents #multiagent

After I published the Tool Use guide, a comment came in fairly quickly: "I get single agents now, but how do I run a code reviewer, security scanner, and doc writer at the same time?" I was actually mid-experiment at that point myself.

Installing claude-agent-sdk 0.2.82 directly, I found the answer. One AgentDefinition dataclass and the ClaudeAgentOptions.agents dict is all it takes. I created the objects and explored the type structure hands-on. No API key meant I couldn't run actual queries, but the code structure and type system were fully testable.

This post is that exploration.

Where Single Agents Hit a Wall

The Tool Use loop is powerful. But there are three situations where it shows limits.

Context contamination. When a single agent handles code quality, security vulnerabilities, and test coverage in one PR review, the context window fills with intermediate results from all three tasks mixed together. The agent sees its earlier reasoning while forming later judgments — the fact that it spotted a code smell early can subtly shade the security analysis.

No parallelism. Code review takes 30 seconds, security scan 20 seconds, doc generation 25 seconds. Single agent: 75 seconds. Three concurrent agents: 30 seconds. There's no reason to run independent tasks serially.

Role bleed. An agent that "thinks like a reviewer then thinks like a security expert" does both jobs worse than dedicated specialists. This is true for human teams too.

The subagent pattern solves these three problems structurally.

Installing claude-agent-sdk 0.2.82 — SDK Structure I Verified Directly

pip install claude-agent-sdk

Version confirmed after install:

Successfully installed claude-agent-sdk-0.2.82

Running dir(claude_agent_sdk) in a temporary sandbox, the subagent-relevant classes that stood out:

import claude_agent_sdk as sdk

sdk.AgentDefinition          # Subagent configuration dataclass
sdk.ClaudeAgentOptions       # Full options including agents dict
sdk.TaskBudget               # Token budget control
sdk.SubagentStartHookInput   # Hook for subagent start events
sdk.SubagentStopHookInput    # Hook for subagent stop events
sdk.list_subagents           # List subagents in a session
sdk.get_subagent_messages    # Retrieve a specific subagent's messages

I read the AgentDefinition source directly with inspect.getsource(). This is the actual dataclass in 0.2.82:

@dataclass
class AgentDefinition:
    description: str          # How the orchestrator identifies this agent
    prompt: str               # Subagent system prompt
    tools: list[str] | None = None
    disallowedTools: list[str] | None = None
    model: str | None = None  # "sonnet", "opus", "haiku", "inherit", or full model ID
    skills: list[str] | None = None
    memory: Literal["user", "project", "local"] | None = None
    mcpServers: list[str | dict[str, Any]] | None = None
    initialPrompt: str | None = None
    maxTurns: int | None = None  # Max loop count for this subagent
    background: bool | None = None
    effort: EffortLevel | int | None = None
    permissionMode: PermissionMode | None = None

One thing I noticed in the tools field comment: "Deprecated: passing 'Skill' here is deprecated; use skills instead." I hadn't seen that in the documentation. The separate skills field is the right place now.

Defining Subagents with AgentDefinition — A PR Review Pipeline

Here's the actual code. A PR auto-review pipeline needs three roles:

import asyncio
import claude_agent_sdk as sdk

# Define each role as a specialized subagent
code_reviewer = sdk.AgentDefinition(
    description="Python code quality and design review specialist",
    prompt=(
        "You're a Python senior engineer with 10 years of experience. "
        "Review code quality, readability, and design patterns. "
        "Provide concrete improvement suggestions in markdown format."
    ),
    tools=["Read", "Grep"],
    model="sonnet",
    maxTurns=8,
)

security_scanner = sdk.AgentDefinition(
    description="Security vulnerability scanner — injection risks, exposed secrets, unsafe operations",
    prompt=(
        "You're a security engineer. "
        "Find SQL injection risks, hardcoded secrets, unsafe eval/exec, "
        "and permission issues. Report each with severity level."
    ),
    tools=["Read", "Grep", "Bash"],
    model="sonnet",
    maxTurns=6,
)

doc_writer = sdk.AgentDefinition(
    description="Docstring and README writer — reads code and generates clear documentation",
    prompt=(
        "You're a technical writer. "
        "Write Google Style docstrings for functions and classes, "
        "and create usage examples for the README."
    ),
    tools=["Read", "Write", "Edit"],
    model="haiku",   # haiku is sufficient for docs and cheaper
    maxTurns=5,
)

# Orchestrator options
opts = sdk.ClaudeAgentOptions(
    system_prompt=(
        "You're a PR review orchestrator. "
        "Call code-reviewer, security-scanner, and doc-writer in parallel "
        "and compile a comprehensive review report from all results."
    ),
    allowed_tools=["Agent", "Read"],  # Agent tool is how subagents are invoked
    agents={
        "code-reviewer": code_reviewer,
        "security-scanner": security_scanner,
        "doc-writer": doc_writer,
    },
    permission_mode="bypassPermissions",
)

The dict keys in ClaudeAgentOptions.agents are the names the orchestrator uses when calling subagents. When the system prompt says "call code-reviewer," Claude invokes that agent via the Agent tool.

Parallel Execution Pattern — Running Three Agents Simultaneously

The most important line in the SDK documentation:

"Multiple subagents can run concurrently. When Claude identifies independent subtasks, it spawns multiple agents simultaneously using multiple Task tool calls in a single message."

When the orchestrator calls multiple Agent tools in a single message, they run in parallel. You don't write asyncio.gather() yourself. Tell the orchestrator to "call these agents in parallel" and the SDK handles it.

Actual query flow:

async def review_pr(pr_diff: str):
    results = []

    async for message in sdk.query(
        prompt=(
            f"Review this PR diff:\n\n{pr_diff}\n\n"
            "Run code-reviewer, security-scanner, and doc-writer simultaneously "
            "to analyze each domain in parallel, then compile a unified review report."
        ),
        options=opts,
    ):
        if isinstance(message, sdk.AssistantMessage):
            for block in message.content:
                if hasattr(block, "text"):
                    results.append(block.text)
        elif isinstance(message, sdk.ResultMessage):
            print(f"Total cost: ${message.total_cost_usd:.4f}")
            print(f"Duration: {message.duration_ms}ms")
            break

    return "\n".join(results)

Each subagent's context window starts fresh. From the official docs:

"A subagent's context window starts fresh, and the only channel from parent to subagent is the Agent tool's prompt string."

The orchestrator only sees the final result, not the intermediate reasoning. That's what prevents context contamination.

Controlling Costs with TaskBudget

Running three subagents concurrently doesn't just triple costs — it can amplify them unpredictably. Each agent might make redundant tool calls trying to do thorough work.

TaskBudget is the API-level fix:

opts = sdk.ClaudeAgentOptions(
    # ... same as above ...
    task_budget=sdk.TaskBudget(total=50000),  # 50K token budget
)

Actual class structure from inspect.getsource(sdk.TaskBudget):

class TaskBudget(TypedDict):
    """API-side task budget in tokens.

    When set, the model is made aware of its remaining token budget so it can
    pace tool use and wrap up before the limit. Sent as
    output_config.task_budget with the task-budgets-2026-03-13 beta header.
    """

    total: int

The task-budgets-2026-03-13 beta header is attached automatically. The agent becomes aware of its remaining budget and decides internally when to pace down and wrap up. Much cleaner than an external timeout that forces mid-task termination.

Combine with AgentDefinition.maxTurns for a two-tier safety net:

security_scanner = sdk.AgentDefinition(
    # ...
    maxTurns=6,  # Subagent level: max 6 tool calls
)

opts = sdk.ClaudeAgentOptions(
    # ...
    task_budget=sdk.TaskBudget(total=100000),  # Global level: 100K token ceiling
)

Subagent Hooks — Tracking Start and Stop Events

SubagentStartHookInput and SubagentStopHookInput let you detect exactly when each subagent starts and finishes:

import time

agent_timings: dict[str, float] = {}

def on_agent_start(hook_input: sdk.SubagentStartHookInput) -> None:
    agent_timings[hook_input.agent_id] = time.time()
    print(f"▶ {hook_input.agent_type} started (id: {hook_input.agent_id[:8]})")

def on_agent_stop(hook_input: sdk.SubagentStopHookInput) -> None:
    start = agent_timings.get(hook_input.agent_id, time.time())
    elapsed = time.time() - start
    print(f"■ {hook_input.agent_type} done ({elapsed:.1f}s)")
    # hook_input.agent_transcript_path has the full subagent conversation log

opts = sdk.ClaudeAgentOptions(
    # ...
    hooks={
        "SubagentStart": [sdk.HookMatcher(hook_callback=on_agent_start)],
        "SubagentStop": [sdk.HookMatcher(hook_callback=on_agent_stop)],
    },
)

agent_transcript_path from SubagentStopHookInput is invaluable for production debugging. If a subagent produces unexpected output, that's where you look first.

Also worth knowing: multiple hook matchers on the same event run concurrently, not sequentially. The docs explicitly state this. Design each hook to be independent.

When to Use Subagents (and When Not To)

I want to be direct here: subagents aren't always the right choice.

Use them when:

You have 3+ independent tasks, each taking 10+ seconds
Different tasks need different tool access (security scanner doesn't need Write)
You've verified that context contamination actually hurts result quality

They're overkill when:

You have 2 tasks where task B depends on task A's output
Total runtime would be under 5 seconds (spawn overhead exceeds benefit)
It's a simple question-answer pattern

The same tradeoff comes up in the A2A + MCP hybrid architecture post: multi-agent structure adds complexity. More failure points, harder debugging, less predictable costs. Don't add subagents to a problem that a single agent can handle.

My personal threshold: "three or more independent tasks, each likely to consume 10K+ tokens with Opus." Below that, I stick with a single agent.

What I Couldn't Test

No API key meant I couldn't capture actual execution logs showing the three agents running in parallel. The object construction and type validation worked, but "what does the console output look like when three agents actually spawn concurrently" — I can't show you that from this run.

The fork_session function also caught my attention but didn't fit in this post. fork_session(session_id, up_to_message_id) lets you branch a session from a specific point. Useful when subagents want to try different strategies from the same base context without repeating earlier work.

Summary

The core of subagent orchestration in claude-agent-sdk 0.2.82:

AgentDefinition: Separate role, prompt, tools, and model per subagent
ClaudeAgentOptions.agents: Register subagent names for the orchestrator to call
Agent tool + parallel prompt instruction: Orchestrator spawns multiple subagents at once

Add TaskBudget and SubagentStartHookInput/SubagentStopHookInput for cost control and execution tracking.

Start with a single agent. Move to subagents when your task fits "independent, parallelizable, three or more."

References: