WonderLab

Posted on Apr 24

SubAgent Architecture Deep Dive: How AI Systems Achieve Specialization Through Delegation

#ai #subagent #multiagent #claude

Context Explosion: The Problem You'll Eventually Hit

Ask an Agent to do something complex — say, "research our competitors and write up a report."

The Agent gets to work: searching web pages, reading files, parsing logs. After a dozen tool calls, the main conversation's context is packed with search result fragments, file contents, intermediate reasoning traces. Most of that information is just "work residue" — the final report doesn't need any of it. But it's now permanently occupying the context window.

The consequences are twofold: degraded reasoning quality (the model has to find signal in a sea of noise) and exploding token costs (every subsequent call has to carry all that history).

This is context explosion. As Agents tackle increasingly complex tasks, this isn't an edge case — it's inevitable.

The problem SubAgents solve at their core is: delegate "work the main Agent doesn't need to remember" to an isolated context, and receive back only a summary.

This is the same design principle as separation of concerns in software engineering — not because the sub-task is unimportant, but because its implementation details shouldn't pollute the state of the caller above it.

The SubAgent Core Model: Three Elements

A SubAgent isn't a different model. It's a different way of running an execution instance. Understanding it means grasping three core elements.

Element 1: Isolated Context Window

This is the most fundamental characteristic of a SubAgent.

Every SubAgent runs in its own independent context window. It has no awareness of what the parent Agent has been discussing or doing — it only has:

Its own System Prompt (defining who it is and what it can do)
The description of the task it's been delegated
Its own tool call history

The parent Agent's entire conversation history is completely invisible to the SubAgent. When the SubAgent finishes, it packages its result and returns it to the parent — who sees only the conclusion, not every step the SubAgent took.

Main conversation context                SubAgent context
┌─────────────────────┐                 ┌─────────────────────┐
│ User message 1       │                 │ System Prompt        │
│ Agent reply 1        │  delegate task  │ (SubAgent role def.) │
│ Tool call A          │ ─────────────→  │ Task description     │
│ User message 2       │                 │ Tool call X          │
│ Agent reply 2        │ ←────────────── │ Tool call Y          │
│ [SubAgent summary]   │  return summary │ Tool call Z          │
└─────────────────────┘                 └─────────────────────┘

The main conversation never sees how many tool calls the SubAgent made or how much content it searched through. That execution trace stays in the SubAgent's isolated context and disappears when the task ends.

Element 2: Capability Constraints

SubAgents can be precisely limited in what they're allowed to do, across two dimensions:

Tool Access

# Read-only SubAgent: can read, can't write
tools: Read, Grep, Glob, Bash

# Exclusion SubAgent: inherits all tools except file writes
disallowedTools: Write, Edit

# Full-access SubAgent: inherits all parent tools
# (omit the tools field — inherits by default)

Model Selection

SubAgents can use a different model from the parent. This is the key lever for cost control:

model: haiku    # fast, low cost — good for simple exploration
model: sonnet   # balanced — good for medium-complexity tasks
model: opus     # high capability — good for deep reasoning
model: inherit  # use the same model as the parent (default)

Element 3: Description-Driven Delegation

How does the parent Agent know when to call which SubAgent?

The answer is the description field. This field is the parent's dispatch index — it matches the current task's semantics against registered SubAgents and delegates to the best fit.

---
name: code-reviewer
description: >
  Specialized for code review. Trigger when checking code quality,
  security vulnerabilities, performance issues, or evaluating whether
  code follows best practices. Not for writing or modifying code.
---

The more precise the description, the more accurately delegation gets triggered. Too broad and it fires in scenarios where it shouldn't; too narrow and legitimate delegation opportunities get missed.

The Internal Flow of Delegation

With the three elements in place, here's the complete delegation flow:

Parent Agent receives a task
    ↓
Decision: is this worth delegating to a SubAgent?
(matches against all registered SubAgent descriptions)
    ↓
Matching SubAgent found
    ↓
Create isolated context, inject system prompt + task description
    ↓
SubAgent executes autonomously (tool calls, reasoning, iteration)
    ↓
SubAgent returns result summary
    ↓
Parent Agent receives summary, continues main flow

One critical constraint to remember: SubAgents cannot spawn other SubAgents.

This is an intentional design decision to prevent infinite nesting from spiraling out of control. If SubAgents could delegate to sub-SubAgents, a complex task could trigger cascading delegations, making context explosion even harder to track rather than solving it.

Claude Code's built-in SubAgents embody this constraint: the Plan SubAgent gathers codebase context during Plan mode, but doesn't spawn further SubAgents for deeper exploration.

Three Execution Modes

When multiple SubAgents collaborate in a system, their execution relationships follow three basic patterns.

Sequential Mode

The output of one SubAgent becomes the input to the next, passed through shared Session State.

SubAgent A         SubAgent B         SubAgent C
(data collection) → (data cleaning) → (report generation)
    ↓                   ↓                   ↓
writes state['raw']  reads + writes     reads state['clean']
                     state['clean']     generates final report

When to use: pipeline tasks with strict ordering dependencies. Each step needs the previous step's output; the sequence can't be shuffled.

Parallel Mode

Multiple SubAgents execute independent tasks simultaneously, each writing to different state keys, with a final aggregator merging the results.

              ┌─ SubAgent A (analyze competitor 1) → state['product_a'] ─┐
Main Agent ───┤─ SubAgent B (analyze competitor 2) → state['product_b'] ─┤─→ Aggregator
              └─ SubAgent C (analyze competitor 3) → state['product_c'] ─┘

When to use: multiple independent sub-tasks that can run simultaneously and need to be combined at the end. Parallel mode can dramatically reduce end-to-end execution time.

Loop Mode

A SubAgent executes repeatedly until a termination condition is met — either hitting a maximum iteration count, or a SubAgent signals "done."

Generator SubAgent → Evaluator SubAgent → quality criteria met?
        ↑                                          │
        └──────── no, iterate again ───────────────┘
                                                   │
                                              yes, output final result

This is the Evaluator-Optimizer pattern in action: one Agent generates, another evaluates, looping until the output quality meets the bar. Best for iterative refinement — creative tasks, complex reasoning chains, anything that benefits from multiple drafts.

Two Communication Paradigms

After a SubAgent finishes its task, how do results flow back to the parent? There are two fundamentally different paradigms.

Paradigm 1: Agent as Tool

The parent Agent retains control throughout. The SubAgent is wrapped as a tool — the parent calls it, receives a return value, and continues its own workflow.

# Pseudocode
result = invoke_subagent(
    agent="code-reviewer",
    task="audit src/auth.py for security vulnerabilities"
)
# Parent Agent continues processing result
summary = synthesize_results(result, other_data)

This paradigm's characteristics: the parent Agent is the active orchestrator, the SubAgent is the passive executor. Suited for scenarios where the parent needs to combine outputs from multiple SubAgents before making a final judgment.

Paradigm 2: Handoff

The parent Agent fully transfers control to the SubAgent. The SubAgent takes ownership of subsequent conversation or tasks. The parent exits execution and no longer participates.

User: "Help me design a complete technical proposal"
    ↓
Router Agent (analyzes request type)
    ↓
Hands off to → Technical Proposal SubAgent (owns all subsequent interaction)

This paradigm's characteristics: suited for scenarios where the nature of the task fundamentally shifts and a specialist SubAgent should have full ownership. The SubAgent's system prompt doesn't need to hedge for "what if other task types come up" — it can be written with extreme precision and depth.

The key distinction between the two:

	Agent as Tool	Handoff
Control	Parent retains	Transferred to SubAgent
Best for	Merging multiple sub-results	Clean task type transition
Parent after	Continues execution	Exits
SubAgent focus	Complete one sub-task	Own all subsequent work

Full Implementation in Claude Code

Claude Code turns these principles into a complete engineering implementation worth studying in detail.

File Format

A SubAgent is a Markdown file. YAML frontmatter defines the configuration; the file body is the System Prompt:

---
name: security-auditor
description: >
  Security audit specialist. Trigger when checking code for security
  vulnerabilities, OWASP risks, SQL injection, XSS, hardcoded secrets,
  or insecure dependencies.
tools: Read, Grep, Glob
model: sonnet
maxTurns: 20
color: red
---

You are a professional code security auditor.

Focus your review on:
- SQL injection risks
- Insecure dependencies
- Hardcoded secrets or credentials
- XSS vulnerabilities
- Unsafe file operations

Group findings by severity: ❌ Critical / ⚠️ Warning / ℹ️ Info

The System Prompt the SubAgent receives is only the body of this file — not the full Claude Code system prompt that the parent Agent operates under. This is the source of the isolation, and why SubAgent behavior is more focused and predictable.

The Design Logic of Built-in SubAgents

Claude Code ships three built-in SubAgents. Their design clearly demonstrates the principle of "constrain exactly what's needed":

SubAgent	Model	Tools	Design Intent
Explore	Haiku (fast, low latency)	Read-only (no Write/Edit)	Codebase exploration; results don't need to persist in main context
Plan	Inherits parent	Read-only	Gathers context in plan mode; cannot itself spawn sub-SubAgents
General-purpose	Inherits parent	All tools	Complex multi-step tasks requiring both exploration and modification

Explore uses Haiku instead of a more capable model — a classic cost optimization call. Codebase exploration doesn't require top-tier reasoning. Using a smaller model is sufficient and preserves budget for where it actually matters.

Scope and Priority

Where a SubAgent file lives determines its scope. When multiple SubAgents share the same name, the higher-priority location wins:

Priority (high → low)
1. Organization-level (Managed Settings) — admin-deployed
2. CLI flag (--agents) — current session only, good for quick testing
3. Project-level (.claude/agents/) — current project, commit to version control
4. User-level (~/.claude/agents/) — all your projects
5. Plugin-level (plugin's agents/) — installed with plugins

Model Priority Resolution

When the parent Agent invokes a SubAgent, the final model selection resolves in this order:

1. Environment variable CLAUDE_CODE_SUBAGENT_MODEL (highest priority)
2. model parameter passed at invocation time
3. model field in the SubAgent definition file
4. Parent Agent's current model (default inheritance)

This design allows overriding all SubAgents' models via an environment variable without touching their definition files — very convenient when you need to temporarily shift cost strategy across the board.

Using SubAgents for Cost Optimization

The SubAgent architecture provides natural cost control levers across three dimensions.

Dimension 1: Route tasks to models by complexity

# Simple code search and exploration: Haiku is plenty
---
name: file-explorer
model: haiku
---

# Code review requiring business logic understanding: Sonnet
---
name: code-reviewer
model: sonnet
---

# Architecture design and complex reasoning: Opus
---
name: architect
model: opus
---

Dimension 2: Use maxTurns to prevent runaway execution

---
name: researcher
maxTurns: 15  # auto-stop after 15 tool calls
---

A SubAgent without a maxTurns limit can loop indefinitely on complex tasks. A reasonable ceiling is the fuse that prevents token costs from spiraling.

Dimension 3: Context isolation itself is savings

SubAgents contain their exploration process in an isolated context. Without SubAgents, all that intermediate work accumulates in the main conversation — every subsequent parent Agent call carries that entire history. Costs grow linearly with task count. SubAgents cut that growth off.

When to Use SubAgents — and When Not To

Not every task belongs in a SubAgent. Misapplied, they add complexity without benefit.

Good fit for SubAgents:

Exploratory tasks: searching a codebase, looking up docs, analyzing logs — the result is a conclusion; the process doesn't need to stay in the main context
Tasks needing professional constraints: security audits, code formatting — a SubAgent with a restricted toolset prevents overreach
Parallelizable independent sub-tasks: research multiple angles simultaneously, then merge
Recurring patterns the team reuses: if you're spawning the same type of worker repeatedly, define a SubAgent and stop re-specifying it every time

Poor fit for SubAgents:

Tasks tightly coupled to the main flow: the SubAgent's intermediate reasoning needs to inform the parent Agent's subsequent decisions — here, isolation is an obstacle rather than a benefit
Tasks requiring continuous back-and-forth: the SubAgent would need to frequently check in with the parent mid-execution
Tasks that are simply too small: creating and dispatching a SubAgent has overhead; a single simple step is more efficient just done directly in the main Agent

The core question for any potential delegation: does the parent Agent need to know how this task was executed? If the answer is no, it's a SubAgent candidate.

Summary

SubAgents are the "delegation and isolation" pattern in AI systems — the same design philosophy as separation of concerns in software, expressed in a different form.

The core chain:

Concept	Essence
Isolated context window	Execution details don't pollute the caller
Description-driven delegation	Semantic-matching automatic dispatch
Capability constraints	Principle of least privilege applied to AI
Sequential / Parallel / Loop	Three basic topologies for composing multiple Agents
Agent as Tool vs Handoff	Two collaboration styles, differing in where control lives

Understanding these principles isn't about turning every task into a SubAgent. It's about knowing when "delegate out and get back a conclusion" is more rational than "stay involved throughout." That judgment is the core competency behind building efficient Agent systems.

References

DEV Community