When to spawn a subagent (and when it just burns tokens)

#ai #claude #agents #productivity

When to spawn a subagent (and when it just burns tokens)

Every guide right now tells you the same thing: your coding agent's context is getting polluted, so offload work to a subagent with a clean context window. It's good advice. It's also the fastest way I've seen people accidentally 7x their token bill.

A subagent is a second copy of the model, spun up with its own fresh context. The parent conversation doesn't come along — the child starts with only the prompt you hand it, does one job, and returns a single string. That isolation is the whole point: the messy, half-finished exploration stays in the child, and your main thread only ever sees the tidy result.

The trap is treating "spawn a subagent" as a free reflex. It isn't. Each subagent carries its own copy of the system prompt, the tool definitions, and whatever files you pass in — every time. Community measurements put subagent-heavy workflows at roughly 7x the tokens of a single-threaded session. The horror stories are real: one financial-services team left 23 subagents analyzing a codebase unattended and came back to a $47,000 bill over three days. A single orchestrated run of ~49 parallel subagents has been estimated at $8,000–$15,000. Nobody meant to spend that. They just spawned by default.

So the real skill isn't "how do I spawn a subagent." It's when spawning one is worth the overhead — and when you're paying a tax for nothing.

The one rule

Spawn a subagent when the context it saves your main thread is worth more than the context the subagent costs to start up.

That's it. Read a task and ask two questions:

How much junk would this generate in my main thread if I did it inline? (Big grep dumps, ten files of exploration, a noisy test run, a long browser session.)
How much do I have to pay just to get the subagent going? (Its system prompt, tool schemas, and the files I have to re-hand it because it can't see my context.)

If (1) is much bigger than (2), spawn. If they're close — or (2) is bigger — do it inline.

The failure mode is delegating a two-line shell command. The subagent's startup overhead (prompt + tool definitions + an extra round trip) dwarfs the "clutter" of just running git status yourself. You paid a setup cost to avoid three lines of output. That's a loss every time.

Four patterns that pay for themselves

Here's where the math actually works out — and one anti-pattern.

1. Wide reads that you only need a summary of. Searching a large repo for "where is auth handled" can pull dozens of files into context. If your main thread only needs the answer — a file path and a two-sentence description — that's a perfect subagent job. It reads 40 files in its own context, hands back four lines, and those 40 files never touch your main thread. The overhead is trivially worth it.

# main thread, in the Agent tool prompt:
"Search this repo and tell me exactly where user
 authentication is enforced. Return: the file:line of
 the check, and one sentence on how it works.
 Do not return file contents."

You get back four lines instead of forty files. That's the trade paying off.

2. Isolating a critic from the maker. Never let the model that wrote the code also grade it — it's read its own reasoning and it's rooting for itself. A fresh subagent that only sees the diff and the acceptance criteria, with none of the maker's "here's why I did it this way," gives you a genuinely independent verdict. There's a whole class of these now: a browser critic that refuses to pass a page without a screenshot, a frontend critic that judges the rendered DOM. The value isn't just clean context — it's clean incentives.

3. Long, noisy tool sessions. A subagent that drives a browser, records a 200-step session, and captures errors will generate a firehose of output. You want the firehose contained. Let the child wade through it and return "3 flows passed, checkout throws a 500 on step 7, here's the trace." Your main thread stays legible.

4. Genuinely parallel, independent work. Three unrelated modules that don't share state can be three subagents at once — real wall-clock speedup. But "independent" is load-bearing. If they touch the same files, you'll get merge chaos and you'll pay for all three contexts anyway. Parallelize things that are actually parallel.

The anti-pattern: using a subagent as a fancy function call for something small and deterministic. Renaming a variable, running one lint command, reading a single known file. There's no context to protect and nothing to isolate — you're just paying the startup tax and adding a round trip. Do it inline.

Two habits that keep the bill sane

Pass the minimum, not "here's everything just in case." A subagent can't see your context, so it's tempting to front-load it with ten files so it "has what it needs." But you pay for every one of those tokens on every spawn. Hand it the task, the one or two files it truly needs, and a crisp definition of what to return. If it needs more, it can go read it — that's the point of giving it its own context.

Constrain the return value. The reason a subagent saves you money is that its final message is small. If you let it return a 3,000-word essay, you've moved the bloat, not removed it. Tell it the shape of the answer: "return a file path and one sentence," "return PASS/FAIL and up to three bullet reasons," "return the failing test names only." A tight return contract is what makes the whole trade worth it.

The mental model

Think of a subagent as hiring a contractor for a scoped job, not as a free helper standing next to you. You brief them (that costs something), they work in their own office where you can't see the mess (that's the benefit), and they hand you a one-page report (that's the payoff). You'd never hire a contractor to hand you a stapler. Same instinct here: delegate the wide, messy, or independent work — and keep the small, cheap, deterministic stuff on your own desk.

Get this one decision right and your long sessions stay both accurate (clean main context, no rot) and affordable (no 7x surprise). Get it wrong and you get the worst of both: a bloated bill and a bloated context.

If you're newer to all this and the words "context window" and "subagent" still feel slippery, that's exactly the gap I built the **AI Learning Ladder* to close — Level 1 walks you from "AI is intimidating" to "confident everyday user" in plain English, no jargon, for $9. It's at penloomstudio.com. Learn the mental models once and every tool after this gets easier.*

Top comments (1)

Skillselion • Jul 4

The return-contract point is the one most people skip, and it's the whole game. "You've moved the bloat, not removed it" should be on a poster.

Two things I'd add from getting this wrong. First, a prose return contract ("return one sentence") leaks - the model still pads. A typed/structured return schema holds far better because the shape is enforced, not requested. Second, the hidden multiplier in that 7x: the parent re-pays for the subagent's summary on every turn after it lands, since the summary now lives in the parent context. A "cheap" 300-word summary compounds across a long session, so the tight return contract isn't just a one-time saving.

Your point 2 - isolating a critic from the maker - is the one I'd rank highest, and less for the token savings than you'd expect. A fresh subagent that only sees the diff and the acceptance criteria changes the incentives, not just the context. The maker is rooting for its own reasoning; the critic never saw it. That's a quality win you'd want even if tokens were free.