You don't need sub-agents

Anton Vinogradov — Tue, 16 Jun 2026 20:05:08 +0000

Open any "agent architecture" post from the last year and you'll hit the same diagram: a box marked Orchestrator at the top, arrows fanning down to a Researcher, a Coder, a Tester, a Reviewer — sometimes, God help us, a CEO agent. It's drawn beautifully. Gradient nodes, clean edges, the works. And it isn't an architecture. It's an org chart. You've seen this picture before; you've just never seen it perform.

We already ran this experiment. In 1975.

Fred Brooks ran it on IBM's OS/360 and wrote it down in The Mythical Man-Month in 1975: adding manpower to a late software project makes it later. The reason wasn't laziness, it was arithmetic. Every new person on a team adds communication paths, and those paths grow as n(n−1)/2 — so the coordination cost climbs faster than the help arrives. Two people, one line. Five people, ten lines. Ten people, forty-five.

Now look at the swarm diagram again. The orchestrator's entire job is to hand out subtasks and then reconcile whatever comes back — which is to say, to be the communication overhead Brooks warned you about. We took a fifty-year-old result about why throwing bodies at a problem doesn't scale, ported it to language models, drew it in a nicer tool, and called it an agent mesh. Progress, no?

Context doesn't survive delegation

Here's the mechanism, minus the diagram. A sub-agent runs in its own isolated context window. When it finishes, it can't hand the parent its reasoning — only a compressed summary of it. Anthropic's own context-engineering guide says exactly this: the sub-agent returns a condensed result while the detailed context that produced it stays trapped in a window nobody else can see.

So to give each child what it needs, you persist state — to files, to a database, to git — and re-read it at the top of every sub-agent. That's not an architecture either. It's plumbing. You are building and maintaining infrastructure whose only purpose is to recreate the shared context a single loop already has, for free, precisely because it never split the context in the first place.

The tell is the workaround. Go read the Claude Code issues and you'll find people having a sub-agent write a temp state file, or commit to git, just so the parent can rediscover what the child was doing. Cloudflare's code-review pipeline does the disciplined version of the same thing — a shared context file plus per-file patch files written to disk — specifically so they don't have to pass the whole merge request to seven reviewers seven times. When the fix for "the agents can't see each other's context" is "write the context to disk and have everyone re-read it," you've reinvented the shared memory you threw away. Slower, and with more YAML.

And the tokens. Splitting work across agents means re-passing context across every boundary, so you pay multiples of the token bill. The orchestrator isn't free either — it has to read and reconcile every child's output — so a fan-out of N agents tends to cost about N+1, not N. More moving parts, more tokens, more places to fail, all to produce the answer one loop would've produced alone. Expensive bullshit with a gradient fill.

Ask the people actually shipping agents

Start with the people who've actually shipped this. PostHog spent a year building agents in production and landed on the same conclusion: a single loop beats sub-agents, because every layer of delegation loses context and chips away at the model's ability to chain tools and self-correct.

The loudest voice against multi-agent systems, though, is Cognition — yes, the company that builds Devin, a coding agent. In June 2025 their Walden Yan published "Don't Build Multi-Agents", and the core point was the one above: parallel agents make conflicting implicit decisions because none of them can see what the others assumed. Ask two sub-agents to build the same thing and you get a bird in one art style and a background in another. (His Flappy-Bird example, not mine.)

Then watch what happened next. Ten months later, April 2026, Yan shipped the follow-up — multi-agents that do work in production. Read the conditions. The pattern that works keeps the writes single-threaded: many agents can contribute intelligence, but one thread commits. The free-for-all swarm, arbitrary agents negotiating with each other, he still files under distraction. The biggest proponent's "we figured it out" turns out to be "we stopped letting them step on each other."

Now the objection you're loading: but Anthropic's research system beat a single agent by 90.2%. It did — on research. That setup burns roughly 15× the tokens of a chat, and token usage alone explains about 80% of the performance variance. Translated: the win is mostly "we paid for far more compute," parallelized across independent search directions where that's legal. If your task decomposes into independent, read-only directions and the answer is worth the bill, do it. If it doesn't, you're paying that multiple to re-pass context one loop would've held for nothing. Read the conditions.

And when these systems break, they break in a specific way. The Berkeley MAST study (Cemri et al., 2025) annotated over 1,600 execution traces across seven popular frameworks — AutoGen, ChatDev, CrewAI, the usual suspects — and clocked failure rates from 41% all the way to 87%. The failures cluster around design and coordination: system-design issues and inter-agent misalignment, agents talking past each other and never agreeing the job is done — not the model being dumb. The authors say it straight: a better base model will not fix this, because it isn't a model problem. It's an org-chart problem wearing an architecture diagram. Notice the pattern?

Where you actually parallelize

When two jobs are genuinely independent, parallelize them — as two separate loops, each with its own full context and no orchestrator in the middle. That isn't a multi-agent system. That's running two programs.

The line is that simple. Parallelize independent work as separate loops. Never stand up an orchestrator to split, then reassemble, a single train of thought — because the reassembly is the entire cost, and one loop never had to pay it. A single loop self-corrects across a hundred steps in one message history, checkpoints and resumes from any step if you put something like Temporal under it, and leaves one clean linear trace instead of a group chat you get to forensically reconstruct at 2am.

The better tool already exists. It's the loop you skipped past on your way to drawing the mesh.

DEV Community: Anton Vinogradov

You don't need sub-agents

We already ran this experiment. In 1975.

Context doesn't survive delegation

Ask the people actually shipping agents

Where you actually parallelize