The division of labor in AI systems

#ai #devtools #opinion

When building AI systems that handle multiple kinds of tasks, do you let different agents exist side by side — each owning its own domain — or do you funnel everything through a single entrypoint that routes to the right agent? At work, this design question has come up multiple times already.

Some of my colleagues and I presented our first agent POCs to the team last week. That question is now more urgent.

I’m increasingly convinced a single entrypoint routing to specialists is the better starting point, because the pattern itself is intuitive: a coordinator handles synthesis and judgment; each task goes to a specialist that is sufficient for the job and cheaper than the coordinator. In a well-run office team, the coordinator handles the routing so specialists can focus. AI agents have different constraints, but the structural logic is the same: routing and deep domain work are different reasoning modes, and mixing them in a single agent has a cost.

The coordinator sits at the top, receives any request, and routes it to the right specialist before synthesising the response.

The alternative

If each agent owns a clear domain — code, research, documentation — you pick the right one and off you go. The routing is done by the user, not the system. But what if the user has a request like “summarise the Q2 report and flag anything that looks wrong”? This is a retrieval task and a reasoning task at the same time. A side-by-side system would require you to split it upfront. The coordinator system handles the split for you.

What the coordinator enables

When the coordinator is responsible for understanding the request, the system can handle tasks that don’t fit neatly into one domain. Specialisation works on two levels. Domain — the problem space the specialist operates in, like code review or document retrieval. And thinking level — how deeply the specialist needs to reason. These aren’t the same thing. A simple script review and a complex distributed systems codebase are both code review, but they don’t call for the same reasoning depth. The coordinator doesn’t just pick the right domain specialist; it also has to estimate how much thinking the task requires.

What the research says

Google Research, Google DeepMind, and MIT published a large study on this in late 2025. In Towards a Science of Scaling Agent Systems, they tested five architectures across four benchmarks and measured error amplification — how much a mistake in one agent’s output cascades through the rest of the system. Independent architectures (no inter-agent communication) amplified errors 17.2× over the single-agent baseline. Centralized coordination contained that to 4.4× through a validation bottleneck. Decentralized and Hybrid fell between them at 7.8× and 5.1× respectively.

The picture isn’t one-directional: performance ranged from +80.9% on decomposable financial reasoning to −70% on sequential planning, depending on task structure. The coordinator-specialists pattern I’m describing isn’t universally better, but it’s better for the kind of work I’m thinking about. Trying this in practice will tell me where the boundaries actually are.

Where this leaves me

In my own research I have found that frameworks like LangGraph and AutoGen model this topology explicitly — a coordinator routing to specialists, with data passed between them. Both are on my list to dig into.

The design question is still open, but I am leaning towards the coordinator-specialists pattern. I’ll find out more once I start building.