Jacob

Posted on Jun 10

The Review Bottleneck: Rethinking Software and Infrastructure Design for the Agent Era

#softwaredevelopment #agents

AI coding agents made code generation cheap. They did not make software delivery cheap. Teams adopting agents at scale keep hitting the same wall: a productivity surge at the start, then review queues that grow faster than anyone can drain them, merge conflicts between concurrent agent sessions, and humans turning into the slowest, most tired component in the pipeline. The problem is not the agents and it is not the reviewers. We are pushing machine-speed generation through architectures and workflows built for human-speed teams. Conway's Law and Brooks's Law still apply, and they bite harder now. The fix is structural. Design explicit boundaries before you deploy agents. Replace ad-hoc coordination with contracts. Move human review from output to intent. Make conformance checking mechanical. The same applies to infrastructure as code, where monolithic state and implicit coupling turn agent concurrency into an incident generator.

1. The bottleneck moved. It did not disappear.

Brooks's Law says adding people to a late project makes it later, because communication and integration overhead grows quadratically with team size. The law was never about typing speed. It was about coordination.

A coding agent is a way of adding ten engineers to your project overnight. The typing is free. The coordination is not. Every generated change still has to be understood, integrated, verified, and trusted by someone. If generation throughput goes up 5-10x while verification throughput stays flat, the system does not get 5-10x faster. It gets a queue. That queue is your review backlog, and the person draining it is burning out.

This is Amdahl's Law applied to delivery: total speedup is capped by the part you did not accelerate. Generation was never the dominant cost of software engineering. Understanding, deciding, and verifying were. Agents made that obvious.

If you have run multiple agent sessions against one repo, you know the symptoms. PRs land faster than anyone can read them properly. Two agents touch the same module and produce conflicting changes. Someone approves a 1,500-line diff because actually reading it would take a day. Interfaces drift because nobody holds the whole picture anymore. None of this is new. It is what happens when you put ten contractors on a project with no architecture and one tech lead who has to read everything. We rediscovered Brooks at machine speed.

2. Conway's Law is now a design tool

Conway's Law: systems mirror the communication structures of the organizations that build them. For decades you inherited an org chart and got the architecture it implied. Designing the org to get the architecture you want, the inverse Conway maneuver, was the advanced move. Amazon's two-pizza teams are the best-known case.

Worth being precise about why the Amazon model worked, because most people remember the wrong half. The pizza is the famous part: teams small enough that internal coordination stays cheap. The load-bearing part was the 2002 interface mandate. Every team exposes functionality through service interfaces. No shared databases. No backdoors. No reading another team's data store. The size limit bounded communication inside teams. The mandate bounded communication between them. Companies that copied the small teams and skipped the interface rule got small teams that still trampled each other.

With agents, Conway's Law stops being a constraint and becomes a free variable. An agent has no career ambitions and no reporting line. You construct its communication structure from scratch, and you should do it before scaling up, not after. The mapping is almost one-to-one. A bounded context with a published contract replaces the service team with its API. Exclusive write ownership replaces team code ownership. Context scope, meaning what fits coherently in an agent's working context, replaces headcount as the sizing constraint. The interface mandate becomes a CI rule instead of a managerial threat: an agent that imports across a boundary fails the build. No appeals process.

The principle: interference is an architecture failure, not an agent failure. If two agents can conflict, the boundary between their domains was never real.

3. Review the intent, not the output

The deepest cause of review fatigue is that we review the wrong artifact. Generated code is the cheap output of an expensive decision. Reading it line by line means spending scarce human attention on the thing that cost nothing to produce, while the thing that mattered (the intent, the interface change, the design decision) flies past buried in a wall of diff.

Move human judgment upstream, where it has leverage. This is what spec-driven workflows like OpenSpec and Architecture Decision Records are for, and they matter more now than when they were invented.

The division of labor is straightforward. ADRs hold the slow-moving decisions: context boundaries, technology choices, invariants, the rules agents must never break. They change rarely and every agent reads them as standing doctrine. Specs hold per-change intent: what is changing, why, which interfaces are affected, what the acceptance criteria are. A human reviews and approves the spec before generation starts. That is a one-page read instead of a two-thousand-line diff. The generated code then gets checked for conformance to the approved spec, and conformance is a question machines can mostly answer.

Under this model, human review degrades gracefully from gatekeeping to sampling. You approve intents, spot-check outputs, and step in when a gate flags something. The per-PR question shifts from "is this good code?", which is subjective and unbounded, to "does this match what we agreed?", which is checkable.

4. Make conformance mechanical

Sampling-based review is only safe if the mechanical gates are strong. The real investment agent adoption demands is not better prompts. It is verification infrastructure.

Contract tests pin every published interface, so an agent cannot change a boundary without the change being explicit. Architecture fitness functions enforce dependency rules in CI: context A does not import context B's internals, full stop. Property-based tests check invariants instead of examples, which matters because agents are very good at making example-based tests pass for the wrong reasons. Schema and data-contract validation does the same job at the data layer. Policy-as-code (OPA, Sentinel, and friends) encodes the security and compliance rules that used to live in reviewers' heads.

A useful mental shift: in an agent-heavy codebase, the tests and contracts are the specification of record. The code is a derived artifact. Put your strongest review on the tests and contracts. Let the derived artifact flow through gates.

The second mechanical lever is diff discipline. A 200-line change with one intent is verifiable in minutes. A 2,000-line change touching four concerns is not verifiable at all, only approvable. Agents drift toward sprawl unless constrained, so make it doctrine: one intent per change, enforced by convention and by tooling where possible.

5. Infrastructure as code: same disease, worse symptoms

All of the above applies to IaC, with higher stakes, because infrastructure mistakes are measured in outages rather than bugs.

The monolithic Terraform root module is the agent-era anti-pattern. A single large state file is a global lock and a global blast radius. Two concurrent agent plans against it will conflict, and any error can touch everything. The remedy is the bounded-context logic applied to state: decompose into small, layered stacks, per domain, per environment, per concern, each with its own state. Agent concurrency then maps onto state-file parallelism instead of state-file contention.

Module interfaces are contracts and deserve the same treatment as service APIs. Explicit variables and outputs, semantic versioning, a designated owner. Consumers pin versions and code against the published interface. Only the owner changes the module. Implicit coupling, like one stack reading another's state or depending on resources it does not manage, is the IaC version of reaching into another team's database. Ban it with the same severity the Amazon mandate banned backdoors.

terraform plan deserves credit as the original spec-driven workflow: a machine-generated statement of intent, reviewed before execution. Lean into that. Humans review plans and policy reports, not HCL diffs. Policy-as-code gates encode the judgment that used to require a senior engineer's eyes on every change: deny public buckets, require tags, restrict instance types, cap cost deltas. Drift detection closes the loop, because in a multi-agent world the most dangerous state is the one nobody owns.

Ephemeral preview environments give agents a place to prove changes empirically. An agent that deploys to an isolated environment and runs conformance checks against it converts review from reading into evidence.

6. Tier autonomy by blast radius

Not every change deserves the same scrutiny. Pretending otherwise is exactly what exhausts reviewers. Efficient agent operations need an explicit autonomy ladder, agreed in advance and encoded in tooling.

Bottom of the ladder: low-blast-radius changes such as internal refactors, test additions, and docs. If they pass all gates, they merge automatically, with humans sampling after the fact. Middle: anything touching a published contract, a schema, or production infrastructure requires human approval of the spec and the diff. Top: changes to the boundaries themselves. New contexts, deleted interfaces, security posture, anything irreversible. Those require an ADR and a human decision, every time.

The agent has to know which lane it is in and stop at the edge. A disciplined agent that proceeds within its mandate, asks at the boundary, and refuses what it cannot verify is worth more than a brilliant one that does everything and therefore has to be checked on everything.

The asymmetry to engineer for: fast inside the boundary, deliberate at the boundary. That asymmetry is the entire efficiency model. Internal velocity is the payoff. Boundary friction is the price of keeping it safe. Review fatigue is what you get when you refuse to choose and apply medium friction to everything.

7. What to do, starting this quarter

A realistic adoption path does not start with more agents. It starts with the substrate.

First, map your bounded contexts and write them down as ADRs: boundaries, contracts, ownership. If you cannot draw the boundaries, agents will find them for you, expensively. Second, pick one context and harden its seams with contract tests, dependency rules in CI, and a published interface. Third, adopt spec-before-code for agent changes in that context, with human approval at the spec stage. Fourth, build out the gate suite until you trust it enough to let green low-risk changes merge with sampling instead of gatekeeping. Fifth, split your IaC state along the same boundaries and put policy-as-code in front of every apply. Then scale the number of concurrent agents. At that point each extra agent adds throughput instead of queue depth.

8. Conclusion

The uncomfortable truth of the agent era: the limiting factor is no longer how fast we can write software, but how fast we can responsibly accept it. Conway's Law says interference is an architecture problem, so solve it with boundaries designed before the agents arrive. Brooks's Law says coordination is the real cost, so solve it by replacing communication with contracts. Review fatigue says we are spending human judgment on the wrong artifact, so solve it by approving intent once instead of inspecting output endlessly.

None of this is exotic. It is the two-pizza team, the API mandate, trunk-based discipline, and policy-as-code. The best ideas of the last twenty years of software organization, applied to a workforce that happens to be synthetic. The teams that win with agents will not be the ones with the best prompts. They will be the ones with the best boundaries.

Top comments (2)

Shaquille Niekerk • Jun 10

Great read, and very insightful. I think one of the biggest shifts happening right now is that AI has made code generation cheap, but good architecture, coordination, and review are still expensive.

The more I work with tools like Claude Code, the more I realize the real challenge isn’t getting code written anymore. It’s building the right boundaries, workflows, and guardrails around the agents so humans don’t become the bottleneck.

Jacob • Jun 10

totally agree, the right architecture and boundares are getting critical in the design