Code Review Is Where the Value Is Now

#ai #softwaredevelopment #programming #devops

With agentic coding tools now mainstream, there is an emerging conversation about which parts of how we build software still make sense. Rethinking the old ceremonies is healthy and necessary. But the velocity these tools introduce comes with a new pressure. Writing is no longer the constraint. Reviewing is. Teams with high AI adoption are merging pull requests at a significantly higher rate, but review time is climbing with it. The choice becomes: lower the bar to keep pace, or hold the line and become the bottleneck to your own delivery. Most organizations are quietly doing one or the other without fully admitting it. Neither is a real solution.

Some in the industry have landed on the conclusion: if the old gate can't keep up, remove it. Automate the review. Let AI write the code, and AI review it. Shift everything upstream to specs, and let the machines handle the rest.

I think we should be more measured in our approach. The question is not whether to use these tools. The question is what we give up when we remove human judgment and taste from the loop, and whether we are being honest with ourselves about that tradeoff.

Specs Don't Precede Understanding

The spec-driven development movement is gaining momentum. The idea is clean: humans write specifications, agents generate code to match them, and deterministic tests verify the output. Code review becomes unnecessary because the spec was the real artifact all along.

The problem is that specs don't precede understanding. They emerge from the process of building. Birgitta Böckeler at Thoughtworks put it well in her recent evaluation of SDD tools: the best way to stay in control of what you're building is small, iterative steps, and up-front spec design runs counter to that. The problem is not the spec. The problem is the assumption that a spec, no matter how detailed, can fully describe a problem space well enough to close the loop entirely.

A spec can capture what you know. But the things you don't know you don't know only emerge through the process of building and the friction of real scrutiny. No document written before the work begins can anticipate them. And when they go undiscovered, the consequences are not theoretical. They affect real users, real systems, and real businesses.

Every project has unknown unknowns. That is not new. What is new is the risk of removing the mechanism by which they get discovered. In traditional development, the process of building is itself a discovery process. You encounter the edge case, you realize the spec was incomplete, you resolve it. The iteration surfaces what the spec could not anticipate. In a fully automated pipeline, that loop is closed. The agent makes decisions silently. The automated reviewer checks what it was told to check. Nobody is deeply thinking through the problem space. The unknown unknowns do not surface until something breaks.

And even setting aside what the spec couldn't anticipate, there is the question of whether the agent faithfully implemented what it was given. Agentic workflows go off the rails. They make decisions that deviate from the spec in subtle ways that are hard to catch after the fact. The 2025 DORA report found that AI adoption continues to have a negative relationship with software delivery stability and that, without robust feedback loops and review processes, increased change volume leads directly to increased instability. That is not sentiment. That is production data across nearly 5,000 technology professionals.

The strongest counterargument is not that humans should be removed entirely, but that their role should shift to spec authorship and system design rather than diff review. That is a reasonable position. But it assumes the gap between a spec and its implementation is small enough to ignore. The data suggests otherwise.

The Reasoning Isn't in the Diff

AI systems make decisions constantly during implementation, and they rarely surface the rationale behind those decisions. Choosing one library over another, selecting an architectural pattern, picking a dependency. These aren't syntax errors. They won't be caught by a linter or a test suite. They represent judgment calls made silently, with no explanation attached.

In traditional development, you could walk over to the engineer who wrote the code and ask why they made a decision. That conversation surfaces context that never makes it into the diff. With an agentic workflow that session is gone. You cannot reconstruct the reasoning after the fact in any reliable way. A reviewer isn't just checking whether the code works. They're asking whether the decisions that produced it were sound.

Who Owns What Ships

There is a question that the fully automated pipeline does not answer: when something goes wrong, who is responsible?

In a spec-to-agent-to-production workflow, the spec writer defined the intent. The agent produced the implementation. The automated reviewer flagged nothing. And yet something broke in a way nobody anticipated. The spec writer may not have the depth to debug it. The agent has no accountability. The pipeline has no memory of why decisions were made.

Code review is not just a quality gate. It is how ownership gets established. When an engineer reviews a change and approves it, they are accepting responsibility for understanding what ships. That accountability is not a bureaucratic formality. It is what drives the careful thinking that catches problems before they reach production. Remove the reviewer and you do not just remove a checkpoint. You remove the person whose name is on it.

You cannot automate away accountability and expect quality to hold.

Invest in the Review, Not Around It

None of this is an argument that human code review is reliable by default. Rubber stamp approvals are real. Reviewers share mental models with authors and miss the same things. The current state of code review is not something worth defending as-is.

But comparing current human review to a fully automated pipeline is the wrong frame. Code review has historically been a secondary activity, something engineers fit around their primary work without dedicated tooling or support. The argument here is not to preserve that. It is to invest in something better: agentic tooling designed to give the engineer who owns a system the context they need to review with confidence. That is not the same activity as scanning a diff for eight seconds. And it is not something you can compare fairly to what we have today.

The SDLC has always been a funnel. Features get scoped, code gets written, changes get reviewed, and software gets shipped. What has changed is where the value is concentrated. Writing code used to be the hard part. It is no longer. The hard part is now the review: understanding what was built, whether it was built right, and whether it fits the system it is being added to. That is where engineering judgment lives. That is where investment should go.

What that looks like in practice is agentic tooling designed to augment the human reviewer, not replace them. The industry spent years investing in making code faster to write and underinvested in making review faster and sharper. That needs to change. Tools that surface how a change fits into the broader system. Tools that expose the reasoning behind implementation decisions so a reviewer can interrogate them. Tools that help the engineer who owns that service pressure test what shipped against what was intended. Not tooling that rubber stamps a diff. Tooling that maximizes truth seeking.

The engineer with accountability for a system should be equipped to ask hard questions quickly. Does this change respect the constraints of the system it touches? Does it introduce dependencies that weren't considered? Does it hold up under conditions the spec didn't anticipate? Those questions require judgment. The tooling should make that judgment faster and sharper, not obsolete.

The goal is not to keep up with the machines. The goal is to ship software that works, that holds up under real conditions, and that we understand well enough to own. Our customers depend on that. Staying in the loop is not a limitation. It is how we honor that responsibility.

The future of software engineering is judgment.

Sources: