olko

Posted on Jun 15

Code Review Used to Be a Power Game. AI Ended It.

#codereview #agents #ai #devrel

For years, the pull request was as much about politics as correctness. Something quietly changed - and we should ask what it cost.

original article: Code Review Used to Be a Power Game. AI Ended It. Nobody Noticed. By Oleg Koval

Picture a pull request with 47 comments.

Half of them are about variable names. Three reference a coding standard that exists only in one senior engineer's head. One of them - buried near the bottom - suggests the entire approach is wrong and should be rewritten. The author, a mid-level developer who spent two weeks on this feature, is staring at their screen wondering if they should just quit.

Now picture the same team, eighteen months later. PRs go through an AI agent first. Suggestions arrive before any human sees the diff. The author clicks through, accepts some, declines others, pushes again. When the human reviewer shows up, the conversation is shorter. Nobody's defensive. The senior engineer still leaves comments - but somehow they land differently now.

What changed? Nobody called a meeting about it. There was no announcement. It just got quieter.

The Comment Thread Was Never About the Code
Here is the thing most engineering retrospectives miss: code review was never purely a technical practice.

It was territory.

The pull request was where seniority got performed. Where "I know this codebase and you don't" was demonstrated, repeatedly, in public. Senior engineers who hadn't shipped significant new code in years could still dominate review threads. Volume of comments became a proxy for expertise. Tone became a proxy for standards. The loudest reviewer in the thread won something - even when they were wrong.

I've seen this pattern across teams at different companies. The most aggressive reviewers were rarely the best engineers. They were the ones who understood that the comment thread was the arena, and who had built their reputation there.

Research has documented this for over a decade. A landmark 2013 study by Bacchelli and Bird at Microsoft - based on interviews, observations, and hundreds of classified review comments across multiple teams - found that code reviews are "less about defects than expected" and instead serve social functions: knowledge transfer, team awareness, and relationship signaling. A follow-up 2015 study by Czerwonka and Greiler, also from Microsoft Research, concluded bluntly that "the social aspect of code reviews cannot be ignored" and that reviews often fail to catch the bugs they are supposedly there to find.

The implication is uncomfortable: we built a practice around defect detection, and what we actually built was a status ritual.

This is not unique to software. Any profession with apprenticeship structures, knowledge gatekeeping, and peer evaluation develops the same dynamics. Law firms have it. Academic peer review has it. Architecture firms have it. The medium changes, the power structure does not.

In software, the PR became the ritual space where that structure reproduced itself, daily, at scale.

The Arena Dissolved
When an AI agent pre-reviews a pull request, something structural shifts.

Suggestions arrive without ego attached. There is no career behind the comment, no relationship to manage, no status to signal. The author reads the suggestion, agrees or disagrees, and moves on. The social cost of declining a suggestion drops to zero. Nobody's feelings are hurt when you dismiss a Copilot comment.

The political charge drains out of the review process. The defensive crouch that junior developers learned to adopt before opening a PR - already anticipating the onslaught - starts to feel unnecessary.

The numbers reflect this, though the most-quoted ones are also the least interesting. Tool adoption saturated early. By 2025, 84% of developers said they use or plan to use AI tools - up from 76% the year before - and 51% reached for them every single day, according to Stack Overflow's survey of more than 49,000 developers. That curve has flattened; nearly everyone is already using something. Organizations using AI code review are reporting review cycles up to 40% shorter. The friction that once defined the PR process is compressing.

But the same survey carries a warning the cheerleaders skip past: trust is collapsing even as usage climbs. Only 29% of developers now say they trust the accuracy of AI output, down from 40% a year earlier. Developers are leaning on the tool harder while believing in it less - a tension worth holding onto, because it surfaces again later. By January 2026, Sonar's State of Code survey had put a hard edge on it: 96% of developers say they do not fully trust AI-generated output, and fewer than half always verify it before committing.

Source: Stack Overflow Developer Survey 2025 (49,000+ respondents)

But adoption is the boring half of the story. The interesting half is what the tools became. Through 2024, AI mostly autocompleted lines a developer was already typing. In 2025 and into 2026 the models matured - more capable, cheaper, and finally agentic - and crossed from suggesting code to writing and reviewing whole changes. That is the real 2026 inflection, and it does not show up in an adoption chart. It shows up in how much of the code is no longer being typed by a human at all: 42% of committed code by early 2026, on track for 65% by 2027.

Source: Sonar State of Code Developer Survey, January 2026 (1,100+ developers; the 2027 figure is Sonar's own projection)

The question quietly changed from "do you use AI" to "how much of this did you actually write."

And the speed gains are real too. A 2022 GitHub study found developers completed tasks 55% faster with Copilot - 71 minutes versus 161 minutes without it.

Source: GitHub Copilot research study, 2022

Though that headline now deserves an asterisk. A 2025 randomized controlled trial by METR found the opposite for experienced developers working in codebases they knew well: AI tools made them 19% slower, even as those same developers believed the tools had sped them up. The felt acceleration and the measured one are not always the same number - a gap that matters once you start trusting the feeling.

And this is genuinely good. I want to be clear about that before moving on, because what comes next is uncomfortable.

Developers who were routinely silenced by hostile reviewers now get their code merged on its merits. People who were told their approach was wrong without being told why now have a patient, non-judgmental interlocutor to think through alternatives with. The "I need to wait for X to approve before I can ship" blocker, which was often less about quality and more about power, shrinks.

The relief is real. It should not be understated.

Three Places the Toxic Reviewer Went
The mistake is to assume the relief means the problem is solved.

The toxic reviewer did not retire. The arena changed, not the person. And people who built their professional identity around dominating that arena adapted - or found new arenas - with the same instincts intact.

The Migrator. The gatekeeping relocated. Now it is about AI output. "The agent missed the real issue." "This prompt is not how we do things here." "You cannot just accept whatever the model suggests." The same person who used to hold court in PR comment threads now holds court in conversations about which AI tools are acceptable, whose prompts are correct, and whether the team's AI output meets some standard that exists, again, primarily in their head.

I have watched this happen in real time. The surface area shrank. The behavior did not.

The Exposed. Remove the review arena, and the underlying technical contribution becomes visible in a way it was not before. For some engineers, this discomfort is clarifying - it pushes them to build new skills, engage differently, find genuine ways to add value. For others, it turns out the arena was the value. The engineer who dominated reviews through volume and aggression, who built their reputation on aggressive gatekeeping, turns out not to have shipped anything significant in two years. Without the ritual to hide behind, the gap is harder to obscure.

The Purist. A third path is to reject the new order entirely and gain identity from the rejection. These engineers become the craft purists - the ones who "still read every line," who insist that human review is irreplaceable, who position themselves as the last line of defense against the sloppiness that AI enables.

Sometimes this is genuine and valuable. There are real things that careful human review catches that automated tools miss. But sometimes it is the same ego in a new costume, and the costume is "I care about quality more than you do."

The three paths are not cleanly separated. Real people move between them. The point is that none of them involve the underlying orientation disappearing.

What We Are Not Asking
Some friction was waste. Unambiguously.

But some was not.

Disagreement in code review caught real problems - not because the hostile reviewer was right, but because the pressure to defend and articulate a decision often revealed that the decision was wrong. The adversarial format, for all its toxicity, created a forcing function. You had to know why you made the choices you made.

Here is where the data gets interesting in a different direction. GitClear's 2025 analysis of 211 million lines of code put real numbers on what used to be a projection. Code churn - the share of new code reverted or rewritten within two weeks of being committed - climbed from 5.5% in 2020 to 7.9% in 2024. Over the same window, copy-pasted code overtook refactored code for the first time: duplicated blocks rose roughly eightfold, while the share of thoughtfully "moved" (refactored) lines collapsed from 24.1% to 9.5%. More code is being produced. More of it is being immediately corrected, and less of it is being cleaned up. When the friction in review drops and AI writes faster, the error surface does not shrink with it - it grows.

Source: GitClear - AI Copilot Code Quality report, 2025 (211M lines analyzed)

Google's 2025 DORA report, drawn from nearly 5,000 technology professionals, sharpens the picture. This year, for the first time, AI adoption correlated with higher delivery throughput - teams ship more - but it still correlated negatively with delivery stability. AI does not fix a team; it amplifies what is already there. Faster output meets weaker guardrails, and the breakage moves downstream.

When AI agents review each other's output - and this is already happening in agentic pipelines where one model writes and another critiques - the question of who catches the systemic error becomes harder to answer. Agreement between agents is not the same as correctness. Two models trained on similar data, operating from similar assumptions, will miss the same things in similar ways. The dissenting voice that was sometimes just an ego trip was also sometimes the person who saw the thing everyone else normalized.

We do not have clean data yet on how this resolves. The evidence is genuinely incomplete.

And there is early evidence for where some of that friction went. In Sonar's January 2026 survey, 38% of developers said reviewing AI-generated code takes more effort than reviewing code written by a human colleague. The review burden did not disappear when the comment threads went quiet. It detached from the person and reattached to the machine's output - less personal, no less real.

What we do know: we did not fix the toxic developer. We gave them less surface area. The friction they caused in the comment thread has moved somewhere else, or gone underground, or is waiting for a new arena to emerge.

That is progress. It is not the same thing as a solution.

What happens when the new arena emerges - and what it looks like when AI agents become the territory for the same old power games - is what the next piece will cover.

DEV Community

Code Review Used to Be a Power Game. AI Ended It.

For years, the pull request was as much about politics as correctness. Something quietly changed - and we should ask what it cost.

Top comments (0)