Code review's real ROI isn't catching bugs

#codereview #softwareengineering #career #ai

Most teams treat code review as a defect filter. The research says that is the wrong scoreboard.

Bacchelli & Bird (ICSE 2013) studied modern code review at Microsoft. They surveyed 873 engineers and analyzed reviewer comments across multiple teams. The headline finding is uncomfortable: "finding defects" is the most-stated motivation for doing code review — but defects are not what dominates the actual review output.

Most comments fall into:

Code improvement suggestions — refactor this, simpler approach, name it better.
Knowledge transfer — explaining why the existing code looks the way it does, surfacing context only one teammate had.
Awareness and team alignment — teaching the reviewer about a part of the system, socializing a design choice across the org.
Defects — present, but a minority of comments.

The implication for how we run reviews is real.

1. Stop measuring reviewers by defects found. That metric optimizes for the wrong thing. A reviewer who left ten useful refactor suggestions and zero "bugs" did the high-value work. Defect-counting metrics push reviewers toward easy nitpicks (style, naming) and away from the harder structural feedback that actually compounds.

2. Pick reviewers for change context, not for "best bug catcher." The same study found reviewer effectiveness is driven primarily by understanding the change — its history, its dependencies, the team's prior decisions. Which means rotating reviews to the person closest to the affected subsystem beats routing them to the most senior generalist.

3. Use reviews for onboarding. If knowledge transfer is the dominant outcome, reviews are the cheapest onboarding mechanism you have. Pair every junior PR with a senior reviewer not because the senior will catch bugs the junior missed, but because the conversation is where the team's mental model gets transmitted.

4. AI reviewer tools should optimize for the right job. Most LLM-based PR reviewers are tuned to flag "potential issues." That's the lowest-leverage quadrant of human review. The high-leverage quadrant is suggesting better approaches and surfacing context. The tools that move past defect-flagging into context-aware refactor suggestions and architectural commentary are the ones that compound team capability.

The deeper point: code review's value lives in the team layer, not the code layer. The code is the medium. The team's shared understanding is the product.

Citation: Bacchelli, A., & Bird, C. (2013). Expectations, Outcomes, and Challenges of Modern Code Review. ICSE 2013. DOI: 10.1109/ICSE.2013.6606617

What does your team's review process actually optimize for — and is that what you want it to?

DEV Community

Code review's real ROI isn't catching bugs

Top comments (0)