DEV Community

Pavel Polívka
Pavel Polívka

Posted on

The Review Revolution: Why Code Review Is Now the Most Important Skill You Have

The PR came in on a Tuesday. Clean diff. All CI checks green. GitLab Duo had already run its pass — no obvious security anti-patterns, no style violations. The author was a solid mid-level developer, two years on the team.

Then I asked: "Why did you choose this approach for the concurrency handling here?"

Pause.

"I think Claude Code suggested it and it seemed right."

That was not the author lying or cutting corners. They had reviewed the output, it looked correct, and they shipped it. The code was correct — at least for the happy path. What neither of us could answer in that moment: what happens when the lock acquisition times out? What is the failure mode if the downstream call is slow? Why this implementation over the simpler one?

That is not a junior developer problem. That is a 2026 code review problem. And most teams are running their last checkpoint with 2019-era habits.


What Changed

Before AI coding tools, the reviewer's job was well-defined: check the logic, catch edge cases, flag style violations, enforce team conventions. It was a hard job, but it had a reasonable assumption built in — the author understood what they wrote, because they wrote it.

That assumption is gone.

With Claude Code, GitHub Copilot, and similar tools in daily use, the reviewer now does two jobs:

  1. Check the code itself
  2. Check whether the author understands the code

These are different skills. The second is harder. And most review processes were not designed for it.

The failure mode looks like this: a developer gets a requirement, pastes it into Claude Code, gets working code, reviews the output for obvious issues, and ships it. The PR reviewer checks the diff, finds nothing wrong, approves it. Neither person can articulate the performance characteristics, the failure modes, or the security implications of what was just merged.

The data confirms this is not edge-case behavior. Veracode's 2026 analysis found that AI-generated code introduces security vulnerabilities in 45% of cases — in Java specifically, 72% of AI-generated code has at least one CWE hit. The Harness 2026 report found that organizations that adopted AI coding tools without updating their review processes saw a 34% increase in production incidents in 2025.

This is not an argument against AI tools. It is an argument that the bar for code review just moved, and most teams have not moved with it.


What Good Review Looks Like Now

1. The New PR Description Standard

The fastest way to know if a developer understood what they shipped is to make them explain it — in their own words, before anyone reviews the code.

Add this to your team's PR template:

## What does this do?
[Explain the intent — not the code, the problem being solved]

## How was this implemented?
[Explain the approach. If AI-generated, say so and explain why you accepted it.]

## Edge cases considered
- [ ] [At least 3. Be specific.]

## How was this tested?
[If AI-generated tests were included, note which assertions were written by hand]
Enter fullscreen mode Exit fullscreen mode

When a developer cannot fill in "edge cases considered," they do not understand what they shipped. That is the signal — and it surfaces before the reviewer has read a single line of code.

The best-performing engineering teams in 2026 are now requiring AI-generated PRs to include this explicitly: what the approach is, what edge cases were considered, and at least one self-generated test per non-trivial function. Not as bureaucracy — as a forcing function for comprehension.

2. The Three Review Questions for AI-Generated Code

When reviewing a PR that includes AI-generated sections — Claude Code output, GitLab Duo suggestions, Copilot completions — ask these three questions. Direct them at the code, not the author.

Question 1: What is this code optimizing for?

AI-generated code is optimized for "solves the stated problem" and "compiles." It is not optimized for your specific performance SLAs, your team's error-handling conventions, or your domain's threat model.

Look at the implementation choices. Is this optimized for throughput? For readability? For minimal memory allocation? Often the answer is: it's optimized for none of those, because the prompt didn't specify any of them. That is where the mismatch lives.

Ask the author: "What constraint was this generated to solve?" If they cannot answer, the review should not proceed.

Question 2: What happens when this fails?

AI tends to generate happy-path code. Exception handling, circuit breakers, timeouts, and retry logic are frequently omitted or wrong.

Look specifically at: what exceptions are caught and swallowed? What happens if the external call is slow or returns a 503? Is the failure mode recoverable or catastrophic? If a pod restarts mid-operation, what is the state of in-flight transactions?

These are not exotic scenarios. These are Tuesday morning incidents. And they are the exact kind of thing AI code generation consistently misses, because the prompt that generated the code did not mention them.

Question 3: Would a junior on this team be able to maintain this in 18 months?

AI-generated code can be locally clever and globally incomprehensible. If the implementation relies on a pattern that is not in your team's standard vocabulary — and the PR description does not explain it — that is a code smell, even if the code is technically correct.

Maintainability is not just about readability. It's about whether the next developer who touches this file will understand the intent well enough to make safe changes. AI-generated code often optimizes for correctness at the cost of clarity.

3. GitLab Duo in the Review Flow

GitLab Duo Code Review integrates at the merge request level and is genuinely useful — it catches style violations, obvious logic errors, and common security anti-patterns quickly.

The key is using it correctly: treat Duo's pass as the mechanical layer, not the full review. It handles the things a linter and a static analyzer would catch, at the speed of a bot.

The human reviewer's job starts where Duo's ends: does the author understand what they shipped? Is the failure handling correct for this domain? Does the implementation reflect team conventions that are not captured in any linter rule?

Duo is not a substitute for the three questions above. It is a first pass that frees you to ask better questions.


Code Review as a Career Signal

AI coding tools have done something unexpected to engineering career ladders: they made junior and mid-level code syntactically indistinguishable from senior code.

A developer with 18 months of experience can now generate a working microservice — clean, organized, passing all automated checks — that looks like something a staff engineer would write. The output is impressive. The signal it sends about the developer's ability is... unclear.

What AI cannot do is teach you to catch what it missed.

Recognizing silent exception swallowing that will cause a service to fail silently under load — that requires having seen it happen. Noticing a data access pattern that will cause N+1 queries as the dataset grows — that requires understanding how your ORM generates SQL. Catching retry logic with no jitter that will thundering-herd your downstream service at exactly the moment you least want it to — that requires having been on-call when it happened to someone else.

These instincts come from production experience. They come from reading incident reviews. They come from having opinions that are more specific than "this looks right."

In 2026, the clearest seniority signal is not what you can generate — it is what you can review.

If you are a mid-level developer trying to reach senior: become exceptional at code review. Not just at finding bugs, but at teaching through review. The comment that explains why a pattern is dangerous, not just that it is. The question that forces the author to reason through failure modes they had not considered. That is senior engineering. AI cannot fake it.


The Checklist

Before approving any PR that includes AI-generated code:

Comprehension checks:

  • [ ] Author can explain the approach in their own words
  • [ ] PR description lists at least 3 edge cases, with specifics
  • [ ] Implementation choice is justified (why this approach over alternatives)

Failure mode checks:

  • [ ] Exception handling is explicit — no empty catch blocks, no silent swallowing
  • [ ] External calls have timeout and retry logic appropriate for the failure profile
  • [ ] Failure modes are recoverable (or the PR explicitly notes they are not)

Quality checks:

  • [ ] At least one non-trivial test was written by the author, not generated
  • [ ] Logging is present at failure points, not just on the happy path
  • [ ] Any new DB query has been analyzed for N+1 risk

Red flags that should pause approval:

  • "I'm not sure why this works, but it does"
  • 100% test coverage where all tests are generated and assert the obvious
  • An exception handler that logs and swallows without alerting
  • An implementation that is clever in a way that is not explained anywhere

The code review is the last checkpoint before production. For most of engineering history, it was also a reasonable proxy for whether the author understood what they wrote.

That proxy is gone. The checkpoint remains.

The teams that figure out how to update their review process — not abandon AI tools, not pretend nothing changed, but update the actual process — will have fewer incidents, faster onboarding, and a clearer signal for who is actually senior.

The code review did not get less important. It got harder. And that is exactly the kind of harder that separates good engineers from the rest.


What does your team's current process look like for reviewing AI-generated code? I'm curious what's working — share it in the comments.

Top comments (0)