Matías Denda

Posted on Jun 3

Code review when half the PR is AI-generated

#ai #codereview #softwareengineering #career

Picture a reviewer opening a PR on Monday morning. The title says "Add user search endpoint." They click on the Files tab.

+847 −12

Eight hundred forty-seven lines added. The reviewer has maybe 15 minutes before the next meeting. They scroll through the diff. The code looks clean — variable names are descriptive, functions are reasonable length, there are tests. They skim, see nothing obviously wrong, and approve.

Three weeks later, the feature ships. Users complain the search is slow. Two weeks after that, the database team files an urgent ticket: a query is scanning 4 million rows on every request. The fix takes a day. The post-mortem blames "insufficient load testing."

The post-mortem is wrong. The problem was that a reviewer with 15 minutes can't meaningfully review 847 lines of AI-generated code. And that's the new reality for half the PRs most teams see in 2026.

The review problem AI created

Before AI, review bandwidth loosely tracked writing bandwidth. If your team wrote 5 PRs a day, they produced at roughly human speed, which meant they were reviewable at roughly human speed. Reviewers could read at the pace code was written.

AI broke that equation. Writers can now produce 10x more code in the same time. Reviewers still read at human speed. Same team, same number of reviewers, 10x more code to review.

The result is what every tech lead I've talked to confirms: review quality has dropped across the industry since AI adoption. Not because reviewers got lazier. Because the volume became impossible to review well at the bar they used to maintain.

What bad AI-generated code actually looks like

If you've reviewed AI-generated PRs, you've probably noticed the problem isn't the kind of thing that jumps out. AI rarely produces code that's obviously broken. The dangerous code looks fine.

Some patterns I see repeatedly:

The library that doesn't exist. AI invents a package. The code imports @acme/super-fast-cache, uses it throughout, and there's no such package. It fails at install time, so it gets caught — but the reviewer didn't catch it. The reviewer trusted that if it's imported, it exists.

The API that doesn't work that way. AI uses methods that don't exist on real libraries, or exist but with different signatures. redis.mget(keys, { default: 0 }) — Redis doesn't have a default option. The code runs, the option gets ignored, defaults don't apply, bugs surface in production.

The silent assumption. AI writes code that makes assumptions it doesn't verify. It assumes input is UTF-8. It assumes timestamps are in UTC. It assumes the database timeout matches the service timeout. A reviewer reading the code sees nothing wrong because the assumptions are invisible — they exist in what wasn't written.

The "works in the tutorial" pattern. AI produces code that works for the tutorial case and fails at scale. The pagination example. The authentication middleware that doesn't handle token refresh. The file upload that buffers everything in memory. Every one of these looks clean in isolation.

These are not the kinds of bugs a junior reviewer catches. They're the kinds a senior reviewer catches because they've seen them before.

The skill that suddenly got very valuable

If AI broke the writing-to-reviewing ratio, the people who unbreak it are reviewers who can go through large diffs fast without losing signal. That's a skill. It's always been valuable. It's now scarce.

What senior reviewers actually do that juniors don't:

They read diffs top-down, not linearly. Junior reviewers start at line 1 and read through. Senior reviewers scan structure first: What files changed? What's the shape of the change? Is the scope what the PR title claims? Only after they understand the shape do they drill into code.

They look for what's missing, not what's there. A reviewer who reads AI code line by line will find typos and style issues. A reviewer who asks "what tests would fail if I were trying to break this?" finds the real problems. Missing error handling, missing edge cases, missing cleanup in failure paths.

They flag assumptions, not just bugs. Senior reviewers comment: "What happens if the user list is empty?" "What's the behavior when the upstream service times out?" "Is this endpoint idempotent? If not, should it be?" These questions don't point to broken code — they point to the invisible choices AI made by default.

They know the 80/20 of the codebase. A senior reviewer who's been on a codebase for a year knows which files are load-bearing, which services are latency-sensitive, which modules have caused production incidents. They weight their attention accordingly — a one-line change in the payment service gets more scrutiny than a 100-line change in the marketing page.

None of this is new. All of it got more valuable.

What the review bar has to look like now

The reviewer's mandate has shifted. It used to be "catch obvious bugs." It now has to be "verify the code is correct for this codebase, at this scale, for this business."

Concrete changes that mature teams are adopting:

Smaller PRs, enforced. If AI lets developers write 800-line PRs in a morning, the team needs to cap PR size institutionally. Most teams landing on a ~400-line max, split across multiple PRs when exceeded. The rationale: a 400-line PR can still be reviewed well in 20-30 minutes; 800 lines cannot.

Required "context" section in PR description. Not just "what does this do" — but "what scale does this need to handle? What failure modes did you consider? What did AI write that you verified carefully?" Makes invisible decisions visible.

Pairing AI-heavy code with focused review. If a PR is substantially AI-generated, it gets a more experienced reviewer by default. Teams assign this through CODEOWNERS or routing rules.

Review budgets in schedules. Used to be that review was something engineers did in the cracks of their day. With AI volume, review is a first-class activity. Teams that take this seriously reserve 1-2 hours per day for deep review, protect it like any other focused work.

A practical setup

Here's a GitHub Actions snippet that enforces PR size limits:

# .github/workflows/pr-size-check.yml
name: PR size check
on:
  pull_request:
    types: [opened, synchronize]

jobs:
  check-size:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - name: Check PR size
        run: |
          CHANGES=$(git diff --shortstat origin/${{ github.base_ref }}...HEAD \
            | awk '{ print $4 + $6 }')
          echo "Total changed lines: $CHANGES"
          if [ "$CHANGES" -gt 400 ]; then
            echo "::error::PR exceeds 400 lines ($CHANGES). Split into smaller PRs."
            exit 1
          fi

This isn't a perfect tool. Some legitimate PRs are larger (big refactors, generated code). The point isn't to be strict — it's to force a conversation every time a PR exceeds a reasonable size. Sometimes the conversation concludes "yes, this one is fine." Often it concludes "this could be three PRs."

And a template for PR descriptions that surfaces invisible AI decisions:

<!-- .github/pull_request_template.md -->

## What changed

<!-- High-level summary -->

## Why this change

<!-- Business or technical reason -->

## Context that doesn't live in the code

*Expected scale:* <!-- e.g., 10k req/min, 1M rows -->
*Failure modes considered:* <!-- e.g., upstream timeout, DB unavailable -->
*AI-assisted sections:* <!-- Which parts used AI, and what you verified -->

## Testing
<!-- How you tested this beyond CI -->

## For the reviewer
<!-- What to pay special attention to -->

This template takes an extra 2 minutes to fill out. It saves reviewers far more than that, and forces the PR author to surface the context AI couldn't generate on its own.

The hardest thing about all this

The hardest thing about post-AI code review isn't technical. It's cultural.

Teams that were getting along fine with casual review are suddenly finding that their production incident rate is climbing, and the incidents trace back to merged PRs that looked fine. The temptation is to blame AI and argue that we should use it less. That's the wrong conclusion — AI is here, and the productivity gains are too real to forgo.

The right conclusion is harder: the review bar has to go up, and teams need to invest in reviewing like they invest in writing. That means explicit time budgeted, training for junior reviewers, pairing on complex reviews, and accepting that shipping 10x more code means spending more total time on quality, not less.

If you're a tech lead reading this: your team needs explicit guidance on what good review looks like in this new context. Nobody can figure it out on their own. The old instincts don't scale to AI-era volume.

If you're a senior IC: your review skill just became one of your most valuable assets. Invest in it. Slow down on your own writing if necessary. A thoughtful review that catches a scaling issue before production is worth more than a dozen PRs merged without one.

If you're a junior: reviewing is the fastest way to learn the codebase and develop senior judgment. Pair on reviews with people who catch what you don't. Ask them to explain their reasoning. That's where the real training happens now — not in writing code AI will increasingly write for you, but in evaluating whether AI's output is correct for your specific context.

Back to the amplifier

Two weeks ago I argued that AI amplifies what developers already are. The same applies to reviewers.

A senior reviewer using the new review practices — smaller PRs, explicit context, reserved time — catches 10x more issues than they used to, because they're reviewing higher-density code more deliberately.

A junior reviewer rubber-stamping AI-generated PRs misses 10x more issues than they used to, because the volume grew faster than their experience.

Same AI. Same reviewers. Radically different outcomes.

The fundamentals of code review — reading for missing cases, flagging assumptions, knowing your codebase — haven't changed. They've just become the thing separating teams that thrive in the AI era from teams drowning in production incidents.

Don't skip them. They're not optional. They're more important now than ever.

This post is part of a series on AI and engineering fundamentals. My book Git in Depth has a full chapter on code review — the principles that don't change whether code is human-written or AI-generated.

See all my articles on Git and engineering practice: dev.to/mdenda.

Top comments (2)

Nasif Sid • Jun 4

This is a very timely point. AI has definitely changed the writing speed, but review speed has not changed the same way.

I especially agree with the idea that AI-generated code often looks clean but can still hide risky assumptions. Things like missing edge cases, wrong API usage, performance issues, or scale problems are easy to miss when the PR is too large.

For me, the biggest takeaway is that code review now needs more structure, not less. Smaller PRs, better PR descriptions, clear testing notes, and reviewer guidance can make a big difference.

AI can help us write faster, but it does not remove the responsibility of understanding what we are merging.

Great reminder that senior-level review skills are becoming even more valuable in the AI era.

Matías Denda • Jun 4

Thanks — you summed up the core tension better than I did: writing speed moved, understanding speed didn't, so the whole bottleneck slid onto review.

One thing I'd add to your "more structure, not less" point: AI can also write the PR description and the testing notes. That sounds like a win, but it's a subtle trap — if the author didn't write the code and didn't write the explanation of it, then the reviewer is now auditing two unverified artifacts instead of one. The description stops being "here's what I did and why" and becomes "here's what the model claims it did." So the structure has to be authored by a human who actually understood the change, otherwise it's just more clean-looking surface to miss things under.

Fully agree on the last part: senior review skill is the scarce resource now, not code.