DEV Community

Cover image for AI Is Quietly Destroying Code Review — And Nobody Is Stopping It
Harsh
Harsh

Posted on

AI Is Quietly Destroying Code Review — And Nobody Is Stopping It

It Started With a PR That Made Me Question Everything

Six months ago, I merged a pull request that I'm still not proud of.

The code looked clean. The logic seemed sound. My AI assistant had helped write it, another AI tool had reviewed it, and I — a senior developer with 5 years of experience — had approved it with a confident "LGTM 🚀".

Three weeks later, it caused a data inconsistency bug that took us 40 hours to debug.

The worst part? When I went back and actually read the code — really read it — I could see the problem. It was hiding in plain sight, beneath perfectly formatted, well-named, beautifully commented code that looked like it was written by a thoughtful engineer.

It wasn't written by a thoughtful engineer. It was generated by one AI, rubber-stamped by another, and approved by a human who had forgotten how to be skeptical.

That human was me.


The New Code Review Pipeline (And Why It's Broken)

Here's what "code review" looks like at a growing number of teams right now:

Developer → GitHub Copilot writes code
         → CodeRabbit / Cursor reviews it
         → Developer skims the AI summary
         → "Looks good!" ✅
         → Merge
Enter fullscreen mode Exit fullscreen mode

We've automated the process of code review without preserving the purpose of it.

Code review was never just about catching bugs. It was about:

  • Knowledge transfer — juniors learning from seniors by reading real decisions
  • Architectural awareness — everyone understanding how the system fits together
  • Collective ownership — building a team that genuinely cares about the codebase
  • Human judgment — asking "wait, should we even be doing this?"

AI tools are shockingly good at the surface layer. They'll catch a missing null check, flag a potential SQL injection, suggest better variable names.

But they don't ask why.


What AI Can't See (But A Human Reviewer Would)

Let me give you a real example from my team.

A junior dev submitted a PR that added a new caching layer. The code was technically correct. The AI reviewer loved it — "Efficient implementation! Good use of Redis TTL! Well-documented!"

What the AI didn't ask:

  • "Hey, we already have a caching layer in the service above this. Did you know about it?"
  • "This will cache user-specific data globally. Is that a GDPR concern?"
  • "Why are we solving this with a cache? Is the underlying query just slow because of a missing index?"

A senior engineer would have asked all three questions in the first 30 seconds of reading.

The AI approved it. I almost did too.

This is the silent danger. Not that AI writes bad code. It's that AI-assisted code review is selectively blind — precise on syntax, invisible on context.


The Psychological Shift Nobody Is Talking About

Here's what's happening inside our heads, and we need to be honest about it.

When I open a PR that was written with AI assistance, I feel a subtle but real shift. The code looks more polished. The variable names are consistent. The comments are thorough. My lizard brain whispers: "This seems fine."

I'm fighting against the halo effect — where surface quality signals deep quality.

Handwritten code with a messy variable name and a // TODO: fix this comment actually makes me more alert. I slow down. I ask questions. I engage.

AI-generated code is too clean to trigger my suspicion.

And then there's the social pressure layer. If a CodeRabbit or Copilot review says "No issues found ✅", and you leave a critical comment, you feel like you're the one being difficult. After all, the AI checked it. Who are you to disagree?

This is how we're slowly outsourcing our professional judgment.


I'm Not Anti-AI. I'm Pro-Honesty.

Let me be very clear: I use AI tools every single day. They make me faster. They catch things I miss. They're genuinely useful.

But there's a difference between:

AI as a first pass — catch obvious issues before human review

AI as a replacement — skip human judgment entirely

The problem isn't the tools. The problem is how we're positioning them.

When a company says "our AI does code review," they're making a product claim. When a developer says "the AI already checked it," they're making an excuse.

We need to stop confusing the two.


What Real Code Review Looks Like in the AI Era

Here's what I've changed on my team after that painful incident:

1. AI review is mandatory. Human review is non-negotiable.

AI tools flag the obvious. Humans review for context, architecture, and consequence. Both happen. Neither replaces the other.

2. Ask "Why" out loud, every time.

Before approving any PR, I now force myself to answer: "Why is this change being made?" If I can't answer without looking at the ticket, I don't approve it.

3. Rotate code review ownership.

Juniors review seniors' PRs. Yes, really. The code gets better AND knowledge transfers in both directions.

4. Add AI-generated code markers.

If code is substantially AI-generated, it gets tagged. Not as a punishment — as a signal for extra human scrutiny, not less.

5. Celebrate slow reviews.

A PR that sits in review for a day with 10 comments is a success story. A PR merged in 5 minutes with 0 comments should make you nervous.


The Thing That Keeps Me Up At Night

We are training a generation of developers who have never had to truly read someone else's code.

They open a PR, run it through AI review, skim the summary, and merge. They're not lazy — they're efficient, by the only definition of efficiency they've been taught.

But code review is where developers grow. It's where you learn to think about edge cases. It's where you absorb architectural patterns. It's where you develop the professional instinct that no AI can give you.

If we automate that away, we don't just get worse code reviews.

We get worse engineers.

And in five years, when we need someone to make a judgment call that no AI can make — someone who deeply understands the system, the business, the users — we'll look around and realize we never developed that person.

Because we let an AI do their job for them before they got the chance to learn it.


What Can You Do Right Now?

  1. Audit your team's review process. How many PRs are merged with zero human comments? That number should concern you.

  2. Set a rule: AI review assists, humans decide. Document it. Enforce it.

  3. Have the uncomfortable conversation. Tell your team that "LGTM, AI checked it" is not a valid review.

  4. Review one PR this week the old-fashioned way — no AI summary, just you and the code diff. Notice how different it feels.

  5. Share this article if it resonated. Because honestly? Most teams won't fix this until enough people start talking about it.


Final Thought

AI is not destroying code review because it's malicious. It's doing it because we let it. Because "faster" felt like "better." Because we confused automation with improvement.

The best code reviewers I know don't just read code. They read between the lines. They ask uncomfortable questions. They slow things down when slowing down is the right call.

That's a human skill. Guard it like it's valuable.

Because it is.


If this hit close to home, I'd love to hear your experience in the comments. What does AI-assisted code review look like at your company? Are you navigating this well — or quietly worried, like I was?

Let's talk about it before it gets worse.


✍️ Written by a Me, refined with AI assistance. The opinions, experiences, and judgment calls are entirely my own.

Top comments (28)

Collapse
 
embernoglow profile image
EmberNoGlow

Excellent post! I was immediately reminded of a post about how AI-generated PRs started being pushed into the Godot repository, and it was truly disgusting. AI wasn't that powerful back then (this was probably 2023 or 2024), and even the people who wrote the code didn't know how it worked. This is probably the biggest problem with modern open source, where a team's efforts are focused not on writing code but on filtering PRs generated in a couple of minutes.

Collapse
 
harsh2644 profile image
Harsh

Thank you for sharing this I hadn't heard about the Godot repository incident, but that's both fascinating and terrifying! 😮 The fact that people were pushing AI-generated PRs without even understanding the code is exactly the kind of thing I was worried about. You're absolutely right — the problem isn't just about reviewing, it's about the signal-to-noise ratio in open source. Maintainers are now spending more time filtering out AI slop than actually reviewing quality contributions. It's like we've shifted from 'how do we write good code' to 'how do we filter out bad AI code.' Do you have a link to that Godot discussion? Would love to read more about how they handled it. Thanks again for adding this context!

Collapse
 
embernoglow profile image
EmberNoGlow

Yes! Here is a post by Rémi Verschelde.

Thread Thread
 
harsh2644 profile image
Harsh

Thank you so much for sharing the link, EmberNoGlow! 💖

Thread Thread
 
embernoglow profile image
EmberNoGlow

You're welcome

Collapse
 
itskondrat profile image
Mykola Kondratiuk

40 hours to debug a well-formatted bug is exactly the failure mode nobody talks about. clean code and correct code are different things and AI is very good at the first one. i think the LGTM problem is that we outsourced the boredom - the slow careful reading - to a tool that does not experience boredom and so moves through it at the same speed it moves through everything else.

Collapse
 
harsh2644 profile image
Harsh

We outsourced the boredom is the most precise description of the LGTM problem I've seen. And you're right the boredom wasn't a bug in the review process. It was the feature. The slow, careful, slightly tedious reading is exactly what catches the thing that looks right but isn't.

AI moves through code at the same speed regardless of complexity because it doesn't experience the friction that signals wait, something feels off here. That friction the moment a human slows down because something doesn't quite sit right is where most subtle bugs get caught. Not by logic, just by discomfort.

Clean code and correct code are different things should honestly be a mandatory disclaimer on every AI coding tool. The models are trained to produce output that reads well. Correctness is a harder, different target and the two can look identical on the surface until production tells you otherwise.

Collapse
 
itskondrat profile image
Mykola Kondratiuk

the friction as signal point is really good. when reading code feels hard that is information. AI smooths it all out into the same cognitive texture and you lose the signal entirely.

Collapse
 
ng96 profile image
Nikita • Edited

Hey! It is a good post describing the saturation point with AI in this area. I have one question though regarding this social pressure aspect:

“And then there's the social pressure layer. If a CodeRabbit or Copilot review says "No issues found ✅", and you leave a critical comment, you feel like you're the one being difficult. After all, the AI checked it. Who are you to disagree?”

The thing is that most academics are naturally suspicious and alert when it comes to AI generated work because of the rampant hallucination of data it does. Researchers have never really trusted the output and suspicion and scrutiny are already intertwined with the final submissions.

So, I don’t understand why engineers are socially accepting AI generated work as gospel truth where others industries are not?

Collapse
 
harsh2644 profile image
Harsh

Nikita! That's a really sharp observation and honestly, it's one I hadn't thought about from the academic angle.

You're absolutely right. Academics are trained to be skeptical by default peer review, citations, reproducibility suspicion is literally baked into the process. So when AI hallucinates, researchers catch it because their culture demands verification.

Engineering culture, on the other hand, has historically rewarded speed and shipping. "Move fast" is a badge of honor. I think that's exactly why engineers are more susceptible we've been conditioned to trust tools that make us faster, without stopping to ask how the tool works or where it might fail.

Also, code "working" is a deceptively low bar. If tests pass and the app runs, it feels validated — even if the logic is subtly wrong. Academics don't have that false safety net.

Really glad you brought this up it adds a whole new dimension to the conversation. 🙌

Collapse
 
ng96 profile image
Nikita

Thanks for the response! I am wondering if the part - being conditioned to tools - is shifting the authority of creation of your code? I am coming from the place where the person/engineer who writes and builds has the creative-authority over the product in the ethical/moral sense.

If there is indeed a pressure where human creative-authority can be overridden by AI-authority, where AI’s judgement of your code is considered more robust than yours - that is more concerning than relegating AI to just tools or assuming that AI has to be better than human experience.

In this context if AI messes up the code, who gets the blame right now? Engineer or the AI? Anyway, this looks like a chicken-egg problem, so I will rest the question. I don’t want to cause confusion. Thanks again!

Thread Thread
 
harsh2644 profile image
Harsh

The creative authority framing is something I haven't seen anyone bring up before, and it's uncomfortable in the best way. You're right there's a meaningful difference between using AI as a tool (authority stays with the engineer) and deferring to AI judgment (authority quietly transfers). And that transfer often happens without anyone consciously deciding it.

The blame question is where it gets really thorny. Right now, when AI-assisted code causes a bug, the engineer gets blamed because they approved it. But the decision was made under social and cognitive pressure to trust the AI's output. That's not full ownership, that's liability without authority. And that gap is dangerous.

Your chicken-egg framing is spot on. Did engineers stop trusting their own judgment because AI got good? Or did AI get positioned as authoritative because engineers were already overwhelmed? I genuinely don't know the answer.

What I do know is that the industry hasn't figured out accountability here yet. And until it does, engineers are carrying the risk while the tools get the credit.

Thank you for pushing this further this thread has gone somewhere I didn't expect. 🙏

Thread Thread
 
ng96 profile image
Nikita

Yes, it indeed got interesting. Thank you for the discussion!

Collapse
 
poushwell profile image
Pavel Ishchin

Probably because tests passing feels like peer review even when it isnt. Academics dont have a green checkmark that makes them stop thinking.

Collapse
 
futurestackreviews profile image
Takashi Fujino

This is one of the better takes I've read on the AI-in-development conversation — mostly because you're not arguing about whether AI is good or bad. You're pointing at something more uncomfortable: what happens to us when we stop being skeptical.

Your point about AI not asking "why" really stuck with me. I've been thinking about what's underneath that, and I think it comes down to something structural — AI has no stakes in the outcome.

When a senior engineer asks "why are we solving this with a cache instead of fixing the query," that question isn't just technical knowledge. It's self-preservation. That engineer knows if this breaks in production, they're the one getting paged at 3 AM. They're reviewing with skin in the game.

AI reviews with zero consequences. It doesn't get paged. It doesn't sit in the postmortem. It doesn't have to explain to a PM why the fix took 40 hours. So it optimizes for what it can measure — syntax, patterns, style — and stays silent on everything that requires caring about what happens next.

I think that's why your rule #4 (tagging AI-generated code for extra scrutiny) works. It's not about distrust — it's about compensating for the fact that the code was produced by something that doesn't bear the cost of getting it wrong.

The generation of developers you're worried about at the end — the ones who've never had to truly read someone else's code — I'd extend that concern one level further. It's not just that they won't learn to read code. It's that they won't develop the instinct for when something matters enough to slow down. That instinct comes from experience with consequences, and AI shields them from exactly that.

Really glad you wrote this. It's the kind of conversation that needs more signal and less "AI bad" noise.

Collapse
 
harsh2644 profile image
Harsh

Takashi, this might be the best comment I've ever received on anything I've written.

AI has no stakes in the outcome I wish I had written that line in the article itself. That's the cleanest explanation of the problem I've seen.

The 3 AM pager point is so real. That fear is what makes a senior engineer pause and ask the uncomfortable question. It's not just knowledge it's accountability baked into instinct. And you're right, AI will never have that. It optimizes for what it can measure, and consequence is unmeasurable to it.

Your extension of my point about the next generation really hit hard too. It's not just that they won't learn to read code it's that they'll never develop the instinct to slow down. That instinct only comes from being burned. From being the one in the postmortem. From having skin in the game.

AI shields them from exactly that experience. And we're calling it a productivity win.

Thank you for adding this genuinely made the article better in the comments than it was on the page. 🙏

Collapse
 
futurestackreviews profile image
Takashi Fujino

Appreciate that, Harsh. Your article deserved a real response — most writing on this topic stays at the surface level of "AI good" or "AI bad" without touching the structural incentive problem underneath.

The "instinct to slow down" framing you just added is sharper than anything I wrote. That's the real loss — not skill, but the feedback loop that builds judgment. You can teach someone to read code. You can't shortcut the pattern recognition that comes from being the person who shipped the bug and sat in the retro.

Looking forward to whatever you write next.

Thread Thread
 
harsh2644 profile image
Harsh

Really appreciate you engaging with it so seriously. Your point about the feedback loop building judgment is exactly right and it's the part that's hardest to replicate with any tool. Looking forward to your next piece too. 💖

Collapse
 
max-ai-dev profile image
Max

The pipeline you describe — AI writes, AI reviews, human skims and merges — is exactly what we designed against. Our rule: the AI self-reviews before pushing. Re-reads every changed file, checks for debug code, typos, missing imports, logic errors. Not because it catches everything, but because it catches the easy stuff before a human has to.

The real defense though is the CI pipeline. PHPStan level 9, PHPMD, Rector — they don't care if the code was written by an AI or a senior engineer. The type mismatch ships or it doesn't. We found that static analysis is the AI's self-awareness — the agent can't tell when its own quality is dropping, but the linter can.

Your point about the generational gap is the one that worries me most. We run a guide mode for junior developers — when an intern asks the AI a question, it asks "what have you tried?" before answering. Not surveillance, just the senior dev who says "think first" before handing you the solution. Without that, you're right — they'll never develop the judgment.

Collapse
 
harsh2644 profile image
Harsh

The AI self-reviews before pushing pattern is something I hadn't considered as a formal step I'd been thinking of it as optional, but framing it as a rule changes the dynamic entirely. It shifts the responsibility back onto the AI before a human even looks.

The static analysis point is sharp: "static analysis is the AI's self-awareness." That framing should be in every team's onboarding doc for AI-assisted workflows. The linter doesn't care about confidence, it just checks facts exactly the counterweight AI needs.

The guide mode for juniors is the one I keep thinking about. "What have you tried?" before answering is such a small intervention but it forces the cognitive step that builds judgment. The problem I see is that most teams won't implement this deliberately they'll just give interns raw access and call it productivity. That's where the generational gap quietly widens.

What's been the hardest part of getting the team to actually follow the self-review rule consistently?

Collapse
 
ai_made_tools profile image
Joske Vermeulen

feels less like AI is “destroying” code review and more like it’s exposing weak review processes.
If teams are shipping more AI-generated code without adapting how they review, problems were kind of inevitable.

Collapse
 
harsh2644 profile image
Harsh

That's a fair reframe, and honestly I don't fully disagree. Weak review processes were always a liability AI just made the cost of ignoring them much higher, much faster.

But I'd push back slightly on inevitable. The speed at which AI-generated code volume scales outpaces how quickly teams can adapt their processes. It's not just exposure it's acceleration. A team that had 6 months to notice and fix a weak process now has 6 weeks. That's the part that feels like destruction from the inside.

Collapse
 
klement_gunndu profile image
klement Gunndu

I'd push back slightly on the halo effect — devs were rubber-stamping PRs long before Copilot showed up. The dual-review approach is strong though, especially rotating juniors into senior review slots.

Collapse
 
harsh2644 profile image
Harsh

That's fair rubber-stamping PRs is older than GitHub, let alone Copilot. I'm not arguing AI created the problem.

The shift I'm pointing to is the scale and speed. A dev rubber-stamping 5 PRs a day is a problem. The same dev rubber-stamping 50 AI-generated PRs a day — with each one looking cleaner and more confident than hand-written code is a different problem. The halo effect doesn't create laziness, it just removes the last friction that was slowing it down.

Glad the dual-review angle landed. The junior-into-senior-slot rotation is underrated specifically because it forces explanation, not just approval.

Collapse
 
apex_stack profile image
Apex Stack

This resonates hard. I run a large programmatic SEO site with thousands of pages where AI generates the actual content — stock analysis copy produced by a local LLM. The "halo effect" you describe applies to content generation too, not just code review.

When AI-generated analysis text looks well-structured and uses the right financial terminology, it's tempting to trust it. But I've caught it reporting dividend yields of 41% instead of 0.41% — technically plausible formatting, completely wrong data. The output looked authoritative enough that it passed multiple automated validation layers before a human eye caught the decimal error.

That experience taught me the same lesson you're describing: AI is great at surface-level correctness and terrible at domain-specific sanity checks. My solution was similar to your rule #1 — automated validation catches the obvious stuff (range checks, format validation), but a human still needs to audit samples regularly and ask "does this actually make sense?"

Nikita's point about academics vs engineers is fascinating. I think the core difference is feedback loops. In academia, peer review catches hallucinated data before publication. In engineering, the feedback loop is production — and by then the damage is done. The caching layer example in your article is a perfect case where the "feedback" was a 40-hour debugging session three weeks later.

The instinct to slow down that Takashi mentioned — that's the real skill at risk.

Collapse
 
harsh2644 profile image
Harsh

The 41% vs 0.41% example is exactly the kind of failure the halo effect produces not obviously wrong output, but plausibly formatted wrong output. That's the dangerous version. A clearly broken response gets caught. A confident, well-structured, wrong response passes through layers of validation because it looks like it deserves to pass.

The feedback loop distinction you drew between academia and engineering is something I want to think about more. Peer review in academia exists specifically to catch hallucination before the damage is done. Engineering's equivalent is production which means the feedback is real, late, and expensive. The 40-hour debugging session wasn't a bug report, it was a delayed peer review.

AI is great at surface-level correctness and terrible at domain-specific sanity checks this is the cleanest summary of the core problem I've seen in these comments. That line should be in every team's AI usage policy.

The instinct to slow down being the real skill at risk that's the thread that connects your example, Takashi's point, and everything in the article. Formatting confidence is what makes slowing down feel unnecessary. That's the trap.

Collapse
 
marina_eremina profile image
Marina Eremina

'Review one PR this week the old-fashioned way' - great example of a possible best practice in the emerging AI era 🙂

Collapse
 
harsh2644 profile image
Harsh

Exactly! Sometimes the old-fashioned way is still the most effective. Glad you liked it 😊