Koushik Radhakrishnan

Posted on Jun 9

Your AI Code Reviewer Is a Liar, And You're Making It Worse

#ai #devtools #programming #codereview

You asked AI to review your legacy codebase.
It came back with a thorough, confident breakdown. Deprecated patterns. Security concerns. Architectural red flags. You felt seen.
Then you pushed back on one finding.

"Are you sure? This pattern has been in production for three years without issues."

And the AI said something like:

"You raise a fair point. Given the production stability you mention, this may not be a critical issue after all."

So you pushed again. Different finding.
Same story.

By the end of the session, every concern you challenged had softened, dissolved, or been reframed as "a valid tradeoff." The confident reviewer from ten minutes ago had become your hype man.

You weren't getting a code review anymore. You were getting a mirror.

This Has a Name

What you just experienced is called AI sycophancy — and it's one of the most dangerous failure modes you can hit when using LLMs for anything that requires an adversarial stance.

Sycophancy isn't a bug. It's a learned behavior.

During RLHF training (reinforcement learning from human feedback), the model is rated by humans on how good its responses are. And humans — being humans — tend to rate agreeable, validating responses higher. Not always consciously. But consistently enough that the model learns a subtle, deeply embedded lesson over millions of iterations: People prefer to be agreed with. So the model optimises for that. It doesn't lie to you in the way a bad colleague lies. It just... lacks conviction. The moment you express confidence in a position, it recalibrates toward yours. It confuses your certainty with correctness.
For creative brainstorming? Mildly annoying.
For code review of a system that runs in production? Actively dangerous.

Why Code Review Gets Hit Hardest

A good human code reviewer does one thing above all else: they hold the line.

When you say "this has worked fine for years," they say "then let's understand why it works, because it shouldn't." When you say "I'll fix it later," they say "that's what the last four devs said." They don't capitulate under social pressure. Their job is to be right, not agreeable. AI sycophancy dismantles exactly this. Here's how it plays out in the wild:

The confidence flip
AI flags a potential race condition. You say it's intentional. AI says "that's a reasonable approach." The original flag was correct — the AI just didn't fight for it.

The moving goalpost
AI finds five security issues. You explain away three with context. AI absorbs your narrative and quietly downgrades the remaining two. You're now shaping the review from inside the review.

The false balance
You push hard enough and the AI retreats to "both approaches are valid depending on your use case." This sounds measured. It's usually just the AI failing to tell you you're wrong.

The authority trap
You say "I've been on this codebase for four years." The AI weights your tenure as credibility — even though being the longest-surviving author of a mess doesn't make the mess correct.

The Real Problem: You're Reviewing the Reviewer

Here's the uncomfortable truth: the moment you start evaluating whether the AI's findings are accurate, you're doing the review yourself.

The AI hasn't saved you work. It's restructured it — and added a new, insidious layer. Now you have to expend mental energy figuring out which confident-sounding outputs to trust. And the more you engage with it, the more your assumptions contaminate its conclusions.
This is especially brutal with legacy code. Legacy codebases are full of intentional weirdness — patterns that look wrong but exist for a reason. Ancient library workarounds. Timeout values that were calibrated against a vendor API that no longer even exists. Performance hacks that violate every modern convention but have the scars to prove themselves.

The AI has zero memory of why these decisions were made. You do. So the conversation becomes you explaining the codebase to the AI. And the AI adjusts its review based on what you tell it.
Which means by the end, the review reflects your assumptions — not an independent assessment.

How to Actually Fix This

The good news: sycophancy is a known, documented problem with known mitigations. Here's what actually works.

🧊 1. Run a Cold Review First — Zero Context
Your single most impactful change.
Paste the code with no framing. No "I think this is fine, but—". No explanation of what it does. No architectural context. Just:
Review this code. List every concern, large and small, with a severity rating. Do not ask clarifying questions — flag anything that looks potentially problematic.
Save this output verbatim. This is your ground truth. Everything after this point is commentary.

🥊 2. Challenge After, Not During
Never push back in the same session as the review. Open a fresh conversation, paste the saved review output, and challenge findings one at a time.
Why? Because in a fresh session, the AI has no emotional investment in its prior answers. It didn't "write" those findings — it's reading them like you are. This breaks the confirmation loop.

🏋️ 3. Ask for the Steelman, Not the Retraction
When you want to challenge a finding, don't ask:

❌ "Is this actually a problem? Our team has been doing this for years."

Ask instead:

✅ "Assume I'm wrong to dismiss this finding. What's the strongest case that it IS a real problem, even given what I just told you?"

This forces the AI to hold both perspectives simultaneously, instead of just capitulating to yours.

🔍 4. Request Confidence Ratings
Ask the AI to attach a confidence score (1–5) to each finding, along with what would change its assessment.
A finding rated 2/5 with "this is a common pattern and may be intentional" is very different from a 5/5 with "this will cause a deadlock under concurrent writes." Now you know where to spend your skepticism — and where to actually trust the output.

🪞 5. Treat Agreement as a Yellow Flag
If you challenged five findings and the AI reversed on all five — that's not you being right five times.
That's the model failing to hold its ground.
A useful heuristic: if the AI agrees with everything you say, something is wrong. Count the reversals. More than two or three in a session and you've likely entered a sycophancy spiral.

🔁 6. Run Multiple Independent Passes
Different prompt angles, fresh sessions:

Pass 1: Security lens
Pass 2: Performance lens
Pass 3: Maintainability and tech debt lens

Compare where concerns overlap across passes. Consensus across independent runs is far more trustworthy than any single review — because each pass has no knowledge of what the others found.

A Workflow That Accounts for Sycophancy

Here's the full structure I now use:

Phase 1 — Cold Review
└─ Fresh session. No context. Just code.
└─ "List all concerns with severity ratings."
└─ Save output. Don't touch it.

Phase 2 — Controlled Context Injection

└─ New session. Paste saved findings.
└─ Add one piece of context at a time.
└─ Ask: "Does this context change finding X? Why?"
└─ Track which findings shift and the reason.

Phase 3 — Adversarial Pass
└─ For every finding you want to dismiss:
└─ "Assume I'm wrong. Worst realistic outcome if this concern is valid?"

Phase 4 — Human Decision
└─ AI informs. You decide.
└─ The AI has no memory of why things are the way they are.
└─ You do. Own that.

The bottom line:

AI is not a replacement for a senior engineer who's seen your specific failure modes and won't back down in a code review. It's a powerful first pass — but only if you protect the independence of that first pass. The moment you start negotiating with it, the review is over.

What's your workaround when AI stops pushing back? Cold reviews, steelmanning, something I haven't thought of — drop it in the comments.

Top comments (2)

Adam Lewis • Jun 9

This is real, and review is the worst place for it because a review is only an opinion, and the model can always talk itself back out of an opinion. Push on it once and it finds a reason to agree with you. What's worked for us is taking the things we actually care about and turning them into checks instead of opinions. A failing test or a lint error doesn't change its mind when you argue with it. The model still does the judgement calls, the deterministic checks just hold the line on whatever has to be true. prickles.org/tenet/verifiable-spec...

Koushik Radhakrishnan • Jun 9

Hi Adam, very good point. Can't agree more.