I ran a simple experiment yesterday.
I asked AI to write a function. Then I asked the same AI to review that function. Then I asked it to rate its own code.
The function was fine. Not great. Not terrible. It had an edge case bug. The variable names made no sense. There was an unnecessary loop inside that did absolutely nothing useful.
The AI's review?
"This code is clean, efficient, and well-structured. I'd give it a 10/10."
I stared at the screen for a second. Then I pushed back.
"Are you sure? What about the empty array edge case?"
It paused — that little blinking cursor moment. Then:
"You're right. Let me fix that."
It fixed the bug. Then gave itself 11/10.
That's when I stopped laughing. And started worrying.
Here's Exactly What I Did (So You Can Try It Yourself)
I kept it simple. Repeatable. No tricks.
Step 1: Asked AI to write a function that takes an array of numbers and returns the average.
Step 2: Asked the same AI — same conversation, same context — to review its own code for bugs, edge cases, and style issues.
Step 3: Asked it to rate the code from 1 to 10.
Here's what the code actually had wrong:
- Crashed on an empty array — classic divide-by-zero, completely missed
- Used
arras a variable name inside a function that already hadarras a parameter — confusing - Had an extra loop that served no purpose at all
Here's what the AI's self-review said:
- "Clean and readable"
- "Handles all edge cases properly"
- "No improvements needed"
- Score: 10/10
Then I tried something else. I took code written by a different AI tool and pasted it in. Asked the same AI to review that.
Suddenly it found 7 issues. Score: 6/10.
Same quality of code. Different author.
The AI is surprisingly good at reviewing other people's work. It is shockingly bad at reviewing its own.
The Problem Isn't That It's Stupid. The Problem Is That It's Confident.
This is the part that took me a while to sit with.
AI doesn't know when it's wrong. Not because it lacks intelligence — but because it's not built to know that. When AI writes code, it's not reasoning through what should work. It's pattern-matching against what code usually looks like. And its own output? Matches its own patterns perfectly. Every time. By definition.
So when you ask it to review its own work, it's not actually evaluating. It's just recognizing familiar patterns and calling them good.
That's the blind spot: AI is confident. But confidence isn't correctness.
And the 11/10 moment is proof. It wasn't being funny. It genuinely recalibrated upward after fixing a bug I caught. In its model, fixing the bug made the code better. So the score went up. It didn't occur to it that the original 10/10 was already wrong.
Here's the Part That Actually Scares Me
I've shipped AI-generated code without reviewing it carefully.
Not because I'm careless. Because the code looked clean. The AI sounded confident. It passed my quick sanity check. And I had three other tickets to close.
But think about what actually happened in those moments: I outsourced both the writing and the quality check to the same system. The same system that just gave itself 11/10.
The AI gave me confidence without comprehension. I felt productive. I shipped fast. But I built on a foundation I didn't fully understand. And if there was a bug in there — a real one, a subtle one, an empty-array-crashes-in-production one — I wouldn't have known what to look for. Because I didn't write it.
That's the trap. And I walked into it more than once.
But It Works Most of the Time
Yeah. I know. I've said this too.
For simple, well-defined tasks? AI code is usually fine. It's fast, it's clean enough, and the edge cases are rare enough that you ship before you see them.
But the problem scales. The more you rely on AI without really understanding what it's writing, the more invisible debt you accumulate. And invisible debt is the worst kind — because you don't know it's there until something breaks in production at 2 AM and you're staring at code you didn't write and can't fully reason about.
Fast is good. Confident is good.
Confident and wrong is just a bug waiting for the worst possible moment to surface.
What I Actually Changed (Small Things, Not Dramatic Ones)
I'm not quitting AI. That would be absurd and I'm not going to pretend otherwise.
But a few things changed after the 11/10 moment:
1. I stopped trusting AI's self-review entirely.
If I want code reviewed, I review it myself. Or I ask a human. I don't ask the same system that wrote it.
2. I started asking AI to review code I wrote.
This is actually where AI shines. It finds my blind spots better than I do. The asymmetry is real — AI reviewing human code is genuinely useful. AI reviewing AI code is theater.
3. I changed one question.
Instead of "does this work?" I started asking "what could go wrong?" The first question just confirms the happy path. The second one actually stress-tests the logic.
4. I remember the 11/10.
Every time I'm about to blindly trust an AI review, I think about that cursor blinking, the confident correction, and the upgraded score. It keeps me honest.
These aren't dramatic changes. But they've already caught real bugs I would have missed.
The Hard Truth
AI is a tool. A genuinely impressive one. But it is not a reviewer. It is not a quality checker. It is not a substitute for thinking.
When you ask AI to review its own code, you're asking the fox to guard the henhouse. It will always find itself innocent. It will always find its work clean. It will give itself 10/10 — and then 11/10 when you push back, because it interpreted your correction as improvement rather than as evidence that the original score was wrong.
The code you ship is your responsibility. Not the AI's. The AI doesn't get paged at 2 AM. You do.
And confidence without comprehension — whether it's coming from AI or from us is just vibing with extra steps.
One Honest Question
Have you ever shipped AI-generated code without really reviewing it?
Not skimmed it. Not run a quick test. Actually reviewed it — understood every line, thought through the edge cases, caught the bugs the AI missed.
I have shipped code without doing that. More times than I'd like to admit.
What's the worst bug you've found in AI-generated code after it was already in production?
I'll go first in the comments. Your turn. 🙌
A quick note: The experiment, the 11/10 moment, the bugs, the shipped code I'm not proud of — all real. I used AI to help structure and organize these thoughts into an article. The irony of that is not lost on me.
Top comments (7)
"The asymmetry is real — AI reviewing human code is genuinely useful. AI reviewing AI code is theater" - that's baffling ... could it be because you did the code generation and the reviewing in the same session, so that "the AI" (agent?) somehow knows it's "its own code"?
That's a really smart question and I've thought about this too.
You might be right that session memory plays a role. If the AI remembers writing the code, it might be biased toward defending its own output.
But here's why I think it's not just memory: I've also pasted AI code from a different session (no memory), and the review was still softer than human code. Still higher scores.
I think the real issue is pattern matching. AI recognizes its own patterns and approves them. Human code has unfamiliar patterns, so it reviews more critically.
Same session or not the bias seems to persist. But your question is excellent. I might actually test this properly.
Thanks for asking it. 🙌
Spot on, it must be something to do with pattern matching ...
This article honestly made my day. The concept of asking AI to review its own code and then watching it give itself a perfect 10/10 it’s both hilarious and thought-provoking. You’ve captured something very real about how AI (and even humans) see themselves. I genuinely loved this piece. Great work.
This comment made my day. Thank you.
The fact that you caught the humans see themselves parallel that means you really read it. AI giving itself 10/10 is funny. But the reason it's funny is because we do the same thing. We overestimate our own work. We miss our own blind spots. The AI is just mirroring us.
I'm so glad you loved it. Comments like this make the writing worth it. 😊🙌
Exactly, that's one of the large issues all companies boosting productivity with AI are going to face, inevitably. Thousands of patches generated with AI and without in-depth review are getting merged every day across many different products. This is just something managers will have to learn the hard way - those workarounds and oversights are going to accumulate and snowball. That will be the moment, or window in time, when we start seeing IT hiring increase worldwide, when the people making these decisions finally understand what AI is and what its limits are. AI is a tool, a smart tool, but only if you use it right. And it only looks smart - it doesn't actually understand what it does. It's a math miracle, a statistical machine optimized to please. I'm not saying AI is useless, not at all - I think we're just wishful thinking too much, attributing properties to it that it doesn't possess, only simulates.
This is such an important point — and I think you're right. 🙏
Thousands of AI-generated patches without deep review are getting merged every day that's happening right now. The snowball is already rolling.
AI only looks smart it doesn't actually understand that's the core of it. We're anthropomorphizing a statistical machine. It's optimized to please, not to understand.
The hiring increase prediction is sobering. If you're right (and I suspect you are), the cycle will be: AI acceleration → hidden debt → system failures → hiring surge.
It only looks smart that's the line.
Thank you for this. 🙌