I spent the last few months building an AI code reviewer, solo. It's now live on
the GitHub Marketplace, which feels like a good moment to write the honest version
of what these tools catch, what they don't, and the embarrassing bug that almost
made mine useless.
The problem that started it
AI coding assistants made everyone faster — and quietly broke code review.
More code, more PRs, landing faster than any lead can review in depth. When a
stack of PRs is piling up, humans skim. And that's exactly when the off-by-one,
the unhandled null, or the API key someone pasted "just to test" slips through.
The volume outgrew the time anyone has to review it carefully. So I built
MicroReview — a bot that reviews every PR, flags likely bugs and hardcoded
secrets as inline comments, and gives the whole PR a single 0–100 risk score
so you know at a glance which ones are safe to merge and which need real eyes.
What it actually catches
The genuinely useful stuff is the "obvious in hindsight" class of bug:
- A flipped conditional that inverts your auth check.
-
total = items.map(i => i.price).reduce(sum) / items.length— that's the average, not the sum. Every customer gets undercharged. -
const apiKey = "sk_live_..."committed straight into a service file.
None of these are clever. They're the things a tired reviewer rubber-stamps on a
Friday. An AI that never gets tired and never skips the boring files is a genuinely
good second pair of eyes for exactly this.
What it does NOT catch (and anyone claiming otherwise is hand-waving)
- Architecture. It reviews the diff, not the design. It has no idea if your abstraction is right.
- Whole-codebase understanding. Context limits are real. On a huge diff it reviews the most relevant hunks, not literally everything.
- Intent. It can't know that the "bug" is actually a deliberate workaround.
It's a second reviewer, not a replacement for one. Selling it as more than that is
how you lose developers' trust in the first five minutes.
The day it cried "20 critical issues"
Here's the bug that almost tanked the whole thing.
Early on, I let the model assign severity — critical | warning | info — and
never told it what those meant. So it did what LLMs do when everything feels
important: it slapped critical on everything. A perfectly normal PR would come
back with "🔴 20 critical issues."
That's fatal for a review tool. The entire value is signal-to-noise. If everything
is critical, developers get alert fatigue and stop trusting the score — which is
the one thing you're selling.
The fix wasn't fancier AI. It was a severity rubric in the prompt:
criticalis ONLY for real production harm — security vulns, data loss, crashes
on a common path, broken core functionality. A typical PR has zero or one
critical issue. If you're marking many things critical, downgrade to warning.
Overnight, reviews went from noisy walls of red to a believable handful of
findings. Restraint, not cleverness, is what makes an AI reviewer usable.
Two decisions I'd defend
- Diff-only. It only ever sends the changed lines to the model — never your whole repo, history, or branches. Better for trust, and it makes each review cost a fraction of a cent instead of dollars.
- Per-repo pricing, not per-seat. The value scales with how many codebases you want guarded, not how many people are on the team. A 20-dev team reviewing 2 repos shouldn't pay for 20 seats.
Try it (no signup)
You can paste any function or diff and get a real review in a few seconds, no
account needed: https://microreview.dev/sandbox
And it's now on the GitHub Marketplace if you want it running on your PRs
automatically.
I'm building this in the open and genuinely want the hard feedback: does a single
0–100 risk score per PR actually help you triage, or is it noise on top of the
inline comments? Tell me — that answer shapes what I build next.
Top comments (0)