Most AI code reviews are useless. "Looks good, consider adding error handling." Thanks, I could have gotten that from a linter.
Here's a 3-step prompt chain that makes AI code review actually catch real bugs — the kind humans miss at 4 PM on a Friday.
Why Single-Prompt Reviews Fail
When you paste code and say "review this," the AI does what any overworked reviewer does: it skims, finds surface-level issues, and approves.
The problem isn't the AI. It's the process. A single pass can't do deep analysis, check edge cases, and verify logic — all at once.
The fix: break review into three focused passes.
Step 1: The Bug Hunter
The first prompt looks only for bugs. Not style. Not naming. Not "consider using X instead." Just bugs.
You are reviewing this code for BUGS ONLY.
Ignore style, naming, performance, and best practices.
Focus exclusively on:
- Logic errors
- Off-by-one errors
- Null/undefined access
- Race conditions
- Unhandled error paths
- Incorrect assumptions about input
Code:
[paste code]
For each bug found, output:
- Line (approximate)
- Bug description
- Why it's a bug (what breaks)
- Minimal fix
Why it works: By excluding everything except bugs, you eliminate the "consider adding..." noise. The AI goes deep on logic instead of wide on suggestions.
Step 2: The Edge Case Finder
The second prompt generates test scenarios the code might fail on:
Given this function, generate 10 edge case inputs that could cause unexpected behavior:
[paste function signature + code]
For each edge case:
- Input value(s)
- Expected behavior
- What actually happens (trace through the code mentally)
- Verdict: PASS or FAIL
Focus on boundaries, empty inputs, large inputs, type mismatches, and concurrent access.
Why it works: Humans are terrible at thinking of edge cases for their own code. We wrote it with the happy path in mind. The AI has no such bias.
Step 3: The Contract Checker
The third prompt verifies the code matches its specification:
Here is a function and its specification:
SPEC:
[paste the requirement / ticket / doc]
CODE:
[paste the implementation]
Check whether the code fulfills every requirement in the spec.
For each requirement:
- Requirement (quoted from spec)
- Status: FULFILLED / MISSING / PARTIAL / CONTRADICTED
- Evidence: which line(s) address this requirement
- Gap: what's missing (if any)
Why it works: The most dangerous bugs aren't in the code — they're in the gap between what was asked and what was built. This step catches "it works but it's wrong."
Putting It Together
Run all three steps on every PR that matters:
PR #347: Add discount calculation to checkout
Step 1 (Bug Hunter):
→ Found: negative discount not clamped → total can go below zero
→ Found: floating-point comparison using === instead of tolerance
Step 2 (Edge Cases):
→ FAIL: discount = 100% → total becomes 0.00 → payment gateway rejects
→ FAIL: multiple discounts applied → order matters, not commutative
Step 3 (Contract Check):
→ MISSING: spec says "max one discount per order" — code allows stacking
→ PARTIAL: spec says "show discount on receipt" — only shows in cart, not receipt
Three prompts. Three minutes. Four real bugs that would have shipped.
When to Use This
Use the full 3-step chain for:
- Payment/billing code
- Authentication/authorization
- Data migration scripts
- Anything user-facing that's hard to roll back
Use just Step 1 (Bug Hunter) for:
- Regular feature PRs
- Internal tooling
- Anything you'll test manually anyway
Skip AI review entirely for:
- Config changes
- Dependency bumps
- One-line fixes you fully understand
The Prompt Chain in Practice
I keep these three prompts in a file called review-chain.md in my project root. When I review a PR:
- Copy the diff
- Run each prompt in sequence
- Compile findings into a review comment
- Add my own human judgment on top
The AI finds the mechanical issues. I focus on design, architecture, and "does this even make sense?"
Together, we're a better reviewer than either of us alone.
Results
Since adopting this chain:
- Caught 3 production bugs in the first week that passed human review
- Review time didn't increase (the prompts run in parallel while I read the diff)
- Junior devs on the team started using it too — review quality went up across the board
Try the Bug Hunter prompt on your last PR. I bet it finds something. Share what it catches — I'm tracking the most common bug categories for a follow-up post.
Top comments (0)