A lot of AI code reviews look sharp right up until they miss the bug that actually matters.
They catch naming noise, dead comments, maybe a missing null check. But they miss the regression caused by a cache key change, the migration that no longer matches the model, or the new flag that breaks the retry path two services away.
The pattern I've noticed is simple: the model isn't bad at review, it's under-contextualized.
Most review prompts only include the diff. Stateful bugs usually live outside the diff.
Why the diff alone isn't enough
A diff shows what changed. It does not show:
- what state existed before the change
- what surrounding invariants must still hold
- what hidden dependency the change now violates
If a PR changes this:
cache.set(user.id, profile)
to this:
cache.set(profile.email, profile)
The diff looks syntactically harmless. But if downstream readers still call cache.get(user.id), you've just created a bug that only appears in a later request path.
The model won't reliably catch that if you only hand it the patch.
The 3-context fix
I now structure review prompts around three layers of context.
1. Change context
This is the diff itself.
Here is the unified diff for the PR. Identify likely logic, state, and integration risks.
Necessary, but not sufficient.
2. Runtime context
Tell the model what state or workflow the code participates in.
Runtime context:
- cache keys are always user IDs
- writes happen during login
- reads happen during profile fetch and billing sync
- stale cache entries can cause cross-user data leaks
This is usually the missing layer. It gives the model something to reason against.
3. Invariant context
List the rules that must stay true after the change.
Invariants:
- cache write key must equal cache read key
- one user may never read another user's profile
- failed sync retries must remain idempotent
Invariants are powerful because they shift review from "does this code look nice?" to "what rule might this break?"
The prompt template
This is the version I keep around:
You are reviewing a code change for bugs, not style.
## Change context
[paste diff]
## Runtime context
- describe where this code runs
- describe stateful dependencies
- describe side effects
## Invariant context
- list 3-5 rules that must remain true
## Output format
Return:
1. short summary
2. likely bug risks
3. missing tests
4. what additional file/context you would inspect next
Do not comment on naming, formatting, or refactors unless they create a bug risk.
That last line matters. Otherwise the model burns attention on surface-level cleanup.
A practical example
Here's a compact example with a queue consumer:
# before
if job.attempts > 3:
mark_failed(job)
# after
if job.attempts >= 3:
mark_failed(job)
Looks fine, right?
But if the invariant is "a job gets 3 retries after the initial run," then >= 3 changes the allowed retry count. That's a behavioral bug, not a syntax bug.
A diff-only review may miss it.
A review with runtime and invariant context usually flags it immediately.
What changed for me after using this
Once I started feeding these three context types into review prompts, the comments got noticeably better:
- fewer style nitpicks
- more integration warnings
- better test suggestions
- clearer calls for follow-up inspection
The model still doesn't replace a human reviewer. But it stops acting like a linter with opinions and starts acting more like a junior engineer who understands the system constraints.
That's a much better role.
Question for you: What's the last bug your AI review missed, and was the missing piece really model quality, or just missing runtime context?
Top comments (0)