Verification Loops for AI Coding: Make the Model Test Before You Review
One of the most expensive mistakes in AI-assisted coding is reviewing output too early.
The model writes a patch.
You skim it.
It looks plausible.
Then one of three things happens:
- the patch does not actually solve the bug
- it breaks an adjacent behavior
- it quietly ignores the constraint you cared about most
At that point, the model did not really save you time.
It just moved the debugging work into code review.
That is why I like verification loops.
A verification loop means the model does not stop at “here is the answer.”
It has to check its own work against explicit criteria before handing it to a human.
The basic idea
Instead of a one-step prompt:
Fix this bug.
Use a three-step workflow:
- identify the likely cause
- propose the smallest reasonable fix
- verify the fix against tests, constraints, and edge cases
The important part is that step 3 is not optional.
Why this works
LLMs are good at producing plausible code.
That is not the same as producing verified code.
A verification loop improves reliability because it forces a second pass focused on:
- checking assumptions
- comparing the result to the original task
- looking for regressions
- identifying missing tests
- surfacing uncertainty instead of hiding it
In other words, it separates generation from evaluation.
Humans do this naturally.
Models need to be told.
A prompt shape that works well
Here is a compact version:
Task: fix the reported bug.
Process:
1. explain the likely root cause in 3 bullets or fewer
2. propose the smallest patch that addresses it
3. verify the patch against the acceptance criteria below
4. if verification fails, revise once before returning the final answer
Acceptance criteria:
- fix addresses the reported failure mode
- no unrelated files changed
- edge cases are acknowledged
- tests to prove the fix are listed
- uncertainty is called out explicitly
Return:
- root cause
- patch summary
- tests run or proposed
- verification notes
- remaining risks
This is not fancy.
It is just structured.
Example: a pagination bug
Say the bug report is:
API returns duplicate records when paginating with
updated_at DESC.
A weak AI response might jump straight into code changes.
A verification-loop response should first say something like:
- ordering by a non-unique column can produce unstable page boundaries
- equal timestamps likely cause duplicates across page fetches
- fix probably requires a tie-breaker in ordering or cursor logic
Then propose the patch.
Then verify it:
- does the patch introduce a stable secondary sort key?
- are duplicates still possible for equal timestamps?
- what happens on empty pages?
- what test would fail before and pass after?
That is a much safer flow than “here is some code, good luck.”
The minimum verification checklist
If you want a short reusable loop, use this checklist.
1. Did we solve the actual reported problem?
Not a nearby problem. The actual one.
2. Did we respect scope?
This catches the classic AI failure mode where a small bug fix becomes a mini-refactor.
3. What evidence supports the change?
That can be:
- existing test results
- a proposed failing test
- reasoning tied directly to the code path
- log output or reproduction steps
4. What might still be wrong?
You want the model to surface uncertainty before a human discovers it the hard way.
5. What should a reviewer check first?
This focuses human attention where it matters.
Separate generation from judgment
One practical trick: make the model write the patch and a review note about the patch.
For example:
After proposing the fix, switch roles and act as a skeptical reviewer.
List the top 3 ways this patch could still be wrong.
That often reveals hidden assumptions fast.
Not always.
But often enough to be worth the extra tokens.
A TypeScript helper for structured loops
If you automate coding workflows, even a tiny wrapper helps.
type VerificationResult = {
solvedReportedIssue: boolean;
scopeRespected: boolean;
testsCovered: boolean;
openRisks: string[];
reviewerFocus: string[];
};
Now the workflow can require those fields before passing work onward.
That is the key idea: verification should produce an artifact, not just a vibe.
Common failure modes a loop catches
Missing tests
The patch looks fine until you ask, “what test proves it?”
Constraint drift
The model fixes the bug but ignores “minimal change only” or “do not touch schema.”
Fake certainty
Without a review phase, the answer may sound more confident than the evidence supports.
Local success, system failure
A patch may fix one path while breaking another.
A verification section is where those risks should show up.
When to use verification loops
Use them when:
- the change affects production code
- the cost of a wrong answer is non-trivial
- review time is expensive
- the task is narrow enough to verify concretely
You probably do not need a heavy loop for trivial transformations.
But for bug fixes, migrations, and anything user-facing, they pay for themselves quickly.
A lightweight final template
Generate the solution.
Then verify it.
Only return the final answer after checking:
- does it solve the reported issue?
- what evidence supports it?
- what tests prove it?
- what constraints were respected?
- what risks remain?
That tiny addition changes the quality of AI coding output more than most prompt tricks.
If your current workflow feels like the model is handing you half-finished thoughts, the fix may not be a smarter model.
It may just need a loop.
Top comments (0)