Nova Elvaris

Posted on Mar 21

Verification Loops for AI Coding: Make the Model Test Before You Review

#ai #testing #productivity #programming

Verification Loops for AI Coding: Make the Model Test Before You Review

One of the most expensive mistakes in AI-assisted coding is reviewing output too early.

The model writes a patch.
You skim it.
It looks plausible.
Then one of three things happens:

the patch does not actually solve the bug
it breaks an adjacent behavior
it quietly ignores the constraint you cared about most

At that point, the model did not really save you time.
It just moved the debugging work into code review.

That is why I like verification loops.

A verification loop means the model does not stop at “here is the answer.”
It has to check its own work against explicit criteria before handing it to a human.

The basic idea

Instead of a one-step prompt:

Fix this bug.

Use a three-step workflow:

identify the likely cause
propose the smallest reasonable fix
verify the fix against tests, constraints, and edge cases

The important part is that step 3 is not optional.

Why this works

LLMs are good at producing plausible code.
That is not the same as producing verified code.

A verification loop improves reliability because it forces a second pass focused on:

checking assumptions
comparing the result to the original task
looking for regressions
identifying missing tests
surfacing uncertainty instead of hiding it

In other words, it separates generation from evaluation.

Humans do this naturally.
Models need to be told.

A prompt shape that works well

Here is a compact version:

Task: fix the reported bug.

Process:
1. explain the likely root cause in 3 bullets or fewer
2. propose the smallest patch that addresses it
3. verify the patch against the acceptance criteria below
4. if verification fails, revise once before returning the final answer

Acceptance criteria:
- fix addresses the reported failure mode
- no unrelated files changed
- edge cases are acknowledged
- tests to prove the fix are listed
- uncertainty is called out explicitly

Return:
- root cause
- patch summary
- tests run or proposed
- verification notes
- remaining risks

This is not fancy.
It is just structured.

Example: a pagination bug

Say the bug report is:

API returns duplicate records when paginating with updated_at DESC.

A weak AI response might jump straight into code changes.

A verification-loop response should first say something like:

ordering by a non-unique column can produce unstable page boundaries
equal timestamps likely cause duplicates across page fetches
fix probably requires a tie-breaker in ordering or cursor logic

Then propose the patch.

Then verify it:

does the patch introduce a stable secondary sort key?
are duplicates still possible for equal timestamps?
what happens on empty pages?
what test would fail before and pass after?

That is a much safer flow than “here is some code, good luck.”

The minimum verification checklist

If you want a short reusable loop, use this checklist.

1. Did we solve the actual reported problem?

Not a nearby problem. The actual one.

2. Did we respect scope?

This catches the classic AI failure mode where a small bug fix becomes a mini-refactor.

3. What evidence supports the change?

That can be:

existing test results
a proposed failing test
reasoning tied directly to the code path
log output or reproduction steps

4. What might still be wrong?

You want the model to surface uncertainty before a human discovers it the hard way.

5. What should a reviewer check first?

This focuses human attention where it matters.

Separate generation from judgment

One practical trick: make the model write the patch and a review note about the patch.

For example:

After proposing the fix, switch roles and act as a skeptical reviewer.
List the top 3 ways this patch could still be wrong.

That often reveals hidden assumptions fast.

Not always.
But often enough to be worth the extra tokens.

A TypeScript helper for structured loops

If you automate coding workflows, even a tiny wrapper helps.

type VerificationResult = {
  solvedReportedIssue: boolean;
  scopeRespected: boolean;
  testsCovered: boolean;
  openRisks: string[];
  reviewerFocus: string[];
};

Now the workflow can require those fields before passing work onward.

That is the key idea: verification should produce an artifact, not just a vibe.

Common failure modes a loop catches

Missing tests

The patch looks fine until you ask, “what test proves it?”

Constraint drift

The model fixes the bug but ignores “minimal change only” or “do not touch schema.”

Fake certainty

Without a review phase, the answer may sound more confident than the evidence supports.

Local success, system failure

A patch may fix one path while breaking another.
A verification section is where those risks should show up.

When to use verification loops

Use them when:

the change affects production code
the cost of a wrong answer is non-trivial
review time is expensive
the task is narrow enough to verify concretely

You probably do not need a heavy loop for trivial transformations.

But for bug fixes, migrations, and anything user-facing, they pay for themselves quickly.

A lightweight final template

Generate the solution.
Then verify it.
Only return the final answer after checking:
- does it solve the reported issue?
- what evidence supports it?
- what tests prove it?
- what constraints were respected?
- what risks remain?

That tiny addition changes the quality of AI coding output more than most prompt tricks.

If your current workflow feels like the model is handing you half-finished thoughts, the fix may not be a smarter model.

It may just need a loop.

DEV Community

Verification Loops for AI Coding: Make the Model Test Before You Review

Verification Loops for AI Coding: Make the Model Test Before You Review

The basic idea

Why this works

A prompt shape that works well

Example: a pagination bug

The minimum verification checklist

1. Did we solve the actual reported problem?

2. Did we respect scope?

3. What evidence supports the change?

4. What might still be wrong?

5. What should a reviewer check first?

Separate generation from judgment

A TypeScript helper for structured loops

Common failure modes a loop catches

Missing tests

Constraint drift

Fake certainty

Local success, system failure

When to use verification loops

A lightweight final template

Top comments (0)