DEV Community

wintrover
wintrover

Posted on • Originally published at wintrover.github.io

The Most Dangerous Word in AI Coding: "Verified"

Got a "Verified" result from my formal verification engine.

Problem was, it was completely wrong.


The Setup

Looking at a simple function: checkType from Bitcoin Core.

The engine generated this SMT query:

(assert (= throwsRuntimeError (not (= typ expected))))
(assert (= typ expected))
(assert throwsRuntimeError)
Enter fullscreen mode Exit fullscreen mode

At first glance? Looks fine.

But there's a fatal flaw in there.


The Contradiction

Unpack it and here's what you get:

  • Error occurs when typ != expected
  • But we're assuming typ == expected
  • While also asserting "an error occurred"

Boil it down:

typ == expected
AND simultaneously: typ != expected
Enter fullscreen mode Exit fullscreen mode

Logically impossible.


What the Solver Did

Z3 (or any SMT solver) takes one look and concludes:

Unsat (Unsatisfiable)

In formal verification, this usually means:

"No execution path exists where the error occurs."

So the engine outputs:

✅ Verified


Where It Went Wrong

Here's the thing.

The solver didn't prove the code safe.

It proved the question itself was invalid.


Vacuous Truth

This is a classic trap in formal verification.

The definition is simple:

If the premise is impossible, the statement is always true.

Example:

"If the sun rises in the west, I am God."

Logically true.\
Because the premise can never hold.


What Actually Happened

The engine effectively asked:

"In a state where typ == expected, can an error occur?
Given errors only happen when typ != expected?"

The solver's answer is clear:

"No such state exists."

Then the engine interprets:

"No error possible → safe"

That jump is the problem.


Verified ≠ Correct

Key point:

"Verified" doesn't mean correct.

Sometimes it means:

Your model is broken.

This case was exactly that:

  • Actual logic wasn't captured
  • Contradictory constraints were generated
  • Solver just short-circuited

Didn't verify the code.\
Logic just crashed.


Why This Matters Now

We're in this flow:

  • AI writes code
  • AI writes tests
  • AI explains "correctness"

Problem:

None of that guarantees truth

Because:

AI optimizes for plausibility, not consistency

LLMs don't "resolve" contradictions.\
They continue them.


The Real Danger: False Confidence

This is worse than a failing test.

Test fails → problem visible
Verified (fake) → problem hidden
Enter fullscreen mode Exit fullscreen mode

In security or finance logic? Catastrophic.


What's Needed

Formal verification systems need at minimum:

1. Sanity Check (Premise Validation)

Before proving anything:

Are these assumptions even possible together?

2. Vacuity Detection

Catch cases that pass because "nothing could happen."

3. Mutation Testing

Break the code intentionally:

If it still verifies, your verification is broken.


Axiom

That's why I'm building Axiom.

Not to replace AI.

To sit on top as a "verification layer."

Role is simple:

  • This holds
  • This doesn't
  • Or: this question is invalid from the start

Final Thought

If your system says "Verified,"

Ask this first:

Is the question itself valid?

If you can't answer that,

You don't have verification.

You have a well-crafted illusion.

Top comments (0)