wintrover

Posted on Apr 21 • Originally published at wintrover.github.io

Inside Axiom’s Verification Kernel: BMC, UAP, Lean Replay, and the Proof Vault

#axiom #formal #verification #bmc

When people hear “formal verification”, they imagine a single solver run that prints Verified.

That’s not what Axiom is optimizing for.

Axiom is built around one requirement: if you say something is verified, you must be able to reproduce that claim deterministically—including the exact query shape, the exact counterexample (if SAT), and the exact Lean replay (if promoted to safe).

1) BMC Core: Bounded Models, Not Hand-Wavy Guarantees

The BMC core is the “boundary model” layer of the engine: it takes a transition system, cuts it at a fixed bound, and asks the only question that matters inside that boundary:

“Does there exist a counterexample within N steps?”

In Axiom’s reference race model, that boundary is explicit:

const
  bmcStepBound* = 10

That bound is not a random default—it’s the contract that defines what “safe” even means in this mode. The semantics are also strict: a race is promoted only when the final state falls outside the set of sequential outcomes computed from the same TransitionRule.

The important architectural decision is that the rule and the safety property are not hardcoded into the solver path. Instead:

External behavior is injected as a TransitionRule(op, delta) model.
The SSOT of the safety property is TransitionRule.invariant.
Encoding is performed as SMT-LIB2 (QF_LIA) and executed through Z3.

This is also where determinism stops being “a preference” and becomes “mechanically enforced”.

The Z3 query text is stamped with fixed-seed and single-thread constraints, and every invariant is asserted as a named clause so UNSAT cores are stable and actionable:

(set-option :fixed_seed 17)
(set-option :smt.threads 1)
...
(assert (! (<= size 512) :named inv_bounded_growth.max_size))
...
(get-unsat-core)

In other words: the BMC core is not just “Z3 integration”. It’s a deterministic boundary-model execution contract where solver outputs can be replayed and audited.

2) UAP Pipeline: Universal Automated Proof as an Orchestrated Repair Loop

BMC gives you a controlled counterexample search. But the real world is not a single bounded query; it’s a sequence:

Extract a model.
Find a counterexample.
Repair the implementation and/or strengthen invariants.
Re-prove.
Preserve only the minimum reusable evidence.

That sequence is what Axiom calls UAP: Universal Acceptance Pipeline.

UAP is not “just” automation. It has a strict output contract: one JSON result that includes translation, scans, repair rounds, proof artifacts, and promotion snapshots.

The core orchestrator is deliberately structured as a set of roles (Detective / Legislator / SWAT / Judge), because each stage produces different kinds of artifacts:

AST + Source
  → UniversalTranslation (AST → IL/SMT mapping)
  → initialScan (counterexample search + trace grounds)
  → legislate (baseline invariant draft)
  → executeRaid (repair + PoC + BDD generation hooks)
  → certify (Lean replay + invariant finalization)
  → investigate (final scan + verdict normalization)
  → writeProofArtifact + writePromotionSnapshot

Two constraints matter here:

“Counterexample not found” is explicitly not equal to “safe”.
Safe promotion is prohibited unless Lean replay succeeds, even if SMT returns UNSAT.

So UAP is a self-healing pipeline, but it’s also an anti-vacuity pipeline: it refuses to treat solver silence as proof.

3) Lean 4 Reproducibility: Z3 Is the Oracle; Lean Is the Kernel

If SMT is the “search engine”, Lean 4 is the “court of law”.

The engine’s safety promotion rules explicitly treat Z3 UNSAT as a hint, not a conclusion. Final safe promotion requires a Lean-kernel-verifiable result (proof term or tactic replay).

This is why Axiom’s Lean side has hard constraints:

Semantics are not allowed to hide inside axioms like axiom transition.
Transition systems must be reconstructed through def nextState and inductive Step.
Generated specs must be replayable under a pinned toolchain and stable tactic strategy.

Operationally, Lean replay is not a one-shot script. It’s a dynamic proof search loop driven by compiler feedback through Lean’s --server JSON-RPC path (LSP-style). The system treats “proof search” as an iterative, deterministic process: observe goals, propose tactics, replay, repeat.

The point is not that Lean always succeeds.

The point is that when it does succeed, it is reproducible—and when it fails, the failure itself is reproducible and promotable as actionable constraints.

4) Proof Vault: Fingerprint-Keyed Evidence Compression and Reuse

Most systems keep “proof results” as human-facing logs.

Axiom treats proofs as assets.

The Proof Vault is where those assets are stored under stable keys so they can be reused instead of re-derived. Two kinds of vaulting matter:

1) Promoted invariants (UAP-level): the engine preserves the latest activeInvariant and refinedInvariant as snapshots and auto-reinjects them into later runs for the same modulePath/functionName.

2) Lean replay tactics (kernel-level): successful tactics are stored keyed by theorem name plus a complexity fingerprint. On the next run, they become a warm-start strategy—preferentially reusing the path with fewer attempts and shorter tactic sequences.

That “fingerprint-first” design is what turns the vault into a compression mechanism:

You discard bulky intermediate artifacts (full ASTs, large traces) after success.
You keep only minimal, fingerprint-addressable evidence that can deterministically recreate the proof attempt.

This is also why fingerprints are treated as first-class inputs across the pipeline (source checksum, SMT fingerprint, invariant fingerprint). If the fingerprint changes, Axiom does not pretend the old proof still applies—it demotes and schedules re-proof precisely where the change occurred.

Closing: A Kernel, Not a Collection of Tools

BMC, UAP, Lean replay, and the Proof Vault are not separate features.

They are one kernel with one invariant:

If Axiom says “verified”, it must be replayable as the same query, the same evidence, and the same proof—deterministically.

That is the only definition of verification that survives contact with AI-generated code.

DEV Community