wintrover

Posted on Mar 20 • Originally published at wintrover.github.io

Why We Still Don't Trust AI-Generated Code: The Archright Trinity

#devlog #ai #formal #verification

"You just can't trust code written by AI."

Until right before I decided to resign, this was the sentence I heard most often in real engineering teams.
The paradox was obvious: organizations wanted "10x productivity" from AI, yet deeply distrusted the output.

I stood in the middle of that contradiction, in agony, drilling into the root cause.
Why do we fail to trust AI-generated code? Is it simply because the tool is imperfect?

No.
My conclusion was this: the real problem is a distorted process that burns human labor to patch AI uncertainty.

Organizations forced engineers to review hundreds of lines produced in seconds through naked-eye inspection and overtime.
That was not a productivity gain.
It was the hell of review labor.
I found that this distrust consistently maps to three fundamental deficits.

1) Deficit of Intent: Is this code truly aligned with my design context?

When intent is not preserved, teams cannot prove whether generated code matches architectural decisions.
The output may look correct, yet still violate what the builder actually meant.

2) Deficit of Stability: Will it break at runtime or quietly degrade performance?

Without deterministic controls, generated code remains a probability game.
Even if it passes quickly, hidden runtime failures and regressions can emerge later.

3) Deficit of Security: Does it work now but plant future vulnerabilities?

Many AI outputs are operationally acceptable in the moment but logically under-proven.
That gap becomes a delayed risk multiplier.

Archright exists to solve this asymmetry as a system.
Instead of turning humans into disposable verification parts, I translated my resignation declaration of deterministic integrity into three concrete technical pillars.

1. Thought Trajectory System: Freezing Intent

It freezes builder intent and context as durable records, like a GitHub commit log for reasoning.
It fixes the architect's thought flow as explicit data so AI inference does not remain a black box.
That creates transparent intent anyone can onboard from immediately, while slashing communication cost.

This system is not a simple log.
In Archright, intent is used through this flow:

Requirements input
→ Capture the architect's intent in a structured form
→ Use it as the reference point across every generation/verification step
→ Validate consistency against "intent," not just against code

In short, code is only one output that must satisfy this intent.

2. Nim Programming Language: A Trinity of Productivity, Performance, and Stability

Why Nim instead of Rust?
Rust is excellent for performance and safety, but often sacrifices production velocity, and its less human-friendly syntax can become a major trigger for AI-agent hallucinations.
Nim preserves performance and stability while delivering exceptional readability.
It is the optimal choice for a high-efficiency engine where both agents and humans communicate with clarity.

Language choice is not a matter of taste.
In Archright's flow:

Intent (high level) → Constraints (intermediate form) → Code (low level)
these three layers must remain continuously connected.

We chose a language that minimizes the cost of maintaining this connection.
Rust is powerful at solving "is this code safe?".
However, Archright addresses an earlier question:
"is this code correct in the first place?"

We chose a language that allows this question to be handled before compile-time output.
Intent and constraints are verified first, before they are converted into code.

3. Formal Verification: Mathematical Proof

Trying to block probabilistic failure with another probabilistic tool is mathematically hollow.
The emerging pattern of AI code review (such as Claude Review) is still a 99%-accurate AI inspecting code produced by another 99%-accurate AI.
In that setup, accuracy may rise to 99.99%, but it can never become 100%.

Most security incidents emerge precisely from that neglected 0.01% gap.
In engineering, "almost certain" is a synonym for "not certain."
Archright internalizes mathematical verification and authorization tools such as Z3 Solver, Lean 4, and Cedar in its engine.
Instead of probabilistic comfort—"tests passed"—it proves "no exception exists" mathematically and frees engineers from review labor.

This verification is not a post-hoc review.
In Archright, it runs in this order:

Convert intent into verifiable rules
Validate those rules hold across all states
Search automatically for counterexamples that break the rules
Stop code generation when a counterexample is found
Generate code only when all constraints hold

For example:
“A user must only be allowed to read their own data” as a security condition means,
→ regardless of incoming request,
a user must only be allowed to read their own data.

If a request is sent with another user's ID:
→ it is immediately detected as a counterexample and code generation is stopped.
→ if even one violating case exists, that logic is not generated.

In short, we do not "catch bugs."
We create a state where bugs cannot exist.

Reclaiming the Joy of Building

On top of these three pillars, engineers are no longer janitors cleaning up AI-generated trash.
They step away from tedious debugging and communication bottlenecks, and return as builders focused only on business-logic design and creative architecture.

AI writes code.
Archright creates a state where that code cannot be wrong.

That is exactly the new software-engineering standard Archright proposes, and the reason I began this journey.

DEV Community