Enyi Emmanuel

Posted on May 30

Building a Verification-First AI Coding Agent: Why I Abandoned "Generate-and-Pray"

#ai #programming #go #opensource

In the race to build the ultimate AI coding assistant, the industry has settled on a shared, deeply flawed paradigm. Let’s call it Generate-and-Pray.

Whether you are using Cursor, GitHub Copilot, Cline, or custom wrapper scripts, the flow is identical:

You prompt the LLM.
The LLM generates a code patch.
The tool writes that patch directly to your filesystem.
You, the human, are forced to be the verification layer. You review the diff, run the compiler, catch hallucinated package imports, execute the test suite, and rollback when things inevitably blow up.

This is chaotic, exhausting, and unsafe.

I wanted an assistant that acts like a senior engineer. Someone who tests and compiles their code before showing it to me. So, I built Kode: a contrarian, verification-first AI coding agent.

Here is why we need to shift from generation to verification, and the engineering details of how Kode does it.

The Thesis: No Generation Without Verification

Kode is built on a simple rule: The LLM is the generative engine, but a local Go orchestrator is the security layer.

Every time the model generates a patch, it passes through a static, pre-compiled Go binary (kode.exe) that executes 9 deterministic verification gates in under 50 milliseconds before a single byte touches your active filesystem. If a gate fails, the patch is rejected, and the compiler-grade error is fed back to the LLM to self-correct.

                  ┌─────────────────────────┐
                  │      User Prompt        │
                  └────────────┬────────────┘
                               ▼
                  ┌─────────────────────────┐
                  │  LLM Generates Patch    │
                  └────────────┬────────────┘
                               ▼
                  ┌─────────────────────────┐
                  │  9 Verification Gates   │◀───┐ (Self-Correction Loop)
                  └────────────┬────────────┘    │
                               │                 │
                      [Pass]?  ├─(No)────────────┘
                               │
                             (Yes)
                               ▼
                  ┌─────────────────────────┐
                  │    Write to Filesystem  │
                  └─────────────────────────┘

By shifting safety-checks left directly into the editor, the user is never the debugger.

Under the Hood: The 9 Verification Gates

To make pre-write verification viable, checks must run near-instantaneously. Here is how the compiled Go engine enforces safety:

AST Syntax Gate: Parses modified files using official Tree-sitter bindings (precision AST parser), falling back to regex heuristics when CGo is unavailable. Parse error = hard block.
Imports Gate: Cross-references every generated import path against the local dependency graph. No more hallucinated npm or Go packages.
Calls Gate: Validates that function and method call sites map to real, existing symbols with matching signatures.
Blast Radius Gate: Walks the dependency graph backward. If the patch affects more files downstream than your threshold allows, it's blocked.
Architecture Gate: Enforces module boundaries (e.g. database layers are blocked from importing route handlers).
Security Gate (SAST): Runs a compiled local SAST engine over the AST to block SQL injections, XSS, and hardcoded credentials.
Sandbox Replay Gate: Ephemerally executes code in a CPU-bounded sandbox to trap infinite loops, memory leaks, and rogue sockets.
QR Code Tunnel Gate: Boots a secure public dev tunnel for local web servers and prints a QR code in your terminal so you can preview layout changes instantly on your phone.
Browser E2E Gate: Generates and runs headless Playwright scripts on your dev server, capturing UI recordings and rolling back if console errors are caught.

3 Killer Features No Incumbent Offers

Building a verification engine opened the door to capabilities that standard extension wrappers simply cannot implement:

1. Ghost Branches (Survival of the Fittest)

Why run one prompt when you can run three? Kode can spawn parallel git worktrees (Ghost Branches) to explore different implementation paths. Each path runs through the Verification pipeline and test suites. Kode evaluates the results, scores them, and automatically merges the highest-scoring candidate back into your workspace.

2. Blindfold Mode (Enterprise Privacy)

For corporate developers, sending proprietary code to third-party LLMs is a compliance nightmare. Blindfold Mode performs a local AST parse and SHA-256 obfuscates all identifiers (variable names, types, functions, packages) before payloads leave your machine. A local mapping table translates them back on response. The cloud model sees your code's logic, but never its intellectual property.

3. Hands-Free Voice Programming (`kode voice`)

No typing required. Just run kode voice, speak your task, and the local mic captures and transcribes it using Whisper. The text is immediately fed into the Plan-Generate-Verify pipeline.

Open Source Licensing: The MIT + AGPLv3 Hybrid Model

To protect against SaaS wrappers while retaining enterprise-friendly local execution, Kode adopts a dual-license model:

MIT License: The core developer tooling (CLI, TUI, internal modules, and web app) is fully permissive.
AGPLv3 License: The cloud-ready LLM gateway and routing proxy server (cmd/gateway/ and internal/gateway/) require any hosted SaaS wrappers to open-source their orchestration code.

Getting Started

Kode is a Bring Your Own Key (BYOK) platform. It compiles to a lightweight ~10MB Go binary with zero external runtime dependencies.

Installation

macOS / Linux:

  curl -fsSL https://raw.githubusercontent.com/sicario-labs/kode/master/script/install.sh | bash

Windows (PowerShell):

  irm https://raw.githubusercontent.com/sicario-labs/kode/master/script/install.ps1 | iex

Termux (Android): Build and compile on ARM64 Termux:

  pkg install golang nodejs git clang make
  go build -o bin/kode ./cmd/kode
  cd third_party/opencode && npm install

Once installed, scaffold your configuration with:

kode init

And start a task loop:

kode loop "add JWT validation to the login route"

Check out the full repository and contribute at github.com/sicario-labs/kode. We'd love to hear your thoughts on shifting the AI coding paradigm from generation to verification!

Top comments (4)

Harjot Singh • May 31

"Generate-and-pray" is the perfect name for the dominant (broken) pattern, and "verification-first" is exactly the right correction - it inverts the default so the agent has to prove the code works before it's accepted, instead of you discovering it doesn't three steps later. The hard design question I'd love your take on: what counts as verification? Tests are the obvious one, but tests the agent also wrote can be a closed loop of self-delusion (it writes the test to pass its own bug). The strongest setups mix agent-authored checks with deterministic ones the agent can't game.

This is the exact architecture I bet Moonshift on (a multi-agent pipeline shipping a prompt to a real SaaS on your own GitHub + Vercel) - generate, then gate against verification the generator doesn't control, so a bad step fails loud at the boundary instead of propagating. Verification-first is the whole difference between a demo and something shippable. ~$3 flat per build, first run free. Really aligned - how do you stop the agent from gaming its own tests? That's the failure mode I find hardest to fully close.

Enyi Emmanuel • May 31

Spot on, Harjot. That closed loop of self-delusion is the exact reason typical agents fail in production. If the LLM generates the test, it will just generate a test that perfectly validates its own hallucination.

Moonshift's approach of gating against external boundaries like GitHub or Vercel is brilliant for the deployment side. With Kode, we took that exact same philosophy and pushed it all the way left, directly into the local compiler step.

To completely close the failure mode of the agent gaming its own tests, we took the gating mechanism out of the LLM's hands entirely.

Instead of relying on agent-authored checks, Kode intercepts the generated code and forces it through a deterministic Go engine via a 9-Gate Verification Pipeline before it ever touches the main development branch. Because these gates are structural and mathematical, the agent physically cannot game them.

For example, with our AST Syntax and Calls Gates, we parse the code down to Tree-sitter Abstract Syntax Trees locally. If the LLM hallucinates a function, the Calls Gate verifies it against the actual local codebase symbols and blocks it instantly. It can't fake a symbol that doesn't exist.

Our Imports Gate cross-references the local dependency graph. If the model hallucinates an npm package, the gate fails it.

Similarly, our Architecture Gate mathematically enforces package boundaries, like stopping a database layer from importing a routing layer.

By shifting the validation away from LLM-authored tests and into a compiled static analysis pipeline, we strip the agent of its ability to mark its own homework. It becomes a pure mathematical pass/fail.

TxDesk • May 31

Generate-and-pray is the right name and the 9-gate pipeline is the right answer for code. The thing I keep getting stuck on in my domain (DeFi support agents reading on-chain state) is that verification-first only works when there's a deterministic verifier on the other end.

For code you have it: AST parsers don't disagree, compilers don't equivocate, tests pass or fail. The blast radius gate, the AST syntax gate, the imports gate - these all have ground truth.

For on-chain state the verifier is the chain, which is technically deterministic but practically not: the RPC you query might be 4 blocks behind, the mempool has pending state that affects your simulation, the protocol's own view (Aave's oracle, Compound's getPrice) diverges from CoinGecko by enough to flip a liquidation decision. So you build a verifier and now you have two answers and have to decide which one is the truth.

The pattern I landed on is similar to your self-correction loop but with multi-source agreement instead of single-gate pass/fail. The agent's interpretation has to match the protocol's own ABI-decoded view AND a recent eth_call against the live contract. Disagreement isn't a failure - it's the high-confidence signal that something is off and the user needs to see both readings.

Ghost Branches is clever. Curious whether you've thought about a similar pattern for verification itself - running the same patch through different verifier implementations and treating disagreement as a signal rather than picking a winner.

Self-Correcting Systems • Jun 1

The 9-gate pipeline solves verification at the right layer: deterministic, pre-write,
applied to the output before it touches the filesystem. TxDesk's point is the hard edge
— verification only holds when the verifier has ground truth to check against.

There's a different layer where the same problem appears, upstream of generation rather
than downstream: the instructions that shaped what the model generated. The pipeline
verifies the output is structurally sound. It doesn't verify that the instruction
governing the output was authorized to govern it.

An LLM acting on a superseded policy — "always use JWT" when the team moved to OAuth
six months ago — can generate code that passes all 9 gates cleanly. Valid AST, resolved
imports, real symbols, architecture boundaries intact, blast radius within threshold.
Perfect verification score. Wrong code. Not because generation failed but because
retrieval pulled the wrong instruction and nothing in the verification layer had
visibility into that.

I've been running experiments on this in agent memory systems — retrieval layers that
optimize for relevance to the query rather than authority to govern the action. The
finding: a retriever can select a memory that's technically correct, and the agent acts
on it confidently, and the downstream output is structurally valid, and the action is
still wrong because the instruction didn't have the authority metadata to say "this
governs this class of action."

Ghost Branches is the right instinct pushed one layer further: run multiple
instruction-retrieval paths alongside multiple generation paths, and treat disagreement
between retrieved instructions as a signal before generation even starts.