Olebeng

Posted on Mar 24 • Edited on Apr 2

We read the spec before we read the code. Here is why that changes everything.

#ai #devops #programming #architecture

When a repository is submitted to IntentGuard, the first thing the pipeline does is nothing that any other code analysis tool does.

It does not read the code.

It reads what the code was supposed to do.

That single design decision — reading intent before reading implementation — is the architectural foundation everything else is built on. I want to explain why we made it, what it requires, and what it changes about the findings you get out the other side.

The question nobody was asking automatically

Every code analysis tool in existence — static analysers, linters, security scanners, SAST platforms — starts from the same place. It reads the code and asks: what is in here? What patterns are dangerous? What vulnerabilities exist?

These are useful questions. There are excellent tools answering them.
The question none of them ever asked is: does this code do what it was designed to do?

Not "is this code clean?" Not "is this code secure?" But: does this implementation reflect the product that was specified, promised to users, committed to investors, and stated in the compliance documents?

That is a different question. And it turns out, you cannot answer it if you start from the code — because the code itself cannot tell you what it was supposed to be.

Pass 1 — Building the intent model

The first pass of the Intent Agent never receives source code. This is an architectural constraint, not a configuration option.

It receives the human-stated intent: the product description the user writes at audit time, the README, any specification documents that have been uploaded, and the repository file tree — directory structure and file names only, no content.

From these inputs, it constructs what we call the Intent Model — a structured representation of what this product was designed to do. What features were claimed. What non-functional properties were promised. What deployment context was assumed. What compliance obligations were stated.
The Intent Model is the baseline. Every finding in an IntentGuard audit is anchored to a claim in the Intent Model — not a pattern in the code, not a rule in a rulebook, but a specific thing the product was supposed to do or be.

There is an important epistemic reason why Pass 1 never reads the code. If it did, it would build an intent model anchored to what the code does — and would naturally generate claims that match the implementation. That defeats the entire purpose. The intent model must come from human-stated intent, not from what the code actually contains. The gap between those two things is the product.

When the inputs are rich — a detailed description, a thorough README, uploaded specification documents — the resulting Intent Model is high confidence and highly specific. When the inputs are thin — a two-sentence description and no documentation — the Intent Model is weaker, and the audit report says so explicitly. Garbage in, limited analysis out. We tell users when this is the case rather than pretending otherwise.

Pass 2 — Comparing intent against evidence

Pass 2 receives the Intent Model and does something that is not sending the entire codebase to a language model.

It retrieves semantically relevant code chunks.

For each claim in the Intent Model, we embed the claim and retrieve the code most likely to confirm or contradict it — using vector similarity against the embedded code chunks stored at ingestion time. The model never sees the full codebase. It sees the code that is most relevant to each specific intent claim.

This matters for two reasons. First, it is faster and cheaper than full-codebase analysis. Second, and more importantly, it produces better results — because a model asked to evaluate one specific claim against relevant evidence will outperform a model given thousands of lines of unrelated code and asked to find everything wrong with it.

For each intent claim, Pass 2 produces one of two finding types:
confirmation or violation.

A confirmation means the code evidence supports the claim. The feature was implemented as stated. The architectural constraint was respected. The compliance obligation is present in the implementation.

A violation means the code contradicts the claim. The feature was stated but not implemented. The architectural constraint was declared and silently ignored. The compliance obligation exists in the spec and is absent from the code.

Both types matter. This is one of the things that makes IntentGuard structurally different from tools that only report problems — 30 to 40 percent of every audit report is confirmations, because knowing what is solid is just as useful as knowing what needs fixing. A codebase where 85 percent of intent claims are confirmed is not a failing codebase. It is a codebase with a known, bounded set of gaps. That is a very different thing to work with.

Why this changes what findings mean

Most security and code analysis findings are context-free. "Hardcoded credential detected at line 47" is a finding about the code. It is real and it matters.

An IntentGuard finding is different. It is a finding about the relationship between the code and the intent behind it.

"This product stated that all user data would be processed in the EU. The database connection string defaults to a US-East endpoint" is not just a configuration finding. It is an intent mismatch — the code contradicts a specific commitment that was made about the product.

That is a categorically different kind of finding. It has different stakeholders, different urgency, and different remediation logic. A developer finding the first one fixes a config. An exec or investor seeing the second one understands a business risk.

After Pass 2 completes, the Intent Model is passed to five specialist agents — Architecture, Security, Compliance, AI Governance, and Dependency — each of which independently audits the codebase against that shared baseline. None of them receive each other's outputs. All of them work from the same Intent Model.

That shared baseline is what makes the findings from different agents comparable, composable, and trustworthy.

The part that surprised us most

When we started running audits on AI-generated codebases, we expected to find security issues. We expected to find dependency vulnerabilities. We expected to find compliance gaps.

What we did not expect was how consistent the intent drift pattern was.
Codebases built with AI coding assistants — Cursor, Copilot, Claude, Gemini — tend to implement features correctly in isolation. Individual functions work. Tests pass. The CI pipeline is green.

But over iterations, the implementation drifts from the intent. Architectural constraints that were stated in the original design are quietly reversed by an AI assistant that did not have that context. Compliance obligations that were present in the product description are absent from the implementation because they were never included in a prompt. Data flows that were specified as EU-only end up routing through US infrastructure because the assistant made a sensible default choice without knowing the regulatory requirement.

None of this shows up in a security scan. None of it triggers a linting rule. It only surfaces when you compare the code against the intent — which is exactly what the two-pass pipeline was designed to do.

Building IntentGuard in public from Johannesburg 🇿🇦. If you are thinking about the intent-vs-implementation gap in AI-generated codebases, or have questions about the retrieval architecture, I would like to hear from you in the comments.

The concepts discussed are my own, the presentation and formating of this post is enhanced by an AI text editor.

Olebeng · Founder, IntentGuard · intentguard.dev

Top comments (2)

Andre Cytryn • Mar 24

the intent drift observation is the most underrated part of this. green tests, passing CI, and yet the codebase quietly drifts from what was actually committed to. the EU data routing example is perfect because no static analysis rule would catch it - the code is doing exactly what it was told to do, just not what the product spec intended. the two-pass design makes sense for that reason. the spec-first read is essentially injecting a ground truth that can't be derived from the code itself. curious about the edge case where intent documents are contradictory or ambiguous - does the intent model surface those conflicts before moving to pass 2?

Olebeng • Mar 25

That is exactly the right edge case to probe Andre, and the honest answer is nuanced.

When input documents are contradictory or ambiguous, the Intent Model does not explicitly name the conflict between Document A and Document B. What it does is reflect the ambiguity in its confidence score — contradictory signals produce a lower-confidence baseline, and the audit report surfaces that explicitly. The user is told the intent analysis is limited and prompted to upload richer or clearer documentation before re-running.

So the conflict is surfaced — just not in the form of "your README says X and your spec says Y." It surfaces as reduced certainty in the intent claims themselves, which then flows through to lower-confidence findings downstream.

Whether that is the right design is an open question. Explicit contradiction detection — flagging specific document conflicts before Pass 2 runs — is a cleaner signal for the user. It is on the roadmap but not in the current build.

The reason we made peace with the confidence-based approach for now is practical: the worst case is that the user gets a flagged, limited-confidence audit and is prompted to clarify. The audit does not silently produce wrong findings. It degrades gracefully and tells you why.
The edge cases around ambiguous intent documents are genuinely the hardest part of this category to get right.

Curious what your use case is — are you thinking about this from a documentation-heavy product context, or more from the AI-assisted codebase side?