When a repository is submitted to IntentGuard, the first thing the pipeline does is nothing that any other code analysis tool does.
It does not read the code.
It reads what the code was supposed to do.
That single design decision — reading intent before reading implementation — is the architectural foundation everything else is built on. I want to explain why we made it, what it requires, and what it changes about the findings you get out the other side.
The question nobody was asking automatically
Every code analysis tool in existence — static analysers, linters, security scanners, SAST platforms — starts from the same place. It reads the code and asks: what is in here? What patterns are dangerous? What vulnerabilities exist?
These are useful questions. There are excellent tools answering them.
The question none of them ever asked is: does this code do what it was designed to do?
Not "is this code clean?" Not "is this code secure?" But: does this implementation reflect the product that was specified, promised to users, committed to investors, and stated in the compliance documents?
That is a different question. And it turns out, you cannot answer it if you start from the code — because the code itself cannot tell you what it was supposed to be.
Pass 1 — Building the intent model
The first pass of the Intent Agent never receives source code. This is an architectural constraint, not a configuration option.
It receives the human-stated intent: the product description the user writes at audit time, the README, any specification documents that have been uploaded, and the repository file tree — directory structure and file names only, no content.
From these inputs, it constructs what we call the Intent Model — a structured representation of what this product was designed to do. What features were claimed. What non-functional properties were promised. What deployment context was assumed. What compliance obligations were stated.
The Intent Model is the baseline. Every finding in an IntentGuard audit is anchored to a claim in the Intent Model — not a pattern in the code, not a rule in a rulebook, but a specific thing the product was supposed to do or be.
There is an important epistemic reason why Pass 1 never reads the code. If it did, it would build an intent model anchored to what the code does — and would naturally generate claims that match the implementation. That defeats the entire purpose. The intent model must come from human-stated intent, not from what the code actually contains. The gap between those two things is the product.
When the inputs are rich — a detailed description, a thorough README, uploaded specification documents — the resulting Intent Model is high confidence and highly specific. When the inputs are thin — a two-sentence description and no documentation — the Intent Model is weaker, and the audit report says so explicitly. Garbage in, limited analysis out. We tell users when this is the case rather than pretending otherwise.
Pass 2 — Comparing intent against evidence
Pass 2 receives the Intent Model and does something that is not sending the entire codebase to a language model.
It retrieves semantically relevant code chunks.
For each claim in the Intent Model, we embed the claim and retrieve the code most likely to confirm or contradict it — using vector similarity against the embedded code chunks stored at ingestion time. The model never sees the full codebase. It sees the code that is most relevant to each specific intent claim.
This matters for two reasons. First, it is faster and cheaper than full-codebase analysis. Second, and more importantly, it produces better results — because a model asked to evaluate one specific claim against relevant evidence will outperform a model given thousands of lines of unrelated code and asked to find everything wrong with it.
For each intent claim, Pass 2 produces one of two finding types:
confirmation or violation.
A confirmation means the code evidence supports the claim. The feature was implemented as stated. The architectural constraint was respected. The compliance obligation is present in the implementation.
A violation means the code contradicts the claim. The feature was stated but not implemented. The architectural constraint was declared and silently ignored. The compliance obligation exists in the spec and is absent from the code.
Both types matter. This is one of the things that makes IntentGuard structurally different from tools that only report problems — 30 to 40 percent of every audit report is confirmations, because knowing what is solid is just as useful as knowing what needs fixing. A codebase where 85 percent of intent claims are confirmed is not a failing codebase. It is a codebase with a known, bounded set of gaps. That is a very different thing to work with.
Why this changes what findings mean
Most security and code analysis findings are context-free. "Hardcoded credential detected at line 47" is a finding about the code. It is real and it matters.
An IntentGuard finding is different. It is a finding about the relationship between the code and the intent behind it.
"This product stated that all user data would be processed in the EU. The database connection string defaults to a US-East endpoint" is not just a configuration finding. It is an intent mismatch — the code contradicts a specific commitment that was made about the product.
That is a categorically different kind of finding. It has different stakeholders, different urgency, and different remediation logic. A developer finding the first one fixes a config. An exec or investor seeing the second one understands a business risk.
After Pass 2 completes, the Intent Model is passed to five specialist agents — Architecture, Security, Compliance, AI Governance, and Dependency — each of which independently audits the codebase against that shared baseline. None of them receive each other's outputs. All of them work from the same Intent Model.
That shared baseline is what makes the findings from different agents comparable, composable, and trustworthy.
The part that surprised us most
When we started running audits on AI-generated codebases, we expected to find security issues. We expected to find dependency vulnerabilities. We expected to find compliance gaps.
What we did not expect was how consistent the intent drift pattern was.
Codebases built with AI coding assistants — Cursor, Copilot, Claude, Gemini — tend to implement features correctly in isolation. Individual functions work. Tests pass. The CI pipeline is green.
But over iterations, the implementation drifts from the intent. Architectural constraints that were stated in the original design are quietly reversed by an AI assistant that did not have that context. Compliance obligations that were present in the product description are absent from the implementation because they were never included in a prompt. Data flows that were specified as EU-only end up routing through US infrastructure because the assistant made a sensible default choice without knowing the regulatory requirement.
None of this shows up in a security scan. None of it triggers a linting rule. It only surfaces when you compare the code against the intent — which is exactly what the two-pass pipeline was designed to do.
Building IntentGuard in public from Johannesburg 🇿🇦. If you are thinking about the intent-vs-implementation gap in AI-generated codebases, or have questions about the retrieval architecture, I would like to hear from you in the comments.
Olebeng · Founder, IntentGuard · intentguard.dev
Top comments (0)