<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Olebeng</title>
    <description>The latest articles on DEV Community by Olebeng (@intentguard_ole).</description>
    <link>https://dev.to/intentguard_ole</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3829729%2F1de6e173-176a-4ddd-abbc-2ab5e3ebb962.jpg</url>
      <title>DEV Community: Olebeng</title>
      <link>https://dev.to/intentguard_ole</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/intentguard_ole"/>
    <language>en</language>
    <item>
      <title>Why running every compliance framework on every codebase is wrong - and how we fixed it</title>
      <dc:creator>Olebeng</dc:creator>
      <pubDate>Tue, 14 Apr 2026 06:59:33 +0000</pubDate>
      <link>https://dev.to/intentguard_ole/why-running-every-compliance-framework-on-every-codebase-is-wrong-and-how-we-fixed-it-4g40</link>
      <guid>https://dev.to/intentguard_ole/why-running-every-compliance-framework-on-every-codebase-is-wrong-and-how-we-fixed-it-4g40</guid>
      <description>&lt;p&gt;When we first built the compliance agent in IntentGuard, it ran every framework against every codebase.&lt;/p&gt;

&lt;p&gt;The result was technically thorough and practically useless.&lt;/p&gt;

&lt;p&gt;A Go REST API with no payment processing was being evaluated against PCI DSS. A Python data pipeline with no personal data handling was generating GDPR findings. A non-AI internal tool was receiving EU AI Act violations as its most prominent output.&lt;/p&gt;

&lt;p&gt;The findings were not wrong, exactly. They were irrelevant. And in audit contexts, irrelevant findings are worse than no findings - they train reviewers to ignore output, which is the opposite of what you want.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem with framework-agnostic scanning&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most compliance tools apply frameworks uniformly. You select the frameworks you want evaluated, and the tool checks the codebase against all of them equally. This approach has a surface-level logic to it - better to check more than less.&lt;/p&gt;

&lt;p&gt;The problem is that compliance frameworks are not generic. PCI DSS applies to systems that process payment card data. HIPAA applies to systems handling protected health information. DORA - the EU's Digital Operational Resilience Act - applies to financial sector entities providing ICT services. Running these frameworks against a codebase that does not fall within their scope produces noise, not signal.&lt;/p&gt;

&lt;p&gt;Worse: when a finding from an inapplicable framework appears at the same severity as a finding from an applicable one, the auditor has to mentally filter. That filtering work defeats the purpose of automation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How we addressed it&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before any LLM call, we now run a deterministic classification step. It reads the intent model — the structured representation of what the product was designed to do — and classifies each framework as applicable or not applicable based on what the codebase actually is.&lt;/p&gt;

&lt;p&gt;The classification is deterministic: no probability, no inference, no LLM. It looks for specific signals in the product description and inferred architecture. A codebase described as processing financial account data and using PCI DSS relevant patterns gets PCI DSS evaluated. One that does not, does not.&lt;/p&gt;

&lt;p&gt;When a framework is not applicable, the compliance agent is instructed to produce a single informational finding: "[Framework] — Not applicable to this codebase." Not a critical violation. Not a high severity gap. An informational acknowledgement that the framework was considered and excluded.&lt;/p&gt;

&lt;p&gt;The result is a compliance grid that reflects the codebase's actual regulatory context — not a generic checklist applied uniformly to everything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this matters for the findings you get&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Five frameworks are universal — they apply to every codebase regardless of type: ISO 27001, SOC 2, OWASP ASVS L2, NIST CSF, and CIS Controls v8. &lt;/p&gt;

&lt;p&gt;These are the baseline for any modern software system.&lt;/p&gt;

&lt;p&gt;The remaining eleven frameworks are conditional. GDPR activates on personal data handling. DORA activates on financial sector context. HIPAA activates on health data signals. OWASP API Top 10 activates on REST or GraphQL API patterns.&lt;/p&gt;

&lt;p&gt;This means an IT auditor reviewing a financial services platform gets a compliance grid dominated by the frameworks that matter to their client — not one where ISO 42001 and EU AI Act appear at the top because those happen to be in the list.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The scope question&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The obvious challenge with deterministic scoping is edge cases. A codebase that does not explicitly declare payment processing but accepts card numbers through a generic input handler would not trigger PCI DSS through intent model signals alone — it would surface through the Security Agent's findings instead.&lt;/p&gt;

&lt;p&gt;This is by design. The scoping step uses the intent model, which comes from the product description the user provides. If the description is accurate, the scoping is accurate. If the description is incomplete, the user is told the confidence is low and prompted to provide more context.&lt;/p&gt;

&lt;p&gt;The Security Agent, the Dependency Agent, and the Architecture Agent all run regardless of framework scoping. A PCI DSS relevant vulnerability will still appear as a security finding even if PCI DSS framework evaluation is scoped out. The framework compliance grid and the security finding list are separate outputs from separate agents.&lt;/p&gt;

&lt;p&gt;Building IntentGuard in public from Johannesburg. If you have worked on compliance tooling and have thoughts on the framework scoping problem — particularly around edge cases — I would like to hear them in the comments.&lt;/p&gt;

&lt;p&gt;The concepts discussed are my own, the presentation and formating of this post is enhanced by an AI assitant.&lt;/p&gt;

&lt;p&gt;Olebeng · Founder, IntentGuard · &lt;a href="https://intentguard.dev/" rel="noopener noreferrer"&gt;intentguard.dev&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>security</category>
      <category>grc</category>
      <category>buildinpublic</category>
    </item>
    <item>
      <title>Why we only accept .txt for document uploads - and why that is the right call for now</title>
      <dc:creator>Olebeng</dc:creator>
      <pubDate>Mon, 06 Apr 2026 16:45:26 +0000</pubDate>
      <link>https://dev.to/intentguard_ole/why-we-only-accept-txt-for-document-uploads-and-why-that-is-the-right-call-for-now-4j5k</link>
      <guid>https://dev.to/intentguard_ole/why-we-only-accept-txt-for-document-uploads-and-why-that-is-the-right-call-for-now-4j5k</guid>
      <description>&lt;p&gt;IntentGuard lets users upload specification documents alongside their repository when submitting an audit. The Intent Agent uses these documents — a product requirements document, an architecture spec, an API reference — to build a higher-confidence model of what the codebase was supposed to do before reading a single line of code.&lt;/p&gt;

&lt;p&gt;Currently, we only accept .txt files.&lt;/p&gt;

&lt;p&gt;Every few days someone asks why. The honest answer is worth a post.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PDF is not a text format&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you open a PDF in a viewer, you see clean, readable text. What the viewer is actually doing is interpreting a stream of rendering instructions — glyph positions, font mappings, coordinate transforms — and reconstructing what looks like text from absolute positions on a page.&lt;br&gt;
pdfminer.six, the standard Python library for PDF text extraction, reverses this process. It reads the rendering instructions, maps glyphs to Unicode characters using whatever font encoding the PDF creator chose, and attempts to reconstruct reading order from the x/y coordinates of each glyph.&lt;/p&gt;

&lt;p&gt;This works well for simple, single-column, machine-generated PDFs. For anything more complex — multi-column layouts, tables, scanned documents, PDFs exported from tools that embed fonts as bitmaps — the extracted text can look plausible while being subtly corrupted. Column order gets swapped. Table cells merge. Headers appear in the middle of paragraphs.&lt;br&gt;
Corrupted structure passed to an intent analysis pipeline does not produce an obvious error. It produces quietly wrong intent claims — which is worse.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The security concern&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;PDFs can contain embedded JavaScript, OpenAction triggers that fire on open, malicious stream objects, and external URI references. Processing untrusted PDFs without a purpose-built sandboxed parser is a real attack surface. pdfminer has had CVEs. Handling untrusted binary formats in a pipeline that processes proprietary codebases is not a decision to make under time pressure.&lt;/p&gt;

&lt;p&gt;DOCX has a different surface: Office Open XML relationships to external resources, embedded objects, and macro containers. python-docx handles the common case cleanly but edge cases involving embedded objects or external references require careful sanitisation before any content reaches the analysis layer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why .txt is not a cop-out&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A plain text file is deterministic. There is no binary parsing, no font mapping, no coordinate reconstruction, no embedded objects. It goes into the chunker directly. Its encoding is validated at upload. Its size is enforced client-side at 50KB per file, up to five files.&lt;/p&gt;

&lt;p&gt;The result is that a founder who pastes their product spec into a .txt file gets more reliable intent analysis than one who uploads a beautifully formatted PDF that extracts poorly. Readable structure matters more than file format.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is coming&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;PDF and DOCX upload support is in the Phase D roadmap. The correct approach is a purpose-built extraction pipeline with: sandboxed processing, content validation before the text reaches the chunker, encoding normalisation, and its own test suite. It deserves a dedicated build session and a security review — not a quick dependency add before launch.&lt;/p&gt;

&lt;p&gt;Until then: .txt, and it works well.&lt;/p&gt;

&lt;p&gt;Building IntentGuard in public from Johannesburg 🇿🇦. If you have built document ingestion pipelines that handle untrusted binary input safely, I'd like to hear how you approached the sandboxing problem.&lt;/p&gt;

&lt;p&gt;The concepts discussed are my own, the presentation and formating of this post is enhanced by an AI assitant.&lt;/p&gt;

&lt;p&gt;Olebeng · Founder, IntentGuard · &lt;a href="https://intentguard.dev/" rel="noopener noreferrer"&gt;intentguard.dev&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>security</category>
      <category>buildinpublic</category>
    </item>
    <item>
      <title>Why the same codebase should always produce the same audit score</title>
      <dc:creator>Olebeng</dc:creator>
      <pubDate>Thu, 02 Apr 2026 05:04:11 +0000</pubDate>
      <link>https://dev.to/intentguard_ole/why-the-same-codebase-should-always-produce-the-same-audit-score-1fed</link>
      <guid>https://dev.to/intentguard_ole/why-the-same-codebase-should-always-produce-the-same-audit-score-1fed</guid>
      <description>&lt;p&gt;There is a failure mode in AI-powered analysis tools that does not get talked about enough, and we ran into it directly.&lt;/p&gt;

&lt;p&gt;When you submit the same repository twice — same commit, same inputs, same everything — you should get the same score. If the score changes between runs, the audit is not an audit. It is a random sample.&lt;/p&gt;

&lt;p&gt;Early in testing, we observed score variance across consecutive runs on identical inputs. Not small variance. Meaningful swings — enough to change the risk interpretation of a codebase entirely. A score that sits in one category on one run and a different category on the next is worse than useless for the people who depend on it most: founders preparing investor materials, compliance leads building audit evidence, CTOs making remediation decisions.&lt;/p&gt;

&lt;p&gt;This is a structural problem with LLM-based analysis, not an implementation bug, and it has a structural cause.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where the variance comes from&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Large language models are probabilistic by default. They sample from a probability distribution when generating output. The "temperature" setting controls how much randomness is introduced — higher temperature means more creative, more varied output. Lower temperature means more consistent, more deterministic output.&lt;/p&gt;

&lt;p&gt;For creative tasks — writing, ideation, brainstorming — temperature is a feature. For security analysis, compliance mapping, and architectural assessment, temperature is a liability.&lt;/p&gt;

&lt;p&gt;An LLM running at a non-zero temperature will produce slightly different findings on the same code across consecutive runs. Different findings feed into the scoring model. Different scores come out. The same codebase looks different on Tuesday than it did on Monday for no reason that reflects anything about the code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix and what it requires&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Setting temperature to zero eliminates sampling randomness. Given the same inputs, the model produces the same outputs. That is the starting point.&lt;br&gt;
But there is a second layer of variance that temperature alone does not solve: finding confidence weighting. When multiple independent models analyse the same code, they may reach different conclusions on borderline cases. How those disagreements are resolved affects the final score — and if the resolution is inconsistent, variance returns through a different door.&lt;/p&gt;

&lt;p&gt;IntentGuard uses a consensus pipeline across up to four independent AI models per finding. For the scoring model to be deterministic, the consensus logic itself must be deterministic — the same set of model votes must always produce the same confidence-weighted outcome.&lt;/p&gt;

&lt;p&gt;We use CVSS v3.1-derived severity scoring as the foundation. CVSS is an industry standard specifically designed for this purpose: reproducible, quantifiable risk scores that two different analysts, given the same evidence, will calculate the same way. Mapping LLM-generated findings to CVSS-derived scores gives the scoring model a deterministic anchor — the same evidence produces the same deduction, every time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this matters more for some users than others&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For a developer running a quick check, score consistency is a nice-to-have. For the use cases IntentGuard is built for, it is non-negotiable. A VC performing technical due diligence on a portfolio company needs to know that the score they see reflects the actual state of the codebase — not the state it happened to be in on the particular run they triggered. A compliance lead building audit evidence needs findings that are reproducible and defensible. A founder preparing investor materials cannot present a Technical Readiness Score that might have read differently yesterday.&lt;/p&gt;

&lt;p&gt;Deterministic scoring is what separates an analytical instrument from a magic eight ball.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The test that now passes&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The gate we set for ourselves was simple: submit the same repository three times in succession with identical inputs and confirm the score is identical across all three runs.&lt;/p&gt;

&lt;p&gt;That gate is now passing. 368 automated tests, including the determinism checks, are green.&lt;/p&gt;

&lt;p&gt;Building IntentGuard in public from Johannesburg 🇿🇦. If deterministic analysis in multi-model AI pipelines is something you have thought about — whether you agree with the approach or see gaps — I would like to hear it in the comments. &lt;/p&gt;

&lt;p&gt;The concepts discussed are my own, the presentation and formating of this post is enhanced by an AI text editor.&lt;/p&gt;

&lt;p&gt;Olebeng · Founder, IntentGuard · intentguard.dev&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>programming</category>
      <category>testing</category>
    </item>
    <item>
      <title>We read the spec before we read the code. Here is why that changes everything.</title>
      <dc:creator>Olebeng</dc:creator>
      <pubDate>Tue, 24 Mar 2026 07:30:08 +0000</pubDate>
      <link>https://dev.to/intentguard_ole/we-read-the-spec-before-we-read-the-code-here-is-why-that-changes-everything-4n24</link>
      <guid>https://dev.to/intentguard_ole/we-read-the-spec-before-we-read-the-code-here-is-why-that-changes-everything-4n24</guid>
      <description>&lt;p&gt;When a repository is submitted to IntentGuard, the first thing the pipeline does is nothing that any other code analysis tool does.&lt;/p&gt;

&lt;p&gt;It does not read the code.&lt;/p&gt;

&lt;p&gt;It reads what the code was supposed to do.&lt;/p&gt;

&lt;p&gt;That single design decision — reading intent before reading implementation — is the architectural foundation everything else is built on. I want to explain why we made it, what it requires, and what it changes about the findings you get out the other side.&lt;/p&gt;

&lt;p&gt;The question nobody was asking automatically&lt;/p&gt;

&lt;p&gt;Every code analysis tool in existence — static analysers, linters, security scanners, SAST platforms — starts from the same place. It reads the code and asks: what is in here? What patterns are dangerous? What vulnerabilities exist?&lt;/p&gt;

&lt;p&gt;These are useful questions. There are excellent tools answering them.&lt;br&gt;
The question none of them ever asked is: does this code do what it was designed to do?&lt;/p&gt;

&lt;p&gt;Not "is this code clean?" Not "is this code secure?" But: does this implementation reflect the product that was specified, promised to users, committed to investors, and stated in the compliance documents?&lt;/p&gt;

&lt;p&gt;That is a different question. And it turns out, you cannot answer it if you start from the code — because the code itself cannot tell you what it was supposed to be.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pass 1 — Building the intent model&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The first pass of the Intent Agent never receives source code. This is an architectural constraint, not a configuration option.&lt;/p&gt;

&lt;p&gt;It receives the human-stated intent: the product description the user writes at audit time, the README, any specification documents that have been uploaded, and the repository file tree — directory structure and file names only, no content.&lt;/p&gt;

&lt;p&gt;From these inputs, it constructs what we call the Intent Model — a structured representation of what this product was designed to do. What features were claimed. What non-functional properties were promised. What deployment context was assumed. What compliance obligations were stated.&lt;br&gt;
The Intent Model is the baseline. Every finding in an IntentGuard audit is anchored to a claim in the Intent Model — not a pattern in the code, not a rule in a rulebook, but a specific thing the product was supposed to do or be.&lt;/p&gt;

&lt;p&gt;There is an important epistemic reason why Pass 1 never reads the code. If it did, it would build an intent model anchored to what the code does — and would naturally generate claims that match the implementation. That defeats the entire purpose. The intent model must come from human-stated intent, not from what the code actually contains. The gap between those two things is the product.&lt;/p&gt;

&lt;p&gt;When the inputs are rich — a detailed description, a thorough README, uploaded specification documents — the resulting Intent Model is high confidence and highly specific. When the inputs are thin — a two-sentence description and no documentation — the Intent Model is weaker, and the audit report says so explicitly. Garbage in, limited analysis out. We tell users when this is the case rather than pretending otherwise.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pass 2 — Comparing intent against evidence&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Pass 2 receives the Intent Model and does something that is not sending the entire codebase to a language model.&lt;/p&gt;

&lt;p&gt;It retrieves semantically relevant code chunks.&lt;/p&gt;

&lt;p&gt;For each claim in the Intent Model, we embed the claim and retrieve the code most likely to confirm or contradict it — using vector similarity against the embedded code chunks stored at ingestion time. The model never sees the full codebase. It sees the code that is most relevant to each specific intent claim.&lt;/p&gt;

&lt;p&gt;This matters for two reasons. First, it is faster and cheaper than full-codebase analysis. Second, and more importantly, it produces better results — because a model asked to evaluate one specific claim against relevant evidence will outperform a model given thousands of lines of unrelated code and asked to find everything wrong with it.&lt;/p&gt;

&lt;p&gt;For each intent claim, Pass 2 produces one of two finding types: &lt;br&gt;
confirmation or violation.&lt;/p&gt;

&lt;p&gt;A confirmation means the code evidence supports the claim. The feature was implemented as stated. The architectural constraint was respected. The compliance obligation is present in the implementation.&lt;/p&gt;

&lt;p&gt;A violation means the code contradicts the claim. The feature was stated but not implemented. The architectural constraint was declared and silently ignored. The compliance obligation exists in the spec and is absent from the code.&lt;/p&gt;

&lt;p&gt;Both types matter. This is one of the things that makes IntentGuard structurally different from tools that only report problems — 30 to 40 percent of every audit report is confirmations, because knowing what is solid is just as useful as knowing what needs fixing. A codebase where 85 percent of intent claims are confirmed is not a failing codebase. It is a codebase with a known, bounded set of gaps. That is a very different thing to work with.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this changes what findings mean&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most security and code analysis findings are context-free. "Hardcoded credential detected at line 47" is a finding about the code. It is real and it matters.&lt;/p&gt;

&lt;p&gt;An IntentGuard finding is different. It is a finding about the relationship between the code and the intent behind it.&lt;/p&gt;

&lt;p&gt;"This product stated that all user data would be processed in the EU. The database connection string defaults to a US-East endpoint" is not just a configuration finding. It is an intent mismatch — the code contradicts a specific commitment that was made about the product.&lt;/p&gt;

&lt;p&gt;That is a categorically different kind of finding. It has different stakeholders, different urgency, and different remediation logic. A developer finding the first one fixes a config. An exec or investor seeing the second one understands a business risk.&lt;/p&gt;

&lt;p&gt;After Pass 2 completes, the Intent Model is passed to five specialist agents — Architecture, Security, Compliance, AI Governance, and Dependency — each of which independently audits the codebase against that shared baseline. None of them receive each other's outputs. All of them work from the same Intent Model.&lt;/p&gt;

&lt;p&gt;That shared baseline is what makes the findings from different agents comparable, composable, and trustworthy.&lt;/p&gt;

&lt;p&gt;The part that surprised us most&lt;/p&gt;

&lt;p&gt;When we started running audits on AI-generated codebases, we expected to find security issues. We expected to find dependency vulnerabilities. We expected to find compliance gaps.&lt;/p&gt;

&lt;p&gt;What we did not expect was how consistent the intent drift pattern was.&lt;br&gt;
Codebases built with AI coding assistants — Cursor, Copilot, Claude, Gemini — tend to implement features correctly in isolation. Individual functions work. Tests pass. The CI pipeline is green.&lt;/p&gt;

&lt;p&gt;But over iterations, the implementation drifts from the intent. Architectural constraints that were stated in the original design are quietly reversed by an AI assistant that did not have that context. Compliance obligations that were present in the product description are absent from the implementation because they were never included in a prompt. Data flows that were specified as EU-only end up routing through US infrastructure because the assistant made a sensible default choice without knowing the regulatory requirement.&lt;/p&gt;

&lt;p&gt;None of this shows up in a security scan. None of it triggers a linting rule. It only surfaces when you compare the code against the intent — which is exactly what the two-pass pipeline was designed to do.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Building IntentGuard in public from Johannesburg 🇿🇦. If you are thinking about the intent-vs-implementation gap in AI-generated codebases, or have questions about the retrieval architecture, I would like to hear from you in the comments.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The concepts discussed are my own, the presentation and formating of this post is enhanced by an AI text editor.&lt;/p&gt;

&lt;p&gt;Olebeng · Founder, IntentGuard · &lt;a href="https://intentguard.dev/" rel="noopener noreferrer"&gt;intentguard.dev&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>programming</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Hello Dev.to - we are building the world's first automated Intent Audit platform</title>
      <dc:creator>Olebeng</dc:creator>
      <pubDate>Tue, 17 Mar 2026 16:05:02 +0000</pubDate>
      <link>https://dev.to/intentguard_ole/hello-devto-we-are-building-the-worlds-first-automated-intent-audit-platform-1gg2</link>
      <guid>https://dev.to/intentguard_ole/hello-devto-we-are-building-the-worlds-first-automated-intent-audit-platform-1gg2</guid>
      <description>&lt;p&gt;Hi Dev.to&lt;/p&gt;

&lt;p&gt;I am Olebeng, a solo founder based in Johannesburg, South Africa, and this is the first post from the IntentGuard account.&lt;/p&gt;

&lt;p&gt;I want to start by being direct about what we are, what we are not, and why I think the problem we are solving matters to this community specifically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What IntentGuard is&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;IntentGuard is an automated Intent Audit platform.&lt;/p&gt;

&lt;p&gt;That is a category that does not exist yet. We are building it.&lt;/p&gt;

&lt;p&gt;The core question we answer is one that no tool has ever been able to answer automatically:&lt;/p&gt;

&lt;p&gt;Does your code do what it was supposed to do?&lt;/p&gt;

&lt;p&gt;Not "does your code have vulnerabilities?" Not "does your code pass your linting rules?" Those questions already have excellent tools answering them.&lt;/p&gt;

&lt;p&gt;The question nobody has answered automatically is whether your code still reflects the intent behind it — the product description, the architecture decisions, the compliance obligations, the promises made to users.&lt;br&gt;
That gap is what IntentGuard audits.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this matters right now&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you have been building with Cursor, Copilot, Claude, or any AI coding assistant, you already know the speed is extraordinary. You can go from idea to working prototype in hours.&lt;/p&gt;

&lt;p&gt;What you might not know yet - but will find out at the worst possible moment - is that AI-generated code has a specific failure mode that no existing tool catches: intent drift.&lt;/p&gt;

&lt;p&gt;The code works. The tests pass. The CI pipeline is green.&lt;/p&gt;

&lt;p&gt;But the code no longer reflects what the product was designed to do. Data flows that were never supposed to exist. Compliance obligations that were stated in the spec and silently dropped in implementation. Architecture decisions that made sense in week one and were quietly reversed by an AI assistant in week six.&lt;/p&gt;

&lt;p&gt;This is not a criticism of AI coding tools. It is the next problem to solve.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What we have built so far&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;IntentGuard is eight sessions into a ten-session build. Here is where we are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A two-pass Intent Agent that constructs a model of what a product was supposed to do — before reading a single line of code&lt;/li&gt;
&lt;li&gt;Five specialist agents (Architecture, Security, Compliance, AI Governance, Dependency) that each independently audit the codebase against that intent model&lt;/li&gt;
&lt;li&gt;A multi-LLM consensus pipeline — up to 4 independent models per finding, so no single model's hallucination makes it into a report&lt;/li&gt;
&lt;li&gt;Four persona-specific reports from one scan: Executive, Developer, Auditor, Investor&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I am building this in public because I think the architecture decisions we have made - particularly around the intent reconstruction pipeline and the zero-data-retention sandbox - are worth discussing openly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I will be posting here&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Technical articles. How the Intent Agent actually works. How we do deterministic diffing without hallucinated PRs. How we enforce multi-LLM consensus without producing contradictory outputs. Real architecture decisions with real trade-offs.&lt;/p&gt;

&lt;p&gt;No marketing. No "10 reasons you need IntentGuard." If the technical work is not interesting enough to stand on its own, no amount of copy will fix that.&lt;/p&gt;

&lt;p&gt;If you are building with AI coding tools, dealing with vibe-coded codebases, Investing is start-ups or thinking about the intent-vs-implementation gap - I would like to hear from you.&lt;/p&gt;

&lt;p&gt;What is the hardest part of maintaining alignment between what you intended to build and what the code actually does?&lt;/p&gt;

&lt;p&gt;The concepts discussed are my own, the presentation and formating of this post is enhanced by an AI text editor.&lt;/p&gt;

&lt;p&gt;Olebeng&lt;br&gt;
Founder, IntentGuard · intentguard.dev&lt;/p&gt;

</description>
      <category>ai</category>
      <category>showdev</category>
      <category>startup</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
