We audited the same codebase twice. The score went down. The audit got better. Here is why.

#security #webdev #devops #opensource

Score Down, Audit Better

On 12 April I ran an Intent Audit on envelope-zero/backend, an open-source Go REST API for personal envelope budgeting. The score came back at 79 with three Critical findings: no authentication at the API layer, no encryption for financial data, and an unprotected Delete Everything endpoint.

On 25 April I re-audited the same codebase with a corrected product description. The score dropped to 71.5. The three Critical findings became two High findings. The confirmation rate went from 57% to 67%. The Technical Readiness Score went from 70 to 76. Architecture maturity went from Level 2 to Level 3.

The code did not change between the two audits. Here is what did, and why it produced a more accurate result.

How an Intent Audit actually works

An Intent Audit operates on two separate inputs simultaneously. The first is the stated intent: what you say the codebase is designed to do, derived from the product description you provide. The second is the implementation evidence: what the code analysis independently surfaces in the sections of the codebase evaluated for each domain.

The audit produces both outputs and measures the distance between them. A finding is not simply "this code has a problem." A finding is "this code does not do what it was stated to do" or "this code has a characteristic that creates risk given its stated purpose."

The product description does not control what the code analysis finds. It establishes the intent baseline against which findings are contextualised. A precise description produces findings calibrated to the actual system. A generic description produces findings calibrated to a generic system that may not match what was actually built.

What changed between the two audits

The first audit used a minimal product description. Without context about the deployment model, the system type, or the specific compliance obligations that apply, the intent model evaluated the codebase as a generic financial API. Three findings were classified as Critical against that generic baseline.

The second description stated precisely what this system is: a self-hosted Go REST API for personal envelope budgeting, deployed on private infrastructure, with specific compliance obligations under GDPR Art. 32 and OWASP ASVS. Intent model confidence went from 74% to 82%.

With a more accurate intent baseline, two of the three Critical findings were reclassified. They were not wrong findings. They were correctly identified characteristics of the codebase that, when evaluated against the actual stated purpose of the system, carried lower severity than a generic Critical classification implied.

The Delete Everything endpoint at internal/controllers/v4/cleanup.go:13–18 remained in both audits. The code analysis identified it independently in both scans. The correct description did not make it go away. It made the other findings more precise, so this one stands out as it should.

The practical lesson

Before submitting a codebase for an Intent Audit, write the product description as the primary input it is. State what the system is and what it is not. State what data it handles, with specific sensitivity classification. State the compliance obligations that apply by name. State what the system deliberately does not implement and what it delegates to other layers. If there are AI components, name them explicitly.

A description that answers those questions establishes an accurate intent baseline. The audit then measures the gap between that baseline and what the code analysis finds in the sections evaluated. That gap is the finding set that is worth acting on.

IntentGuard is in final pre-launch hardening. Waitlist at intentguard.dev.

DEV Community

We audited the same codebase twice. The score went down. The audit got better. Here is why.

Top comments (0)