We audited the same codebase twice. The score went down. The audit got better. Here is why.

#ai #webdev #devops #security

Score Down, Audit Better

On 12 April I ran an Intent Audit on envelope-zero/backend. The score came back at 79 with three Critical findings. No authentication at the API layer. No encryption for financial data. An unprotected Delete Everything endpoint.

On 25 April I re-audited the same codebase with a corrected product description. The score dropped to 71.5. The three Critical findings became two High findings. The confirmation rate went from 57% to 67%. The Technical Readiness Score went from 70 to 76. Architecture maturity went from Level 2 to Level 3.

The code did not change between the two audits. Here is what did.

Why the product description is the primary input

An Intent Audit does not just run static analysis against a codebase. It first builds an intent model. This is a structured representation of what the codebase is supposed to do, derived from the product description you provide alongside the codebase's own README and documentation. The intent model determines what the findings are evaluated against.

The first audit used a minimal product description that described a generic REST API. The second used a precise description that stated the deployment model (self-hosted Go binary, personal infrastructure), the compliance obligations that apply (GDPR Art. 32, OWASP ASVS), the specific audit concerns (authentication, encryption, destructive endpoint access control), and what the codebase deliberately does not handle (multi-tenancy, payment processing, PII beyond financial records). Intent model confidence went from 74% to 82%.

Why the score went down when the findings improved

The three Critical findings in the first audit were severity overclassifications that the corrected description resolved. "No authentication at the API layer" was Critical in the first audit. In the second, the corrected description gave the Intent Agent the context to evaluate whether authentication was expected at this specific layer given the deployment model. The finding was reclassified as a Medium architecture observation.

The Delete Everything endpoint remained. Both audits identified it. Both confirmed it across two independent models. The corrected description did not make it go away. It made it clearer, placing it correctly as a High compliance finding under OWASP ASVS V4.2 rather than as a generic Critical risk.

A Critical finding carries a higher score deduction than a High finding under CVSS-derived scoring. Removing two overclassified Criticals reduced the deduction. That is why the score dropped.

The practical lesson

The time you spend writing a precise product description before running an Intent Audit is the highest-leverage work in the entire process. A description that answers these questions consistently produces better findings:

What is this exactly — a library, a deployed application, an API, a framework? What data does it handle, with specific sensitivity classification? What compliance obligations apply by name? What does it deliberately not implement, and what does it delegate to the application layer or platform? Are there AI components, declared specifically?

A generic description produces findings calibrated to a generic system. A precise description produces findings calibrated to what this specific codebase was actually designed to do. The score is a summary. Getting the description right is the prerequisite for a summary that means something.

IntentGuard is in final pre-launch hardening. Waitlist at intentguard.dev.

DEV Community

We audited the same codebase twice. The score went down. The audit got better. Here is why.

Top comments (0)