Who Audits the AI-Generated Code? We Built an AI to Do It

#security #ai #webdev #programming

Vibe coding is everywhere. Cursor, Lovable, Bolt, v0, Replit Agent... the tools keep shipping and the code keeps flowing. But here's the thing nobody wants to talk about: most of that code has never been reviewed by anyone.

Not by a human. Not by a linter. Not by anything.

We found out the hard way. After shipping a dozen products built almost entirely with AI tools, we started noticing patterns. Security headers missing. Lighthouse scores in the 40s. Meta tags that made zero sense. Accessibility? Forget about it.

So we asked ourselves: if AI can write the code, can another AI catch what the first one missed?

The Meta Problem

There's something almost funny about it. You prompt an AI to build your app. It generates thousands of lines of code in minutes. You deploy it because it works in the preview. And then you wonder why Google isn't indexing it, why your Core Web Vitals are red, and why someone on Reddit found an open API endpoint.

The AI that wrote the code optimized for one thing: making it work. It didn't think about your robots.txt. It didn't check if your images have alt text. It didn't verify that your authentication flow actually prevents session hijacking.

This is the gap we set out to close with Audit Vibecoding.

What We Actually Check

The audit runs dozens of automated checks across five categories:

Security - Headers, CSP policies, exposed endpoints, authentication patterns, dependency vulnerabilities. The stuff that keeps you up at night (or should).

SEO - Meta tags, canonical URLs, structured data, sitemap validity, robots.txt, Open Graph tags. Most AI-generated sites ship with generic or completely broken SEO.

Performance - Core Web Vitals, bundle size, image optimization, lazy loading, render-blocking resources. AI loves to import entire libraries when you need one function.

Accessibility - ARIA labels, color contrast, keyboard navigation, screen reader compatibility, heading hierarchy. This is where AI-generated code fails the hardest. Almost every project we've audited scores below 50% on accessibility.

Code Quality - TypeScript strictness, error handling, unused imports, console.logs left in production, hardcoded values. The kind of stuff a senior dev would catch in code review but your AI pair programmer doesn't care about.

Each check produces a pass/fail/warning result with a specific recommendation. You get an overall Audit Score from 0 to 100. The average across all projects we've audited? 31. Production-ready is 80+.

Let that sink in. The average vibecoded project scores 31 out of 100.

The Pipeline

You give us a URL. That's it. No account, no setup, no GitHub integration needed.

Behind the scenes, the audit crawls your deployed site and analyzes what it finds. It's looking at your actual production output, not your source code. This matters because a lot of problems only show up after build and deploy. Your local dev server might work fine while your production site is leaking environment variables.

Within 24 hours you get a full report: every check with its result, a prioritized action plan (what to fix first), and an overall score. The whole thing costs between $4 and $9. Less than a coffee and a sandwich.

Why Not Just Use Lighthouse?

Lighthouse is great. We use it ourselves. But it only covers performance and some accessibility. It doesn't check your security headers. It doesn't validate your structured data. It doesn't tell you that your sitemap returns a 404 or that your canonical URL points to localhost.

We built this because we needed something that covers the full surface area of a shipped product, not just the parts that Chrome DevTools can see.

What We've Learned From Auditing Real Projects

After running audits on projects built with Lovable, Cursor, Bolt, and others, a few patterns keep showing up:

Security is always the worst category. AI-generated code almost never includes proper security headers. Content-Security-Policy? Missing. X-Frame-Options? Missing. Rate limiting? What's that?

SEO is broken in predictable ways. Missing meta descriptions, duplicate title tags across pages, no sitemap, robots.txt blocking everything. The AI generates working pages but doesn't think about discoverability.

Accessibility gets ignored entirely. This is the one that bothers me most. Screen reader support, keyboard navigation, ARIA labels... AI tools just don't prioritize this unless you explicitly ask. And even then, the implementation is often wrong.

Performance problems come from over-engineering. AI loves to add dependencies. A simple landing page ends up importing React, three animation libraries, a state management solution, and a CSS-in-JS framework. The result loads in 8 seconds on mobile.

The Uncomfortable Question

"Quis custodiet ipsos custodes?" Who watches the watchmen?

If AI writes the code and AI audits the code, where does the human fit in? Honestly, the human fits in the same place they always did: making decisions. The audit gives you a prioritized list of what's broken. You decide what to fix, when, and how much it matters for your specific use case.

A personal blog with a security score of 40? Probably fine. A SaaS handling payment data with that same score? Fix it now.

The point isn't to replace human judgment. The point is to give you the information you need to make good decisions, especially when the code was written by something that doesn't understand the context of your business.

Try It

If you've shipped anything built with AI tools, run an audit. Not because I'm selling you something (though yes, auditvibecoding.com is the product). But because you should know what you shipped.

The average score is 31. Yours might be better. It might be worse. Either way, you should know.

I'm Jakub, founder of Inithouse. We build AI-powered products and occasionally write about what we learn along the way. This is one of those learnings: ship fast, but know what you shipped.