Three days ago VibeDoctor launched a scanner for AI-generated apps. They scan by running six tools in parallel: SonarQube, Gitleaks, Trivy, Lighthouse, plus custom checks. They scanned open-lovable, devika, and bolt.new and found hundreds of issues. It's good work.
But there's a whole class of bug their approach can't see. I built Vibe Check to catch that class. Here's what's different.
The gap
Static scanners pattern-match. They look for eval(, os.system(, hardcoded API keys, outdated dependencies. They're good at this, and they catch a lot.
What they can't catch is intent.
Consider this real function from a repo I scanned this week:
def _auth_ok(req: Request) -> bool:
"""Accept multiple common secret formats so GHL/curl both work."""
secret = (WEBHOOK_SECRET or "").strip()
if not secret:
return True # no secret set -> allow all
...
Grep for known vuln patterns: nothing. SonarQube: clean. Gitleaks: no secrets here (that's the point). Trivy: no CVEs. Every static tool I threw at it said OK.
The bug is that when WEBHOOK_SECRET is unset in the environment, the function returns True and the webhook is fully open. In development WEBHOOK_SECRET is often unset. In production, a simple env-var typo becomes an unauthenticated remote action vector.
This is a semantic bug. You only catch it by reading the code and asking "what happens when the inputs are missing?" That's a human pen-tester mindset, not a regex.
How Vibe Check reads code
Vibe Check sends files to Claude via the Anthropic API with a custom prompt focused on six categories: secrets, auth, injection, data exposure, dependencies, config. The prompt has specific guardrails for LLM failure modes:
-
Default-allow patterns. Explicit instructions to flag
if not secret: return Trueas critical. -
Dynamic SQL in column names. Parameterized queries are safe until the column name or
ORDER BYclause comes from an f-string. The prompt flags this explicitly. - Privacy invariant. Claude is told to never echo actual secret values in findings. The scanner itself never persists source code, raw responses, or the secrets it finds.
Output is structured via Anthropic's tool-use API. Every finding has category, severity, file, line, title, description, suggested_fix, and a verbatim code_snippet field that Vibe Check uses to auto-correct line numbers post-hoc (Claude hallucinates line numbers; the snippet search fixes that).
False positives are the hard part
LLMs are noisy. Over a day of self-scanning my own repo, I watched Claude emit gems like:
"render.yaml contains
sync: falsefor secrets — could be misconfigured"
(sync: falseis the correct Render.com setting)"compare_digest is correctly implemented"
(flagged as a finding even though it's literally the fix)"SQL query logging during development could expose sensitive data"
(it's development; that's the whole point ofecho=True)
The solution isn't more prompting. Claude ignores prompt guardrails about a third of the time. The real fix is a hard post-parse filter that drops findings matching known false-positive patterns:
- Infrastructure config files (
render.yaml,.github/workflows/*,Dockerfile,*.tf) at low/medium severity - Template files (
.env.example) for secrets-category findings - Test files (
tests/,conftest.py) entirely - Self-contradictory descriptions ("X is correct, but...")
- Speculative hedging at low/medium ("could be a", "if this were", "potentially allowing")
- Dev-environment-only warnings ("during development", "in non-production")
Critical severity never gets filtered. We want false positives at critical to surface for human review, not be silently dropped.
After this filter, my own repo scans as 100/100 with zero findings. Without it, the score bounced between 79 and 94 across seven runs with completely different findings each time. Filtering is the product.
What I found in the wild
I scanned a 24K-star AI-coding repo (responsible disclosure in flight; this post will be updated with the repo name after Thursday 2026-04-17 EOD UTC). Top findings:
| Finding | File | Why static scanners miss it |
|---|---|---|
| Unauthenticated command execution | app/api/run-command-v2/route.ts |
No auth gate before shelling out. No eval() or child_process.exec() with obviously-tainted input. Just a route handler that trusts any caller. |
| Arbitrary file write via AI-generated content | app/api/apply-ai-code-stream/route.ts |
Writes to disk. Static tools see fs.writeFile and don't flag it. |
| Missing auth on package installation | app/api/install-packages-v2/route.ts |
The action is "install arbitrary npm package". No auth. |
| API key leaked in error responses | app/api/search/route.ts |
Regex scanners look for hardcoded keys in source. This one is echoed back to the client on failure paths. |
Combined, these form a complete "submit code → write it → execute it" chain. A separate scanner ran against this same repo three days ago and flagged hundreds of issues. None of these four. They require reading the route handlers end-to-end and asking what's actually authenticated.
Background
I'm a non-technical founder. I can't write code. I built two production apps using AI coding tools and realized I had no way to know if they were safe. 65% of vibe-coded apps have security vulnerabilities. 35 CVEs were traced to AI-generated code in March 2026 alone.
So I built the tool I needed. Vibe Check uses Claude to understand what code is trying to do, and catches when it silently fails. Built by someone who can't code, for everyone who can't code.
Try it
Vibe Check is free, no signup required for basic scans. Sign in with GitHub for scan history. Your code is never stored.
https://chat-api-19ij.onrender.com
Code is open at evance1227/chat. Feedback welcome, especially on the prompt and the false-positive filter.
— Elise Vance (@shecantcode)
Top comments (0)