I scanned every major vibe coding tool for security. None scored above 90.

#ai #webdev #opensource #showdev

I'm a non-technical founder. I can't write code. I built two production apps entirely with AI.

Last week I scanned my own app for security. It scored 20/100. Found 8 vulnerabilities including a critical auth bypass where missing config silently allows all requests.

So I built Vibe Check — an AI-powered security scanner for vibe-coded apps. Then I pointed it at the tools themselves.

The Scorecard

Repo	Score	Critical Finding
open-lovable (Firecrawl)	0/100	Unauthenticated command execution across 3 API routes
Devika	40/100	LLM responses executed via subprocess.run() without validation
Cloudflare VibeSDK	78/100	API tokens logged in plain text, missing OAuth CSRF protection
Bolt.new	90/100	Command injection via user-controlled shell action content
Cline	96/100	Minor CI script command injection
Cursor	100/100	Clean

The tools millions of people use to build apps have their own security issues. If their code has vulnerabilities, what about yours?

Why Static Scanners Miss These

Three days ago, VibeDoctor launched a scanner and scanned some of these same repos. They use SonarQube, Gitleaks, Trivy, and Lighthouse — six tools running in parallel. They scored Devika 66/100.

I scored it 40/100 and found a CRITICAL command injection they missed entirely.

The difference is approach. Static scanners pattern-match for known vulnerabilities. They look for eval(, hardcoded API keys, outdated dependencies. They're good at this.

What they can't catch is intent.

Consider this real code from Devika:

# LLM generates a list of commands
commands = response.split("\n")
for cmd in commands:
    subprocess.run(cmd.split(), capture_output=True)

No eval(). No obviously dangerous function call. SonarQube sees subprocess.run() with a list argument — that's the safe way to call subprocess. Clean.

But the commands come from an LLM response. Whatever the AI decides to generate gets executed on the server. That's not a syntax bug. That's an intent bug. The code does exactly what it was written to do — it just shouldn't have been written to do that.

The Bug That Started Everything

Here's the function from my own production app that Vibe Check caught:

def _auth_ok(req: Request) -> bool:
    secret = (WEBHOOK_SECRET or "").strip()
    if not secret:
        return True  # no secret set -> allow all
    ...

Every static tool I ran against this said it was clean. No hardcoded secrets. No SQL injection. No XSS. The code is syntactically perfect.

The bug: when WEBHOOK_SECRET isn't set in the environment, the function returns True and every webhook request is authorized. In development, that variable is often unset. In production, one env var typo means your backend is wide open.

This is a semantic bug. You only catch it by reading the code and asking "what happens when the inputs are missing?"

How Vibe Check Works

Vibe Check sends your code to Claude (Anthropic's AI) with a security-focused prompt covering six categories: secrets, auth, injection, data exposure, dependencies, and config.

The prompt has specific instructions for the bugs AI coding tools create:

Default-allow patterns. Flag if not secret: return True as critical.
LLM output as execution input. Flag any path where model output reaches subprocess, eval, or file write without validation.
Missing auth on action endpoints. Flag routes that perform destructive actions without authentication checks.

Output is structured via Anthropic's tool-use API. Every finding has a category, severity, file, line number, description, and suggested fix — all in plain English.

False Positives Are the Real Product

LLMs are noisy security reviewers. During a day of self-scanning, Claude flagged:

"render.yaml contains sync: false for secrets — could be misconfigured"

(sync: false is the correct Render.com setting for secrets.)

"compare_digest is correctly implemented"

(Flagged as a finding even though it's literally the fix.)

The solution isn't more prompting — Claude ignores guardrails about a third of the time. The real fix is a hard post-parse filter that drops findings matching known false-positive patterns:

Infrastructure config files at low/medium severity
Template files for secrets-category findings
Test files entirely
Self-contradictory descriptions ("X is correct, but...")
Speculative hedging ("could be a", "potentially allowing")
Dev-environment-only warnings

Critical severity never gets filtered. We want false positives at critical to surface for human review.

After this filter, Vibe Check's own repo scans as 100/100 with zero findings. Without the filter, it bounced between 79 and 94 with different findings each run.

The Numbers Behind the Problem

This isn't theoretical:

65% of vibe-coded production apps have security issues (Escape.tech, 1,400 apps scanned)
45% of AI-generated code contains OWASP Top 10 vulnerabilities (Veracode, 100+ LLMs tested)
35 CVEs traced to AI-generated code in March 2026 alone (Georgia Tech Vibe Security Radar)
1.5 million API keys leaked from one vibe-coded app within 3 days of launch (Moltbook)
63% of vibe coding users are not developers

That last number is the one that matters most. Most vibe coders can't read the code they're shipping. They can't audit it. They can't tell if the login page can be bypassed.

I'm one of them. I built two production apps without writing a line of code. I had no idea my code was vulnerable until I scanned it.

The Loop

Here's what Vibe Check enables:

Scan — paste a GitHub URL, get a score in 60 seconds
Download — get the findings as a markdown file
Fix — paste the findings into your AI coding tool (Cursor, Claude Code, Lovable, whatever you used to build it)
Re-scan — verify the fixes worked and your score went up

My own app went from 20/100 to working on fixes now. The scanner caught what I couldn't see. The AI coding tool fixed what the scanner found. The re-scan verified the fixes worked.

That's the product. Scan. Fix. Verify.