Gabriel Anhaia

Posted on Apr 5

69 Vulnerabilities in 15 Apps: The Vibe Coding Security Reckoning Is Real

#ai #webdev #programming #security

My project: Hermes IDE | GitHub
Me: gabrielanhaia

A pen testing firm audited 15 applications built primarily through vibe coding. They found 69 vulnerabilities. Six were critical. Not "theoretically possible." Not "in a lab environment." Critical as in: an attacker can read the database, hijack sessions, or escalate to root.

On shipped apps. Handling real user data.

That report would be bad enough on its own. But it landed alongside code quality metrics showing 41% churn rates, a 4x spike in duplication, and Apple straight-up rejecting vibe-coded apps from the App Store. The pattern here isn't subtle.

What the Vulnerability Report Actually Found

The 69 vulnerabilities weren't novel attack vectors. They were boring. Textbook OWASP Top 10 stuff that any mid-level developer would catch in code review: SQL injection, broken authentication, hardcoded secrets, missing input validation, insecure direct object references.

That's the damning part. These aren't hard problems. They're solved problems. Every security linter on the market flags them.

Here's what a typical vibe-coded Express.js login endpoint looks like in the wild:

// Real pattern found in vibe-coded apps (DO NOT use this)
app.post('/login', async (req, res) => {
  const { email, password } = req.body;
  // No input validation. No rate limiting. No CSRF token.
  const user = await db.query(
    `SELECT * FROM users WHERE email = '${email}' AND password = '${password}'`
  );
  if (user.rows.length > 0) {
    // Password stored in plaintext. JWT secret hardcoded.
    const token = jwt.sign({ id: user.rows[0].id }, 'my-secret-key-123');
    res.json({ token });
  } else {
    res.status(401).json({ error: 'Invalid credentials' });
  }
});

Count the problems. String interpolation in SQL (injection). Plaintext password comparison (no hashing). Hardcoded JWT secret. No rate limiting. No input sanitization. No CSRF protection. That's six security failures in 15 lines, and every single one of them showed up across the audited apps.

What that endpoint should look like

import rateLimit from 'express-rate-limit';
import bcrypt from 'bcrypt';
import { doubleCsrf } from 'csrf-csrf';

const loginLimiter = rateLimit({
  windowMs: 15 * 60 * 1000,
  max: 10,
  message: 'Too many login attempts. Try again later.'
});

const { doubleCsrfProtection } = doubleCsrf({
  getSecret: () => process.env.CSRF_SECRET,
});

app.post('/login', loginLimiter, doubleCsrfProtection, async (req, res) => {
  const { email, password } = req.body;

  if (!email || typeof email !== 'string' || !password) {
    return res.status(400).json({ error: 'Invalid input' });
  }

  const result = await db.query(
    'SELECT id, password_hash FROM users WHERE email = $1',
    [email]
  );

  if (result.rows.length === 0) {
    // Constant-time comparison to prevent timing attacks
    await bcrypt.compare(password, '$2b$12$invalidhashplaceholder');
    return res.status(401).json({ error: 'Invalid credentials' });
  }

  const user = result.rows[0];
  const valid = await bcrypt.compare(password, user.password_hash);

  if (!valid) {
    return res.status(401).json({ error: 'Invalid credentials' });
  }

  const token = jwt.sign(
    { id: user.id },
    process.env.JWT_SECRET,
    { expiresIn: '1h' }
  );
  res.json({ token });
});

The gap between those two snippets is the gap between "it works" and "it won't get you sued." Vibe coding tools consistently produce the first version. The person prompting them doesn't know enough to demand the second.

The Metrics Tell a Structural Story

Security holes grab headlines. The code quality numbers tell a quieter, arguably worse story.

41% code churn. That's the percentage of code rewritten shortly after being written. In a healthy codebase, churn sits around 15-20%. At 41%, the codebase is thrashing. Code gets generated, breaks, gets regenerated, breaks differently, gets regenerated again. It's not iteration. It's flailing.

4x increase in duplication. When a developer needs a payment form on two pages, the instinct is to extract a shared component. An AI prompted with "add a payment form to settings" doesn't check whether a payment form already exists. It generates a new one. Across an entire codebase, this turns into hundreds of near-identical code blocks that all need to be updated independently when requirements change.

Refactoring collapsed from 25% to under 10%. This one's the slow killer. In 2021, about a quarter of changed lines in a typical codebase were refactoring (restructuring without changing behavior). That discipline is what keeps codebases livable over time. Vibe coding tools don't refactor. They bolt on. New features get stacked onto existing code with no structural consideration, and the architectural debt compounds silently.

Worth noting: These metrics come from codebases where AI generated the majority of code with minimal human review. Teams using AI as an assistant while maintaining code review practices show significantly healthier numbers.

Together, these numbers describe software that's being accumulated, not engineered. There's a difference, and it's the difference between a house and a pile of lumber shaped like a house.

Apple Said No

App Store rejections made this concrete in a way that statistics couldn't.

Apple blocked updates for apps built with vibe coding platforms (specifically flagging Replit and Vibecode) under Guideline 2.5.2. The guideline requires apps to be "self-contained" and bars apps whose core functionality depends on external code generation services that bypass Apple's review process.

Think about what that implies. Apple's review process assumes a developer who understands their own codebase. Someone who can explain what the app does and why. When the "developer" typed a description into a chat window and got back a compiled binary, that assumption evaporates.

There's a liability angle too. When an app leaks data or bricks a device, Apple wants someone who can actually diagnose and fix the problem. A vibe coder who can't read their own source code can't debug it either. Their only recourse is to re-prompt and hope the next version is better. That's not a fix. That's a coin flip.

The rejections hit real people with real apps and real users. Updates blocked. Bug fixes stuck. Revenue frozen. The vibe coding subreddits lit up with people (most of them non-developers) who suddenly realized they'd built a business on foundations they couldn't inspect.

Open Source Is Bleeding Out

An academic paper published this year makes an argument that's gotten less press but might matter more long-term: vibe coding is draining the open source ecosystem.

The mechanism isn't complicated. Open source runs on a cycle: developers use a library, hit a bug, read the source, understand the architecture, submit a fix. That cycle depends on people actually engaging with existing code.

Vibe coding breaks every step. When something doesn't work, the vibe coder doesn't open a GitHub issue. They re-prompt. The AI might swap the library entirely. It might reimplement the functionality from scratch. It might copy code from the library's source without attribution or license compliance.

The paper tracked contribution patterns and found issue reports from AI-heavy projects dropped sharply, while first-time PR contributions cratered even harder. The pipeline of new open source maintainers is drying up.

Here's the irony that should keep people up at night: vibe coding tools are trained on open source code. They consume the ecosystem's output while starving its input. Every library that goes unmaintained because nobody's contributing anymore is a dependency that eventually becomes a vulnerability. Check your node_modules folder. Count the packages maintained by one person. Now imagine that person quits because nobody files useful bug reports anymore.

# Want to see how many of your dependencies are maintained by solo devs?
# Check maintainer counts for your top-level dependencies
npm ls --depth=0 --json 2>/dev/null | \
  jq -r '.dependencies | keys[]' | \
  while read pkg; do
    maintainers=$(npm view "$pkg" maintainers --json 2>/dev/null | jq 'length')
    echo "$pkg: $maintainers maintainer(s)"
  done

The open source supply chain is already fragile. Vibe coding is applying steady, distributed pressure to its weakest points.

The Regeneration Death Spiral

There's a feedback loop here that makes things worse over time, and it's worth spelling out.

Vibe-coded app breaks. The vibe coder can't manually debug it because they don't understand the code. So they re-prompt. The AI generates new code that might fix the immediate symptom but introduces new complexity, new duplication, new blind spots. The codebase gets harder to reason about. The next bug is harder to fix. The next regeneration is more destructive.

Traditional development accumulates technical debt, the known cost of shortcuts. Vibe coding accumulates something different. Call it technical confusion. Nobody understands the codebase. Not the AI (it has no persistent memory of architectural decisions). Not the developer (they can't read code). Not a future maintainer (good luck onboarding onto a codebase that was never designed, only precipitated).

The 41% churn metric is this loop, measured. Every regeneration cycle makes the next one more likely and more damaging.

"It'll Get Better" and Other Comforting Fictions

The standard defense goes like this: "It's a tool. Users should review the output."

Technically correct. Practically absurd.

The entire selling point of vibe coding is that non-developers can build software. Telling them to audit the generated code for SQL injection and CSRF vulnerabilities is like selling someone a kit airplane and telling them to inspect the welds. The product's marketing directly undermines its safety requirements.

Then there's the "models will improve" argument. Maybe they will. But improving code generation quality isn't the same as improving code generation safety. The incentive structure for vibe coding platforms rewards speed and demo-ability. "Ship in 5 minutes" sells subscriptions. "Ship safely in 5 days after a security review" does not.

Even with perfect code generation (which is nowhere close), there's a deeper problem. Good software requires understanding constraints, failure modes, edge cases, and tradeoffs. Those things are hard to capture in a natural language prompt. They come from experience. From debugging production incidents. From reading stack traces at 3 AM. Vibe coding's bet is that none of that knowledge matters. The accumulating data says otherwise.

A more honest framing: AI-assisted coding (Copilot, Cursor, Claude Code) where an experienced developer drives and reviews is productive in ways that show up in the data. AI-replaced coding where the AI drives and nobody reviews is dangerous in ways that also show up in the data. The industry keeps conflating these two things. They aren't the same.

What Would Actually Help

If the industry is serious about not turning the next decade of software into a security landfill, a few things need to happen.

Mandatory security scanning on vibe coding platforms. Not optional. Not paid tier. Run every generated app through SAST and DAST before deployment. If the platform makes it trivially easy to create software, it should make it equally hard to deploy vulnerable software.

# Minimum viable security scanning for any project
# These should be non-negotiable before deployment

# Static analysis for known vulnerability patterns
npx semgrep --config=auto ./src

# Dependency vulnerability check
npm audit --production

# Secret detection in codebase
npx trufflehog filesystem ./src --no-update

# For Python projects
pip-audit
bandit -r ./src

Hard boundaries between prototype and production. Vibe coding is great for prototypes, hackathons, internal tools, and throwaway experiments. The danger starts when those prototypes get domain names and payment forms. Platforms should enforce this boundary: maybe a mandatory security gate before enabling custom domains or user authentication.

Honest marketing. Stop showing someone typing "build me an Uber clone" and celebrating when it renders a map. Show what happens six months later when there's a data breach and the founder can't explain their own codebase to a regulator. The current marketing for vibe coding platforms borders on negligent.

Education that includes AI failure modes. If coding programs teach prompt engineering, they should also teach what AI gets wrong, how to audit generated code, and when to throw AI output away and write it by hand. AI literacy isn't "how to use ChatGPT." It's knowing when not to trust it.

Where This Lands

The vibe coding wave produced a lot of software fast. Some of it was useful. Some was impressive as demos. And some, per the audits, the metrics, the App Store rejections, and the academic research, was a liability waiting to detonate.

Software engineering became a discipline because building reliable systems is genuinely difficult. The difficulty doesn't evaporate when the code writes itself. It just hides. It hides in the SQL injection on line 47. In the hardcoded secret on line 12. In the duplicated payment logic that handles refunds correctly in one place and incorrectly in another.

The question isn't whether vibe coding survives. Some version of it will. The question is whether the industry builds guardrails before the breach headlines start. Before the class-action lawsuits over AI-generated apps that leaked medical data. Before someone's vibe-coded IoT firmware lets an attacker into a hospital network.

Based on current trajectory, the guardrails are losing the race.

Have you inherited or audited a vibe-coded codebase? What did you find? I'd like to hear war stories, especially from anyone who's had to rewrite one for production. Drop your experience in the comments.

Top comments (6)

Jill Mercer • Apr 6

that works now trap is the biggest hurdle for us vibe coders—i've caught cursor trying to bypass basic auth just to make a button work. it's easy to get caught in the loop of shipping fast and ignoring the debt building up in the background. Austin taught me: just start the thing, but the reckoning you're describing is why i'm spending way more time on manual reviews lately. still figuring it out in cursor, but we can't let the vibe kill the security.

Jonathan Demir • Apr 16

The 69-vulnerabilities-in-15-apps figure tracks with what we see too. The split is roughly 40% config/header issues (easy to catch), 35% secret exposure (catchable with scanning), and 25% logic-level issues like IDOR and broken auth (only catchable with behavioral testing). The last category is where real user harm comes from -- and it's the one most scanners skip entirely.

Denys Nyzhehorodtsev • Apr 5

This hits hard. The scary part ? is not even the complexity of the vulnerabilities, it's how basic they are. I've seen similar patterns even in small front-end tooling: when AI generates code without strong constraints, it tends to ignore consistency, reuse, and safety entirely. It optimizes for "works now", not "survives production".
What worries me most, is the false sense of confidence. Non-devs (and even some devs) assume, that if something compiles and runs, it's "good enough". But as you said, the real gap is between working code and responsible code.
AI is incredibly useful, when you already know. what to look for, but without that, it's basically accelerating bad practices at scale.
Feel like we're heading toward a phase where "AI-generated" will need the same skipticism as "untested" used to.

Muggle AI • Apr 13

The breakdown you're describing — database reads, session hijacking, privilege escalation — these aren't static analysis misses. They're runtime behavior problems. A scanner can flag that a route lacks an auth check in the source, but it can't tell you whether your actual auth flow can be bypassed by manipulating session state mid-journey.

There's a layer missing between static scanning and manual pen testing: automated journey verification against the live app, running as real user personas. Not "does this code look secure" but "can I actually do this thing I shouldn't be able to do." The 69 vulnerabilities figure makes me think this layer needs to become standard practice.