ayame0328

Posted on Mar 18

Unreviewed AI Code Is Everywhere — Here's What Breaks First

#codereview #programming #security #ai

A Hacker News post titled "Toward automated verification of unreviewed AI-generated code" hit 70 points and 57 comments today. The discussion confirmed something I've been seeing firsthand: developers are shipping AI-generated code without meaningful review, and the failure modes are predictable.

I've spent the last 3 weeks building a security scanner specifically for AI-generated code. After scanning hundreds of code samples, I can tell you exactly what breaks first — and it's not what most people expect.

The Real Problem Isn't "Bad AI"

The HN thread has the usual debates: "just review the code" vs. "nobody has time for that." Both sides miss the point.

The problem isn't that AI writes bad code. The problem is that AI writes plausible-looking code that passes a quick glance. A human skimming a PR will see clean formatting, reasonable variable names, and familiar patterns. The dangerous stuff hides in the details.

I learned this the hard way. Early on, I tried using an LLM to detect vulnerabilities in AI-generated code. I ran the same scan 5 times and got 5 different severity scores. That's when I realized: you can't fight nondeterminism with more nondeterminism.

The 5 Patterns That Break First

After building 93 detection rules across 14 categories, here's what I keep finding in AI-generated code, ranked by frequency:

1. Hardcoded Secrets (found in ~70% of samples)

AI assistants love generating "working examples" with real-looking API keys, database URLs, and tokens. The developer copies the pattern, replaces some values, and misses others. I've seen AWS keys (AKIA...), Stripe keys, and database connection strings sitting in plain JavaScript files.

Why AI gets this wrong: It optimizes for "code that runs immediately." Environment variables add friction.

2. Empty Catch Blocks (found in ~60% of samples)

try {
  const data = await fetchUserData(id);
  return processData(data);
} catch (e) {
  // handle error
}

That comment is a lie. There's no handling. The function silently returns undefined, and three components downstream crash with unhelpful errors. I spent an entire afternoon debugging a dashboard that showed blank data — traced it back to an empty catch block that swallowed a 401.

3. Missing Input Validation on API Routes

AI-generated Next.js API routes almost never validate input properly. They'll destructure req.body and pass values straight to database queries. No type checking, no sanitization, no length limits.

I found this pattern so consistently that it became one of my highest-confidence detection rules.

4. Overly Permissive CORS

res.setHeader('Access-Control-Allow-Origin', '*');

When AI generates an API endpoint, it wants the code to work. CORS restrictions make development harder, so AI defaults to wide-open access. The developer gets it working in development and ships it.

5. Console.log with Sensitive Data

AI-generated debugging code frequently logs request bodies, user objects, and tokens. These logs end up in production monitoring services, log aggregators, and error tracking tools — all places where sensitive data shouldn't be.

Why Static Analysis Beats LLM for This

The HN article discusses formal verification approaches, which are great in theory but heavy in practice. Here's what actually works at scale:

Pattern matching + AST parsing. That's it. No LLM, no API costs, no variance.

When I was building my scanner, I tried three approaches:

LLM-based analysis — Inconsistent results. Same code, different verdicts. Expensive at scale. I killed this after week 1.
Semgrep/existing tools — Good for human-written code patterns, but they miss AI-specific patterns like phantom package imports and AI-style error handling.
Custom static analysis — Deterministic, fast (under 2 seconds for most files), and tunable. I can encode exactly the patterns I keep seeing in AI output.

The key insight: AI-generated code has recognizable patterns. It's not random — it follows the training distribution. That makes it detectable with rules, not AI.

The Uncomfortable Truth

The 57 comments on that HN thread reveal a split:

Camp A: "We need formal verification for AI code" (correct but impractical for most teams)
Camp B: "Just review the code yourself" (correct but doesn't scale when AI generates 10x more code)
Camp C: "Ship it and fix bugs later" (this is what's actually happening)

Camp C is winning by default. And that means automated scanning isn't optional anymore — it's the minimum viable safety net.

The code doesn't need to be perfect. It needs to be checked. Automatically, consistently, every time.

What I'm Watching

This HN discussion signals a shift. Six months ago, the discourse was "AI code is amazing." Now it's "how do we verify AI code?" That's a healthier conversation.

The tools will catch up. The question is how many silent failures ship in the meantime.

Scan Your Code

I built CodeHeal to catch exactly these patterns — 93 rules across 14 categories, zero LLM, deterministic results every time. Paste your AI-generated code and see what it finds.

Try CodeHeal free →

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.