DEV Community

Cover image for I Scanned 31 AI-Built Repos. Each Tool Leaves Behind a Different Mess.
Yuvraj Angad Singh
Yuvraj Angad Singh

Posted on

I Scanned 31 AI-Built Repos. Each Tool Leaves Behind a Different Mess.

46% of every issue I found was the same thing: deep nesting. AI models keep stuffing logic into the same function instead of breaking it apart. That pattern showed up in every tool I tested.

I scanned 31 public JS/TS repos with vibecheck, a linter with 34 rules for AI-specific code smells. 10 repos from Cursor, 11 from Lovable, 10 from Bolt.new. Only public repos with real application code, no scaffolds or starters.

This is not a scientific benchmark. But it's a real sample of real repos people shipped. I manually reviewed every error-level finding and noted which patterns were real issues vs template boilerplate. The full repo list and raw scan data are public if you want to verify.

The numbers

Tool Repos Issues Files Issues/file Errors
Cursor 10 9,534 1,499 6.36 77
Lovable 11 4,832 1,886 2.56 27
Bolt 10 329 212 1.55 2

14,695 issues across 3,597 files. Cursor had the most issues by far, but also the biggest repos. Bolt looked cleanest by volume but had the smallest codebases.

What broke most often

Three patterns dominated everything else:

  1. Deep nesting: ~46% of all issues. The most AI-shaped pattern in the dataset. Models keep appending branches because it keeps context intact. Humans usually extract functions earlier.

  2. Console.log pollution: hundreds of hits across every tool. No AI coding tool cleans up debug logging.

  3. God functions: the largest was 3,579 lines in a Cursor repo (easy-kanban's AppContent). Lovable's biggest was 1,810 lines. Bolt's was 820 lines.

Here's what nested AI code looks like in practice:

if (newChunkCount === 0 && this.queue.size > 0) {
  if (currentProgress >= 99.9) {
    console.log('[PackageQueue] Progress 100%, no new chunks...');
    this.reset();
    return;
  }

  this.emptyRetryCount++;
  console.log(`[PackageQueue] No new chunks. Retry: ${this.emptyRetryCount}`);

  if (this.emptyRetryCount >= this.maxEmptyRetries) {
    console.error('[PackageQueue] PKG installation failed');
  }
}
Enter fullscreen mode Exit fullscreen mode

Nested control flow plus debug logging left in production. This showed up everywhere.

Each tool had a distinct pattern

Cursor had the highest issue density (6.36/file) and 77 error-level findings. All SQL injection hits came from Cursor repos. It also had the most as any usage (490 in one repo alone). My read: Cursor users were building bigger, more ambitious apps, and the mess scaled with them. This is inference, not proof. Bigger projects have more surface area.

Lovable had innerHTML or dangerouslySetInnerHTML in 100% of repos (11 out of 11). That's a confirmed pattern at scale. It also produced the only eval-class calls:

<code dangerouslySetInnerHTML={{ __html: highlightedCode }} />
Enter fullscreen mode Exit fullscreen mode
const script = new Function(consoleProxyScript);
Enter fullscreen mode Exit fullscreen mode

Some of these are template-level patterns (syntax highlighting, chart CSS) that aren't exploitable in isolation. But they're the kind of code that drifts toward real XSS if someone later feeds user input into the same path. Worth reviewing, not worth panicking about.

Bolt had the lowest issues per file (1.55) but was the only tool that shipped a hardcoded database credential and an error info leak in the same repo:

const pool = new Pool({
  host: '5.75.154.79',
  user: 'postgres',
  password: 'Jk5h...redacted...',
  database: 'postgres',
});
Enter fullscreen mode Exit fullscreen mode
res.status(500).json({ error: error.message });
Enter fullscreen mode Exit fullscreen mode

Bolt looked cleaner by volume but not necessarily safer. Small repo size helped its totals. Batch 2 had zero security findings though, so the batch 1 hardcoded key might be an outlier.

What actually mattered

The tools did not fail in the same way. But the overlap mattered more than the differences.

  • None of them naturally refactor.
  • None of them clean up logs.
  • None of them ask "should this really be one 3,500-line function?"

Scope and review mattered more than tool choice. Small apps were cleaner. Lovable had the single cleanest repo in the entire set (humanise-ai, 0.48 issues/file). Cursor had the dirtiest (easy-kanban, 14.04 issues/file). One of the cleanest Bolt repos was actually a hybrid built with Bolt + Cursor + Cline.

The bad outcome was never "AI touched the repo." The bad outcome was "AI wrote it, nobody looked at it after."

vibecheck flagged patterns, not confirmed vulnerabilities. Some of these findings are real bugs. Some are smells that might never cause a problem. A linter doesn't know intent. It flags what it sees. I doubled the sample from 15 to 31 repos, and the same patterns held.

Try it

If you're shipping AI-generated code, run a check before you push:

npx @yuvrajangadsingh/vibecheck .
Enter fullscreen mode Exit fullscreen mode

34 rules. JS/TS and Python. Also runs as a GitHub Action, VS Code extension, and MCP server for AI coding agents.

Top comments (0)