Yuvraj Angad Singh

Posted on Apr 8

I Scanned 31 AI-Built Repos. Each Tool Leaves Behind a Different Mess.

#webdev #ai #javascript #devtools

46% of every issue I found was the same thing: deep nesting. AI models keep stuffing logic into the same function instead of breaking it apart. That pattern showed up in every tool I tested.

I scanned 31 public JS/TS repos with vibecheck, a linter with 34 rules for AI-specific code smells. 10 repos from Cursor, 11 from Lovable, 10 from Bolt.new. Only public repos with real application code, no scaffolds or starters.

This is not a scientific benchmark. But it's a real sample of real repos people shipped. I manually reviewed every error-level finding and noted which patterns were real issues vs template boilerplate. The full repo list and raw scan data are public if you want to verify.

The numbers

Tool	Repos	Issues	Files	Issues/file	Errors
Cursor	10	9,534	1,499	6.36	77
Lovable	11	4,832	1,886	2.56	27
Bolt	10	329	212	1.55	2

14,695 issues across 3,597 files. Cursor had the most issues by far, but also the biggest repos. Bolt looked cleanest by volume but had the smallest codebases.

What broke most often

Three patterns dominated everything else:

Deep nesting: ~46% of all issues. The most AI-shaped pattern in the dataset. Models keep appending branches because it keeps context intact. Humans usually extract functions earlier.
Console.log pollution: hundreds of hits across every tool. No AI coding tool cleans up debug logging.
God functions: the largest was 3,579 lines in a Cursor repo (easy-kanban's AppContent). Lovable's biggest was 1,810 lines. Bolt's was 820 lines.

Here's what nested AI code looks like in practice:

if (newChunkCount === 0 && this.queue.size > 0) {
  if (currentProgress >= 99.9) {
    console.log('[PackageQueue] Progress 100%, no new chunks...');
    this.reset();
    return;
  }

  this.emptyRetryCount++;
  console.log(`[PackageQueue] No new chunks. Retry: ${this.emptyRetryCount}`);

  if (this.emptyRetryCount >= this.maxEmptyRetries) {
    console.error('[PackageQueue] PKG installation failed');
  }
}

Nested control flow plus debug logging left in production. This showed up everywhere.

Each tool had a distinct pattern

Cursor had the highest issue density (6.36/file) and 77 error-level findings. All SQL injection hits came from Cursor repos. It also had the most as any usage (490 in one repo alone). My read: Cursor users were building bigger, more ambitious apps, and the mess scaled with them. This is inference, not proof. Bigger projects have more surface area.

Lovable had innerHTML or dangerouslySetInnerHTML in 100% of repos (11 out of 11). That's a confirmed pattern at scale. It also produced the only eval-class calls:

<code dangerouslySetInnerHTML={{ __html: highlightedCode }} />

const script = new Function(consoleProxyScript);

Some of these are template-level patterns (syntax highlighting, chart CSS) that aren't exploitable in isolation. But they're the kind of code that drifts toward real XSS if someone later feeds user input into the same path. Worth reviewing, not worth panicking about.

Bolt had the lowest issues per file (1.55) but was the only tool that shipped a hardcoded database credential and an error info leak in the same repo:

const pool = new Pool({
  host: '5.75.154.79',
  user: 'postgres',
  password: 'Jk5h...redacted...',
  database: 'postgres',
});

res.status(500).json({ error: error.message });

Bolt looked cleaner by volume but not necessarily safer. Small repo size helped its totals. Batch 2 had zero security findings though, so the batch 1 hardcoded key might be an outlier.

What actually mattered

The tools did not fail in the same way. But the overlap mattered more than the differences.

None of them naturally refactor.
None of them clean up logs.
None of them ask "should this really be one 3,500-line function?"

Scope and review mattered more than tool choice. Small apps were cleaner. Lovable had the single cleanest repo in the entire set (humanise-ai, 0.48 issues/file). Cursor had the dirtiest (easy-kanban, 14.04 issues/file). One of the cleanest Bolt repos was actually a hybrid built with Bolt + Cursor + Cline.

The bad outcome was never "AI touched the repo." The bad outcome was "AI wrote it, nobody looked at it after."

vibecheck flagged patterns, not confirmed vulnerabilities. Some of these findings are real bugs. Some are smells that might never cause a problem. A linter doesn't know intent. It flags what it sees. I doubled the sample from 15 to 31 repos, and the same patterns held.

Try it

If you're shipping AI-generated code, run a check before you push:

npx @yuvrajangadsingh/vibecheck .

34 rules. JS/TS and Python. Also runs as a GitHub Action, VS Code extension, and MCP server for AI coding agents.

DEV Community