Vincenzo Rubino

Posted on May 4

161 verified AI package hallucinations across 8.5M indexed — open dataset

#ai #security #supplychain #mcp

161 verified AI package hallucinations across 8.5M indexed — open dataset

TL;DR: DepScope is a free MCP server + REST API that AI coding agents call before installing packages. We index 8.5M+ packages across 19 ecosystems and track 45K+ vulnerabilities in real time. We also publish a verified open corpus of LLM-hallucinated package names — every entry cross-validated daily, CC-BY-NC-SA. Cite us in your research, integrate the MCP server in your agent.

Why this matters

When AI coding agents (Claude, GPT, Cursor, Aider, Copilot, Windsurf) generate code, they sometimes invent package names that don't exist. If a developer runs pip install fastapi-turbo blindly, an attacker who registered the typosquat owns their machine.

This is called slopsquatting, and academic studies put the rate at 3–25% of generated dependencies (JFrog 2024, Lasso Security 2024).

DepScope was built to be the infrastructure layer AI agents query before installing — fast, free, MCP-native, and at a scale that matches the real registry ecosystem.

The numbers

Metric	Value
Packages indexed	8.5M+
Ecosystems covered	19 (npm, PyPI, Cargo, Go, Maven, NuGet, RubyGems, Composer, Pub, Hex, Swift, CocoaPods, CPAN, Hackage, CRAN, Conda, Homebrew, JSR, Julia)
Vulnerabilities tracked	45K+ (OSV mirror, daily refresh)
EPSS-enriched advisories	330,000+
KEV (CISA actively exploited)	1,587 entries synced
Verified hallucination corpus	161 entries
Of which observed in real AI agent traffic	133
Of which from peer-reviewed slopsquat research	28
Update cadence	daily — packages, vulns, severity, hallucinations

How DepScope compares

	DepScope	Snyk	Socket	deps.dev
Packages indexed	8.5M+	~30M	~10M	~5M
Ecosystems	19	12	5	7
Free + no auth	✅	❌ ($25/dev/mo)	❌ enterprise	✅
MCP-native	✅	❌	❌	❌
Hallucination corpus	✅ public	❌	❌	❌
Real-time API	✅	✅	✅	✅

We're not the biggest — we're the most accessible for the AI agent era.

The hallucination corpus — methodology

Every entry passes a multi-stage validation pipeline before it's published:

Live observation — an AI agent calls /api/check and the upstream registry returns 404
Plausibility filter — names that look like URLs, image paths, scanner probes, or scheme-prefixed garbage are dropped at ingest
Cross-validation — multi-caller / multi-day persistence required for the observed source
Daily re-verifier — every flagged entry is re-checked nightly. If the registry now resolves, the flag is reverted and the entry is removed from the public corpus

What you get in /api/benchmark/hallucinations is the result after this pipeline. Most public hallucination datasets don't disclose their filtering — ours does.

The slopsquat economy

LLMs don't invent names randomly. They invent plausible-sounding variants of real packages. The signature suffixes:

-easy   -pro    -turbo   -plus
-simple -fast   -advanced -extended
-ultra  -enhanced -enterprise -optimized

Top entries (verified against live registries with did_you_mean resolution):

Hallucinated name	Hits	Real package
`conda/torch-lightning-easy`	25	`pytorch-lightning`
`pypi/fastapi-turbo`	17	`fastapi`
`cargo/tokio-stream-extras`	17	`tokio-stream`
`npm/typescript-utility-pack-pro`	17	`type-fest`
`pypi/pandas-easy-pivot`	13	`pandas`
`npm/react-hooks-essential`	13	`react` (built-in hooks)
`npm/jwt-token-validator-easy`	1	`jsonwebtoken`
`composer/laravel/auth-pro`	1	`laravel/sanctum`
`pypi/numpy-extensions-plus`	1	`numpy`
`pypi/reqeusts`	1	`requests` (typo)
`npm/lodsh`	1	`lodash` (typo)

If you maintain a package and see your name with a -pro or -turbo suffix on a registry, that's almost always a slopsquat waiting for an LLM-generated pip install to land.

Cross-validation: the multi-agent test

The strongest signal isn't volume — it's multiple agents independently inventing the same fake name:

torch-lightning-easy — invented across 7 different agent fingerprints
fastapi-turbo — 7 different agents
tokio-stream-extras — 5 different agents

When 7 different LLMs converge on the same fake name, that fake name is structurally plausible to neural networks — meaning attackers can predict and pre-register it. This is the real danger.

The dataset

Live JSON: depscope.dev/api/benchmark/hallucinations
License: CC-BY-NC-SA 4.0 (attribution + non-commercial)
Update: daily 05:00 UTC, with last_updated_at field
Cite: Rubino, V. (2026). DepScope hallucinations dataset. depscope.dev
GitHub mirror: github.com/cuttalo/depscope-hallucinations-dataset (daily snapshots + research scripts)

How to integrate DepScope MCP in your agent

Add to Claude Desktop / Cursor / Windsurf config:

{
  "mcpServers": {
    "depscope": {
      "url": "https://mcp.depscope.dev/mcp"
    }
  }
}

Or local stdio:

npx -y depscope-mcp

22 tools exposed: check_package, package_exists, find_alternatives, check_typosquat, check_malicious, scan_project, get_vulnerabilities, +15 more. Free, no auth, no rate limit.

Try it before integrating: paste your package.json at depscope.dev → instant verdict + hallucination check.

For researchers and tool builders

If you build an AI coding tool, integrate DepScope MCP. Every blocked hallucination is one less compromised developer machine.

If you research AI safety, the dataset is yours under CC-BY-NC-SA — please cite us. If an entry looks wrong, open an issue: every false positive caught makes the dataset more useful for the whole community.

If you maintain a package and worry your name could be hallucinated as yourpkg-pro:

Pre-register the variant on the relevant registry (npm/PyPI/Cargo all let you publish + immediately deprecate).
Or watch depscope.dev/benchmark — patterns are shown live.

Built by DepScope. Data: CC-BY-NC-SA 4.0. SDKs: AGPL. Backend: proprietary.

mcp #ai #security #supplychain #slopsquatting #llm #aitools

DEV Community

161 verified AI package hallucinations across 8.5M indexed — open dataset

161 verified AI package hallucinations across 8.5M indexed — open dataset

Why this matters

The numbers

How DepScope compares

The hallucination corpus — methodology

The slopsquat economy

Cross-validation: the multi-agent test

The dataset

How to integrate DepScope MCP in your agent

For researchers and tool builders

mcp #ai #security #supplychain #slopsquatting #llm #aitools

Top comments (0)