161 verified AI package hallucinations across 8.5M indexed — open dataset
TL;DR: DepScope is a free MCP server + REST API that AI coding agents call before installing packages. We index 8.5M+ packages across 19 ecosystems and track 45K+ vulnerabilities in real time. We also publish a verified open corpus of LLM-hallucinated package names — every entry cross-validated daily, CC-BY-NC-SA. Cite us in your research, integrate the MCP server in your agent.
Why this matters
When AI coding agents (Claude, GPT, Cursor, Aider, Copilot, Windsurf) generate code, they sometimes invent package names that don't exist. If a developer runs pip install fastapi-turbo blindly, an attacker who registered the typosquat owns their machine.
This is called slopsquatting, and academic studies put the rate at 3–25% of generated dependencies (JFrog 2024, Lasso Security 2024).
DepScope was built to be the infrastructure layer AI agents query before installing — fast, free, MCP-native, and at a scale that matches the real registry ecosystem.
The numbers
| Metric | Value |
|---|---|
| Packages indexed | 8.5M+ |
| Ecosystems covered | 19 (npm, PyPI, Cargo, Go, Maven, NuGet, RubyGems, Composer, Pub, Hex, Swift, CocoaPods, CPAN, Hackage, CRAN, Conda, Homebrew, JSR, Julia) |
| Vulnerabilities tracked | 45K+ (OSV mirror, daily refresh) |
| EPSS-enriched advisories | 330,000+ |
| KEV (CISA actively exploited) | 1,587 entries synced |
| Verified hallucination corpus | 161 entries |
| Of which observed in real AI agent traffic | 133 |
| Of which from peer-reviewed slopsquat research | 28 |
| Update cadence | daily — packages, vulns, severity, hallucinations |
How DepScope compares
| DepScope | Snyk | Socket | deps.dev | |
|---|---|---|---|---|
| Packages indexed | 8.5M+ | ~30M | ~10M | ~5M |
| Ecosystems | 19 | 12 | 5 | 7 |
| Free + no auth | ✅ | ❌ ($25/dev/mo) | ❌ enterprise | ✅ |
| MCP-native | ✅ | ❌ | ❌ | ❌ |
| Hallucination corpus | ✅ public | ❌ | ❌ | ❌ |
| Real-time API | ✅ | ✅ | ✅ | ✅ |
We're not the biggest — we're the most accessible for the AI agent era.
The hallucination corpus — methodology
Every entry passes a multi-stage validation pipeline before it's published:
-
Live observation — an AI agent calls
/api/checkand the upstream registry returns 404 - Plausibility filter — names that look like URLs, image paths, scanner probes, or scheme-prefixed garbage are dropped at ingest
-
Cross-validation — multi-caller / multi-day persistence required for the
observedsource - Daily re-verifier — every flagged entry is re-checked nightly. If the registry now resolves, the flag is reverted and the entry is removed from the public corpus
What you get in /api/benchmark/hallucinations is the result after this pipeline. Most public hallucination datasets don't disclose their filtering — ours does.
The slopsquat economy
LLMs don't invent names randomly. They invent plausible-sounding variants of real packages. The signature suffixes:
-easy -pro -turbo -plus
-simple -fast -advanced -extended
-ultra -enhanced -enterprise -optimized
Top entries (verified against live registries with did_you_mean resolution):
| Hallucinated name | Hits | Real package |
|---|---|---|
conda/torch-lightning-easy |
25 | pytorch-lightning |
pypi/fastapi-turbo |
17 | fastapi |
cargo/tokio-stream-extras |
17 | tokio-stream |
npm/typescript-utility-pack-pro |
17 | type-fest |
pypi/pandas-easy-pivot |
13 | pandas |
npm/react-hooks-essential |
13 |
react (built-in hooks) |
npm/jwt-token-validator-easy |
1 | jsonwebtoken |
composer/laravel/auth-pro |
1 | laravel/sanctum |
pypi/numpy-extensions-plus |
1 | numpy |
pypi/reqeusts |
1 |
requests (typo) |
npm/lodsh |
1 |
lodash (typo) |
If you maintain a package and see your name with a -pro or -turbo suffix on a registry, that's almost always a slopsquat waiting for an LLM-generated pip install to land.
Cross-validation: the multi-agent test
The strongest signal isn't volume — it's multiple agents independently inventing the same fake name:
-
torch-lightning-easy— invented across 7 different agent fingerprints -
fastapi-turbo— 7 different agents -
tokio-stream-extras— 5 different agents
When 7 different LLMs converge on the same fake name, that fake name is structurally plausible to neural networks — meaning attackers can predict and pre-register it. This is the real danger.
The dataset
- Live JSON: depscope.dev/api/benchmark/hallucinations
- License: CC-BY-NC-SA 4.0 (attribution + non-commercial)
-
Update: daily 05:00 UTC, with
last_updated_atfield -
Cite:
Rubino, V. (2026). DepScope hallucinations dataset. depscope.dev - GitHub mirror: github.com/cuttalo/depscope-hallucinations-dataset (daily snapshots + research scripts)
How to integrate DepScope MCP in your agent
Add to Claude Desktop / Cursor / Windsurf config:
{
"mcpServers": {
"depscope": {
"url": "https://mcp.depscope.dev/mcp"
}
}
}
Or local stdio:
npx -y depscope-mcp
22 tools exposed: check_package, package_exists, find_alternatives, check_typosquat, check_malicious, scan_project, get_vulnerabilities, +15 more. Free, no auth, no rate limit.
Try it before integrating: paste your package.json at depscope.dev → instant verdict + hallucination check.
For researchers and tool builders
If you build an AI coding tool, integrate DepScope MCP. Every blocked hallucination is one less compromised developer machine.
If you research AI safety, the dataset is yours under CC-BY-NC-SA — please cite us. If an entry looks wrong, open an issue: every false positive caught makes the dataset more useful for the whole community.
If you maintain a package and worry your name could be hallucinated as yourpkg-pro:
- Pre-register the variant on the relevant registry (npm/PyPI/Cargo all let you publish + immediately deprecate).
- Or watch depscope.dev/benchmark — patterns are shown live.
Built by DepScope. Data: CC-BY-NC-SA 4.0. SDKs: AGPL. Backend: proprietary.
Top comments (0)