DEV Community

SpiderRating
SpiderRating

Posted on • Originally published at spiderrating.com

State of MCP Security 2026: We Scanned 15,923 AI Tools. Here's What We Found.

We scanned every publicly available MCP server and OpenClaw skill — 15,923 in total. Here's the complete security landscape of the AI tool ecosystem.

TL;DR: 36% of MCP servers scored F (failing). 42 skills confirmed malicious (0.4%), with 552 initially flagged. Token leakage is the #1 vulnerability, found in 757 servers. Only 2% earned a B grade or higher.

The Dataset

SpiderRating analyzed 15,923 AI tools across two ecosystems:

  • 5,725 MCP servers (Model Context Protocol — the standard for connecting AI agents to external tools)
  • 10,198 OpenClaw/ClawHub skills (agent behavior definitions for Claude, Cursor, Windsurf)

Each tool was rated on three dimensions: Description Quality, Security, and Metadata — combined into a SpiderScore (0-10) and letter grade (A-F).

This is the largest independent security analysis of the MCP/AI tool ecosystem to date.

Key Findings

1. Most AI Tools Are Mediocre — Only 2% Score B or Higher

Grade MCP Servers Skills What It Means
A (9.0+) 0 (0%) 0 (0%) No tool meets "exemplary" standards
B (7.0-8.9) 116 (2%) 95 (1%) Production-ready with good practices
C (5.0-6.9) 1,995 (35%) 9,050 (89%) Adequate but room for improvement
D (3.0-4.9) 1,546 (27%) 1,052 (10%) Significant quality/security gaps
F (<3.0) 2,068 (36%) 1 (0%) Failing — serious issues

Zero tools scored A. MCP servers have a bimodal distribution: either decent (C) or terrible (F).

2. Token Leakage Is the #1 Vulnerability

We found 32,691 security findings across the ecosystem.

Rank Vulnerability Servers Affected Findings
1 Token Leakage 757 (13%) 6,632
2 Command Injection 269 (5%) 1,007
3 SQL Injection 105 (2%) 787
4 Path Traversal 244 (4%) 761
5 Prototype Pollution 145 (3%) 489
6 Hardcoded Credentials 163 (3%) 389
7 Secret Leakage (metadata) 114 (2%) 376
8 Command Injection (os) 112 (2%) 263

Token leakage alone accounts for 20% of all findings. API keys, auth tokens, and secrets are being exposed through MCP tool outputs.

3. 36% of MCP Servers Score F

More than a third of MCP servers are fundamentally unsafe:

  • Average MCP score: 4.11/10
  • Average skill score: 5.91/10

Why MCP servers score worse: Description quality crisis — average 3.13/10. Most servers don't tell AI agents what their tools do.

4. 552 Skills Flagged, 42 Confirmed Malicious

We used a two-pass security analysis:

  1. Automated Threat Scanner — pattern matching for known malicious behaviors
  2. LLM Verification — Claude Haiku reviews each finding to distinguish "security tool describing attacks" from "malicious skill executing attacks"

Results:

  • 552 skills initially flagged with critical security issues
  • 42 confirmed malicious after LLM verification (0.4% of ecosystem)
  • 97% of automated findings were false positives — mostly legitimate security tools whose descriptions triggered keyword-based detection

5. The Description Quality Crisis

97% of tools lack a scenario trigger — they don't tell the AI when to use them.

Signal Coverage
Has action verb ~60%
Has scenario trigger ~3%
Has param documentation ~45%
Has error guidance ~8%

AI agents frequently choose the wrong tool — not because AI is dumb, but because tool documentation is broken.

What This Means for Developers

If you build MCP servers:

  1. Write scenario triggers — tell AI agents when to use each tool
  2. Don't log tokens — use structured error handling that strips secrets
  3. Use parameterized queries — SQL injection is #3
  4. Add a README and license — it's 20% of your score

If you install AI tools:

  1. Check the SpiderScore before installing — below C (5.0) has known issues
  2. Be cautious with skills rated critical — 0.4% are confirmed malicious
  3. Prefer tools with B grade — they've demonstrated security best practices

Methodology

  • Scanner: spidershield (open source, MIT)
  • Data: 15,923 tools, 78,849 tool descriptions, 32,691 security findings
  • Precision: 93.6% calibrated accuracy
  • Scoring: Description (45%) + Security (35%) + Metadata (20%)

Data updated daily. Full methodology available at spiderrating.com.


What's the worst MCP security issue you've encountered? Share in the comments.

Top comments (0)