DEV Community

AgentsID
AgentsID

Posted on

We Scanned 100 MCP Servers. Anthropic's Own Reference Implementations Scored F.

We scanned 100 MCP server packages — including the official reference implementations from Anthropic, Microsoft, and Notion — and published the results.

Every vendor-maintained server that exposed tools scored F.

The Numbers

  • 100 MCP server packages scanned
  • 41 exposed tool definitions (59% were opaque to security review)
  • 485 tools analyzed
  • 893 total findings
  • 71% scored F. Zero scored A.

The Gold Standard Failure

We didn't just scan random community packages. We targeted the servers that developers copy as templates:

Server Maintainer Tools Grade
server-github Anthropic 26 F
server-filesystem Anthropic 14 F
@playwright/mcp Microsoft 22 F
notion-mcp-server Notion 22 F
server-puppeteer Anthropic 7 F
server-memory Anthropic 9 F
server-everything Anthropic 13 F

Anthropic's GitHub MCP server exposes 26 tools — push_files, merge_pull_request, fork_repository — with zero input validation, zero per-tool permissions, and zero scope boundaries. An agent with a GitHub PAT can push to any repo, merge any PR, and fork any project the token can access. No guardrails.

These aren't theoretical risks. The related @modelcontextprotocol/server-git was hit with CVE-2025-68143 (path traversal) and CVE-2025-68144 (argument injection) in early 2026. Our scanner identifies exactly the structural preconditions — unbounded strings, no schema constraints — that made those CVEs inevitable.

Hallucination-Based Vulnerabilities: A New Vulnerability Class

We identified something no one else is scanning for: hallucination-based vulnerabilities (HBVs) — security weaknesses that exist in the semantic space between what a tool description says and what the LLM infers.

163 HBVs across 41 servers. Seven classes:

  1. Vague descriptions — "manages user data" could mean read or delete. The LLM picks whichever fits the prompt.
  2. A*mbiguous tool names* — manage_users gives the model no signal about whether it creates or destroys.
  3. Missing scope boundaries — "access files" without specifying which files.
  4. Short descriptions — 17 characters forces the LLM to hallucinate capabilities.
  5. No description — behavior is entirely inferred from the name.
  6. Implicit authority escalation — dangerous tool described as a "helper utility."
  7. Overlapping descriptions — two tools with 92% description overlap. The LLM picks one non-deterministically.

HBVs are invisible to traditional scanners (SAST, DAST). They can't be fixed by patching code — they require rewriting tool descriptions. And they work even with perfect authentication. OAuth doesn't help when the tool schema allows anything.

The Thesis

The MCP specification is vulnerable by default. It allows — and through its reference implementations, actively encourages — empty schemas, unbounded inputs, and vague tool descriptions. Schema strictness and semantic validation must move from optional best practice to protocol-level mandatory.

Try It Yourself

The scanner is open source:

npx @agentsid/scanner -- npx @modelcontextprotocol/server-filesystem ./
Enter fullscreen mode Exit fullscreen mode

Full paper, methodology, and all 100 scan reports:
https://github.com/stevenkozeniesky02/agentsid-scanner/blob/master/docs/state-of-agent-security-2026.md


Steven Kozeniesky — AgentsID Research (agentsid.dev)

Top comments (0)