DEV Community

Cover image for We Scanned the Top 20 MCP Servers for Security Vulnerabilities — Here's What We Found
ecap0
ecap0

Posted on

We Scanned the Top 20 MCP Servers for Security Vulnerabilities — Here's What We Found

TL;DR: We scanned the 20 most popular MCP servers with multiple AI models. 60% had at least one real security finding. Anthropic's official servers (Playwright, Slack, SQLite, Fetch) all scored 99-100/100 — here's what they did right. Two packages have critical vulnerabilities you should know about.

Scan your own package: agentaudit.dev


Why We Did This

The MCP (Model Context Protocol) ecosystem is exploding. Thousands of developers are installing MCP servers daily to connect AI agents to tools, databases, and APIs.

But here's the problem: Most MCP servers have never been security audited.

These servers often have access to:

  • 🔐 Your source code repositories
  • 🗄️ Your databases
  • 📧 Your email and communication tools
  • ☁️ Your cloud infrastructure

One vulnerable MCP server = Game over for your entire AI agent security.

So we decided to scan the top 20 MCP servers ourselves using AgentAudit — an open-source security scanner specifically designed for AI agent packages.


How AgentAudit Works

AgentAudit isn't your typical SAST tool. Here's what makes it different:

1. LLM-Powered Analysis (Not Just Regex)

Traditional scanners use regex patterns and AST analysis. AgentAudit uses LLMs that can understand context, intent, and semantic meaning.

Example: A regex scanner sees exec() and flags it. AgentAudit understands:

  • Is the input sanitized?
  • Is there a whitelist?
  • What's the threat model?

2. 12 Structured Detection Patterns

The scanner checks for AI-agent-specific vulnerabilities:

  • Prompt injection
  • Tool poisoning
  • Capability escalation
  • Credential exposure
  • Path traversal
  • Command injection
  • MCP protocol abuse
  • Supply chain attacks
  • And more...

3. Multi-Model Verification

You can scan the same package with different LLMs. Findings confirmed by multiple models have higher confidence.

4. Community Trust Registry

Results are uploaded to agentaudit.dev, where packages get a Trust Score (0-100). Other users can review, vote, and comment on findings.

5. ASF-IDs (Like CVEs for AI Agents)

Each finding gets an ASF-ID (AgentAudit Security Finding), e.g., ASF-2026-2019 — a standardized identifier for tracking.


The Scan: What We Did

We selected the 20 most popular MCP servers based on:

  • GitHub stars
  • Official status (Anthropic, Microsoft, etc.)
  • Community adoption

Each package was scanned with multiple models:

Model Reports Cost/Scan Performance
Gemini 2.5 Flash 20 ~$0.02 Best scanner — found most real issues
Claude Opus 4 20 ~$1-2 Balanced — fewer findings, higher precision
GPT-4o 15 ~$0.10 Nearly useless — found almost nothing
Claude Haiku 4.5 8 ~$0.01 Too conservative — misses real issues

Total: 68 reports across 4 models, ~$37 total cost.

Model Performance (Benchmark on 9 Known-Vulnerable Packages)

Model Recall Precision F1 Score
Gemini 2.5 Flash 85% 83% 84%
Claude Haiku 4.5 82% 81% 82%
Claude Sonnet 4 79% 76% 78%
Claude Sonnet 4.6 78% 76% 77%
GPT-4o 65% 66% 65%

Key finding: GPT-4o is considered a top model but is terrible at security analysis. Gemini 2.5 Flash is the best value.


The Results: Trust Scores for Top 20 MCP Servers

✅ Clean Bill of Health (Trust Score: 99-100)

These packages had NO findings from ANY model:

Package Publisher Trust Score
Playwright MCP Anthropic/Microsoft 100
Stripe Agent Toolkit Stripe 100
Supabase MCP Supabase 99
Slack MCP Server Anthropic 99
Linear MCP Server Linear 100
Sentry MCP Server Sentry 100
Cloudflare MCP Server Cloudflare 100
Firebase MCP Google 100
MCP Server SQLite Anthropic 100
MCP Server Fetch Anthropic 100

10 out of 20 packages passed with flying colors. These are well-built with good security practices.


⚠️ Moderate Risk (Trust Score: 65-94)

Findings exist but are manageable:

Package Trust Score Findings
MongoDB MCP Server 94 2 findings (low severity)
MCP Server Qdrant 85 1 active finding (runtime dependency injection)
Git-MCP 80 2 findings (unauthenticated R2 endpoint)
MCP Grafana 80 4 findings (medium severity)
GitHub MCP Server 78 4 findings (unsanitized exec.Command input)
Notion MCP Server 65 5 findings (path traversal in file uploads)

🔴 Needs Attention (Trust Score: 15-50)

These packages have serious issues:

Package Trust Score Findings
Terraform MCP Server 50 4 findings (shell injection, insecure TLS, unverified binaries)
Chrome DevTools MCP 33 7 findings (arbitrary file writes, command injection)
MCP Server Kubernetes 15 5 findings (2 CRITICAL)

Critical Findings You Should Know About

🔴 CRITICAL #1: Kubernetes MCP — Arbitrary Command Execution

Package: mcp-server-kubernetes

Trust Score: 15/100

Findings: 5 total (2 CRITICAL)

Vulnerability 1: Arbitrary Command Execution via KUBECONFIG_COMMAND

The server allows setting KUBECONFIG_COMMAND environment variable, which executes arbitrary shell commands:

// Vulnerable pattern found
const command = process.env.KUBECONFIG_COMMAND;
execSync(command); // Arbitrary command execution!
Enter fullscreen mode Exit fullscreen mode

Impact: Anyone who can set this env var can run arbitrary commands on the host system.

Vulnerability 2: Unauthenticated HTTP/SSE Transport

The server listens on 0.0.0.0 without authentication:

// Listening on all interfaces, no auth
const server = createServer(handler);
server.listen(3000, '0.0.0.0');
Enter fullscreen mode Exit fullscreen mode

Impact: Anyone on the network can send kubectl commands to the server.

Recommendation: Do not use in production until fixed.


🔴 CRITICAL #2: Chrome DevTools MCP — File Write + Command Injection

Package: chrome-devtools-mcp

Trust Score: 33/100

Findings: 7 total

Vulnerability 1: Arbitrary File Writes

File write operations don't sanitize paths:

// Unsanitized path from user
await fs.writeFile(userProvidedPath, content);
Enter fullscreen mode Exit fullscreen mode

Impact: Can write files outside intended directory (path traversal).

Vulnerability 2: Command Injection via Chrome Args

Chrome launch arguments allow command injection:

// User-controlled args passed to Chrome
launchChrome(userArgs);
Enter fullscreen mode Exit fullscreen mode

Impact: Arbitrary command execution via crafted Chrome arguments.

Vulnerability 3: Arbitrary Extension Installs

Can install arbitrary browser extensions:

// No validation on extension ID
await installExtension(userProvidedExtensionId);
Enter fullscreen mode Exit fullscreen mode

Impact: Malicious extensions could be installed.

Recommendation: Use with extreme caution. Review all inputs.


🟡 HIGH: Notion MCP — Path Traversal in Uploads

Package: notion-mcp-server

Trust Score: 65/100

Findings: 5 total

Vulnerability: Path Traversal in File Uploads

Local file upload operations don't sanitize paths:

// User-provided path not sanitized
const filePath = path.join(uploadDir, userFilename);
await fs.copyFile(userFile, filePath);
Enter fullscreen mode Exit fullscreen mode

Impact: Can write files outside upload directory using ../../../ patterns.

Fix: Normalize and validate paths before use.


🟡 HIGH: Terraform MCP — Shell Injection

Package: terraform-mcp-server

Trust Score: 50/100

Findings: 4 total

Vulnerability: Shell Injection in Build Arguments

Build arguments passed to shell without sanitization:

// User input passed to shell
execSync(`terraform ${userCommand} ${userArgs}`);
Enter fullscreen mode Exit fullscreen mode

Impact: Arbitrary command execution via crafted arguments.

Additional Issues:

  • Downloads and executes unverified binaries in CI
  • Insecure TLS configuration

Recommendation: Use array-based command execution instead of shell strings.


What Anthropic's Servers Do Right

Anthropic's official MCP servers all scored 99-100/100. Here's what they do differently:

Pattern 1: Path Traversal Protection (server-filesystem)

The official filesystem server has six layers of path validation:

export function isPathWithinAllowedDirectories(
  absolutePath: string,
  allowedDirectories: string[]
): boolean {
  // 1. Null byte rejection
  if (absolutePath.includes('\x00')) return false;

  // 2. Normalization
  const normalizedPath = path.resolve(path.normalize(absolutePath));

  // 3. Check containment
  return allowedDirectories.some(dir => {
    const normalizedDir = path.resolve(path.normalize(dir));
    return normalizedPath.startsWith(normalizedDir + path.sep);
  });
}
Enter fullscreen mode Exit fullscreen mode

Plus:

  • Symlink resolution
  • Atomic writes with race condition prevention
  • Proper error handling

Pattern 2: Command Execution via Arrays (NOT Strings)

Anthropic's servers use array-based command execution:

// SECURE (used by Anthropic)
const command = "kubectl";
const args = ["delete", resourceType, name];
execFileSync(command, args);

// INSECURE (NOT found in Anthropic servers)
execSync(`kubectl delete ${resourceType} ${name}`);
Enter fullscreen mode Exit fullscreen mode

One server explicitly validates array types:

if (!Array.isArray(input.command)) {
  throw new McpError(
    ErrorCode.InvalidParams,
    "Command must be an array. String commands not supported for security."
  );
}
Enter fullscreen mode Exit fullscreen mode

Takeaway: These patterns should be copied by all MCP developers.


Success Stories: Security Done Right

octocode-mcp: Fixed All 5 Findings in 48 Hours

When we scanned octocode-mcp, we found 5 security issues. The maintainer's response?

Within 48 hours:

  • ✅ All 5 findings fixed
  • ✅ 64 regression tests added
  • ✅ Public verification report posted

Read the full case study →

This is how you do open source security right. 👏


Sentry: Added AgentAudit Badge to XcodeBuildMCP

Sentry added the AgentAudit security badge to their XcodeBuildMCP repo.

What this means: Users can instantly see the security status before installing.

Why it matters: Major security companies like Sentry are leading by example — transparency builds trust.

View the repo →


IBM: PR Submitted for mcp-context-forge (10k+ stars)

IBM has a pending PR to add the AgentAudit security badge to their mcp-context-forge repo.

Status: PR under review. Once merged, thousands of users will see the security status before installing.

View the PR →


Important Disclaimers

1. LLM-Based Scanning Is NOT Perfect

We manually reviewed all findings and removed false positives. But some may remain. Trust scores are relative, not absolute.

2. Findings Represent a Point in Time

These scans were conducted in February 2026. Maintainers may have already fixed issues. Check the live reports for updates.

3. A Score of 100 Doesn't Guarantee Zero Vulnerabilities

It means no findings were detected by our scanners. Traditional vulnerabilities (buffer overflows, etc.) may still exist.

4. We Responsibly Disclosed Critical Findings

Critical findings were disclosed to maintainers before publication to give them time to fix.


What Should You Do?

For MCP Server Maintainers

1. Scan your package NOW

npx agentaudit scan https://github.com/your-org/your-mcp-server
Enter fullscreen mode Exit fullscreen mode

2. Add the AgentAudit Badge

[![AgentAudit: Safe](https://img.shields.io/badge/AgentAudit-Safe-green)](https://agentaudit.dev/package/your-org/your-mcp-server)
Enter fullscreen mode Exit fullscreen mode

3. Fix High-Risk Findings Before Release

  • Critical/High findings = block release
  • Medium findings = document or fix ASAP
  • Low findings = track in backlog

4. Copy Anthropic's Security Patterns

  • Path traversal protection (6 layers)
  • Array-based command execution
  • Symlink resolution
  • Atomic writes

For AI Developers

1. Check Before You Install

Look for AgentAudit badges in READMEs. No badge? Scan it yourself:

npx agentaudit scan https://github.com/org/package
Enter fullscreen mode Exit fullscreen mode

2. Use Safe Defaults

These packages scored 99-100:

  • ✅ Playwright MCP (Anthropic)
  • ✅ Stripe Agent Toolkit (Stripe)
  • ✅ Supabase MCP (Supabase)
  • ✅ Slack MCP Server (Anthropic)
  • ✅ Sentry MCP Server (Sentry)

3. Avoid High-Risk Packages

Until fixed, avoid:

  • ❌ MCP Server Kubernetes (Trust: 15)
  • ❌ Chrome DevTools MCP (Trust: 33)
  • ❌ Terraform MCP Server (Trust: 50)

For Security Teams

1. Implement Automated Scanning

Add AgentAudit to your CI/CD pipeline:

# GitHub Action example
- name: Security Scan
  run: npx agentaudit scan . --fail-on high
Enter fullscreen mode Exit fullscreen mode

2. Use the Right Model

  • Gemini 2.5 Flash for screening (cheap, high recall)
  • Claude Opus 4 for verification (precise, low FP)
  • Skip GPT-4o (not reliable for security)

3. Understand the Limitations

  • Single-model findings may be false positives
  • Multi-model consensus = high confidence
  • Context matters (e.g., MD5 for non-crypto is OK)

The Cost Breakdown

Total cost for 68 scans: ~$37

Model Scans Cost
Gemini 2.5 Flash 40 ~$0.80
Claude Opus 4 20 ~$35
GPT-4o 15 ~$1.50
Claude Haiku 4.5 8 ~$0.10

You can scan your package for ~$0.02 with Gemini. That's less than a cup of coffee for peace of mind.


What's Next?

We're continuing to scan more MCP servers and AI agent packages. Our goal:

  • 100+ MCP servers scanned by Q2 2026
  • Public reports for every package
  • Badge program for security-transparent projects
  • CI/CD integration for automated pre-release audits

Want to scan your package? Visit agentaudit.dev and enter your GitHub repo URL.


Resources


Questions? Drop them in the comments! 👇

Scan your package now: agentaudit.dev

Top comments (0)