ecap0

Posted on Feb 26

We Scanned the Top 20 MCP Servers for Security Vulnerabilities — Here's What We Found

#mcp #ai #security #opensource

TL;DR: We scanned the 20 most popular MCP servers with multiple AI models. 60% had at least one real security finding. Anthropic's official servers (Playwright, Slack, SQLite, Fetch) all scored 99-100/100 — here's what they did right. Two packages have critical vulnerabilities you should know about.

Scan your own package: agentaudit.dev

Why We Did This

The MCP (Model Context Protocol) ecosystem is exploding. Thousands of developers are installing MCP servers daily to connect AI agents to tools, databases, and APIs.

But here's the problem: Most MCP servers have never been security audited.

These servers often have access to:

🔐 Your source code repositories
🗄️ Your databases
📧 Your email and communication tools
☁️ Your cloud infrastructure

One vulnerable MCP server = Game over for your entire AI agent security.

So we decided to scan the top 20 MCP servers ourselves using AgentAudit — an open-source security scanner specifically designed for AI agent packages.

How AgentAudit Works

AgentAudit isn't your typical SAST tool. Here's what makes it different:

1. LLM-Powered Analysis (Not Just Regex)

Traditional scanners use regex patterns and AST analysis. AgentAudit uses LLMs that can understand context, intent, and semantic meaning.

Example: A regex scanner sees exec() and flags it. AgentAudit understands:

Is the input sanitized?
Is there a whitelist?
What's the threat model?

2. 12 Structured Detection Patterns

The scanner checks for AI-agent-specific vulnerabilities:

Prompt injection
Tool poisoning
Capability escalation
Credential exposure
Path traversal
Command injection
MCP protocol abuse
Supply chain attacks
And more...

3. Multi-Model Verification

You can scan the same package with different LLMs. Findings confirmed by multiple models have higher confidence.

4. Community Trust Registry

Results are uploaded to agentaudit.dev, where packages get a Trust Score (0-100). Other users can review, vote, and comment on findings.

5. ASF-IDs (Like CVEs for AI Agents)

Each finding gets an ASF-ID (AgentAudit Security Finding), e.g., ASF-2026-2019 — a standardized identifier for tracking.

The Scan: What We Did

We selected the 20 most popular MCP servers based on:

GitHub stars
Official status (Anthropic, Microsoft, etc.)
Community adoption

Each package was scanned with multiple models:

Model	Reports	Cost/Scan	Performance
Gemini 2.5 Flash	20	~$0.02	Best scanner — found most real issues
Claude Opus 4	20	~$1-2	Balanced — fewer findings, higher precision
GPT-4o	15	~$0.10	Nearly useless — found almost nothing
Claude Haiku 4.5	8	~$0.01	Too conservative — misses real issues

Total: 68 reports across 4 models, ~$37 total cost.

Model Performance (Benchmark on 9 Known-Vulnerable Packages)

Model	Recall	Precision	F1 Score
Gemini 2.5 Flash	85%	83%	84%
Claude Haiku 4.5	82%	81%	82%
Claude Sonnet 4	79%	76%	78%
Claude Sonnet 4.6	78%	76%	77%
GPT-4o	65%	66%	65%

Key finding: GPT-4o is considered a top model but is terrible at security analysis. Gemini 2.5 Flash is the best value.

The Results: Trust Scores for Top 20 MCP Servers

✅ Clean Bill of Health (Trust Score: 99-100)

These packages had NO findings from ANY model:

Package	Publisher	Trust Score
Playwright MCP	Anthropic/Microsoft	100
Stripe Agent Toolkit	Stripe	100
Supabase MCP	Supabase	99
Slack MCP Server	Anthropic	99
Linear MCP Server	Linear	100
Sentry MCP Server	Sentry	100
Cloudflare MCP Server	Cloudflare	100
Firebase MCP	Google	100
MCP Server SQLite	Anthropic	100
MCP Server Fetch	Anthropic	100

10 out of 20 packages passed with flying colors. These are well-built with good security practices.

⚠️ Moderate Risk (Trust Score: 65-94)

Findings exist but are manageable:

Package	Trust Score	Findings
MongoDB MCP Server	94	2 findings (low severity)
MCP Server Qdrant	85	1 active finding (runtime dependency injection)
Git-MCP	80	2 findings (unauthenticated R2 endpoint)
MCP Grafana	80	4 findings (medium severity)
GitHub MCP Server	78	4 findings (unsanitized exec.Command input)
Notion MCP Server	65	5 findings (path traversal in file uploads)

🔴 Needs Attention (Trust Score: 15-50)

These packages have serious issues:

Package	Trust Score	Findings
Terraform MCP Server	50	4 findings (shell injection, insecure TLS, unverified binaries)
Chrome DevTools MCP	33	7 findings (arbitrary file writes, command injection)
MCP Server Kubernetes	15	5 findings (2 CRITICAL)

Critical Findings You Should Know About

🔴 CRITICAL #1: Kubernetes MCP — Arbitrary Command Execution

Package: mcp-server-kubernetes

Trust Score: 15/100

Findings: 5 total (2 CRITICAL)

Vulnerability 1: Arbitrary Command Execution via KUBECONFIG_COMMAND

The server allows setting KUBECONFIG_COMMAND environment variable, which executes arbitrary shell commands:

// Vulnerable pattern found
const command = process.env.KUBECONFIG_COMMAND;
execSync(command); // Arbitrary command execution!

Impact: Anyone who can set this env var can run arbitrary commands on the host system.

Vulnerability 2: Unauthenticated HTTP/SSE Transport

The server listens on 0.0.0.0 without authentication:

// Listening on all interfaces, no auth
const server = createServer(handler);
server.listen(3000, '0.0.0.0');

Impact: Anyone on the network can send kubectl commands to the server.

Recommendation: Do not use in production until fixed.

🔴 CRITICAL #2: Chrome DevTools MCP — File Write + Command Injection

Package: chrome-devtools-mcp

Trust Score: 33/100

Findings: 7 total

Vulnerability 1: Arbitrary File Writes

File write operations don't sanitize paths:

// Unsanitized path from user
await fs.writeFile(userProvidedPath, content);

Impact: Can write files outside intended directory (path traversal).

Vulnerability 2: Command Injection via Chrome Args

Chrome launch arguments allow command injection:

// User-controlled args passed to Chrome
launchChrome(userArgs);

Impact: Arbitrary command execution via crafted Chrome arguments.

Vulnerability 3: Arbitrary Extension Installs

Can install arbitrary browser extensions:

// No validation on extension ID
await installExtension(userProvidedExtensionId);

Impact: Malicious extensions could be installed.

Recommendation: Use with extreme caution. Review all inputs.

🟡 HIGH: Notion MCP — Path Traversal in Uploads

Package: notion-mcp-server

Trust Score: 65/100

Findings: 5 total

Vulnerability: Path Traversal in File Uploads

Local file upload operations don't sanitize paths:

// User-provided path not sanitized
const filePath = path.join(uploadDir, userFilename);
await fs.copyFile(userFile, filePath);

Impact: Can write files outside upload directory using ../../../ patterns.

Fix: Normalize and validate paths before use.

🟡 HIGH: Terraform MCP — Shell Injection

Package: terraform-mcp-server

Trust Score: 50/100

Findings: 4 total

Vulnerability: Shell Injection in Build Arguments

Build arguments passed to shell without sanitization:

// User input passed to shell
execSync(`terraform ${userCommand} ${userArgs}`);

Impact: Arbitrary command execution via crafted arguments.

Additional Issues:

Downloads and executes unverified binaries in CI
Insecure TLS configuration

Recommendation: Use array-based command execution instead of shell strings.

What Anthropic's Servers Do Right

Anthropic's official MCP servers all scored 99-100/100. Here's what they do differently:

Pattern 1: Path Traversal Protection (server-filesystem)

The official filesystem server has six layers of path validation:

export function isPathWithinAllowedDirectories(
  absolutePath: string,
  allowedDirectories: string[]
): boolean {
  // 1. Null byte rejection
  if (absolutePath.includes('\x00')) return false;

  // 2. Normalization
  const normalizedPath = path.resolve(path.normalize(absolutePath));

  // 3. Check containment
  return allowedDirectories.some(dir => {
    const normalizedDir = path.resolve(path.normalize(dir));
    return normalizedPath.startsWith(normalizedDir + path.sep);
  });
}

Plus:

Symlink resolution
Atomic writes with race condition prevention
Proper error handling

Pattern 2: Command Execution via Arrays (NOT Strings)

Anthropic's servers use array-based command execution:

// SECURE (used by Anthropic)
const command = "kubectl";
const args = ["delete", resourceType, name];
execFileSync(command, args);

// INSECURE (NOT found in Anthropic servers)
execSync(`kubectl delete ${resourceType} ${name}`);

One server explicitly validates array types:

if (!Array.isArray(input.command)) {
  throw new McpError(
    ErrorCode.InvalidParams,
    "Command must be an array. String commands not supported for security."
  );
}

Takeaway: These patterns should be copied by all MCP developers.

Success Stories: Security Done Right

octocode-mcp: Fixed All 5 Findings in 48 Hours

When we scanned octocode-mcp, we found 5 security issues. The maintainer's response?

Within 48 hours:

✅ All 5 findings fixed
✅ 64 regression tests added
✅ Public verification report posted

Read the full case study →

This is how you do open source security right. 👏

Sentry: Added AgentAudit Badge to XcodeBuildMCP

Sentry added the AgentAudit security badge to their XcodeBuildMCP repo.

What this means: Users can instantly see the security status before installing.

Why it matters: Major security companies like Sentry are leading by example — transparency builds trust.

View the repo →

IBM: PR Submitted for mcp-context-forge (10k+ stars)

IBM has a pending PR to add the AgentAudit security badge to their mcp-context-forge repo.

Status: PR under review. Once merged, thousands of users will see the security status before installing.

View the PR →

Important Disclaimers

1. LLM-Based Scanning Is NOT Perfect

We manually reviewed all findings and removed false positives. But some may remain. Trust scores are relative, not absolute.

2. Findings Represent a Point in Time

These scans were conducted in February 2026. Maintainers may have already fixed issues. Check the live reports for updates.

3. A Score of 100 Doesn't Guarantee Zero Vulnerabilities

It means no findings were detected by our scanners. Traditional vulnerabilities (buffer overflows, etc.) may still exist.

4. We Responsibly Disclosed Critical Findings

Critical findings were disclosed to maintainers before publication to give them time to fix.

What Should You Do?

For MCP Server Maintainers

1. Scan your package NOW

npx agentaudit scan https://github.com/your-org/your-mcp-server

2. Add the AgentAudit Badge

[![AgentAudit: Safe](https://img.shields.io/badge/AgentAudit-Safe-green)](https://agentaudit.dev/package/your-org/your-mcp-server)

3. Fix High-Risk Findings Before Release

Critical/High findings = block release
Medium findings = document or fix ASAP
Low findings = track in backlog

4. Copy Anthropic's Security Patterns

Path traversal protection (6 layers)
Array-based command execution
Symlink resolution
Atomic writes

For AI Developers

1. Check Before You Install

Look for AgentAudit badges in READMEs. No badge? Scan it yourself:

npx agentaudit scan https://github.com/org/package

2. Use Safe Defaults

These packages scored 99-100:

✅ Playwright MCP (Anthropic)
✅ Stripe Agent Toolkit (Stripe)
✅ Supabase MCP (Supabase)
✅ Slack MCP Server (Anthropic)
✅ Sentry MCP Server (Sentry)

3. Avoid High-Risk Packages

Until fixed, avoid:

❌ MCP Server Kubernetes (Trust: 15)
❌ Chrome DevTools MCP (Trust: 33)
❌ Terraform MCP Server (Trust: 50)

For Security Teams

1. Implement Automated Scanning

Add AgentAudit to your CI/CD pipeline:

# GitHub Action example
- name: Security Scan
  run: npx agentaudit scan . --fail-on high

2. Use the Right Model

Gemini 2.5 Flash for screening (cheap, high recall)
Claude Opus 4 for verification (precise, low FP)
Skip GPT-4o (not reliable for security)

3. Understand the Limitations

Single-model findings may be false positives
Multi-model consensus = high confidence
Context matters (e.g., MD5 for non-crypto is OK)

The Cost Breakdown

Total cost for 68 scans: ~$37

Model	Scans	Cost
Gemini 2.5 Flash	40	~$0.80
Claude Opus 4	20	~$35
GPT-4o	15	~$1.50
Claude Haiku 4.5	8	~$0.10

You can scan your package for ~$0.02 with Gemini. That's less than a cup of coffee for peace of mind.

What's Next?

We're continuing to scan more MCP servers and AI agent packages. Our goal:

✅ 100+ MCP servers scanned by Q2 2026
✅ Public reports for every package
✅ Badge program for security-transparent projects
✅ CI/CD integration for automated pre-release audits

Want to scan your package? Visit agentaudit.dev and enter your GitHub repo URL.

Resources

AgentAudit Website — Scan your package
CLI on npm — npx agentaudit scan
CLI GitHub — Source code
Skill (IDE integration) — Auto-check before install
GitHub Action — CI/CD integration
Live Reports — Browse all scans

Questions? Drop them in the comments! 👇

Scan your package now: agentaudit.dev

Top comments (1)

Harjot Singh • Jun 1

great insights on the security vulnerabilities in MCP servers. it's crucial for developers to prioritize security as the ecosystem grows. moonshift lets you deploy a full next.js + postgres + auth app in about 7 minutes, and you keep the code on your github. if you're interested, I can offer a free run to check it out.