TL;DR: We scanned the 20 most popular MCP servers with multiple AI models. 60% had at least one real security finding. Anthropic's official servers (Playwright, Slack, SQLite, Fetch) all scored 99-100/100 — here's what they did right. Two packages have critical vulnerabilities you should know about.
Scan your own package: agentaudit.dev
Why We Did This
The MCP (Model Context Protocol) ecosystem is exploding. Thousands of developers are installing MCP servers daily to connect AI agents to tools, databases, and APIs.
But here's the problem: Most MCP servers have never been security audited.
These servers often have access to:
- 🔐 Your source code repositories
- 🗄️ Your databases
- 📧 Your email and communication tools
- ☁️ Your cloud infrastructure
One vulnerable MCP server = Game over for your entire AI agent security.
So we decided to scan the top 20 MCP servers ourselves using AgentAudit — an open-source security scanner specifically designed for AI agent packages.
How AgentAudit Works
AgentAudit isn't your typical SAST tool. Here's what makes it different:
1. LLM-Powered Analysis (Not Just Regex)
Traditional scanners use regex patterns and AST analysis. AgentAudit uses LLMs that can understand context, intent, and semantic meaning.
Example: A regex scanner sees exec() and flags it. AgentAudit understands:
- Is the input sanitized?
- Is there a whitelist?
- What's the threat model?
2. 12 Structured Detection Patterns
The scanner checks for AI-agent-specific vulnerabilities:
- Prompt injection
- Tool poisoning
- Capability escalation
- Credential exposure
- Path traversal
- Command injection
- MCP protocol abuse
- Supply chain attacks
- And more...
3. Multi-Model Verification
You can scan the same package with different LLMs. Findings confirmed by multiple models have higher confidence.
4. Community Trust Registry
Results are uploaded to agentaudit.dev, where packages get a Trust Score (0-100). Other users can review, vote, and comment on findings.
5. ASF-IDs (Like CVEs for AI Agents)
Each finding gets an ASF-ID (AgentAudit Security Finding), e.g., ASF-2026-2019 — a standardized identifier for tracking.
The Scan: What We Did
We selected the 20 most popular MCP servers based on:
- GitHub stars
- Official status (Anthropic, Microsoft, etc.)
- Community adoption
Each package was scanned with multiple models:
| Model | Reports | Cost/Scan | Performance |
|---|---|---|---|
| Gemini 2.5 Flash | 20 | ~$0.02 | Best scanner — found most real issues |
| Claude Opus 4 | 20 | ~$1-2 | Balanced — fewer findings, higher precision |
| GPT-4o | 15 | ~$0.10 | Nearly useless — found almost nothing |
| Claude Haiku 4.5 | 8 | ~$0.01 | Too conservative — misses real issues |
Total: 68 reports across 4 models, ~$37 total cost.
Model Performance (Benchmark on 9 Known-Vulnerable Packages)
| Model | Recall | Precision | F1 Score |
|---|---|---|---|
| Gemini 2.5 Flash | 85% | 83% | 84% |
| Claude Haiku 4.5 | 82% | 81% | 82% |
| Claude Sonnet 4 | 79% | 76% | 78% |
| Claude Sonnet 4.6 | 78% | 76% | 77% |
| GPT-4o | 65% | 66% | 65% |
Key finding: GPT-4o is considered a top model but is terrible at security analysis. Gemini 2.5 Flash is the best value.
The Results: Trust Scores for Top 20 MCP Servers
✅ Clean Bill of Health (Trust Score: 99-100)
These packages had NO findings from ANY model:
| Package | Publisher | Trust Score |
|---|---|---|
| Playwright MCP | Anthropic/Microsoft | 100 |
| Stripe Agent Toolkit | Stripe | 100 |
| Supabase MCP | Supabase | 99 |
| Slack MCP Server | Anthropic | 99 |
| Linear MCP Server | Linear | 100 |
| Sentry MCP Server | Sentry | 100 |
| Cloudflare MCP Server | Cloudflare | 100 |
| Firebase MCP | 100 | |
| MCP Server SQLite | Anthropic | 100 |
| MCP Server Fetch | Anthropic | 100 |
10 out of 20 packages passed with flying colors. These are well-built with good security practices.
⚠️ Moderate Risk (Trust Score: 65-94)
Findings exist but are manageable:
| Package | Trust Score | Findings |
|---|---|---|
| MongoDB MCP Server | 94 | 2 findings (low severity) |
| MCP Server Qdrant | 85 | 1 active finding (runtime dependency injection) |
| Git-MCP | 80 | 2 findings (unauthenticated R2 endpoint) |
| MCP Grafana | 80 | 4 findings (medium severity) |
| GitHub MCP Server | 78 | 4 findings (unsanitized exec.Command input) |
| Notion MCP Server | 65 | 5 findings (path traversal in file uploads) |
🔴 Needs Attention (Trust Score: 15-50)
These packages have serious issues:
| Package | Trust Score | Findings |
|---|---|---|
| Terraform MCP Server | 50 | 4 findings (shell injection, insecure TLS, unverified binaries) |
| Chrome DevTools MCP | 33 | 7 findings (arbitrary file writes, command injection) |
| MCP Server Kubernetes | 15 | 5 findings (2 CRITICAL) |
Critical Findings You Should Know About
🔴 CRITICAL #1: Kubernetes MCP — Arbitrary Command Execution
Package: mcp-server-kubernetes
Trust Score: 15/100
Findings: 5 total (2 CRITICAL)
Vulnerability 1: Arbitrary Command Execution via KUBECONFIG_COMMAND
The server allows setting KUBECONFIG_COMMAND environment variable, which executes arbitrary shell commands:
// Vulnerable pattern found
const command = process.env.KUBECONFIG_COMMAND;
execSync(command); // Arbitrary command execution!
Impact: Anyone who can set this env var can run arbitrary commands on the host system.
Vulnerability 2: Unauthenticated HTTP/SSE Transport
The server listens on 0.0.0.0 without authentication:
// Listening on all interfaces, no auth
const server = createServer(handler);
server.listen(3000, '0.0.0.0');
Impact: Anyone on the network can send kubectl commands to the server.
Recommendation: Do not use in production until fixed.
🔴 CRITICAL #2: Chrome DevTools MCP — File Write + Command Injection
Package: chrome-devtools-mcp
Trust Score: 33/100
Findings: 7 total
Vulnerability 1: Arbitrary File Writes
File write operations don't sanitize paths:
// Unsanitized path from user
await fs.writeFile(userProvidedPath, content);
Impact: Can write files outside intended directory (path traversal).
Vulnerability 2: Command Injection via Chrome Args
Chrome launch arguments allow command injection:
// User-controlled args passed to Chrome
launchChrome(userArgs);
Impact: Arbitrary command execution via crafted Chrome arguments.
Vulnerability 3: Arbitrary Extension Installs
Can install arbitrary browser extensions:
// No validation on extension ID
await installExtension(userProvidedExtensionId);
Impact: Malicious extensions could be installed.
Recommendation: Use with extreme caution. Review all inputs.
🟡 HIGH: Notion MCP — Path Traversal in Uploads
Package: notion-mcp-server
Trust Score: 65/100
Findings: 5 total
Vulnerability: Path Traversal in File Uploads
Local file upload operations don't sanitize paths:
// User-provided path not sanitized
const filePath = path.join(uploadDir, userFilename);
await fs.copyFile(userFile, filePath);
Impact: Can write files outside upload directory using ../../../ patterns.
Fix: Normalize and validate paths before use.
🟡 HIGH: Terraform MCP — Shell Injection
Package: terraform-mcp-server
Trust Score: 50/100
Findings: 4 total
Vulnerability: Shell Injection in Build Arguments
Build arguments passed to shell without sanitization:
// User input passed to shell
execSync(`terraform ${userCommand} ${userArgs}`);
Impact: Arbitrary command execution via crafted arguments.
Additional Issues:
- Downloads and executes unverified binaries in CI
- Insecure TLS configuration
Recommendation: Use array-based command execution instead of shell strings.
What Anthropic's Servers Do Right
Anthropic's official MCP servers all scored 99-100/100. Here's what they do differently:
Pattern 1: Path Traversal Protection (server-filesystem)
The official filesystem server has six layers of path validation:
export function isPathWithinAllowedDirectories(
absolutePath: string,
allowedDirectories: string[]
): boolean {
// 1. Null byte rejection
if (absolutePath.includes('\x00')) return false;
// 2. Normalization
const normalizedPath = path.resolve(path.normalize(absolutePath));
// 3. Check containment
return allowedDirectories.some(dir => {
const normalizedDir = path.resolve(path.normalize(dir));
return normalizedPath.startsWith(normalizedDir + path.sep);
});
}
Plus:
- Symlink resolution
- Atomic writes with race condition prevention
- Proper error handling
Pattern 2: Command Execution via Arrays (NOT Strings)
Anthropic's servers use array-based command execution:
// SECURE (used by Anthropic)
const command = "kubectl";
const args = ["delete", resourceType, name];
execFileSync(command, args);
// INSECURE (NOT found in Anthropic servers)
execSync(`kubectl delete ${resourceType} ${name}`);
One server explicitly validates array types:
if (!Array.isArray(input.command)) {
throw new McpError(
ErrorCode.InvalidParams,
"Command must be an array. String commands not supported for security."
);
}
Takeaway: These patterns should be copied by all MCP developers.
Success Stories: Security Done Right
octocode-mcp: Fixed All 5 Findings in 48 Hours
When we scanned octocode-mcp, we found 5 security issues. The maintainer's response?
Within 48 hours:
- ✅ All 5 findings fixed
- ✅ 64 regression tests added
- ✅ Public verification report posted
This is how you do open source security right. 👏
Sentry: Added AgentAudit Badge to XcodeBuildMCP
Sentry added the AgentAudit security badge to their XcodeBuildMCP repo.
What this means: Users can instantly see the security status before installing.
Why it matters: Major security companies like Sentry are leading by example — transparency builds trust.
IBM: PR Submitted for mcp-context-forge (10k+ stars)
IBM has a pending PR to add the AgentAudit security badge to their mcp-context-forge repo.
Status: PR under review. Once merged, thousands of users will see the security status before installing.
Important Disclaimers
1. LLM-Based Scanning Is NOT Perfect
We manually reviewed all findings and removed false positives. But some may remain. Trust scores are relative, not absolute.
2. Findings Represent a Point in Time
These scans were conducted in February 2026. Maintainers may have already fixed issues. Check the live reports for updates.
3. A Score of 100 Doesn't Guarantee Zero Vulnerabilities
It means no findings were detected by our scanners. Traditional vulnerabilities (buffer overflows, etc.) may still exist.
4. We Responsibly Disclosed Critical Findings
Critical findings were disclosed to maintainers before publication to give them time to fix.
What Should You Do?
For MCP Server Maintainers
1. Scan your package NOW
npx agentaudit scan https://github.com/your-org/your-mcp-server
2. Add the AgentAudit Badge
[](https://agentaudit.dev/package/your-org/your-mcp-server)
3. Fix High-Risk Findings Before Release
- Critical/High findings = block release
- Medium findings = document or fix ASAP
- Low findings = track in backlog
4. Copy Anthropic's Security Patterns
- Path traversal protection (6 layers)
- Array-based command execution
- Symlink resolution
- Atomic writes
For AI Developers
1. Check Before You Install
Look for AgentAudit badges in READMEs. No badge? Scan it yourself:
npx agentaudit scan https://github.com/org/package
2. Use Safe Defaults
These packages scored 99-100:
- ✅ Playwright MCP (Anthropic)
- ✅ Stripe Agent Toolkit (Stripe)
- ✅ Supabase MCP (Supabase)
- ✅ Slack MCP Server (Anthropic)
- ✅ Sentry MCP Server (Sentry)
3. Avoid High-Risk Packages
Until fixed, avoid:
- ❌ MCP Server Kubernetes (Trust: 15)
- ❌ Chrome DevTools MCP (Trust: 33)
- ❌ Terraform MCP Server (Trust: 50)
For Security Teams
1. Implement Automated Scanning
Add AgentAudit to your CI/CD pipeline:
# GitHub Action example
- name: Security Scan
run: npx agentaudit scan . --fail-on high
2. Use the Right Model
- Gemini 2.5 Flash for screening (cheap, high recall)
- Claude Opus 4 for verification (precise, low FP)
- Skip GPT-4o (not reliable for security)
3. Understand the Limitations
- Single-model findings may be false positives
- Multi-model consensus = high confidence
- Context matters (e.g., MD5 for non-crypto is OK)
The Cost Breakdown
Total cost for 68 scans: ~$37
| Model | Scans | Cost |
|---|---|---|
| Gemini 2.5 Flash | 40 | ~$0.80 |
| Claude Opus 4 | 20 | ~$35 |
| GPT-4o | 15 | ~$1.50 |
| Claude Haiku 4.5 | 8 | ~$0.10 |
You can scan your package for ~$0.02 with Gemini. That's less than a cup of coffee for peace of mind.
What's Next?
We're continuing to scan more MCP servers and AI agent packages. Our goal:
- ✅ 100+ MCP servers scanned by Q2 2026
- ✅ Public reports for every package
- ✅ Badge program for security-transparent projects
- ✅ CI/CD integration for automated pre-release audits
Want to scan your package? Visit agentaudit.dev and enter your GitHub repo URL.
Resources
- AgentAudit Website — Scan your package
-
CLI on npm —
npx agentaudit scan - CLI GitHub — Source code
- Skill (IDE integration) — Auto-check before install
- GitHub Action — CI/CD integration
- Live Reports — Browse all scans
Questions? Drop them in the comments! 👇
Scan your package now: agentaudit.dev
Top comments (0)