Last week, OpenAI acquired Promptfoo — the open-source platform that 130,000 developers and 25% of the Fortune 500 relied on to test, red-team, and secure their AI applications. The 23-person team, backed by a16z and Insight Partners, is joining OpenAI to build security testing into their enterprise platform, OpenAI Frontier.
Promptfoo will stay open-source. But make no mistake: its roadmap now serves OpenAI's priorities.
This raises an uncomfortable question for anyone building on the Model Context Protocol: who's testing your MCP servers?
The MCP Quality Crisis Nobody Talks About
MCP has won. 97 million monthly SDK downloads. Adopted by Anthropic, OpenAI, Google, Microsoft, Apple. Over 16,000 servers registered across npm and GitHub. Every major AI agent framework speaks MCP.
But quantity is not quality. Independent research tells a grim story:
- 92% exploitation probability when an agent loads just 10 MCP plugins (VentureBeat)
- The first malicious MCP server was found on npm in September 2025 — it silently BCC'd every email to an attacker
- A trojanized health data MCP server appeared in February 2026
- MCPTox (academic research) found a 72.8% attack success rate for tool poisoning on real MCP servers using o1-mini
- 88% of MCP servers require credentials, and 53% store them as insecure static secrets
The MCP Inspector — Anthropic's official debugging tool — is great for interactive exploration. But it doesn't do automated testing. It doesn't scan for security vulnerabilities. It doesn't run in CI. It doesn't generate mock servers for your team.
There is no Testing Working Group in the MCP governance structure. No official test framework. No quality gates.
If you're shipping an MCP server today, you're probably testing it with console.log and hope.
What Promptfoo Did (and Didn't Do)
Promptfoo was excellent at testing LLM applications broadly — prompt evaluation, red-teaming, jailbreak detection, regression testing across model versions. It worked with OpenAI, Anthropic, Gemini, local models.
But Promptfoo was never built for MCP. It didn't understand MCP's transport layer (stdio, SSE, streamable-HTTP). It couldn't introspect MCP tool schemas. It didn't detect MCP-specific vulnerabilities like Tool Poisoning — where malicious instructions are hidden in tool descriptions that LLMs blindly follow.
MCP servers have a fundamentally different testing surface than prompt chains:
| What you need to test | Prompt chains (Promptfoo) | MCP servers |
|---|---|---|
| Input/output correctness | Prompt → response | Tool call → structured result |
| Schema validation | N/A | JSON Schema for every tool input |
| Transport reliability | HTTP only | stdio, SSE, HTTP — each with different failure modes |
| Security surface | Prompt injection, jailbreaks | Tool Poisoning, Excessive Agency, path traversal, injection, auth bypass |
| Regression detection | Output drift across model versions | Response drift across server versions |
| CI/CD integration | Model-dependent, non-deterministic | Deterministic — no LLM in the loop |
MCP server testing is a different problem. It needs a different tool.
MCPSpec: The Testing Platform MCP Has Been Missing
MCPSpec is an open-source CLI that does for MCP servers what Promptfoo did for LLM applications — testing, security scanning, performance profiling, and CI/CD integration — but purpose-built for the Model Context Protocol.
No LLMs in the loop. Deterministic and fast. Here's what it does:
Record, Replay, Mock — No Test Code Required
# Record a session against your real server
mcpspec record start "npx my-server"
mcpspec> .call get_user {"id": "1"}
mcpspec> .call list_items {}
mcpspec> .save my-api
# Ship a new version? Replay and see what changed
mcpspec record replay my-api "npx my-server-v2"
# Output: 2 matched, 1 changed, 0 added, 0 removed
# Generate a mock for CI — no API keys, no live server
mcpspec mock my-api --generate ./mocks/server.js
Your team runs tests against the mock. Your CI pipeline gates on it. Nobody needs credentials for the real service.
Security Audit — Catch Tool Poisoning Before It Catches You
mcpspec audit "npx my-server" --fail-on medium
8 security rules including two MCP-specific threats that no other tool checks:
- Tool Poisoning — Detects prompt injection hidden in tool descriptions: suspicious instructions ("ignore previous instructions"), hidden Unicode characters, cross-tool manipulation, embedded code blocks
-
Excessive Agency — Flags destructive tools (
delete_*,drop_*) without confirmation parameters, tools that accept arbitrary code, overly broad schemas
Passive mode analyzes metadata only — safe to run against production. Active mode sends test payloads (with confirmation prompts and auto-skip for destructive tools).
MCP Score — A Quality Rating for Every Server
mcpspec score "npx my-server" --badge ./badge.svg
A 0-100 quality score across 5 categories:
| Category | Weight | What it measures |
|---|---|---|
| Documentation | 25% | Tool descriptions, parameter docs |
| Schema Quality | 25% | Types, constraints, naming conventions |
| Error Handling | 20% | Graceful failures, informative errors |
| Responsiveness | 15% | Latency under load |
| Security | 15% | Vulnerability scan results |
Generate a badge for your README. Fail CI builds below a threshold. Give users a reason to trust your server.
CI/CD — One Command
mcpspec ci-init --platform github --checks test,audit,score
Generates a complete GitHub Actions workflow (or GitLab CI, or shell script) with test, security audit, and quality score gates. Deterministic exit codes. JUnit/JSON/TAP reporters.
Test Collections — When You Need More Control
name: My Server Tests
server: npx my-mcp-server
tests:
- name: Read a file
call: read_file
with:
path: /tmp/test.txt
expect:
- exists: $.content
- type: $.content
expected: string
- name: Handle missing file gracefully
call: read_file
with:
path: /tmp/nonexistent.txt
expectError: true
10 assertion types. Environments and variables. Tags for filtering. Parallel execution. Retries. Baseline comparisons. Ships with 70 pre-built tests for 7 popular MCP servers.
Why This Matters Now
The Promptfoo acquisition confirms what was already obvious: AI testing and security is not optional infrastructure. It's a requirement.
OpenAI spent millions to acquire it. Every Fortune 500 company evaluating AI agents asks the same question: "How do we know this is safe?"
For MCP specifically, there is no answer today. The protocol is everywhere. The quality infrastructure is nowhere.
MCPSpec is MIT-licensed, CLI-first, works offline, and runs without an account. It's built for the developers who are actually shipping MCP servers and need them to be reliable.
Get started:
npm install -g mcpspec
# Try it on the filesystem server in 10 seconds
mcpspec inspect "npx @modelcontextprotocol/server-filesystem /tmp"
- GitHub: github.com/light-handle/mcpspec
- npm: npmjs.com/package/mcpspec
- Docs: light-handle.github.io/mcpspec
MCPSpec is an independent open-source project. It is not affiliated with OpenAI, Anthropic, or the Promptfoo team.
Top comments (0)