The MCP (Model Context Protocol) ecosystem has exploded. awesome-mcp-servers lists 200+ servers — but there was no way to know if any of them actually worked.
So I built mcp-probe: a zero-config CLI that validates MCP servers in one command.
The problem
You add a server to Claude Desktop, it silently fails. You look at logs, get "connection closed". You have no idea if it is a network issue, a broken dependency, or the server just does not implement the protocol correctly.
What mcp-probe does
npx @k08200/mcp-probe @modelcontextprotocol/server-memory
mcp-probe @modelcontextprotocol/server-memory
────────────────────────────────────────────────────
✓ MCP protocol handshake 1392ms — memory-server v0.6.3
✓ Tools discovery 33ms — Found 9 tools
✓ Tool schema validation — All tool schemas are valid
────────────────────────────────────────────────────
Server memory-server v0.6.3
Caps tools
Tools
▸ create_entities Create multiple new entities in the knowledge graph
▸ read_graph Read the entire knowledge graph
▸ search_nodes Search for nodes in the knowledge graph
▸ ...and 6 more
✓ PASS 1455ms total
For a server with resources and prompts too (server-everything):
✓ Tools discovery 22ms — Found 14 tools
✓ Resources discovery 2ms — Found 7 resources
✓ Prompts discovery 5ms — Found 4 prompts
It catches real bugs
@modelcontextprotocol/server-filesystem — one of the most well-known MCP servers — currently has a broken dependency:
✗ MCP protocol handshake — Error: Cannot find module 'ajv'
Before mcp-probe, this would show as "connection closed" with no indication of why.
CI integration
Exit code 1 on failure means it works as a CI gate:
- name: Validate MCP server
run: npx @k08200/mcp-probe @your-org/your-mcp-server
timeout-minutes: 2
JSON output for scripting:
npx @k08200/mcp-probe @scope/server --output json
How it works
Under the hood it uses the official @modelcontextprotocol/sdk to run the actual protocol handshake. It pipes stderr from the spawned process so when a server crashes on startup, you see the real error.
const transport = new StdioClientTransport({
command: 'npx',
args: ['--yes', target],
stderr: 'pipe', // capture crash output
});
const client = new Client(
{ name: 'mcp-probe', version: '0.1.0' },
{ capabilities: { roots: { listChanged: false } } }
);
await client.connect(transport);
const tools = await client.listTools();
// also listResources() and listPrompts() if server advertises them
Get it
npx @k08200/mcp-probe @modelcontextprotocol/server-memory
GitHub: k08200/mcp-probe
npm: @k08200/mcp-probe
Would love to hear what servers you try it on — especially if you find one where the output is confusing or wrong.
Top comments (5)
The "connection closed" debug story is the right pain to address. We run several MCP servers in agent workflows (Supabase, Datadog, Gmail)
and the failure modes that bite hardest aren't dependency or protocol issues, they're auth handoff. Server starts fine, lists tools fine,
but every call returns 401 because the OAuth flow needed a browser redirect the agent can't trigger.
Two things that would make this lethal as a CI gate:
Are you planning a mode that takes a sample input per tool and validates the call path end-to-end? That'd close the gap between "server
started" and "server actually works in an agent loop.
This is exactly the feedback I needed — thanks for the detailed breakdown.
The auth handoff gap is real and I've been thinking about how to close it. The initialize + list* checks catch the "server won't start" class of failures, but you're right that they completely miss the "server starts but is useless without browser auth" class. OAuth redirect failures are silent from the outside.
On your two points:
Tool-call dry-runs: Yes, this is on the roadmap. The plan is an optional tools.json sidecar where you declare sample inputs per tool — mcp-probe then calls each tool with those inputs and reports pass/fail/latency per call. Something like:
{
"tools": {
"read_file": { "path": "/tmp/test.txt" },
"search": { "query": "hello world" }
}
}
The tricky part is making it not require a sidecar for basic use — thinking of a --probe-tools flag that calls each tool with auto-generated minimal inputs from the schema (empty strings, zero values) just to verify the call path doesn't 500.
Stderr noise vs fatal errors: Currently I use a heuristic (Error: prefix, skip stack frames) but it's fragile. I'm planning to let server authors ship a .mcp-probe.json at their package root that declares expected startup warnings — anything matching those patterns gets downgraded to warn instead of fail. Open to other approaches if you've seen patterns that work.
Which of the servers you run (Supabase, Datadog, Gmail) would be most useful to target first for the dry-run feature? Happy to use those as test cases.
Datadog first. The OAuth-handoff failure is the highest-signal class because it's silent from outside today. Every check passes until the
first real call. Once that lands, Supabase is the cleanest positive-control surface (big tool catalog, stable token auth, varied input
shapes) to validate the dry-run pipeline against a server that actually works.
Sidecar over auto-fallback for real tool validation. Empty-string fuzz tends to land on input validation, not the call path you're trying to
exercise. Structured stderr on the spec side would obviate the noise-vs-fatal heuristic entirely.
Sidecar is shipped in v0.3.0.
.mcp-probe.json in your project root (or --tools-file):
{
"tools": {
"logs_query": {
"input": { "query": "service:web status:error", "timeframe": "1h" },
"expect": { "not_error_code": [401, 403] }
}
}
}
Sidecar inputs are used when available; auto-generated minimal inputs are fallback only. The dry-run output now shows which calls used sidecar vs auto. expect.not_error_code treats those HTTP codes as warn instead of fail — covers the OAuth handoff case.
npx @k08200/mcp-probe@latest --probe-tools
Would use Datadog as the first real test case if you're willing to share the server package name.
cool :)