DEV Community

yongrean
yongrean

Posted on

I built the npm audit for MCP servers

The MCP (Model Context Protocol) ecosystem has exploded. awesome-mcp-servers lists 200+ servers — but there was no way to know if any of them actually worked.

So I built mcp-probe: a zero-config CLI that validates MCP servers in one command.

The problem

You add a server to Claude Desktop, it silently fails. You look at logs, get "connection closed". You have no idea if it is a network issue, a broken dependency, or the server just does not implement the protocol correctly.

What mcp-probe does

npx @k08200/mcp-probe @modelcontextprotocol/server-memory
Enter fullscreen mode Exit fullscreen mode
mcp-probe  @modelcontextprotocol/server-memory
────────────────────────────────────────────────────
  ✓  MCP protocol handshake  1392ms — memory-server v0.6.3
  ✓  Tools discovery  33ms — Found 9 tools
  ✓  Tool schema validation — All tool schemas are valid
────────────────────────────────────────────────────
  Server   memory-server v0.6.3
  Caps     tools

  Tools
    ▸ create_entities  Create multiple new entities in the knowledge graph
    ▸ read_graph  Read the entire knowledge graph
    ▸ search_nodes  Search for nodes in the knowledge graph
    ▸ ...and 6 more

  ✓  PASS  1455ms total
Enter fullscreen mode Exit fullscreen mode

For a server with resources and prompts too (server-everything):

  ✓  Tools discovery  22ms — Found 14 tools
  ✓  Resources discovery  2ms — Found 7 resources
  ✓  Prompts discovery  5ms — Found 4 prompts
Enter fullscreen mode Exit fullscreen mode

It catches real bugs

@modelcontextprotocol/server-filesystem — one of the most well-known MCP servers — currently has a broken dependency:

  ✗  MCP protocol handshake — Error: Cannot find module 'ajv'
Enter fullscreen mode Exit fullscreen mode

Before mcp-probe, this would show as "connection closed" with no indication of why.

CI integration

Exit code 1 on failure means it works as a CI gate:

- name: Validate MCP server
  run: npx @k08200/mcp-probe @your-org/your-mcp-server
  timeout-minutes: 2
Enter fullscreen mode Exit fullscreen mode

JSON output for scripting:

npx @k08200/mcp-probe @scope/server --output json
Enter fullscreen mode Exit fullscreen mode

How it works

Under the hood it uses the official @modelcontextprotocol/sdk to run the actual protocol handshake. It pipes stderr from the spawned process so when a server crashes on startup, you see the real error.

const transport = new StdioClientTransport({
  command: 'npx',
  args: ['--yes', target],
  stderr: 'pipe',  // capture crash output
});

const client = new Client(
  { name: 'mcp-probe', version: '0.1.0' },
  { capabilities: { roots: { listChanged: false } } }
);

await client.connect(transport);
const tools = await client.listTools();
// also listResources() and listPrompts() if server advertises them
Enter fullscreen mode Exit fullscreen mode

Get it

npx @k08200/mcp-probe @modelcontextprotocol/server-memory
Enter fullscreen mode Exit fullscreen mode

GitHub: k08200/mcp-probe
npm: @k08200/mcp-probe

Would love to hear what servers you try it on — especially if you find one where the output is confusing or wrong.

Top comments (5)

Collapse
 
arvavit profile image
Vadym Arnaut

The "connection closed" debug story is the right pain to address. We run several MCP servers in agent workflows (Supabase, Datadog, Gmail)
and the failure modes that bite hardest aren't dependency or protocol issues, they're auth handoff. Server starts fine, lists tools fine,
but every call returns 401 because the OAuth flow needed a browser redirect the agent can't trigger.

Two things that would make this lethal as a CI gate:

  • Tool-call dry-runs, not just discovery. A server that registers 20 tools but errors on every actual invocation passes a "list tools" check.
  • Distinguish stderr noise from real init failures. Several official servers log warnings on startup (missing optional config, version checks) that look identical to fatal errors if you're capturing stderr verbatim.

Are you planning a mode that takes a sample input per tool and validates the call path end-to-end? That'd close the gap between "server
started" and "server actually works in an agent loop.

Collapse
 
k08200 profile image
yongrean

This is exactly the feedback I needed — thanks for the detailed breakdown.

The auth handoff gap is real and I've been thinking about how to close it. The initialize + list* checks catch the "server won't start" class of failures, but you're right that they completely miss the "server starts but is useless without browser auth" class. OAuth redirect failures are silent from the outside.

On your two points:

Tool-call dry-runs: Yes, this is on the roadmap. The plan is an optional tools.json sidecar where you declare sample inputs per tool — mcp-probe then calls each tool with those inputs and reports pass/fail/latency per call. Something like:

{
"tools": {
"read_file": { "path": "/tmp/test.txt" },
"search": { "query": "hello world" }
}
}
The tricky part is making it not require a sidecar for basic use — thinking of a --probe-tools flag that calls each tool with auto-generated minimal inputs from the schema (empty strings, zero values) just to verify the call path doesn't 500.

Stderr noise vs fatal errors: Currently I use a heuristic (Error: prefix, skip stack frames) but it's fragile. I'm planning to let server authors ship a .mcp-probe.json at their package root that declares expected startup warnings — anything matching those patterns gets downgraded to warn instead of fail. Open to other approaches if you've seen patterns that work.

Which of the servers you run (Supabase, Datadog, Gmail) would be most useful to target first for the dry-run feature? Happy to use those as test cases.

Collapse
 
arvavit profile image
Vadym Arnaut

Datadog first. The OAuth-handoff failure is the highest-signal class because it's silent from outside today. Every check passes until the
first real call. Once that lands, Supabase is the cleanest positive-control surface (big tool catalog, stable token auth, varied input
shapes) to validate the dry-run pipeline against a server that actually works.

Sidecar over auto-fallback for real tool validation. Empty-string fuzz tends to land on input validation, not the call path you're trying to
exercise. Structured stderr on the spec side would obviate the noise-vs-fatal heuristic entirely.

Thread Thread
 
k08200 profile image
yongrean

Sidecar is shipped in v0.3.0.

.mcp-probe.json in your project root (or --tools-file ):

{
"tools": {
"logs_query": {
"input": { "query": "service:web status:error", "timeframe": "1h" },
"expect": { "not_error_code": [401, 403] }
}
}
}
Sidecar inputs are used when available; auto-generated minimal inputs are fallback only. The dry-run output now shows which calls used sidecar vs auto. expect.not_error_code treats those HTTP codes as warn instead of fail — covers the OAuth handoff case.

npx @k08200/mcp-probe@latest --probe-tools

Would use Datadog as the first real test case if you're willing to share the server package name.

Collapse
 
kushal1o1 profile image
KUSHAL BARAL

cool :)