TL;DR: I built a tiny tool that speaks the MCP protocol and ran it against 11 public Model Context Protocol servers. Handshake latency ranged from 97ms to nearly 21 seconds — a 215× spread — and the bigger problem isn't downtime at all. Free live index + open-source CLI at the bottom.
The itch
The Model Context Protocol went from a proposal to 10,000+ public servers in about a year. Agents now lean on these servers the way web apps lean on APIs. But I kept hitting flaky failures while building on them and couldn't tell: was it my code, or the server?
There's no Pingdom for MCP. So I built one — and the first thing I did was point it at a set of well-known public servers.
How it works (no API, just the protocol)
The trick is that MCP is just a protocol — JSON-RPC over HTTP. So instead of calling some third-party API, the tool pretends to be an agent: it runs the real handshake (initialize then tools/list), exactly like Claude or any MCP client would, and measures the round trip. Zero dependencies, ~150 lines.
Finding 1: a 215× latency spread
| Server | Handshake latency | Tools |
|---|---|---|
| Hugging Face | 97 ms | 8 |
| Context7 | 108 ms | 2 |
| Cloudflare Docs | 148 ms | 2 |
| SpaceMolt | 194 ms | 183 |
| Exa | 239 ms | 2 |
| CrashStory | 522 ms | 18 |
| Roundtable | 551 ms | 13 |
| DeepWiki | 605 ms | 3 |
| Chainflip Broker | 634 ms | 6 |
| Microsoft Learn | 664 ms | 3 |
| GitMCP | 20,820 ms | 5 |
Median was 522ms. If your agent calls a tool on a 21-second server, that's 21 seconds your user stares at a spinner — or the request times out. Latency here isn't vanity; it's whether the agent works or hangs.
Finding 2: the real blind spot — contract drift
Those 11 servers expose 245 tools between them. Every tool is a contract: a name and a set of required inputs that agents depend on.
Here's what nobody's watching. A normal uptime monitor sees 200 OK and says healthy. But MCP servers rarely fail by going down — they fail by quietly changing the contract: a tool gets renamed in a redeploy, an optional param becomes required, a tool disappears. The server still returns 200 OK, and every agent calling it silently breaks.
Uptime tells you the server answered. It can't tell you the server still does what your agent expects.
Catching that requires actually speaking the protocol and diffing the tool schemas over time. That's the whole point of the tool.
Try it
- Free live index (continuously updated): https://mcpwatch.app/reliability-index.html
- Full report: https://mcpwatch.app/report.html
- Open-source CLI (MIT): npx mcpwatch — https://github.com/ClaudefoustCEO/mcpwatch
If you run an MCP server, I'd genuinely love to add it to the index — drop the URL in the comments. And I'm curious what you'd want monitored that I'm not thinking about.
Top comments (1)
The latency spread is the signal here. MCP servers are often discussed like they are just capability catalogs, but handshake and tool-call latency become part of the agent's reasoning budget.
I would love to see this kind of index include a "safe to put in an interactive loop" threshold, not only uptime. A 20-second server can still be useful, but probably not in the same workflow as a 100ms one.