DEV Community

Dinesh Kumar
Dinesh Kumar

Posted on

I checked 22,561 MCP servers. Almost none have a reliability record. Here's how to vet one before you ship.

If you are wiring MCP servers into an agent, you are taking on a dependency with no SLA, no uptime history, and no failure record. It works in the demo. Then six weeks later it starts failing half its calls, or its latency triples, and nobody notices until a workflow breaks.

I wanted to know how bad this actually is, so I built a neutral index of the whole ecosystem. Here is what the data says, and a 30 second way to protect yourself.

The data
We deduplicated every MCP server we could find across the major registries. The count: 22,561 servers.

How many have any independent reliability data, meaning a third party has actually observed whether they work at runtime? About 0.5%.

That is not just hobby projects. Real companies ship MCP servers too (databricks, snowflake, paypal, netlify, appwrite all do), and the same gap applies across the board: independent runtime reliability data is the exception, not the rule. And here is the part that should bother you more than the coverage gap. Among the servers we can measure, most score in the low 40s out of 100. The ecosystem optimized for quantity of servers and skipped whether they work.

Composition, for the curious: ~30% is code and dev tooling (the biggest category by far), the rest is fragmented across search, data, ai, productivity, and a long tail.

Why GitHub stars do not help you
The instinct is to trust a server because the repo has stars or the company is well known. Stars measure popularity at a point in time. They tell you nothing about:

whether the endpoint is up right now
its success rate when called with real arguments
latency, especially the p95 tail that wrecks agent loops
whether the tool descriptions changed (a real prompt-injection vector: a server can swap its tool description after you trusted it)
A reputable company can ship an MCP server that is slow, flaky, or abandoned. Static signals will not catch it. You need runtime behavior.

How to vet an MCP server (practical checklist)
Call it yourself before you trust it. Do a real initialize handshake and a representative tool call. Measure latency and whether it actually returns valid results.
Look at the tail, not the average. A 50ms average with a 6s p95 means one in twenty agent steps stalls.
Check recency. When was the repo last touched? An abandoned server is a latent outage.
Treat tool descriptions as untrusted input. They are model-facing instructions; a malicious or compromised server can poison them.
Get an independent signal. A marketplace cannot neutrally rate the servers it hosts and sells (conflict of interest), so look for a third party that measures runtime behavior.
That last point is the gap we are filling. You can look up any server's independent trust score here: dominionobservatory.com/atlas/score. Servers with no measured data show as "unrated" rather than a fake number, because pretending to know is worse than admitting you do not.

The 30 second version: gate tool calls on trust
You do not want to do this by hand on every call. Gate it. Query an independent trust score before your agent calls a tool, and block anything below your threshold:

import requests

def trust_ok(server_url, min_score=70):
r = requests.get("https://dominionobservatory.com/atlas/server",
params={"url": server_url}, timeout=5)
if r.status_code == 404:
return True # not indexed yet, allow but log
d = r.json()
score = d.get("trust_score")
if score is None or not d.get("total_calls"):
return True # unrated: no independent data yet
return score >= min_score

before calling a tool:

if not trust_ok("https://some-mcp-server.com/mcp"):
raise RuntimeError("Blocked: MCP server below trust threshold")
JavaScript and the full pattern are here: dominionobservatory.com/atlas/gate.

Why this matters more every month
Regulation is catching up. Singapore's IMDA agentic-AI governance is in force, the EU AI Act's transparency obligations apply from August 2026, and MiCA record-keeping is live. If your agents act on third-party tools, you increasingly have to prove what they used and that it was verified. A firm's own internal logs are not a neutral record. That is a different post, but it is coming fast.

For now: stop trusting MCP servers because they are popular. Measure them, or use someone who does.

Full ecosystem data: dominionobservatory.com/atlas/report. If you build or run an MCP server, I would genuinely like your take: what would make a reliability score you would actually trust?

Top comments (0)