We Scored 39,752 MCP Servers — Here's What We Found
We built an open-source platform that scores MCP servers across three independent dimensions. After scoring nearly 40K tools, the data revealed some uncomfortable truths — and one big problem with how we were measuring quality.
The Problem
Our first scoring system evaluated tools on static analysis alone: schema correctness, token efficiency, description quality, security, and install reliability. Five dimensions. One composite quality score.
The result: 85.7% of all tools scored Grade B.
No differentiation. No motivation to improve. A platform that tells everyone "you're average" is a platform nobody needs.
The Fix: Additive Scoring Model
We rebuilt the system from scratch. The new model recognizes that real-world adoption matters just as much as code quality.
Composite Grade = Quality Score (0-100)
+ Community Bonus (0-60)
+ Trust Bonus (0-30)
Quality Score — Static analysis of your tool definition. Five dimensions: Schema (25%), Token Efficiency (25%), Description (20%), Security (15%), Install (15%).
Community Bonus — How agents actually gauge reliability. Stars (log-scale, 0-30), Activity recency (0-20), Official/Verified status (0-10).
Trust Bonus — Real execution data. No data = no bonus (not a penalty). Proven tools earn up to 30 extra points.
Quality Floor
Popularity cannot outrank genuine quality. A tool with 10K stars but poor engineering cannot exceed its Quality Floor cap.
The Results(39,762 tools)
| Grade | Distribution | What it means |
|---|---|---|
| B+ | 2.8% | Very good — close to A, highly motivated |
| B | 18.1% | Good — solid quality, room to grow community |
| C+ | 13.8% | OK — decent quality, needs promotion |
| C | 54.0% | Average — good foundation, no community signal yet |
| D | 10.9% | Needs work — quality gaps but still active |
| F | 0.4% | Critical — abandoned or serious issues |
Key insight: 54% of tools have solid quality but zero community adoption. They're invisible to agents. The path from C to B is simple: get 10 stars. From B to B+: get 50 stars + stay active. The scoring system tells you exactly what to do.
Check Your Grade
Paste your GitHub repo:
https://agent-tool-intel-production.up.railway.app
Then embed your badge:
[](https://agent-tool-intel-production.up.railway.app)
Agent Tool Intelligence is open source (MIT). GitHub · Methodology · Monthly Report
Tags: #mcp #ai #agents #opensource #typescript #developertools
Top comments (1)
Large-scale MCP scoring is a very useful direction because the ecosystem is growing faster than most teams can manually evaluate. I’d be interested in how you separate static quality signals, like schema completeness and tool descriptions, from runtime behavior signals, like tool-call failures, retries, latency, and unsafe outputs. In my experience, the most useful agent tooling combines both: preflight quality checks and post-run execution traces. I’m exploring the runtime side through agent-inspect, so this kind of MCP quality dataset is very relevant to how developers might decide which tools are safe enough to try.