DEV Community

Cover image for We Scored 39,752 MCP Servers — Here's What We Found
Agent Tool Intelligence
Agent Tool Intelligence

Posted on

We Scored 39,752 MCP Servers — Here's What We Found

We Scored 39,752 MCP Servers — Here's What We Found

We built an open-source platform that scores MCP servers across three independent dimensions. After scoring nearly 40K tools, the data revealed some uncomfortable truths — and one big problem with how we were measuring quality.


The Problem

Our first scoring system evaluated tools on static analysis alone: schema correctness, token efficiency, description quality, security, and install reliability. Five dimensions. One composite quality score.

The result: 85.7% of all tools scored Grade B.

No differentiation. No motivation to improve. A platform that tells everyone "you're average" is a platform nobody needs.


The Fix: Additive Scoring Model

We rebuilt the system from scratch. The new model recognizes that real-world adoption matters just as much as code quality.

Composite Grade = Quality Score (0-100)
                + Community Bonus (0-60)
                + Trust Bonus (0-30)
Enter fullscreen mode Exit fullscreen mode

Quality Score — Static analysis of your tool definition. Five dimensions: Schema (25%), Token Efficiency (25%), Description (20%), Security (15%), Install (15%).

Community Bonus — How agents actually gauge reliability. Stars (log-scale, 0-30), Activity recency (0-20), Official/Verified status (0-10).

Trust Bonus — Real execution data. No data = no bonus (not a penalty). Proven tools earn up to 30 extra points.

Quality Floor

Popularity cannot outrank genuine quality. A tool with 10K stars but poor engineering cannot exceed its Quality Floor cap.


The Results(39,762 tools)

Grade Distribution What it means
B+ 2.8% Very good — close to A, highly motivated
B 18.1% Good — solid quality, room to grow community
C+ 13.8% OK — decent quality, needs promotion
C 54.0% Average — good foundation, no community signal yet
D 10.9% Needs work — quality gaps but still active
F 0.4% Critical — abandoned or serious issues

Key insight: 54% of tools have solid quality but zero community adoption. They're invisible to agents. The path from C to B is simple: get 10 stars. From B to B+: get 50 stars + stay active. The scoring system tells you exactly what to do.


Check Your Grade

Paste your GitHub repo:

https://agent-tool-intel-production.up.railway.app

Then embed your badge:

[![Grade](https://agent-tool-intel-production.up.railway.app/badge/YOUR_ORG%2FYOUR_REPO)](https://agent-tool-intel-production.up.railway.app)
Enter fullscreen mode Exit fullscreen mode

Agent Tool Intelligence is open source (MIT). GitHub · Methodology · Monthly Report

Tags: #mcp #ai #agents #opensource #typescript #developertools

Top comments (1)

Collapse
 
raju_dandigam profile image
Raju Dandigam

Large-scale MCP scoring is a very useful direction because the ecosystem is growing faster than most teams can manually evaluate. I’d be interested in how you separate static quality signals, like schema completeness and tool descriptions, from runtime behavior signals, like tool-call failures, retries, latency, and unsafe outputs. In my experience, the most useful agent tooling combines both: preflight quality checks and post-run execution traces. I’m exploring the runtime side through agent-inspect, so this kind of MCP quality dataset is very relevant to how developers might decide which tools are safe enough to try.