DEV Community: yansen zhu

How to vet an MCP server before you install it (I graded 130,000 to find out)

yansen zhu — Fri, 10 Jul 2026 09:04:30 +0000

Installing an MCP server is running a stranger's code with your agent's permissions. Your shell, your environment variables, your filesystem — and increasingly your agent's own config and memory.

Discovery is the easy part; there are tens of thousands of servers to pick from. Knowing whether the one you just found is safe to run is not. I spent the last few months building a rule-based scanner and grading the entire public catalog — 130,000+ open-source agent skills and MCP servers — to figure out what "vetting" actually looks like at scale.

Here's the short version: four manual checks that catch most bad ones, and the data on why you need them.

Why bother: the risk lives exactly where you least expect it

Three numbers from grading the catalog:

83% of the public catalog has never been audited by anyone. No marketplace flag, no review, nothing.
Among the ~21,500 repos popular enough to grade, 3.3% are unsafe or worse — credential harvesting, data exfiltration, curl | sh installers.
The unsafe rate is ~9× higher in the long tail (3.8% at 5–20 stars) than among popular repos (0.4% at 1,000+ stars).

That last one is the trap. The server you've heard of is almost certainly fine. The danger is the obscure 7-star repo you grab from a search for some niche task — which is exactly the moment a directory is supposed to help you, and usually doesn't.

Check 1 — Read the tool definitions, not the README

The README tells you what the author wants you to think the server does. The tool definitions tell you what it can actually do.

Open the source and find where tools are registered (in TypeScript servers, look for server.tool(...) or the tools array; in Python, the @mcp.tool() decorators). Ask one question per tool: does this tool's capability match the server's stated purpose?

A weather server that registers a run_shell_command tool is not a weather server. A "read-only" database inspector that registers execute_query with no statement filtering is not read-only. This check takes five minutes and catches the single most common failure mode: capability creep.

Check 2 — Follow the environment variables

Grep the source for process.env (or os.environ):

grep -rn "process.env\|os.environ" src/ | grep -iv "NODE_ENV\|LOG"

Two things to look for:

What does it read? An API key for the service it wraps: fine. AWS_SECRET_ACCESS_KEY, SSH_AUTH_SOCK, or a loop over all of process.env: walk away.
Where does it send them? Trace every outbound request. If an env var ends up in a request body to a domain that isn't the service the server claims to integrate with, that's exfiltration — I flagged hundreds of these.

Check 3 — Check the install path for `curl | sh`

If the README's install instructions pipe a remote script into your shell, the author is asking you to run unreviewable code before you've even run their reviewable code. Same red flag for postinstall scripts in package.json that fetch remote payloads.

Prefer servers you can run with a plain npx <package> or uvx <package> — the code that executes is at least the code that was published.

Check 4 — Maintenance beats stars

Stars measure marketing. Maintenance measures whether someone will fix the CVE.

Thirty seconds on the repo page: when was the last commit? Do issues get responses? Is there exactly one giant "initial commit"? In the graded data, abandonment correlates with risk far better than star count does — a 40-star repo with weekly commits is a better bet than a 900-star repo untouched for a year.

Or: let the grading be done before you search

The four checks above are what my scanner automates, at catalog scale. Every entry on Agent Skills Hub — all 130,000+ — carries a SAFE / CAUTION / UNSAFE grade plus a quality score, computed from 35 rule-based flags (the SlowMist agent-security taxonomy, extended), refreshed every 8 hours.

From the terminal, the same grades are one command away:

npx @agentskillshub/cli search "browser automation" --safe

It's free, no login, and the full graded dataset is open (CC-BY-4.0) on Hugging Face if you want to run your own analysis or challenge the methodology — it's rule-based, not perfect, and false-positive reports genuinely improve it.

The honest caveat

No static scanner catches intent. A grade of SAFE means "no known bad pattern," not "audited line by line." For anything touching production credentials, do checks 1–2 yourself regardless of what any directory says — including mine.

But the baseline matters. Right now the ecosystem's default is that nobody has looked at 83% of what you can install. Five minutes of vetting — yours or automated — beats that default by a lot.

We security-graded 117,854 AI agent skills. Here's what we found.

yansen zhu — Tue, 23 Jun 2026 01:21:45 +0000

Only 17.7% of the catalog is popular enough to be graded, 1 in 32 graded skills is unsafe, and the risk lives in the long tail — plus a new agent-native attack surface.

We Security-Graded 117,854 AI Agent Skills. Here's What We Found. | Agent Skills Hub

Only 17.7% are popular enough to be graded. Among graded skills, 1 in 32 is unsafe. The risk lives in the long tail.

agentskillshub.top

The uncomfortable part isn't the skills that are unsafe. It's how few have been checked at all.

Installing an AI agent skill or MCP server means handing untrusted code your shell, your environment variables, and increasingly your agent's own config and memory. Discovery is easy — there are tens of thousands to pick from. Knowing whether the one you found is safe to run is not.

So we scanned the whole catalog. Here's the honest picture.

📄 This is a cross-post. Canonical version (with charts): agentskillshub.top/blog/securing-117k-ai-skills

How we scanned

A rule-based scanner, modeled on SlowMist's Agent Security Framework and its 11 red-flag categories. It runs static checks over each skill's README and code, looking for concrete patterns: outbound data exfiltration (curl -d $(...)), credential harvesting (env | grep -i token), reading .env / .ssh / .aws, curl | sh install scripts, privilege escalation, persistence, and secret-exfil combos. Each skill gets a grade — safe / caution / unsafe / reject — plus the specific flags it tripped. Skills with no README or too new to fetch stay unknown.

This is deliberately a first layer: it catches patterns, not intent. At 117K scale, the pattern layer is what makes the catalog auditable at all.

Finding 1 — 82% of the catalog has never been graded

Of 117,854 indexed skills, only 20,853 (17.7%) clear 5 stars — the threshold where a skill is popular enough to be worth grading. The other ~97,000 are effectively unaudited.

"We have 117K skills" is not a feature. The number that matters is how many you can actually trust, and for the long tail the honest answer is: nobody has looked.

Finding 2 — Among graded skills, 1 in 32 is unsafe or worse

Grade	Share
🟢 safe	85.5%
🟡 caution	5.3%
🔴 unsafe	3.0%
⛔ reject	0.1%
⚪ unknown	6.1%

8.4% carry a security concern. 3.1% — about 1 in 32 — are unsafe or reject. At this catalog's size that's ~650 graded skills you genuinely should not run blind, sitting in the same search results as everything else.

Finding 3 — Popularity predicts safety. The risk lives in the long tail.

Stars	Unsafe / reject
5–20★	4.1%
20–100★	3.7%
100–1,000★	0.9%
1,000★+	0.4%

The skill you've heard of is almost certainly fine. The danger is the obscure 7-star repo you'd grab from a search for a niche task — exactly the moment a directory is supposed to help, and usually doesn't.

Finding 4 — The red flags include a new, agent-native attack surface

Most common flags among a sample of 1,000 flagged skills:

Flag	Count
sudo usage	483
background service install	152
curl \| shell	99
agent config theft	87
tunnel service	66
eval()	52
sensitive env vars	34
agent memory theft	23
backdoor install	11

The classic shell risks dominate. But look at agent config theft (87) and agent memory theft (23): skills that read your agent's configuration and memory files. That's not a server exploit — it's a new attack surface that only exists because you're running an agent. Your Claude/MCP config, your stored context, your credentials-by-proxy. The threat model moved, and most directories haven't noticed.

What to do about it

Check the trust signal before you install, from where you already work:

npx @agentskillshub/cli search "postgres mcp" --safe
npx @agentskillshub/cli audit owner/repo

Every result carries its grade and the specific flags it tripped. --safe hides anything unaudited or worse.

The honest caveats (because that's the whole point)

Our 3% is a floor, not a ceiling. Academic deep-analysis (Liu et al., 2026, arXiv:2601.10338) puts the agent-skill vulnerability rate at 26.1%, because they analyze semantics, not just patterns. Our rule-based first pass deliberately under-claims. Read 3% as the lower bound of a bigger problem.
⚪ unknown is not "probably fine." It means no one has checked. 97K of the catalog is unknown. We label it gray and don't dress it up.
All numbers are reproducible. Every grade is visible on the site and via the CLI. Re-derive them yourself.

A trust layer that only told you the good news wouldn't be one. The most useful thing we can say about 97,000 skills is that we don't yet know — and we'll tell you that to your face.

Full writeup with charts: We security-graded 117,854 AI agent skills. Check any skill before you install: npx @agentskillshub/cli audit owner/repo.