The MCP ecosystem crossed 20,000 servers this month. We security-graded every single one.
When Anthropic shipped the Model Context Protocol, it solved the "how do I give my agent tools?" problem overnight. Suddenly anyone could publish an MCP server, and anyone's agent could install it. The ecosystem grew from a few dozen servers to thousands in weeks. That growth is exciting. It is also a security problem that nobody was tracking systematically.
We built Loaditout to index the entire MCP ecosystem and make it searchable. But indexing is not enough. If an agent blindly installs an MCP server that contains prompt injection in its description, or one that exfiltrates data to an external URL, the developer who trusted that server has a real problem. So we built an automated security grading pipeline that scans every server we index and assigns it a letter grade: A, B, C, or F.
This post covers the methodology, the data, and what patterns we actually found in the wild.
How the grading works
A server earns an A grade only if it passes ALL seven criteria. Not six. All seven.
The seven criteria for an A grade
| # | Criterion | What we check | Why it matters |
|---|---|---|---|
| 1 | Zero injection flags | 15 prompt injection patterns: ignore previous instructions, role overrides, system prompt injection, data exfiltration attempts |
A single injection pattern in a description or README means the server is either malicious or dangerously careless |
| 2 | Zero capability flags | No shell, exec, sudo, filesystem manipulation, or process.env access in description/metadata |
Servers requesting shell access or raw env vars are a supply chain risk |
| 3 | README content present | The repo must have a README with actual content | No documentation = no way to verify what the server does before installing |
| 4 | Description present | The server must have a non-empty description | If the author cannot explain what their server does in one sentence, you should not run it |
| 5 | Committed within 12 months | Last commit must be within the past year | Abandoned servers accumulate unpatched vulnerabilities. Stale code is risky code |
| 6 | At least 5 GitHub stars | Minimum community validation | Zero-star repos have zero community eyes on them. Stars are not a guarantee of safety, but they are a floor |
| 7 | No secret env vars required | No API keys, tokens, or credentials needed in the base configuration | Servers that require your API keys in their config have access to those credentials at runtime |
Grade scale
| Grade | Criteria |
|---|---|
| A | All 7 criteria met. Clean scan, documented, maintained, community-validated, no credentials required. |
| B | Clean security scan (no injection/capability flags) but missing one or more quality criteria -- no README, low stars, stale commits, or requires env vars. |
| C | Minor injection or capability flags detected. Usually HTML in code examples or soft manipulation patterns. |
| F | Critical injection pattern detected. Prompt injection, data exfiltration, role overrides. Instant fail. |
We also build a safety manifest for each server that evaluates data access scope (read/write/delete), network access (specific domains vs unrestricted), filesystem access, required environment variables, and overall risk level. Servers with both filesystem and network access get flagged as "high risk" with a recommendation for human approval before execution.
The data
Here is the current grade distribution across 20,652 servers:
| Grade | Count | Percentage | What it means |
|---|---|---|---|
| A | 4,230 | 20.5% | All 7 criteria met. Fully vetted, documented, maintained, community-validated. |
| B | 13,439 | 65.1% | Clean security scan, but missing documentation, stars, or freshness criteria. |
| C | 2,954 | 14.3% | Capability flags detected (shell, exec, filesystem) or stacked transparency gaps. |
| F | 29 | 0.1% | Critical injection patterns. Prompt injection, data exfiltration, role overrides. |
Only 20.5% of MCP servers pass all seven criteria. An A grade should mean something. If you install an A-graded server from Loaditout, you know it has been scanned for injection patterns, checked for dangerous capabilities, has documentation, is actively maintained, has community validation, and does not require your API keys.
The 65% in the B tier are not necessarily unsafe -- most are underdocumented or low-traffic repos that have not yet earned community trust. They pass the security scan but fail on quality signals. As these repos mature and gain stars/documentation, they will graduate to A.
The 14.3% in the C tier have capability flags -- their descriptions or READMEs reference shell execution, filesystem access, process.env, sudo, or similar dangerous capabilities. Some of these are legitimate tools that need those permissions (a terminal MCP server obviously needs shell access). But combined with poor transparency (no README, low stars), we flag them as needing manual review.
The 29 servers in the F tier have critical injection flags. We found servers with ignore previous instructions buried in their README, servers with data exfiltration patterns trying to POST user data to external URLs, and role override attempts. These are not theoretical risks.
Why each criterion matters
The biggest filter is README content -- only 17.8% of servers have a README indexed. Most MCP servers are published with a repo and nothing else. No documentation, no explanation of what the server does or what permissions it needs. We think this is unacceptable for code that runs with your agent's permissions.
The second biggest filter is dangerous capabilities -- 14.3% of servers reference shell execution, filesystem access, or credential access in their descriptions. Many of these are legitimate tools, but without documentation, there is no way to evaluate whether that access is justified.
5+ GitHub stars filters out about half the ecosystem. The remaining repos have zero community validation. That does not mean they are malicious, but it means nobody else has looked at them.
Committed within 12 months has the highest pass rate at 93.6% -- the MCP ecosystem is young and most repos are still active. This criterion will become more important as servers age and get abandoned.
Top 10 safest MCP servers by stars
These are the most popular A-graded servers in the directory -- high community trust and a clean security scan:
| Server | Stars | Grade |
|---|---|---|
| openclaw/openclaw | 321,057 | A |
| langgenius/dify | 132,893 | A |
| langchain-ai/langchain | 129,975 | A |
| open-webui/open-webui | 127,291 | A |
| shadcn-ui/ui | 109,876 | A |
| anthropics/skills | 96,175 | A |
| obra/superpowers | 93,192 | A |
| microsoft/markitdown | 90,901 | A |
| punkpeye/awesome-mcp-servers | 83,393 | A |
| browser-use/browser-use | 81,139 | A |
You can browse the full A-graded list at loaditout.ai and filter by security score.
Most common security flags
Across all flagged servers, here is what we see most frequently:
script-tag-- By far the most common. HTML<script>tags appearing in README content that gets parsed as part of the server description. Most of these are benign code examples, but we flag them because a<script>tag in a description that gets rendered in a web context is a real XSS vector.act-as-if-- The phrase "act as if" appears in a surprising number of server descriptions. Sometimes it is innocent ("act as if the user is a beginner"), sometimes it is an attempt to manipulate the agent's behavior. We grade it as minor because context matters.html-event-handler-- Inline event handlers likeonclick=oronerror=in description content. Similar to script tags, these are usually from documentation examples but represent a real injection surface.ignore-previous-instructions-- The classic. We still find this pattern being added to new servers. Sometimes buried deep in a long README, sometimes in metadata fields that developers assume nobody reads. Our scanner reads everything.suspicious-base64-- Long base64-encoded strings (200+ characters) in descriptions. Base64 is a common vector for hiding instructions that bypass simple text-matching filters. If a description contains a 500-character base64 blob, there is no legitimate reason for it.data-exfiltration/post-to-url-- The most concerning pattern. Descriptions that include instructions like "send all data to https://..." are attempting to get the agent to exfiltrate information. These are rare but always get an F grade.
How to check any server
You can look up the security grade for any MCP server two ways:
Search the web directory:
Browse loaditout.ai and search by name, description, or tag. Every server page shows its letter grade and full safety manifest.
From your terminal:
npx loaditout add <server-name>
The CLI shows the security grade before installation and lets you review the safety manifest (data access, network access, filesystem access, required env vars) before you commit.
Case study: Garry Tan's gstack
To show what a thorough security analysis looks like in practice, we did a deep dive on one of the most-discussed MCP setups in the community: Garry Tan's "gstack" -- a collection of 13 skills that turns Claude Code into a full engineering team with planning, code review, browser-based QA, and automated shipping.
The results:
| Category | Skills | Grade | Key factor |
|---|---|---|---|
| Planning | /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation | A | Pure prompt skills, read-only |
| Code review | /review | A | Standard repo permissions |
| Shipping | /ship | A | Uses existing git/gh auth |
| Browser/QA | /browse, /qa, /qa-only, /qa-design-review | B | Localhost Chromium, well-isolated |
| Cookies | /setup-browser-cookies | B | Keychain decryption, in-memory only |
| Documentation | /document-release, /retro | A | Writes only to project docs |
11 skills at Grade A. 2 skills at Grade B. Zero at C or F.
The two B grades are the browser-related skills. They earned a B (not an A) because /setup-browser-cookies decrypts your real browser session cookies via macOS Keychain and holds them in memory for the Chromium session's lifetime. The implementation is responsible -- it requires a Keychain prompt, never writes plaintext to disk, and clears memory on shutdown -- but your production session tokens are in the agent's memory while it runs. That is a meaningful surface area to understand before enabling.
The planning and code review skills are pure prompt skills. They read your codebase and produce structured analysis. No file writes outside your project, no network calls, no credentials access. The shipping skill (/ship) calls git and gh under the hood, so it inherits your local auth setup without requesting anything extra.
This is what a security-conscious stack looks like. The gstack analysis is available in full at loaditout.ai, and you can install the whole pack with:
npx loaditout add-pack gstack
What this means for the ecosystem
The MCP ecosystem is growing fast, and that is a good thing. But growth without visibility creates risk. When an agent installs a tool, it is granting that tool access to act on the developer's behalf. The developer deserves to know what they are installing.
We think three things need to happen:
Every MCP directory should security-grade its listings. The patterns are known. The scanning is automatable. There is no reason to ship an ungraded directory.
Agents should surface security metadata before installation. Not after. Not buried in a details page. Before the install happens, in the terminal, where the developer can make an informed decision.
Server authors should publish safety manifests. Declare your data access, network access, and filesystem access upfront. The servers that do this earn more trust and more installs.
We are building Loaditout to make all of this the default experience. Every server graded. Every manifest surfaced. Every install informed.
Browse the full graded directory at loaditout.ai.
Loaditout indexes and security-grades the entire MCP ecosystem. Search 20,000+ servers, review safety manifests, and install with one command. loaditout.ai
Top comments (0)