Auditing an MCP Server Against the OWASP MCP Top 10

#security #ai #mcp #owasp

Auditing an MCP Server Against the OWASP MCP Top 10

The OWASP MCP Top 10 is now the taxonomy people reach for when they talk about MCP risk. It is the framework a security team will bring into a procurement conversation, and the one practitioners increasingly cite by number. It is still a beta — Phase 3, under active community revision — but the categories are stable enough to design an audit around.

So here is the practical question. You operate an MCP server. Someone hands you the Top 10 and asks how you stand against it. What does an audit actually check, category by category?

Eight of the ten are testable against a running server. The remaining two are not really about a single server at all — they live in the build pipeline and in org-level governance. That split is most of the work, so it is how the rest of this is organized.

A note on what "testable" means here. An audit from the network sees the deployed endpoint, with its real auth, its real TLS, its real manifest — not a config file on a developer's laptop. That vantage point decides which risks an audit can reach. It is the difference between reading what a server claims and observing what it does.

The eight an audit covers

MCP01 — Token Mismanagement and Secret Exposure. This is the risk that credentials leak through the protocol surface: tokens in URLs, secrets echoed in tool output, access tokens with no sane expiry, credentials written into log notifications. Every one of those is observable. An audit sends real requests and reads what comes back — whether a token survives in a response body, whether the access-token lifetime is bounded, whether an error path quietly returns the credential it just rejected.

MCP06 — Prompt Injection via Contextual Payloads. MCP turns tool descriptions, prompt templates, and resource content into a second instruction channel pointed at the model. A description that contains an injection string, an invisible Unicode payload, or a known jailbreak phrase is a finding you can read straight off the manifest — no execution required. This is the part of the framework that maps most cleanly onto passive inspection, because the attack lives in text the server already serves.

MCP07 — Insufficient Authentication and Authorization. The largest testable surface, and the one with the least ambiguity. Does an unauthenticated caller get rejected? Do administrative tools require identity? Is PKCE enforced on the authorization code flow? Are expired, invalid, and revoked bearer tokens actually turned away, or does any non-empty token open the door? These are concrete request-response facts. A server either rejects the malformed call or it doesn't.

MCP02 — Privilege Escalation via Scope Creep. Scope creep is harder to catch than missing auth, because it is about permissions that are technically granted but quietly too broad. The audit examines the OAuth contract: whether granted scopes match what was requested, whether resource indicators are enforced, whether the scopes a server advertises are the scopes it honors. That catches the server that hands back more than it was asked for.

MCP05 — Command Injection and Execution. A scanner can't see your shell. What it can see is whether the server validates the inputs that would feed one. Does it reject undeclared properties, enforce declared types, hold string and array bounds, refuse path traversal in resource URIs? Input validation is the observable proxy for injection resistance, and a server that enforces its declared schema closes the door that most injection walks through.

MCP10 — Context Injection and Over-Sharing. When context is shared across sessions or agents, one task's data can surface in another's. The audit probes isolation between authorization contexts: whether task records leak across identities, whether task IDs are guessable on a no-auth deployment, whether an elicitation form quietly collects more than it should. These are boundaries you can test by holding two identities and checking what one can see of the other.

MCP03 — Tool Poisoning. Tool poisoning is an adversary corrupting the tools, interface definitions, or outputs a model depends on. It has three observable forms, and the audit reads all three. Schema poisoning and malicious descriptions show up in the manifest on the first read — the audit checks descriptions, parameter definitions, and namespaces the way the model will, before an agent acts on them. Rug pulls — a trusted tool that changes its description after you approved it — are caught by fingerprinting each tool on first sight and re-verifying that fingerprint on every reconnect, so drift becomes a finding rather than a surprise. Tool shadowing, where one server impersonates another's tools, surfaces when the audit compares tool and namespace claims across the set of servers an agent can reach.

MCP08 — Lack of Audit and Telemetry. The framework calls for disciplined logging of tool invocations and context changes. From the network, the audit checks the hygiene of that telemetry: that log notifications don't carry credentials, that log levels come from the standard set, that the logging surface follows the spec rather than leaking internal state. Good telemetry that quietly exfiltrates secrets is its own finding, and this is where it surfaces.

The two that live at a different layer

MCP04 — Software Supply Chain and Dependency Tampering. This category is about signed components, dependency provenance, and catching a tampered package before it ships. It is a build-pipeline and SBOM concern by nature — the integrity question is settled at build time, in your CI and artifact signing, not at the running endpoint. A network audit flags adjacent signals like disclosed framework and runtime versions; the provenance work belongs upstream where the dependency tree actually lives.

MCP09 — Shadow MCP Servers. The risk here is the server nobody told the security team about — spun up for a demo, running default credentials, never inventoried. This is a discovery and governance problem: by definition it is about the servers an organization doesn't yet know it has. The lever that helps most is making the audit of known internal servers cheap enough to run everywhere, so bringing a server into governance stops being a chore.

What to take from this

Most of the OWASP MCP Top 10 is testable against a running server, and the testable part skews toward the risks that have produced real CVEs this year — auth gaps, token exposure, injection through tool descriptions. That is the part an audit gives you a clean, reproducible answer on.

The remaining two are worth naming precisely. Supply-chain integrity is settled in your build pipeline; shadow servers are a discovery problem that sits above any one endpoint. The framework is most useful when you know which layer each of its risks belongs to, and which tool in your stack owns it.