What mcpwall Does and Doesn't Protect Against

#ai #security #mcp #opensource

What mcpwall Does and Doesn't Protect Against

Security tools that hide their limitations aren't security tools.

I published mcpwall's full threat model. Here's the summary: what's covered, what isn't, and what's next.

Where mcpwall sits

mcpwall is a transparent stdio proxy between your AI coding tool and the MCP server. Every JSON-RPC message from the client passes through the policy engine. Rules are YAML, evaluated top-to-bottom, first match wins.

The key word is request firewall. In v0.1.x, mcpwall inspects what your AI agent asks to do. It does not yet inspect what the server sends back.

Inbound (inspected):   Claude Code → mcpwall → MCP Server
Outbound (logged only): Claude Code ← mcpwall ← MCP Server

8 attack classes blocked out of the box

No configuration needed. These default rules apply automatically and scan every argument value recursively:

SSH key theft — blocks .ssh/, id_rsa, id_ed25519, id_ecdsa in any argument
.env file access — blocks .env and all variants
Credential files — AWS credentials, .npmrc, Docker config, kube config, .gnupg
Browser data — Chrome, Firefox, Safari profiles, cookies, login data
Destructive commands — rm -rf, mkfs, dd if=, format C:
Pipe-to-shell — curl/wget/fetch piped to bash/sh/python/node
Reverse shells — netcat, /dev/tcp/, bash -i, mkfifo, socat
Secret / API key leakage — 10 patterns (AWS, GitHub, OpenAI, Stripe, etc.) + Shannon entropy threshold

Plus: JSON-RPC batch bypass fixed, ReDoS mitigation, symlink resolution, crash protection.

Known limitations

These are attack classes that mcpwall v0.1.x does not mitigate. We're publishing them because hiding limitations is worse than having them.

High severity

Response-side attacks — Server responses forwarded unfiltered. A compromised server can return secrets in tool results. Planned: v0.2.0.
Base64 / URL encoding bypass — Rules match literal strings only. Encoded secrets or commands pass through.
Rate limiting / DoS — No throttling on tool call volume. Planned: v0.4.0.

Medium severity

Tool description poisoning / rug pulls — mcpwall doesn't inspect tool metadata. A server can change descriptions after trust is established. Planned: v0.3.0.
Prompt injection — Can't detect semantic LLM manipulation. Sees the resulting tool call, not the manipulation — but may still catch the dangerous arguments.
Shell metacharacter bypass — Pipes caught; semicolons, &&, backticks, $() not covered by default rules.
Unicode / DNS exfiltration / env leakage — Out of scope for v0.1.x.

Low severity

Config tampering (TOCTOU), log integrity, timing side-channels, deep nesting stack overflow.

Defense in depth

mcpwall is one layer, not the whole stack:

Install-time scanning — Tools like mcp-scan check tool descriptions before you use a server.
Runtime firewall (mcpwall) — Enforces policy on every tool call as it happens.
Container isolation — Limits blast radius if a server is compromised.

What's next

Version	Feature
v0.2.0	Response inspection — scan server responses for secrets
v0.3.0	Tool integrity / rug pull detection
v0.3-4	HTTP/SSE proxy mode — support remote MCP servers
v0.4.0	Rate limiting

Read the full threat model

The complete reference includes component-by-component analysis, all default rule details, severity ratings, trust boundary diagrams, and the full list of assumptions.

mcpwall.dev/threat-model

GitHub: github.com/behrensd/mcp-firewall
npm: npmjs.com/package/mcpwall
Website: mcpwall.dev

Originally published at mcpwall.dev/blog/mcpwall-threat-model.