The Security Gap in MCP Tool Servers (And What I Built to Fix It)

#security #mcp #ai #python

MCP has no security model. I built Heddle — a policy-and-trust layer that turns YAML configs into validated, policy-enforced MCP tool servers.

MCP (Model Context Protocol) is how AI agents connect to tools. Claude Desktop uses it, Cursor uses it, and thousands of developers are building MCP servers to give AI access to their APIs, databases, and infrastructure.

There's one problem: MCP has no security model.

The protocol defines how a client talks to a server, but says nothing about what that server is allowed to do. No authentication between client and server. No authorization on which tools can be called. No audit trail of what happened. The spec assumes you'll handle all of that yourself.

Most people don't.

What Actually Goes Wrong

I run a self-hosted server with Prometheus, Grafana, Ollama, Gitea, and a handful of other services. I wanted Claude Desktop to query all of them through MCP. The standard approach is to write a Python FastMCP server for each one — a few dozen lines per service, hardcode the API key, register the tools, done.

That works until you think about what you've actually built:

Every MCP server has full access to whatever its process can reach. Your Prometheus tool can also hit your Grafana API, your Gitea API, and anything else on localhost. There's no scoping.

API keys live in environment variables or config files. If you have 9 MCP servers, you have 9 places where credentials sit in plaintext with no access policy.

Nothing is logged. If Claude calls a tool that restarts a service or deletes data, there's no record of which tool was called, with what parameters, by which agent, at what time.

There's no concept of read-only vs. write. A tool either exists or it doesn't. MCP doesn't know that query_prometheus is safe to call freely but restart_service should require approval.

Tool composition creates emergent risks. When Claude has access to multiple MCP servers, it can chain calls across them. Server A reads sensitive data, Server B posts to an external API — Claude could combine them in ways neither server was designed for.

These aren't theoretical risks. During development, I declared an agent as read-only (Trust Tier 1) but gave it a tool that used HTTP POST. The system I built caught it — blocked the call, logged a trust violation, and forced me to either fix the config or explicitly upgrade the trust level. Without that enforcement, the tool would have silently worked and I'd never have known my security model was wrong.

What I Built

Heddle is a runtime that sits between your YAML config and the MCP protocol. You define your tools in a config file, and Heddle validates, secures, and serves them — with policy enforcement on every call.

Here's a complete tool server for Prometheus:

agent:
  name: prometheus-bridge
  version: "1.0.0"
  exposes:
    - name: query_prometheus
      access: read
      description: "Run a PromQL query"
      parameters:
        query: { type: string, required: true }
    - name: get_alerts
      access: read
      description: "List active Prometheus alerts"
  http_bridge:
    - tool_name: query_prometheus
      method: GET
      url: "http://localhost:9090/api/v1/query"
      query_params: { query: query }
    - tool_name: get_alerts
      method: GET
      url: "http://localhost:9090/api/v1/alerts"
  runtime:
    trust_tier: 1

Run heddle run agents/prometheus-bridge.yaml and Claude can query Prometheus in natural language. But every call goes through a six-layer dispatch pipeline before it reaches the API:

Rate limiting → Access mode check → Escalation rules → Input validation → Trust tier enforcement → HTTP bridge execution

Each layer can independently block the call and log why.

The Security Controls

The dispatch pipeline enforces these controls on every tool call:

Trust Tiers (T1–T4). Each config declares a trust level. T1 (observer) can only use GET — any POST/PUT/DELETE is blocked at runtime, not just warned. T2 (worker) allows scoped writes. T3 (operator) allows cross-agent invocation. T4 (privileged) requires human approval. I caught a real misconfiguration with this — a T1 agent tried to POST and the enforcer blocked it before the request ever left the process.

Access Mode Annotations. Every tool is declared as access: read or access: write. T1 configs with write tools are rejected at load time — before the server even starts. This is the schema-level version of least privilege.

Credential Broker. API keys are stored in ~/.heddle/secrets.json with per-config access policies. Configs reference them as {{secret:prometheus-token}} — resolved at runtime, never written to the YAML file. A config can only access secrets it's been explicitly granted. Unauthorized access is denied, logged, and returns a placeholder instead of the real value.

Escalation Rules. Declarative conditions that hold a tool call for review instead of executing it. For example, my VRAM orchestrator has a rule that holds any smart_load call if the model name contains "27b" — because loading a 27-billion parameter model consumes most of my 24GB GPU memory. The rule triggers, the call is held, and the audit log records why.

escalation_rules:
  - name: large-model-load
    reason: "Loading a model that will consume most of the 24GB VRAM"
    tool: "smart_load"
    param_contains:
      model_name: "27b"

Input Validation. Type checking, length limits, and injection pattern detection on every parameter. The validator catches shell injection (; rm -rf /), SQL injection (' OR 1=1), path traversal (../../etc/passwd), and LLM prompt injection (ignore previous instructions). In strict mode, these are blocked. In permissive mode, they're logged and passed through.

Hash-Chained Audit Log. Every tool call, trust violation, credential access, and escalation hold is logged as a JSON Lines entry. Each entry includes a SHA-256 hash of the previous entry — if anyone modifies or deletes a log entry, the chain breaks and verification fails.

Config Signing. All YAML configs are signed with HMAC-SHA256. If a config is modified after signing, the runtime detects the tampering. AI-generated configs (from Heddle's natural language generator) are automatically quarantined in a staging directory until explicitly promoted.

What It Looks Like Running

I'm currently running 46 tools from 9 configs through a single MCP connection to Claude Desktop. The configs cover Prometheus, Grafana, Ollama, Gitea, an RSS aggregator, a RAG search API, a GPU VRAM orchestrator, and a daily operations briefing agent.

Every one of those 46 tools goes through the same dispatch pipeline. The Prometheus tools are T1 (read-only, 5 tools). The Ollama bridge is T2 (can POST for text generation). The VRAM orchestrator is T3 (can invoke other agents, has escalation rules on destructive operations).

The trust tiers aren't just labels — they're enforced. A T1 config physically cannot make a POST request, even if the HTTP bridge URL is correct and the API would accept it. The enforcer blocks it before the request is constructed.

Framework Mapping

Every security control maps to at least one industry framework. This matters if you're in an organization that needs to demonstrate compliance, or if you're building a portfolio that shows applied security architecture (which is why I built this):

Control	OWASP Agentic Top 10	NIST AI RMF
Trust tiers	#3 Excessive Agency	GV-1.3
Credential broker	#7 Unsafe Credential Mgmt	MAP-3.4
Audit logging	#9 Insufficient Logging	MS-2.6
Input validation	#1 Prompt Injection	MS-2.5
Config signing	#8 Supply Chain	GV-6.1
Escalation rules	#3 Excessive Agency	GV-1.3

The full threat model with 8 threat categories is in the repo at docs/threat-model.md.

Getting Started

git clone https://github.com/goweft/heddle.git
cd heddle
python -m venv venv && source venv/bin/activate
pip install -e ".[dev]"

# Try a starter pack
cp packs/prometheus.yaml agents/
heddle validate agents/prometheus.yaml
heddle run agents/prometheus.yaml --port 8200

Heddle ships with 6 starter packs — Prometheus, Grafana, Gitea/GitHub, Ollama, Sonarr, and Radarr — that you can drop into agents/ and run immediately. All read-only (T1) except Ollama (T2 for text generation).

Or generate a config from natural language:

heddle generate "agent that wraps the Home Assistant API" --model qwen3:14b

Works with Claude Desktop, Cursor, and any MCP client that supports stdio transport.

Heddle is open source (MIT) at github.com/goweft/heddle. 126 tests, 15 security controls, and a threat model mapped to OWASP Agentic Top 10 and NIST AI RMF. If you're exposing APIs to AI agents, I'd like to know what security controls you wish existed.