Donnyb369

Posted on Apr 25

MCP Spine v0.2.5: I Built a Full Middleware Stack for MCP Tool Calls

#mcp #python #security #ai

Last month I shipped MCP Spine v0.1 — a basic proxy that sat between Claude Desktop and MCP servers. It did schema minification and security basics.

Since then, it's grown into a full middleware stack. Here's everything in v0.2.5 and why each piece exists.

The Starting Point

57 tools. 5 servers. Claude Desktop config file with one entry pointing to Spine. Everything routes through the proxy.

pip install mcp-spine
mcp-spine init

The setup wizard detects your installed servers (npx, node, Python), asks what features you want, and writes a tailored config.

Schema Minification: 61% Fewer Tokens

Every tool call starts with the LLM reading tool schemas. With 57 tools, that's thousands of tokens before the conversation even begins.

Spine's minifier strips $schema, additionalProperties, parameter descriptions, titles, and defaults — keeping only what the LLM actually needs. Level 2 cuts 61% of schema tokens with zero information loss.

The web dashboard shows real-time savings:

State Guard: No More Stale Edits

In long coding sessions, Claude memorizes file contents from earlier in the conversation. Then it "edits" the old version — silently overwriting your current code.

State Guard watches your project files, computes SHA-256 hashes, and injects compact version pins into every tool response. When Claude's cached version doesn't match, it knows to re-read.

Prompt Injection Detection

This one surprised me. Tool responses can contain text that looks like instructions to the LLM — "ignore previous instructions", "[SYSTEM]", or encoded payloads.

Spine now scans every tool response for 8 categories of injection patterns before it reaches the model. Detections are logged as security events and can trigger webhook alerts to Slack or Discord.

# spine/injection.py detects:
# - System prompt overrides
# - Role injection ("you are now a...")
# - Instruction hijacking
# - Jailbreak attempts (DAN, developer mode)
# - Data exfiltration URLs
# - Base64-encoded payloads

Plugin System: The Compliance Layer

This is the feature I'm most excited about. Spine plugins are Python files that hook into the tool call pipeline:

from spine.plugins import SpinePlugin

class SlackFilter(SpinePlugin):
    name = "slack-filter"
    deny_channels = ["hr-private", "exec-salary"]

    def on_tool_response(self, tool_name, arguments, response):
        if "slack" not in tool_name:
            return response
        # Filter messages from denied channels
        content = response.get("content", [])
        filtered = [b for b in content
                    if not any(ch in b.get("text", "").lower()
                              for ch in self.deny_channels)]
        return {**response, "content": filtered}

Drop it in your plugins/ directory, enable in config, done. The LLM never sees messages from those channels.

Four hook points: on_tool_call (transform args or block calls), on_tool_response (filter responses), on_tool_list (hide tools), and lifecycle hooks.

Web Dashboard

Zero-dependency browser dashboard at localhost:8777:

mcp-spine web --db spine_audit.db

Shows tool calls, security events, token budget usage, schema token savings, server latency, request log, and client sessions. Auto-refreshes every 3 seconds.

Tool Response Caching

Read-only tools like read_file and list_directory often get called with the same arguments multiple times in a conversation. Spine now caches these responses:

[tool_cache]
enabled = true
cacheable_tools = ["read_file", "read_query", "list_directory"]
ttl_seconds = 300

Cache hits skip the downstream server call entirely. LRU eviction with TTL expiration.

Everything Else in v0.2.5

Token budget: daily limits, per-server limits, warn/block actions, persistent tracking, spine_budget meta-tool
Tool aliasing: create_or_update_file → edit_github_file
Config hot-reload: edit config while running, changes apply in seconds
Webhook notifications: Slack/Discord/JSON alerts on security events
Multi-user audit: session-tagged entries, mcp-spine audit --sessions
Analytics export: CSV/JSON with time and event filtering
Streamable HTTP: MCP 2025-03-26 transport support
Interactive wizard: mcp-spine init detects your setup
Latency monitoring: per-server tracking with degradation alerts

The Numbers

20 source files
190+ tests
CI on Windows + Linux, Python 3.11-3.13
AAA score on Glama
Approved on mcpservers.org
MIT licensed

Try It

pip install mcp-spine
mcp-spine init
mcp-spine doctor --config spine.toml
mcp-spine serve --config spine.toml
mcp-spine web --db spine_audit.db

GitHub: https://github.com/Donnyb369/mcp-spine

What would you build with a plugin system for MCP tool calls?

Top comments (6)

Ken W Alger • Apr 27

Building a middleware stack for MCP tool calls is exactly what the ecosystem needs right now to move past the 'discovery bottleneck.' I’m a big fan of the 'Spine' metaphor—we need a central nervous system to handle things like rate-limiting and context-shaping before the LLM even sees the tool output.

In my own work on the 'Sovereign Synapse,' I’ve been looking at similar 'Thin Proxy' architectures to prevent context rot. Curious—how are you handling the latency overhead as the middleware stack grows? Looking forward to following the progress on v0.3.

ArkForge • Apr 29

Prompt injection detection at the proxy layer (your spine/injection.py approach) solves the input side. The gap that stays open: once a tool call passes through and an action gets executed, there is no immutable record of what the model received, what it decided, and what actually ran. State Guard with SHA-256 is a good step toward version integrity - curious whether you log those state transitions in a way that survives outside the local session. We ran into exactly this audit-trail problem building Trust Layer for multi-agent workflows (arkforge.tech), where proving what an agent did post-hoc matters as much as filtering what it sees pre-execution.

ArkForge • Apr 30

The State Guard approach works well for single-agent sessions, but there's a TOCTOU gap worth flagging: the SHA-256 is computed at tool-response time and injected as a version pin, but in multi-agent or concurrent-session setups, another process can modify the file between when State Guard hashes it and when the LLM acts on that pin. The LLM sees a "consistent" hash that was already stale before the write arrived. A simple mitigation is including a monotonic read timestamp alongside the hash so the write tool can reject not just content mismatches but also pins that are older than a configurable threshold.

Mykola Kondratiuk • May 3

the middleware approach is elegant until you’re debugging a tool call that fails silently three layers deep. 61% token reduction is real but every proxy adds a failure mode that’s harder to trace in production.

Laura Ashaley • Apr 27

Solid systems work adding a middleware layer for tool calls improves control, observability, and reliability in complex agent pipelines. That’s exactly where scalable AI tooling starts to matter.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.