<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Donnyb369 </title>
    <description>The latest articles on DEV Community by Donnyb369  (@donnyb369422e67b98e4b668da).</description>
    <link>https://dev.to/donnyb369422e67b98e4b668da</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3883179%2F494a4589-cb53-4f8a-a54f-7dfd8c1f33f7.png</url>
      <title>DEV Community: Donnyb369 </title>
      <link>https://dev.to/donnyb369422e67b98e4b668da</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/donnyb369422e67b98e4b668da"/>
    <language>en</language>
    <item>
      <title>MCP Spine v0.2.5: I Built a Full Middleware Stack for MCP Tool Calls</title>
      <dc:creator>Donnyb369 </dc:creator>
      <pubDate>Sat, 25 Apr 2026 21:41:14 +0000</pubDate>
      <link>https://dev.to/donnyb369422e67b98e4b668da/mcp-spine-v025-i-built-a-full-middleware-stack-for-mcp-tool-calls-49h7</link>
      <guid>https://dev.to/donnyb369422e67b98e4b668da/mcp-spine-v025-i-built-a-full-middleware-stack-for-mcp-tool-calls-49h7</guid>
      <description>&lt;p&gt;Last month I shipped MCP Spine v0.1 — a basic proxy that sat between Claude Desktop and MCP servers. It did schema minification and security basics.&lt;/p&gt;

&lt;p&gt;Since then, it's grown into a full middleware stack. Here's everything in v0.2.5 and why each piece exists.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Starting Point
&lt;/h2&gt;

&lt;p&gt;57 tools. 5 servers. Claude Desktop config file with one entry pointing to Spine. Everything routes through the proxy.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;mcp-spine
mcp-spine init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The setup wizard detects your installed servers (npx, node, Python), asks what features you want, and writes a tailored config.&lt;/p&gt;

&lt;h2&gt;
  
  
  Schema Minification: 61% Fewer Tokens
&lt;/h2&gt;

&lt;p&gt;Every tool call starts with the LLM reading tool schemas. With 57 tools, that's thousands of tokens before the conversation even begins.&lt;/p&gt;

&lt;p&gt;Spine's minifier strips &lt;code&gt;$schema&lt;/code&gt;, &lt;code&gt;additionalProperties&lt;/code&gt;, parameter descriptions, titles, and defaults — keeping only what the LLM actually needs. Level 2 cuts 61% of schema tokens with zero information loss.&lt;/p&gt;

&lt;p&gt;The web dashboard shows real-time savings:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuastb8hzc3s4hv30rph5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuastb8hzc3s4hv30rph5.png" alt="Dashboard" width="800" height="316"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  State Guard: No More Stale Edits
&lt;/h2&gt;

&lt;p&gt;In long coding sessions, Claude memorizes file contents from earlier in the conversation. Then it "edits" the old version — silently overwriting your current code.&lt;/p&gt;

&lt;p&gt;State Guard watches your project files, computes SHA-256 hashes, and injects compact version pins into every tool response. When Claude's cached version doesn't match, it knows to re-read.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prompt Injection Detection
&lt;/h2&gt;

&lt;p&gt;This one surprised me. Tool responses can contain text that looks like instructions to the LLM — "ignore previous instructions", "[SYSTEM]", or encoded payloads.&lt;/p&gt;

&lt;p&gt;Spine now scans every tool response for 8 categories of injection patterns before it reaches the model. Detections are logged as security events and can trigger webhook alerts to Slack or Discord.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# spine/injection.py detects:
# - System prompt overrides
# - Role injection ("you are now a...")
# - Instruction hijacking
# - Jailbreak attempts (DAN, developer mode)
# - Data exfiltration URLs
# - Base64-encoded payloads
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Plugin System: The Compliance Layer
&lt;/h2&gt;

&lt;p&gt;This is the feature I'm most excited about. Spine plugins are Python files that hook into the tool call pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;spine.plugins&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SpinePlugin&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SlackFilter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SpinePlugin&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;slack-filter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;deny_channels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hr-private&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exec-salary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_tool_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;slack&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;
        &lt;span class="c1"&gt;# Filter messages from denied channels
&lt;/span&gt;        &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
        &lt;span class="n"&gt;filtered&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ch&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                              &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ch&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;deny_channels&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;filtered&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Drop it in your &lt;code&gt;plugins/&lt;/code&gt; directory, enable in config, done. The LLM never sees messages from those channels.&lt;/p&gt;

&lt;p&gt;Four hook points: &lt;code&gt;on_tool_call&lt;/code&gt; (transform args or block calls), &lt;code&gt;on_tool_response&lt;/code&gt; (filter responses), &lt;code&gt;on_tool_list&lt;/code&gt; (hide tools), and lifecycle hooks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Web Dashboard
&lt;/h2&gt;

&lt;p&gt;Zero-dependency browser dashboard at &lt;code&gt;localhost:8777&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;mcp-spine web &lt;span class="nt"&gt;--db&lt;/span&gt; spine_audit.db
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Shows tool calls, security events, token budget usage, schema token savings, server latency, request log, and client sessions. Auto-refreshes every 3 seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool Response Caching
&lt;/h2&gt;

&lt;p&gt;Read-only tools like &lt;code&gt;read_file&lt;/code&gt; and &lt;code&gt;list_directory&lt;/code&gt; often get called with the same arguments multiple times in a conversation. Spine now caches these responses:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[tool_cache]&lt;/span&gt;
&lt;span class="py"&gt;enabled&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;cacheable_tools&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"read_file"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"read_query"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"list_directory"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;ttl_seconds&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cache hits skip the downstream server call entirely. LRU eviction with TTL expiration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Everything Else in v0.2.5
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Token budget&lt;/strong&gt;: daily limits, per-server limits, warn/block actions, persistent tracking, &lt;code&gt;spine_budget&lt;/code&gt; meta-tool&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool aliasing&lt;/strong&gt;: &lt;code&gt;create_or_update_file&lt;/code&gt; → &lt;code&gt;edit_github_file&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Config hot-reload&lt;/strong&gt;: edit config while running, changes apply in seconds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Webhook notifications&lt;/strong&gt;: Slack/Discord/JSON alerts on security events&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-user audit&lt;/strong&gt;: session-tagged entries, &lt;code&gt;mcp-spine audit --sessions&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Analytics export&lt;/strong&gt;: CSV/JSON with time and event filtering&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streamable HTTP&lt;/strong&gt;: MCP 2025-03-26 transport support&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interactive wizard&lt;/strong&gt;: &lt;code&gt;mcp-spine init&lt;/code&gt; detects your setup&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency monitoring&lt;/strong&gt;: per-server tracking with degradation alerts&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;20 source files&lt;/li&gt;
&lt;li&gt;190+ tests&lt;/li&gt;
&lt;li&gt;CI on Windows + Linux, Python 3.11-3.13&lt;/li&gt;
&lt;li&gt;AAA score on Glama&lt;/li&gt;
&lt;li&gt;Approved on mcpservers.org&lt;/li&gt;
&lt;li&gt;MIT licensed&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;mcp-spine
mcp-spine init
mcp-spine doctor &lt;span class="nt"&gt;--config&lt;/span&gt; spine.toml
mcp-spine serve &lt;span class="nt"&gt;--config&lt;/span&gt; spine.toml
mcp-spine web &lt;span class="nt"&gt;--db&lt;/span&gt; spine_audit.db
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub: &lt;a href="https://github.com/Donnyb369/mcp-spine" rel="noopener noreferrer"&gt;https://github.com/Donnyb369/mcp-spine&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What would you build with a plugin system for MCP tool calls?&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>python</category>
      <category>ai</category>
      <category>security</category>
    </item>
    <item>
      <title>I Built the Middleware Layer MCP is Missing</title>
      <dc:creator>Donnyb369 </dc:creator>
      <pubDate>Sun, 19 Apr 2026 15:02:52 +0000</pubDate>
      <link>https://dev.to/donnyb369422e67b98e4b668da/i-built-the-middleware-layer-mcp-is-missing-eo</link>
      <guid>https://dev.to/donnyb369422e67b98e4b668da/i-built-the-middleware-layer-mcp-is-missing-eo</guid>
      <description>&lt;p&gt;Every MCP tutorial shows the same thing: connect Claude to your filesystem, your database, your GitHub. Five servers, 57 tools, infinite power.&lt;/p&gt;

&lt;p&gt;Nobody talks about what happens next.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problems Nobody Mentions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Token waste.&lt;/strong&gt; With 40+ tools loaded, you're burning thousands of tokens on JSON schemas every turn. Before Claude even reads your question, it's consumed half its context window on tool definitions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context rot.&lt;/strong&gt; In long coding sessions, Claude memorizes file contents from earlier in the conversation. Then it edits the old version — silently overwriting your latest changes. You don't notice until the code breaks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zero security boundary.&lt;/strong&gt; MCP servers run with full access. No audit trail. No rate limits. No secret scrubbing. Your GitHub token shows up in logs. There's nothing between the LLM and your tools.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No compliance layer.&lt;/strong&gt; Claude wants to read Slack? Hope you're okay with it seeing your DMs with your boss. There's no way to filter what reaches the model.&lt;/p&gt;

&lt;h2&gt;
  
  
  MCP Spine: One Proxy, Full Control
&lt;/h2&gt;

&lt;p&gt;I built &lt;a href="https://github.com/Donnyb369/mcp-spine" rel="noopener noreferrer"&gt;MCP Spine&lt;/a&gt; — a local-first middleware proxy that sits between your LLM client and your MCP servers. One config file, one entry point in &lt;code&gt;claude_desktop_config.json&lt;/code&gt;, and everything routes through it.&lt;/p&gt;

&lt;p&gt;Here's what it does:&lt;/p&gt;

&lt;h3&gt;
  
  
  61% Token Savings
&lt;/h3&gt;

&lt;p&gt;The schema minifier strips unnecessary fields from tool definitions — &lt;code&gt;$schema&lt;/code&gt;, &lt;code&gt;additionalProperties&lt;/code&gt;, verbose descriptions, defaults. Level 2 cuts token usage by 61% with zero information loss.&lt;/p&gt;

&lt;h3&gt;
  
  
  State Guard Stops Context Rot
&lt;/h3&gt;

&lt;p&gt;Spine watches your project files, tracks SHA-256 hashes, and injects version pins into every tool response. When Claude has a stale cached version, the pin tells it to re-read. Context rot solved.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security That Actually Works
&lt;/h3&gt;

&lt;p&gt;Rate limiting (per-tool and global), path traversal jails, secret scrubbing (AWS keys, GitHub tokens, private keys), HMAC-fingerprinted audit trails, and circuit breakers on failing servers. Defense-in-depth — every layer assumes the others might fail.&lt;/p&gt;

&lt;h3&gt;
  
  
  Plugin System for Compliance
&lt;/h3&gt;

&lt;p&gt;Drop-in Python plugins hook into the tool call pipeline. The included Slack filter example strips messages from sensitive channels before the LLM ever sees them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;spine.plugins&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SpinePlugin&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SlackFilter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SpinePlugin&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;slack-filter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;deny_channels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hr-private&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exec-salary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_tool_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;slack&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;
        &lt;span class="c1"&gt;# Filter out messages from denied channels
&lt;/span&gt;        &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Everything Else
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Semantic routing&lt;/strong&gt; with local embeddings (no API calls) — only relevant tools reach the LLM&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human-in-the-loop&lt;/strong&gt; confirmation for destructive tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token budget&lt;/strong&gt; tracking with daily limits and warn/block enforcement&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Config hot-reload&lt;/strong&gt; — edit your config while Spine is running&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-user audit&lt;/strong&gt; with session-tagged entries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Three transports&lt;/strong&gt;: stdio, SSE, and Streamable HTTP (MCP 2025-03-26)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interactive setup wizard&lt;/strong&gt; (&lt;code&gt;mcp-spine init&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;mcp-spine
mcp-spine init
mcp-spine doctor &lt;span class="nt"&gt;--config&lt;/span&gt; spine.toml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then add one entry to your &lt;code&gt;claude_desktop_config.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"spine"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"python"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-m"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"spine.cli"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"serve"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"--config"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/path/to/spine.toml"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Battle-Tested on Windows
&lt;/h2&gt;

&lt;p&gt;Most MCP tooling assumes macOS. Spine is battle-tested on Windows with MSIX sandbox paths, &lt;code&gt;npx.cmd&lt;/code&gt; resolution, paths with spaces and parentheses, environment variable merging, and unbuffered stdout to prevent pipe hangs. It also runs on macOS and Linux.&lt;/p&gt;

&lt;p&gt;190+ tests, CI on Windows + Linux across Python 3.11-3.13.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/Donnyb369/mcp-spine" rel="noopener noreferrer"&gt;https://github.com/Donnyb369/mcp-spine&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;PyPI: &lt;code&gt;pip install mcp-spine&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Glama: AAA score&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;What security or compliance problems are you running into with MCP? I'd love to hear what features would be most useful.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>python</category>
      <category>ai</category>
      <category>security</category>
    </item>
    <item>
      <title>I routed 60 MCP tools through a single proxy — here's what I learned about token waste and security</title>
      <dc:creator>Donnyb369 </dc:creator>
      <pubDate>Thu, 16 Apr 2026 21:05:54 +0000</pubDate>
      <link>https://dev.to/donnyb369422e67b98e4b668da/i-routed-60-mcp-tools-through-a-single-proxy-heres-what-i-learned-about-token-waste-and-security-2mej</link>
      <guid>https://dev.to/donnyb369422e67b98e4b668da/i-routed-60-mcp-tools-through-a-single-proxy-heres-what-i-learned-about-token-waste-and-security-2mej</guid>
      <description>&lt;p&gt;I've been building MCP servers for Claude Desktop for a few months now. At one point I had five servers running: filesystem, GitHub, SQLite, a knowledge graph, and Brave Search. Sixty tools total, all piped into one LLM.&lt;/p&gt;

&lt;p&gt;It worked. But three things kept going wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  The token problem
&lt;/h2&gt;

&lt;p&gt;Every time Claude makes a tool call, it sends the full schema of every available tool in the context window. Sixty tools means sixty JSON schema definitions, every single request. I measured it: &lt;strong&gt;over 4,800 tokens of schema overhead per request&lt;/strong&gt;, before Claude even starts thinking about your question.&lt;/p&gt;

&lt;p&gt;That's money. At API rates, those wasted tokens add up fast across a workday of tool calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  The security problem
&lt;/h2&gt;

&lt;p&gt;I found out the hard way that my &lt;code&gt;claude_desktop_config.json&lt;/code&gt; was passing environment variables to child processes — and a bug in how I was merging env vars meant the entire system PATH, including tokens and API keys, was getting passed through. One of my GitHub tokens ended up in a log file. Twice.&lt;/p&gt;

&lt;p&gt;MCP servers run as child processes with whatever permissions your user account has. There's no audit trail, no rate limiting, no secret scrubbing. If a tool call returns sensitive data, it goes straight into the LLM context with no filtering.&lt;/p&gt;

&lt;h2&gt;
  
  
  The context rot problem
&lt;/h2&gt;

&lt;p&gt;Claude would read a file, modify it three tool calls later, then reference the stale version from its context. The file had changed on disk but Claude was still working with the old content. I called this "context rot" — the LLM's view of the world drifts from reality over a long session.&lt;/p&gt;

&lt;h2&gt;
  
  
  So I built a proxy
&lt;/h2&gt;

&lt;p&gt;MCP Spine sits between Claude Desktop and all your MCP servers. One proxy, one connection, all traffic flows through it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude Desktop ◄──stdio──► MCP Spine ◄──stdio──► filesystem
                                      ◄──stdio──► GitHub
                                      ◄──stdio──► SQLite
                                      ◄──stdio──► memory
                                      ◄──stdio──► Brave Search
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's what it does at each layer:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security proxy&lt;/strong&gt; — validates every JSON-RPC message, scrubs secrets from tool outputs (AWS keys, GitHub tokens, bearer tokens, private keys, connection strings), rate limits tool calls, blocks command injection and path traversal, and writes an HMAC-fingerprinted audit trail to SQLite.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Schema minifier&lt;/strong&gt; — strips verbose descriptions, defaults, and metadata from tool schemas before they reach the LLM. The type information and required fields stay intact. Real measured savings on 12 representative tools:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Level&lt;/th&gt;
&lt;th&gt;Savings&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0 (off)&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1 (light)&lt;/td&gt;
&lt;td&gt;11%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2 (default)&lt;/td&gt;
&lt;td&gt;32%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The best individual tool (&lt;code&gt;read_file&lt;/code&gt;) went from 586 characters down to 242 — a 59% reduction. The savings compound: with 60 tools, Level 2 saves roughly 1,500 tokens per request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;State guard&lt;/strong&gt; — watches files on disk with SHA-256 hashes. When Claude references a file that's changed since it last read it, Spine injects a version pin into the response: "this file has changed since you last saw it." No more context rot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Semantic router&lt;/strong&gt; — uses local embeddings (ChromaDB + MiniLM) to figure out which tools are relevant to the current task. Instead of showing all 60 tools, it shows the 5-10 that matter. This is optional and currently experimental — the ML dependencies add startup time, so I made them lazy-loading.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I learned building it
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Environment variable handling is a minefield.&lt;/strong&gt; The biggest bug I hit was &lt;code&gt;env=self.config.env or None&lt;/code&gt; in the subprocess spawn. When a server config had custom env vars (like &lt;code&gt;GITHUB_TOKEN&lt;/code&gt;), this replaced the entire process environment instead of extending it. Every server that needed a custom env var was silently missing &lt;code&gt;PATH&lt;/code&gt;, &lt;code&gt;HOME&lt;/code&gt;, and everything else. The fix was one line: &lt;code&gt;{**os.environ, **self.config.env}&lt;/code&gt;. But it took hours to diagnose because the error messages were about missing executables, not missing env vars.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Windows is a different world.&lt;/strong&gt; Python's asyncio on Windows uses a Proactor event loop that can't do &lt;code&gt;connect_read_pipe&lt;/code&gt; / &lt;code&gt;connect_write_pipe&lt;/code&gt; on stdio handles from piped processes. The workaround is raw binary I/O with &lt;code&gt;run_in_executor&lt;/code&gt; for reads. I also had to handle paths with spaces and parentheses (my project lives in &lt;code&gt;MCP (The Spine)&lt;/code&gt;), UNC paths, and the MSIX sandbox that Claude Desktop runs in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;npx is slow, node is fast.&lt;/strong&gt; Spawning MCP servers via &lt;code&gt;npx @modelcontextprotocol/server-github&lt;/code&gt; takes 10-15 seconds because npx checks for updates every time. Switching to &lt;code&gt;node C:\path\to\node_modules\...\dist\index.js&lt;/code&gt; connects in under a second. This matters because MCP clients have handshake timeouts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Thread safety in audit logging is easy to get wrong.&lt;/strong&gt; The semantic router runs a background thread for model loading. That thread calls the audit logger, which tries to use a SQLite connection created in the main thread. SQLite doesn't allow cross-thread connection sharing. Fix: &lt;code&gt;check_same_thread=False&lt;/code&gt; plus a &lt;code&gt;threading.Lock()&lt;/code&gt; around all DB operations.&lt;/p&gt;

&lt;h2&gt;
  
  
  The numbers
&lt;/h2&gt;

&lt;p&gt;Running on Windows with Python 3.14 and Claude Desktop:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;6 MCP servers connected through one proxy&lt;/li&gt;
&lt;li&gt;60 tools total, routed and minified&lt;/li&gt;
&lt;li&gt;32% average schema token savings (up to 59% on verbose tools)&lt;/li&gt;
&lt;li&gt;135+ tests, CI green on Windows + Linux&lt;/li&gt;
&lt;li&gt;Sub-second server connections (with node direct path)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;mcp-spine
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Configure your servers in a TOML file, point Claude Desktop at Spine, and all your MCP traffic gets security hardening, token savings, and an audit trail.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/Donnyb369/mcp-spine" rel="noopener noreferrer"&gt;github.com/Donnyb369/mcp-spine&lt;/a&gt;&lt;br&gt;
PyPI: &lt;a href="https://pypi.org/project/mcp-spine" rel="noopener noreferrer"&gt;pypi.org/project/mcp-spine&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's open source, local-first, and works on Windows and Linux. No cloud, no accounts, no telemetry.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm an independent developer building open-source MCP tooling. If you're using MCP servers with Claude Desktop or any other LLM client, I'd love to hear what problems you're hitting. Drop a comment or open an issue on GitHub.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claude</category>
      <category>llm</category>
      <category>mcp</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
