<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Donnyb369 </title>
    <description>The latest articles on DEV Community by Donnyb369  (@donnyb369422e67b98e4b668da).</description>
    <link>https://dev.to/donnyb369422e67b98e4b668da</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3883179%2F494a4589-cb53-4f8a-a54f-7dfd8c1f33f7.png</url>
      <title>DEV Community: Donnyb369 </title>
      <link>https://dev.to/donnyb369422e67b98e4b668da</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/donnyb369422e67b98e4b668da"/>
    <language>en</language>
    <item>
      <title>I routed 60 MCP tools through a single proxy — here's what I learned about token waste and security</title>
      <dc:creator>Donnyb369 </dc:creator>
      <pubDate>Thu, 16 Apr 2026 21:05:54 +0000</pubDate>
      <link>https://dev.to/donnyb369422e67b98e4b668da/i-routed-60-mcp-tools-through-a-single-proxy-heres-what-i-learned-about-token-waste-and-security-2mej</link>
      <guid>https://dev.to/donnyb369422e67b98e4b668da/i-routed-60-mcp-tools-through-a-single-proxy-heres-what-i-learned-about-token-waste-and-security-2mej</guid>
      <description>&lt;p&gt;I've been building MCP servers for Claude Desktop for a few months now. At one point I had five servers running: filesystem, GitHub, SQLite, a knowledge graph, and Brave Search. Sixty tools total, all piped into one LLM.&lt;/p&gt;

&lt;p&gt;It worked. But three things kept going wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  The token problem
&lt;/h2&gt;

&lt;p&gt;Every time Claude makes a tool call, it sends the full schema of every available tool in the context window. Sixty tools means sixty JSON schema definitions, every single request. I measured it: &lt;strong&gt;over 4,800 tokens of schema overhead per request&lt;/strong&gt;, before Claude even starts thinking about your question.&lt;/p&gt;

&lt;p&gt;That's money. At API rates, those wasted tokens add up fast across a workday of tool calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  The security problem
&lt;/h2&gt;

&lt;p&gt;I found out the hard way that my &lt;code&gt;claude_desktop_config.json&lt;/code&gt; was passing environment variables to child processes — and a bug in how I was merging env vars meant the entire system PATH, including tokens and API keys, was getting passed through. One of my GitHub tokens ended up in a log file. Twice.&lt;/p&gt;

&lt;p&gt;MCP servers run as child processes with whatever permissions your user account has. There's no audit trail, no rate limiting, no secret scrubbing. If a tool call returns sensitive data, it goes straight into the LLM context with no filtering.&lt;/p&gt;

&lt;h2&gt;
  
  
  The context rot problem
&lt;/h2&gt;

&lt;p&gt;Claude would read a file, modify it three tool calls later, then reference the stale version from its context. The file had changed on disk but Claude was still working with the old content. I called this "context rot" — the LLM's view of the world drifts from reality over a long session.&lt;/p&gt;

&lt;h2&gt;
  
  
  So I built a proxy
&lt;/h2&gt;

&lt;p&gt;MCP Spine sits between Claude Desktop and all your MCP servers. One proxy, one connection, all traffic flows through it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude Desktop ◄──stdio──► MCP Spine ◄──stdio──► filesystem
                                      ◄──stdio──► GitHub
                                      ◄──stdio──► SQLite
                                      ◄──stdio──► memory
                                      ◄──stdio──► Brave Search
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's what it does at each layer:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security proxy&lt;/strong&gt; — validates every JSON-RPC message, scrubs secrets from tool outputs (AWS keys, GitHub tokens, bearer tokens, private keys, connection strings), rate limits tool calls, blocks command injection and path traversal, and writes an HMAC-fingerprinted audit trail to SQLite.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Schema minifier&lt;/strong&gt; — strips verbose descriptions, defaults, and metadata from tool schemas before they reach the LLM. The type information and required fields stay intact. Real measured savings on 12 representative tools:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Level&lt;/th&gt;
&lt;th&gt;Savings&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0 (off)&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1 (light)&lt;/td&gt;
&lt;td&gt;11%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2 (default)&lt;/td&gt;
&lt;td&gt;32%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The best individual tool (&lt;code&gt;read_file&lt;/code&gt;) went from 586 characters down to 242 — a 59% reduction. The savings compound: with 60 tools, Level 2 saves roughly 1,500 tokens per request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;State guard&lt;/strong&gt; — watches files on disk with SHA-256 hashes. When Claude references a file that's changed since it last read it, Spine injects a version pin into the response: "this file has changed since you last saw it." No more context rot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Semantic router&lt;/strong&gt; — uses local embeddings (ChromaDB + MiniLM) to figure out which tools are relevant to the current task. Instead of showing all 60 tools, it shows the 5-10 that matter. This is optional and currently experimental — the ML dependencies add startup time, so I made them lazy-loading.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I learned building it
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Environment variable handling is a minefield.&lt;/strong&gt; The biggest bug I hit was &lt;code&gt;env=self.config.env or None&lt;/code&gt; in the subprocess spawn. When a server config had custom env vars (like &lt;code&gt;GITHUB_TOKEN&lt;/code&gt;), this replaced the entire process environment instead of extending it. Every server that needed a custom env var was silently missing &lt;code&gt;PATH&lt;/code&gt;, &lt;code&gt;HOME&lt;/code&gt;, and everything else. The fix was one line: &lt;code&gt;{**os.environ, **self.config.env}&lt;/code&gt;. But it took hours to diagnose because the error messages were about missing executables, not missing env vars.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Windows is a different world.&lt;/strong&gt; Python's asyncio on Windows uses a Proactor event loop that can't do &lt;code&gt;connect_read_pipe&lt;/code&gt; / &lt;code&gt;connect_write_pipe&lt;/code&gt; on stdio handles from piped processes. The workaround is raw binary I/O with &lt;code&gt;run_in_executor&lt;/code&gt; for reads. I also had to handle paths with spaces and parentheses (my project lives in &lt;code&gt;MCP (The Spine)&lt;/code&gt;), UNC paths, and the MSIX sandbox that Claude Desktop runs in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;npx is slow, node is fast.&lt;/strong&gt; Spawning MCP servers via &lt;code&gt;npx @modelcontextprotocol/server-github&lt;/code&gt; takes 10-15 seconds because npx checks for updates every time. Switching to &lt;code&gt;node C:\path\to\node_modules\...\dist\index.js&lt;/code&gt; connects in under a second. This matters because MCP clients have handshake timeouts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Thread safety in audit logging is easy to get wrong.&lt;/strong&gt; The semantic router runs a background thread for model loading. That thread calls the audit logger, which tries to use a SQLite connection created in the main thread. SQLite doesn't allow cross-thread connection sharing. Fix: &lt;code&gt;check_same_thread=False&lt;/code&gt; plus a &lt;code&gt;threading.Lock()&lt;/code&gt; around all DB operations.&lt;/p&gt;

&lt;h2&gt;
  
  
  The numbers
&lt;/h2&gt;

&lt;p&gt;Running on Windows with Python 3.14 and Claude Desktop:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;6 MCP servers connected through one proxy&lt;/li&gt;
&lt;li&gt;60 tools total, routed and minified&lt;/li&gt;
&lt;li&gt;32% average schema token savings (up to 59% on verbose tools)&lt;/li&gt;
&lt;li&gt;135+ tests, CI green on Windows + Linux&lt;/li&gt;
&lt;li&gt;Sub-second server connections (with node direct path)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;mcp-spine
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Configure your servers in a TOML file, point Claude Desktop at Spine, and all your MCP traffic gets security hardening, token savings, and an audit trail.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/Donnyb369/mcp-spine" rel="noopener noreferrer"&gt;github.com/Donnyb369/mcp-spine&lt;/a&gt;&lt;br&gt;
PyPI: &lt;a href="https://pypi.org/project/mcp-spine" rel="noopener noreferrer"&gt;pypi.org/project/mcp-spine&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's open source, local-first, and works on Windows and Linux. No cloud, no accounts, no telemetry.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm an independent developer building open-source MCP tooling. If you're using MCP servers with Claude Desktop or any other LLM client, I'd love to hear what problems you're hitting. Drop a comment or open an issue on GitHub.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claude</category>
      <category>llm</category>
      <category>mcp</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
