<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Daino</title>
    <description>The latest articles on DEV Community by Daino (@daino).</description>
    <link>https://dev.to/daino</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3871259%2Fdad9a05b-1894-4ddb-855c-a222b4a34faa.png</url>
      <title>DEV Community: Daino</title>
      <link>https://dev.to/daino</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/daino"/>
    <language>en</language>
    <item>
      <title>Building mcp-shield: Production-grade resilience for MCP servers</title>
      <dc:creator>Daino</dc:creator>
      <pubDate>Fri, 10 Apr 2026 08:13:28 +0000</pubDate>
      <link>https://dev.to/daino/building-mcp-shield-production-grade-resilience-for-mcp-servers-57ci</link>
      <guid>https://dev.to/daino/building-mcp-shield-production-grade-resilience-for-mcp-servers-57ci</guid>
      <description>&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;

&lt;p&gt;If you're building AI agents with MCP (Model Context Protocol), you've hit this wall: MCP servers have zero resilience built in.&lt;/p&gt;

&lt;p&gt;A slow GitHub API? Your agent waits 600 seconds. A flaky database connection? The entire chain crashes. A dead server? It keeps getting hammered request after request.&lt;/p&gt;

&lt;p&gt;MCP is great for connecting agents to tools. But it assumes every server is always fast, always available, and always correct. In production, that's never true.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution
&lt;/h3&gt;

&lt;p&gt;I built &lt;strong&gt;mcp-shield&lt;/strong&gt; — a transparent stdio proxy that wraps any MCP server with production-grade middleware.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent ←→ mcp-shield ←→ MCP Server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One command, zero code changes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx @daino/mcp-shield wrap &lt;span class="nt"&gt;--timeout&lt;/span&gt; 30s &lt;span class="nt"&gt;--retries&lt;/span&gt; 3 &lt;span class="nt"&gt;--&lt;/span&gt; npx @modelcontextprotocol/server-github
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How It Works
&lt;/h3&gt;

&lt;p&gt;MCP communicates via JSON-RPC 2.0 over stdio with Content-Length framing (like LSP). mcp-shield sits in the middle, intercepting &lt;code&gt;tools/call&lt;/code&gt; messages and applying a middleware chain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Incoming tools/call request
  → Logger (start)
    → Timeout (AbortController)
      → Retry (exponential backoff + jitter)
        → Circuit Breaker (fail fast if server is down)
          → Forward to real MCP server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everything else — &lt;code&gt;initialize&lt;/code&gt;, &lt;code&gt;tools/list&lt;/code&gt;, notifications — passes through untouched.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Middleware
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Timeout&lt;/strong&gt; — Wraps each tool call with an AbortController. If the server doesn't respond within the configured time, the agent gets a clear error instead of hanging forever.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Retry&lt;/strong&gt; — Exponential backoff with jitter. Smart enough to skip deterministic errors (invalid params, method not found) — those will never succeed on retry.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Circuit Breaker&lt;/strong&gt; — Classic state machine: closed → open → half-open. After N consecutive failures, stop calling the server entirely. Try again after a cooldown period. This prevents hammering a dead server and wasting tokens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rate Limiting&lt;/strong&gt; — Sliding window per tool. Prevents runaway agent loops where the AI keeps calling the same failing tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool Filtering&lt;/strong&gt; — Allow/deny lists so you can restrict which tools the agent can actually use. The proxy filters both &lt;code&gt;tools/list&lt;/code&gt; responses and &lt;code&gt;tools/call&lt;/code&gt; requests.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Response Validation&lt;/strong&gt; — Checks that MCP responses conform to the expected schema. Two modes: "warn" (log and pass through) or "enforce" (reject invalid responses).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Metrics&lt;/strong&gt; — Prometheus-compatible &lt;code&gt;/metrics&lt;/code&gt; endpoint. Counters for calls/errors/retries, histograms for latency, per server and tool.&lt;/p&gt;

&lt;h3&gt;
  
  
  Per-Tool Configuration
&lt;/h3&gt;

&lt;p&gt;Not all tools are equal. A file read should timeout in 10 seconds, but a repository search might need 60. mcp-shield supports per-tool config via YAML:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;defaults&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;30s&lt;/span&gt;
  &lt;span class="na"&gt;retries&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;max&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
    &lt;span class="na"&gt;backoff&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;exponential&lt;/span&gt;
    &lt;span class="na"&gt;jitter&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="na"&gt;servers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;github&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;npx&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;@modelcontextprotocol/server-github"&lt;/span&gt;
    &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;get_file_contents&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;60s&lt;/span&gt;
      &lt;span class="na"&gt;search_repositories&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;retries&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;max&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Claude Desktop Integration
&lt;/h3&gt;

&lt;p&gt;Drop it into your &lt;code&gt;claude_desktop_config.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"github"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"@daino/mcp-shield"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"wrap"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"--timeout"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"30s"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"--retries"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"--"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"@modelcontextprotocol/server-github"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Works with Claude Desktop, Cursor, or any MCP client.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I Learned
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;stdout is sacred.&lt;/strong&gt; MCP uses stdout for protocol messages. Every &lt;code&gt;console.log&lt;/code&gt; breaks the protocol. All logging goes to stderr via pino.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Circuit breaker should be per-server, not per-tool.&lt;/strong&gt; If one tool on a server keeps failing, the whole server is probably down. No point trying other tools.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Don't retry deterministic errors.&lt;/strong&gt; Invalid params or method-not-found will never succeed on retry. Only retry transient failures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test with real stdio framing.&lt;/strong&gt; Content-Length headers matter. Unicode strings have different byte length than character length. A mock server that speaks the real protocol catches bugs that unit tests miss.&lt;/p&gt;

&lt;h3&gt;
  
  
  Try It
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx @daino/mcp-shield wrap &lt;span class="nt"&gt;--&lt;/span&gt; npx @modelcontextprotocol/server-github
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/DainoJung/mcp-shield" rel="noopener noreferrer"&gt;https://github.com/DainoJung/mcp-shield&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;npm:&lt;/strong&gt; &lt;code&gt;@daino/mcp-shield&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;MIT license, TypeScript, 90 tests&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're running MCP in production, I'd love to hear what resilience features you need. Open an issue or PR!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>mcp</category>
    </item>
  </channel>
</rss>
