<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: pengspirit</title>
    <description>The latest articles on DEV Community by pengspirit (@incultnitollc).</description>
    <link>https://dev.to/incultnitollc</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3914325%2Fb37d323f-e828-4db9-a08f-3eb6b60fbaaf.png</url>
      <title>DEV Community: pengspirit</title>
      <link>https://dev.to/incultnitollc</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/incultnitollc"/>
    <language>en</language>
    <item>
      <title>What does a missing description on an MCP tool actually do? Four failure modes I traced from real MCP servers</title>
      <dc:creator>pengspirit</dc:creator>
      <pubDate>Tue, 12 May 2026 12:46:23 +0000</pubDate>
      <link>https://dev.to/incultnitollc/what-does-a-missing-description-on-an-mcp-tool-actually-do-four-failure-modes-i-traced-from-real-4jn2</link>
      <guid>https://dev.to/incultnitollc/what-does-a-missing-description-on-an-mcp-tool-actually-do-four-failure-modes-i-traced-from-real-4jn2</guid>
      <description>&lt;p&gt;This is the third article in a series. The first established that &lt;strong&gt;schema descriptions are load-bearing&lt;/strong&gt; — if you ship an MCP tool with &lt;code&gt;{ "type": "string" }&lt;/code&gt; and no &lt;code&gt;description&lt;/code&gt;, the model has to guess at a contract that doesn't exist. The second pushed further: &lt;strong&gt;tool descriptions are runtime policy, not documentation&lt;/strong&gt; — the absence of a "do not use for X" clause is a permission to use the tool for X.&lt;/p&gt;

&lt;p&gt;This one answers the engineering question that sits underneath both: &lt;strong&gt;what specifically happens, mechanically, when an MCP tool's description is missing?&lt;/strong&gt; Not in the abstract — in the four failure modes I have actually watched a Claude-class agent produce against real MCP servers I've run &lt;code&gt;mcp-probe&lt;/code&gt; over.&lt;/p&gt;

&lt;p&gt;The short version is that a missing description does not produce one failure. It produces a hierarchy of four, each one further away from where the bug appears to come from.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure mode 1 — selection failure (the tool is invisible)
&lt;/h2&gt;

&lt;p&gt;The cheapest failure, and the one nobody notices, is that &lt;strong&gt;the tool simply doesn't get called&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When Claude looks at a tool list, it reads &lt;code&gt;name + description + inputSchema.properties[].description&lt;/code&gt; as a single decision packet. The name alone is rarely enough. &lt;code&gt;fetch_data&lt;/code&gt; could mean "fetch from the database," "fetch from the API," "fetch from cache," or "read a file." Without a description that disambiguates, the agent treats the tool as a noisy candidate and picks something else.&lt;/p&gt;

&lt;p&gt;I have a server in front of me right now where one of the tools is named &lt;code&gt;lookup&lt;/code&gt;. No description on the tool. The schema's single string parameter has no description either. Across maybe 30 attempts to use it through Claude over a week, the model called it twice. Both times, the tool was wrong. The other 28 times, the model went elsewhere — usually to a tool with a clearer description, even when that tool was a worse fit.&lt;/p&gt;

&lt;p&gt;The signal you'd want here — "the model would have used my tool but doesn't know what it does" — is invisible. The tool doesn't error. It's not slow. It just doesn't show up in the trace, because the trace only records calls that happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure mode 2 — argument shape failure (the model picks, the schema rejects)
&lt;/h2&gt;

&lt;p&gt;If the model does pick the tool, the next thing it has to do is fill in arguments. With no parameter descriptions, &lt;strong&gt;it makes the argument shape up from the parameter name and type&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Real example from &lt;code&gt;@modelcontextprotocol/server-filesystem&lt;/code&gt;. The server has a &lt;code&gt;read_file&lt;/code&gt; tool. The schema declares one required property: &lt;code&gt;path: { type: "string" }&lt;/code&gt; — and this is the documented behavior, no description on the parameter. Watch what happens when you try to use it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The model has to decide: absolute path or relative? Relative to what — workspace, server CWD, user home?&lt;/li&gt;
&lt;li&gt;It has to decide: is the path expected to be inside an allowed root, or anywhere on disk?&lt;/li&gt;
&lt;li&gt;It has to decide: is &lt;code&gt;~/foo.txt&lt;/code&gt; allowed, or does it need to be expanded?&lt;/li&gt;
&lt;li&gt;It has to decide whether forward-slashes or backslashes matter on the platform it thinks it's running on.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these are answerable from &lt;code&gt;path: string&lt;/code&gt;. The model will pick something — usually &lt;code&gt;/Users/&amp;lt;name&amp;gt;/&amp;lt;project&amp;gt;/&amp;lt;file&amp;gt;&lt;/code&gt; for absolute, or &lt;code&gt;./&amp;lt;file&amp;gt;&lt;/code&gt; for relative — but the choice is a 50/50 against your real path-resolution logic. Half the time, the call succeeds. Half the time, it returns "permission denied" or "file not found," and the model has to retry with a different shape, blowing through 1–2 turns of context to recover from a description that should have been one sentence.&lt;/p&gt;

&lt;p&gt;The fix on &lt;code&gt;read_file&lt;/code&gt; is exactly one line of schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt; path: {
   type: "string",
&lt;span class="gi"&gt;+  description: "Absolute path inside one of the allowed roots configured at server startup. Use forward slashes. Tilde expansion is not performed."
&lt;/span&gt; }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add that, and the failure mode goes away. The argument lands right on the first try.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure mode 3 — LLM-side validator rejection (the call never leaves the client)
&lt;/h2&gt;

&lt;p&gt;This is the failure mode I had not seen until I started running &lt;code&gt;mcp-probe&lt;/code&gt; against real servers, and it's the one that surprised me.&lt;/p&gt;

&lt;p&gt;Several MCP clients — Claude Desktop in particular at certain config thresholds — apply a &lt;strong&gt;secondary validator&lt;/strong&gt; on top of the schema you ship. Not the JSON Schema validation that runs server-side after the call. A pre-flight check that runs before the call leaves the client.&lt;/p&gt;

&lt;p&gt;That validator looks for two things: (a) is &lt;code&gt;description&lt;/code&gt; present at the tool level, and (b) is &lt;code&gt;description&lt;/code&gt; present on every required parameter. When either is missing, the client doesn't refuse the tool outright — it down-weights it heavily, and in some configurations the call gets rewritten to a "ask the user" path instead.&lt;/p&gt;

&lt;p&gt;I do not have a public spec to point at for this — it's behavior I observed across multiple MCP clients while building the scorecards published in this repo's &lt;code&gt;docs/scorecards/&lt;/code&gt; directory. Servers with full descriptions consistently saw 2–3× more tool invocations through the same agent task than servers without, holding everything else constant. The mechanism, as best I can reconstruct it, is the client treating description-completeness as a quality signal and routing around tools that score low.&lt;/p&gt;

&lt;p&gt;If that's right — and the scorecard data is the evidence I have — then a missing description doesn't just degrade tool selection. It degrades it twice: once at the model layer (failure mode 1) and once at the client layer (failure mode 3). Stacked, those move a tool from "occasionally used wrong" to "effectively unreachable."&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure mode 4 — routing collapse (your tool gets used, the wrong tool gets used instead)
&lt;/h2&gt;

&lt;p&gt;The last failure mode is the one that tool authors notice last and find most painful, because it shows up as "another team's tool is eating my tool's traffic."&lt;/p&gt;

&lt;p&gt;When two MCP tools have overlapping intent surfaces — say, your &lt;code&gt;send_email&lt;/code&gt; and another server's &lt;code&gt;notify_user&lt;/code&gt; — the description is the only thing the model uses to route between them. If yours has a sharp description ("transactional email triggered by an explicit user action; do not use for marketing or broadcast") and the other has nothing, the routing collapses &lt;em&gt;toward the vague one&lt;/em&gt;, not away from it.&lt;/p&gt;

&lt;p&gt;This is counterintuitive. You would expect "more specific description = more likely to be picked." It works the other way. A vague description has no negative scope. The model sees "could plausibly handle this" and picks it for everything within the envelope, including cases your tool would have handled better. Yours, with the sharp scope, only gets picked when the model is sure your case applies — which is rare, because being sure is expensive.&lt;/p&gt;

&lt;p&gt;The defense is the anti-purpose clause from the second article in this series: write what your tool is &lt;strong&gt;not&lt;/strong&gt; for, by name, pointing at the specific other tool you want the routing to go to instead. &lt;em&gt;"Do not use this for marketing campaigns or one-off broadcasts — those go through &lt;code&gt;marketing_send&lt;/code&gt;."&lt;/em&gt; The other tool's vagueness is now your contract. If they don't add an anti-purpose clause back, you've at least claimed the boundary unilaterally.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means for the schema you ship
&lt;/h2&gt;

&lt;p&gt;Three small rules that fall out of the four failure modes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Every tool gets a description, period.&lt;/strong&gt; Not "TODO: add description." Actually describe what the tool does, in one sentence, in the first 80 characters — that's the part the agent's selection packet uses most heavily.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Every required parameter gets a description that pins the shape.&lt;/strong&gt; Not "the path." A description like "Absolute path inside an allowed root, forward slashes, no tilde expansion" — five constraints in fifteen words. If you can't write that sentence, you don't fully understand the parameter, and your server will fail in failure mode 2 anyway.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;For any tool whose intent overlaps another tool you know about, write the anti-purpose clause.&lt;/strong&gt; Name the other tool. Point at it. Vagueness is a vacuum that the routing fills with whichever tool sounds adjacent enough.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The contract framing
&lt;/h2&gt;

&lt;p&gt;If I had to compress the whole series into one line, it would be this: &lt;strong&gt;the description fields in an MCP tool's schema are the only contract the model sees at runtime&lt;/strong&gt;. Not the README, not the docs site, not the GitHub issues. The schema. Anything you don't write into the description doesn't exist for the agent.&lt;/p&gt;

&lt;p&gt;The four failure modes above are what happens when that contract has gaps. Each gap looks like a different bug — selection went wrong, arguments went wrong, the call never left the client, traffic went to a competitor — but the root cause is the same one-line fix every time.&lt;/p&gt;




&lt;p&gt;I built &lt;a href="https://www.npmjs.com/package/@incultnitollc/mcp-probe" rel="noopener noreferrer"&gt;&lt;code&gt;mcp-probe&lt;/code&gt;&lt;/a&gt; to make these failures visible before they ship. It enumerates every tool a server exposes, flags missing descriptions on tools and required parameters, runs every callable tool with auto-generated arguments matching the declared schema, and exits non-zero if any of failure modes 1–4 are statically detectable. It's not a replacement for &lt;a href="https://github.com/modelcontextprotocol/inspector" rel="noopener noreferrer"&gt;Anthropic's MCP Inspector&lt;/a&gt; — Inspector is the right tool for interactive debugging when something has already gone wrong. &lt;code&gt;mcp-probe&lt;/code&gt; is the pre-publish CLI for catching the four failures above before the model ever sees the server.&lt;/p&gt;

&lt;p&gt;Both tools are useful. They sit on different sides of the same problem.&lt;/p&gt;

&lt;p&gt;If you're shipping an MCP server, the one specific thing I'd ask is this: before you publish, run something that fails on missing descriptions. It can be &lt;code&gt;mcp-probe&lt;/code&gt;, it can be a homemade lint, it can be a code review checklist. The failure modes above are not theoretical — they're the four actual ways a missing description shows up in production. Catch them at lint time and your server enters the ecosystem at the top of the routing surface, not invisible at the bottom.&lt;/p&gt;

&lt;p&gt;The next article in this series will walk through the same four failure modes from the &lt;strong&gt;client author's&lt;/strong&gt; side — what an MCP client should do when it sees a tool with no description, beyond just rendering it. That's where the secondary validator in failure mode 3 lives, and it's where the load-bearing-descriptions framing has its sharpest implication.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>ai</category>
      <category>llm</category>
      <category>tooling</category>
    </item>
    <item>
      <title>Tool descriptions are load-bearing too: the anti-purpose pattern in MCP</title>
      <dc:creator>pengspirit</dc:creator>
      <pubDate>Thu, 07 May 2026 14:33:09 +0000</pubDate>
      <link>https://dev.to/incultnito_llc/tool-descriptions-are-load-bearing-too-the-anti-purpose-pattern-in-mcp-15m2</link>
      <guid>https://dev.to/incultnito_llc/tool-descriptions-are-load-bearing-too-the-anti-purpose-pattern-in-mcp-15m2</guid>
      <description>&lt;p&gt;A few days ago I posted &lt;a href="https://dev.to/incultnitollc/schema-descriptions-are-load-bearing-why-missing-parameter-descriptions-break-mcp-clients-4l42"&gt;Schema descriptions are load-bearing: why missing parameter descriptions break MCP clients&lt;/a&gt;. The argument: every parameter without a description is a load-bearing element silently absent from the schema, and agents fail in ways that look like model problems but are actually contract problems.&lt;/p&gt;

&lt;p&gt;The post got a comment from &lt;a class="mentioned-user" href="https://dev.to/mickyarun"&gt;@mickyarun&lt;/a&gt; that's worth its own essay:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The "load-bearing" framing is the right shape — the same observation applies one level up at the tool level. Most MCP catalogues we've audited had perfectly described parameters but no description of when not to call this tool, which is the bit that actually decides whether an agent reaches for the right surface. The half-hour we spent adding "anti-purpose" descriptions to about a dozen of our internal tools cut the wrong-tool-selected rate roughly in half. Arguably the parameter case in this post is just the most visible instance of a broader rule: every field of every schema an agent reads is doing structural work whether you specified it or not.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;He's right, and the pattern deserves a name. Call it the &lt;strong&gt;anti-purpose pattern&lt;/strong&gt;: every tool description should specify not just what the tool is for, but what it is &lt;em&gt;not&lt;/em&gt; for.&lt;/p&gt;

&lt;h2&gt;
  
  
  HOW vs WHETHER
&lt;/h2&gt;

&lt;p&gt;Parameter descriptions answer &lt;strong&gt;HOW&lt;/strong&gt; to call a tool — what types, what shape, what valid values.&lt;/p&gt;

&lt;p&gt;Tool descriptions answer &lt;strong&gt;WHETHER&lt;/strong&gt; to call a tool — does this surface match the user's intent at all.&lt;/p&gt;

&lt;p&gt;Both are schema. Both are load-bearing. The first is usually under-specified. The second is almost always under-specified.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why "Searches the web" fails
&lt;/h2&gt;

&lt;p&gt;Most MCP tool descriptions read like marketing copy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;"Searches the web for information"&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;"Retrieves data from the database"&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;"Sends an email"&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is fine in isolation. It collapses the moment an agent has three search tools, two database tools, and four messaging tools loaded at once — which is the actual production scenario.&lt;/p&gt;

&lt;p&gt;The agent has to disambiguate. The schema gave it nothing to disambiguate with. So it picks the first plausible match, or the one with the cleanest parameter list, or the one whose name lexically matches the user's phrasing. None of these correlate with correctness.&lt;/p&gt;

&lt;h2&gt;
  
  
  The anti-purpose pattern
&lt;/h2&gt;

&lt;p&gt;The fix is mechanical:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Before: "Searches the web for information"

After:  "Searches the public web for current events,
         news, and recently published content.
         Do not use for: code lookup (use code_search),
         internal documentation (use docs_search),
         or queries answerable from training data."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three changes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Specific scope&lt;/strong&gt; — "public web" not "the web", "current events" not "information"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Disambiguation pointers&lt;/strong&gt; — names the sibling tools the agent might confuse this with&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explicit exclusions&lt;/strong&gt; — the "do not use for" clause&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;@mickyarunreports roughly 50% fewer wrong-tool-selection errors after adding clauses like this to about a dozen internal tools. That's a half-hour edit producing a measurable behavior shift, with no model change and no prompt-engineering tax on the consumer side.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why tool authors skip this
&lt;/h2&gt;

&lt;p&gt;Two reasons, both fixable:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The author knows what the tool is for, so the description is implicit.&lt;/strong&gt; Authors write descriptions that document the tool's positive purpose because that's what they were thinking about while writing it. The negative purpose — what they consciously decided this tool would &lt;em&gt;not&lt;/em&gt; do — never makes it onto the page.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP examples don't model it.&lt;/strong&gt; Look at any MCP server template or quickstart and tool descriptions are one-line declaratives. There's no canonical example that says "here's what a production tool description looks like with anti-purpose."&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The first is fixed by a checklist. The second is fixed by people writing posts like this one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Concrete checklist
&lt;/h2&gt;

&lt;p&gt;When writing or auditing a tool description, the description should answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scope:&lt;/strong&gt; What specifically does this operate on? ("public web", "this user's calendar", "Postgres tables in the analytics schema")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trigger:&lt;/strong&gt; What user intent should select this tool?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anti-trigger:&lt;/strong&gt; What user intent looks similar but should select a different tool?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sibling pointer:&lt;/strong&gt; Which neighboring tools are the most likely confusion sources, and what should send the agent there instead?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have more than one tool in your MCP server, all four are load-bearing. Skipping any of them outsources the disambiguation to whatever the model happens to guess.&lt;/p&gt;

&lt;h2&gt;
  
  
  Coming to mcp-probe
&lt;/h2&gt;

&lt;p&gt;This is the next axis I'm adding to &lt;a href="https://www.npmjs.com/package/@incultnitostudiosllc/mcp-probe" rel="noopener noreferrer"&gt;mcp-probe&lt;/a&gt;. Parameter-description coverage is already scored. Tool-description quality — including a heuristic for anti-purpose clauses — belongs in the same scorecard.&lt;/p&gt;

&lt;p&gt;Thanks to &lt;a class="mentioned-user" href="https://dev.to/mickyarun"&gt;@mickyarun&lt;/a&gt; for the comment that pulled the framing one level up. Schema descriptions are load-bearing. So is every other field of the contract an agent is asked to read.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>ai</category>
      <category>tooling</category>
      <category>agents</category>
    </item>
    <item>
      <title>Schema descriptions are load-bearing: why missing parameter descriptions break MCP clients</title>
      <dc:creator>pengspirit</dc:creator>
      <pubDate>Tue, 05 May 2026 16:16:49 +0000</pubDate>
      <link>https://dev.to/incultnitollc/schema-descriptions-are-load-bearing-why-missing-parameter-descriptions-break-mcp-clients-4l42</link>
      <guid>https://dev.to/incultnitollc/schema-descriptions-are-load-bearing-why-missing-parameter-descriptions-break-mcp-clients-4l42</guid>
      <description>&lt;p&gt;I shipped &lt;a href="https://www.npmjs.com/package/@incultnitollc/mcp-probe" rel="noopener noreferrer"&gt;&lt;code&gt;mcp-probe&lt;/code&gt;&lt;/a&gt; — a CLI that points at any MCP server, enumerates every tool, resource, and prompt, calls each with auto-generated arguments, validates against declared schemas, prints a pass/fail scorecard, and exits 0/1 for CI.&lt;/p&gt;

&lt;p&gt;The plan for launch week: run it against the official Node MCP servers and post results. The first run made me look like I'd broken half the ecosystem. The second, after I read my own output, told a different story — most failures were bugs in my client, not the servers. The rest collapsed into one finding about schema design.&lt;/p&gt;

&lt;p&gt;This post is the corrected version. Three sections: what mcp-probe does, what the scorecards say, and the three bugs I fixed in my own client first.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. What mcp-probe does
&lt;/h2&gt;

&lt;p&gt;One command. stdio, SSE, or Streamable HTTP transport. No config file required.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx @incultnitollc/mcp-probe &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="s2"&gt;"npx -y @modelcontextprotocol/server-memory"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is a scorecard:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Tools callable:      9/9
Resources readable:  n/a
Prompts callable:    n/a
Schema warnings:     4
ALL CHECKS PASSED
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Exit code 0 if everything passes, 1 if anything fails. Drop it in CI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npx -y @incultnitollc/mcp-probe test "node dist/index.js"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Install globally if you'd rather not &lt;code&gt;npx&lt;/code&gt; every time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @incultnitollc/mcp-probe
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The mental model is &lt;code&gt;curl&lt;/code&gt; for MCP servers. You don't open Claude Desktop, hand-write a config, restart the app, and stare at the tool list to see whether anything broke. You run one command and get a scorecard.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuuvitnyao76ow5kklqn2.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuuvitnyao76ow5kklqn2.gif" alt="mcp-probe demo" width="720" height="490"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  2. What I found across the four official Node servers
&lt;/h2&gt;

&lt;p&gt;Here is the actual scorecard from &lt;code&gt;docs/scorecards/SUMMARY.md&lt;/code&gt;, re-run on &lt;code&gt;@incultnitollc/mcp-probe@1.0.1&lt;/code&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Server&lt;/th&gt;
&lt;th&gt;Tools&lt;/th&gt;
&lt;th&gt;Resources&lt;/th&gt;
&lt;th&gt;Prompts&lt;/th&gt;
&lt;th&gt;Schema warns&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;@modelcontextprotocol/server-memory&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;9 / 9&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;@modelcontextprotocol/server-sequential-thinking&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1 / 1&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;@modelcontextprotocol/server-everything&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;12 / 13&lt;/td&gt;
&lt;td&gt;7 / 7&lt;/td&gt;
&lt;td&gt;3 / 4&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;partial&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;@modelcontextprotocol/server-filesystem&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;8 / 14&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;partial&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Aggregate: 30 of 37 tools callable across four servers, 81%. Two servers fully pass. The other two have a single failure pattern between them.&lt;/p&gt;

&lt;p&gt;A scope note before the finding, because I got this wrong the first time: Anthropic's &lt;code&gt;fetch&lt;/code&gt; MCP server is Python-only, installed via &lt;code&gt;uvx mcp-server-fetch&lt;/code&gt;. It has never been published to npm. mcp-probe runs against any stdio MCP server regardless of language — only this scorecard is scoped to the official Node servers. Earlier launch copy of mine that called &lt;code&gt;server-fetch&lt;/code&gt; "broken on npm" was wrong, and I want to flag it explicitly here because I almost shipped that draft.&lt;/p&gt;

&lt;p&gt;Now the real finding. Every remaining failure on the partial-pass servers traces to the same root cause: &lt;strong&gt;missing &lt;code&gt;description&lt;/code&gt; fields on schema properties&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;On &lt;code&gt;server-filesystem&lt;/code&gt;, six of the fourteen tools fail because mcp-probe doesn't know which arguments are supposed to be file paths versus directory paths versus arbitrary strings. The &lt;code&gt;path&lt;/code&gt; parameter on &lt;code&gt;read_file&lt;/code&gt;, &lt;code&gt;read_text_file&lt;/code&gt;, &lt;code&gt;read_media_file&lt;/code&gt;, &lt;code&gt;edit_file&lt;/code&gt;, and &lt;code&gt;write_file&lt;/code&gt; has no description in the schema, so my client defaults to the allowed sandbox directory itself. The server correctly returns &lt;code&gt;EISDIR&lt;/code&gt; (you tried to read a directory as a file) or &lt;code&gt;EACCES&lt;/code&gt; (you tried to write to one). &lt;code&gt;move_file&lt;/code&gt; fails the same way — both &lt;code&gt;source&lt;/code&gt; and &lt;code&gt;destination&lt;/code&gt; resolve to the same directory, and the server correctly refuses the no-op rename. The server is doing its job. The schema is the gap.&lt;/p&gt;

&lt;p&gt;On &lt;code&gt;server-everything&lt;/code&gt;, one prompt fails because the &lt;code&gt;resourceType&lt;/code&gt; argument has no description. It's an enum — &lt;code&gt;"Text"&lt;/code&gt; or &lt;code&gt;"Blob"&lt;/code&gt; — but with no description and no examples, my client passes the literal string &lt;code&gt;"test"&lt;/code&gt; and the server correctly returns &lt;code&gt;Invalid resourceType: test&lt;/code&gt;. The schema validator inside mcp-probe even raises a warning on this property before the call fires:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WARN  get-resource-reference — Property "resourceType" missing description
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That warning is the diagnostic working as intended — mcp-probe still attempts the call, then surfaces both the warning and the resulting failure side-by-side so you can see the connection.&lt;/p&gt;

&lt;p&gt;The substantive insight, and the line I'll repeat at every MCP-related event for the next year: &lt;strong&gt;when an MCP server ships parameter properties without descriptions, no automated tool can guess valid arguments.&lt;/strong&gt; Not mcp-probe. Not your IDE's autocomplete. Not an LLM trying to call the tool from Claude Desktop. Schema descriptions aren't documentation polish. They're the instruction manual the model is reading every time it picks an argument. They're load-bearing.&lt;/p&gt;

&lt;p&gt;If you maintain an MCP server and you want a quick win, add &lt;code&gt;"description"&lt;/code&gt; to every property in every input schema. The 18 schema warnings on &lt;code&gt;server-filesystem&lt;/code&gt; are not 18 separate problems — they're 18 instances of the same one-line fix.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. The three bugs I fixed in my own client first
&lt;/h2&gt;

&lt;p&gt;Here's the part I want to be honest about. The first time I ran mcp-probe against &lt;code&gt;server-filesystem&lt;/code&gt;, I got 2 of 14 tools passing and a scorecard that screamed FAIL. My instinct was to write a launch post saying "the official filesystem server is broken." I almost did.&lt;/p&gt;

&lt;p&gt;Then I actually read my own output. Most of those failures were because my client was sending arguments the server had no way to accept. A diagnostic tool is only credible if it can distinguish "your server is broken" from "I sent garbage." Stress-testing forced that distinction, and three commits came out of it before I trusted the scorecard.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Commit &lt;code&gt;3825170&lt;/code&gt; — show the args we sent on every failure.&lt;/strong&gt; When a tool or prompt call fails, mcp-probe now prints the exact JSON it sent alongside the server's error response. Before this, a failure looked like &lt;code&gt;MCP error -32603: Invalid resourceType: test&lt;/code&gt; with no indication that &lt;code&gt;"test"&lt;/code&gt; was something my client had auto-generated. After this, you can read the failure and immediately tell whether the server rejected something reasonable or something nonsense. This is the smallest of the three changes and the most important one for the trust story.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Commit &lt;code&gt;ce4f55e&lt;/code&gt; — sandbox-aware paths.&lt;/strong&gt; &lt;code&gt;server-filesystem&lt;/code&gt; enforces an allowed-directory sandbox. mcp-probe now calls &lt;code&gt;list_allowed_directories&lt;/code&gt; before generating sample arguments and uses one of those directories as the default for any &lt;code&gt;path&lt;/code&gt;-shaped parameter. On macOS, where &lt;code&gt;/tmp&lt;/code&gt; is a symlink to &lt;code&gt;/private/tmp&lt;/code&gt;, it normalizes via &lt;code&gt;realpath&lt;/code&gt; so the path the server receives matches what the sandbox check expects. This single commit moved &lt;code&gt;server-filesystem&lt;/code&gt; from 2 of 14 passing to 8 of 14. The remaining 6 are the missing-description cases I already covered — the bugs that aren't mine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt-argument enum extractor.&lt;/strong&gt; When a prompt argument is described in prose like &lt;code&gt;"one of: Text, Blob"&lt;/code&gt; instead of as a JSON Schema enum, mcp-probe now tries to parse the allowed values out of the description string and pick one. Partial — it works on the prompts that have prose-level documentation, and it does nothing for arguments like &lt;code&gt;resourceType&lt;/code&gt; on &lt;code&gt;server-everything&lt;/code&gt; that have neither schema enum nor prose description. This is why the schema-description finding above isn't theoretical: I built the workaround, and the workaround can't help when there's no text to read.&lt;/p&gt;

&lt;p&gt;The loop, in one sentence: I had to make my client honest about what it was sending before I could call any server's failure a server bug.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @incultnitollc/mcp-probe
mcp-probe &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="s2"&gt;"npx -y @modelcontextprotocol/server-memory"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Repo: &lt;a href="https://github.com/incultnitollc/mcp-probe" rel="noopener noreferrer"&gt;github.com/incultnitollc/mcp-probe&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;npm: &lt;a href="https://www.npmjs.com/package/@incultnitollc/mcp-probe" rel="noopener noreferrer"&gt;@incultnitollc/mcp-probe&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Raw scorecards from this post: &lt;a href="https://github.com/incultnitollc/mcp-probe/tree/main/docs/scorecards" rel="noopener noreferrer"&gt;&lt;code&gt;docs/scorecards/&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Pre-publish checklist for MCP server maintainers: &lt;a href="https://github.com/incultnitollc/mcp-probe/blob/main/docs/checklist.md" rel="noopener noreferrer"&gt;&lt;code&gt;docs/checklist.md&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you maintain an MCP server and you want a scorecard run against it, open an issue with the &lt;a href="https://github.com/incultnitollc/mcp-probe/issues/new?template=test_my_server.yml" rel="noopener noreferrer"&gt;test-my-server template&lt;/a&gt; and I'll post the results as a comment. If mcp-probe reports something that looks like a server bug and isn't, open an issue against mcp-probe instead — that's the loop that produced commits &lt;code&gt;3825170&lt;/code&gt; and &lt;code&gt;ce4f55e&lt;/code&gt;, and it's the only way the diagnostic gets more trustworthy.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>claude</category>
      <category>devtools</category>
      <category>testing</category>
    </item>
  </channel>
</rss>
