Martin Havel

Posted on Jun 10

Your MCP Server Passes Every Test — and Claude Still Rejects the Tool

#mcp #testing #typescript #ai

We shipped what looked like a routine improvement to one of our MCP tools: a declared outputSchema, generated from our existing Zod types. Server-side smoke tests passed. The structured output validated cleanly against JSON Schema 2020-12 with an independent validator. We deployed.

Then Claude Desktop refused to call the tool at all.

This post is a write-up of that incident—what failed, why all our tests missed it, and what an actual end-to-end test for an MCP server needs to look like. One caveat up front: this is n=1, observed on our specific setup (@modelcontextprotocol/sdk ^1.10, zodToJsonSchema, Claude Desktop as the client, June 2026). Treat it as a mechanism worth knowing about, not a universal law.

"MCP tool not showing up": outputSchema rejected by the client ingest layer

That heading is the literal symptom, because it's what you'll be searching for at 11 p.m. The variants we tried ourselves: MCP tool not showing up, outputSchema rejected, MCP tool ingest failed, Claude Desktop tool error request_id.

Here's what we saw, concretely:

Calling the tool from Claude Desktop produced an error carrying a request_id—the request reached Anthropic's side and was rejected there.
Our server logs showed a new session line and then... nothing. No [tool] invocation line. The call never arrived at our handler.
A curl against our own server with the same payload? Worked perfectly. Valid response, valid structuredContent.

So the tool definition itself was being rejected somewhere between the client and our server—at what I'll call the tool-ingest layer: the validation Anthropic's infrastructure runs on tool definitions before the model is ever allowed to use them. Our server was never consulted.

The setup: adding outputSchema via zodToJsonSchema

The tool in question was watch_entity from our open-source Czech company due-diligence MCP server (cz-agents on GitHub). We already returned structuredContent and wanted to declare its shape, which the MCP spec supports via outputSchema:

import { zodToJsonSchema } from "zod-to-json-schema";

const watchEntityOutput = z.object({
  ico: z.string(),
  status: z.literal("watching"),
  expires_at: z.string().nullable(),  // <- this matters later
});

server.registerTool(
  "watch_entity",
  {
    description: "Watch a Czech company for changes",
    inputSchema: watchEntityInput,
    outputSchema: zodToJsonSchema(watchEntityOutput), // <- the change
  },
  handler
);

Looks innocent. But look at what zodToJsonSchema actually emits by default:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "ico": { "type": "string" },
    "status": { "const": "watching" },
    "expires_at": {
      "anyOf": [
        { "type": "string" },
        { "type": "null" }
      ]
    }
  },
  "required": ["ico", "status", "expires_at"]
}

Three constructs in this schema are worth noticing:

The $schema URI is draft-07, not the JSON Schema 2020-12 dialect the MCP spec gravitates toward.
z.null() / .nullable() produces "type": "null" branches.
z.literal() produces const.

All three are completely standard JSON Schema. Every off-the-shelf validator we threw at this—including an independent jsonschema check of our structuredContent against the schema—passed without complaint.

But the ingest layer that vets tool definitions on Anthropic's side was, at the time we hit this, stricter than a generic validator. It didn't accept this combination, and the failure mode wasn't "schema ignored"—it was the entire tool being rejected.

I want to be careful with framing here: this isn't a story about anyone doing something wrong. The SDK emitted valid draft-07. Our validator correctly validated it. The ingest layer enforced a tighter profile than "any valid JSON Schema." Validation layers differ, and the only place all of them meet is a live round-trip. That lesson survives even if the ingest layer accepts these constructs tomorrow.

The diagnostic key: asymmetry

What saved us from days of wrong theories was one observation: only the tool with outputSchema failed. Our text-only tool on the same server, same session, same deploy—get_dd_report—kept working the whole time.

That asymmetry rules out almost everything else you'd suspect first:

A network or transport issue would hit both tools.
A server crash or bad deploy would hit both tools.
A client-side transient would not reproduce selectively, every time, on exactly one tool.

A generic outage doesn't aim. When one tool fails deterministically and its siblings don't, diff the tool definitions, not the infrastructure. In our case, the diff was one field: outputSchema.

The second diagnostic key was the log shape. A new session line with no [tool] line means the handshake happened but the tool call was never dispatched to us. Combined with an error that carries a request_id, that places the rejection firmly on the ingest side—not in our process, not in the user's network.

The fix: drop outputSchema, keep structuredContent

Here's the part that surprised me: structuredContent works fine without a declared outputSchema. The schema declaration is metadata; the structured payload travels regardless.

server.registerTool(
  "watch_entity",
  {
    description: "Watch a Czech company for changes",
    inputSchema: watchEntityInput,
    // outputSchema removed — structuredContent still flows
  },
  async (args) => {
    const result = await watchEntity(args);
    return {
      content: [{ type: "text", text: summarize(result) }],
      structuredContent: result, // <- still delivered to the client
    };
  }
);

We removed the outputSchema declaration, redeployed, and the end-to-end flow worked perfectly: Claude Desktop called the tool, the [tool] line showed up in docker logs, and structured content arrived at the client. You lose client-side schema validation of your output, which is a real (if modest) loss—but you keep the structured data, and you keep a working tool.

If you do want to keep outputSchema, the cautious path based on what we observed is to post-process the generated schema—strip the draft-07 $schema URI, replace "type": "null" branches with a non-null type plus optionality, replace const with a single-value enum—and then test it through a real client before trusting it. Which brings us to the actual point.

Why your curl smoke test will never catch this

Our smoke test did what most MCP smoke tests do: hit the server over HTTP, list tools, call each one, and validate the response. It's a fine test of our code. It exercises exactly zero of the validation that happens on the client/platform side.

The chain for a hosted MCP tool call looks roughly like this:

Claude (model) → Anthropic tool-ingest/validation → your MCP server → back

A curl test starts at step 3. Everything that can reject your tool in steps 1–2—schema dialect restrictions, definition size limits, naming rules, whatever else the platform enforces—is invisible to it. Server-side green means "my half works," and nothing more.

So our deploy checklist gained one non-negotiable step. The real E2E test for an MCP server is a live client round-trip:

Deploy to a staging endpoint.
Connect a real Claude client (Claude Desktop, claude.ai, or Claude Code) to it.
Ask it to invoke every tool—especially any tool whose definition changed, not just its handler.
Watch your server logs for the invocation marker. For us: docker logs -f mcp-server | grep '\[tool\]'. A session line without a tool line indicates a rejection upstream of your server.
Confirm the result rendered correctly in the client.

It's manual, it's slightly annoying, and it takes five minutes. It is also the only test in our suite that would have caught this—and the only one that exercises the same path your users do.

Takeaways

A tool definition change is riskier than a handler change: it gets re-validated by every layer between you and the model, and the failure mode is the whole tool disappearing.
zodToJsonSchema defaults to draft-07 with type: "null" and const—valid JSON Schema that stricter ingest profiles may not accept. Inspect the generated schema; don't assume.
Selective failure is information. One tool down, siblings up → diff the definitions.
A new session log line without a [tool] log line localizes the rejection upstream of your server.
structuredContent does not require outputSchema. When in doubt, ship the data without the declaration.
Server-side smoke tests validate your half of the contract. Only a live Claude client round-trip validates the whole thing. Put one in your deploy checklist.

*Observed June 2026 on @modelcontextprotocol/sdk ^1.10 with Claude Desktop. If the ingest behavior has changed since, the specific constructs may pass—the testing lesson stands either way. The server involved is open source: cz-agents MCP servers

Top comments (6)

Alex Shev • Jun 11

This is a good reminder that MCP testing needs to include the client’s decision layer, not just the server contract. A tool can be technically valid and still be unattractive or ambiguous to the model.

I would test three things separately: schema validity, tool-call ergonomics, and end-to-end model behavior. If Claude refuses the tool, the failure may be naming, output shape, ambiguity, or missing examples rather than the server endpoint itself.

Martin Havel • Jun 12

Exactly — those three layers are the right split, and in my case all three passed "on paper." The failure lived entirely in the last one: end-to-end model behavior.

The actual culprit was output shape — a declared outputSchema. The schema was legal and the tool validated, but Claude wouldn't reliably accept and surface it. The fix wasn't tightening the schema, it was dropping structuredContent altogether: return two content blocks instead — a human-readable markdown summary first, then a byte-identical raw JSON block. The model surfaces the summary, agents and power-users still get the raw.

And on the endpoint not being the issue — what finally caught it was talking to the real client, not curl. A schema test can't tell you the model quietly declined to use the tool, or surfaced its own paraphrase instead of your output. So "does Claude actually pick it up and surface it as-is" is now a separate manual check for me on every change.

Alex Shev • Jun 12

That output-shape detail is really useful. It is a good example of why "valid schema" and "usable by the model client" are different checks.

I like the two-block pattern: human summary first, raw JSON second. It gives the model something stable to surface without forcing it to paraphrase the machine-readable part. The manual client check is probably unavoidable until MCP test suites include model-facing acceptance, not only protocol acceptance.

Martin Havel • Jun 12

Agreed ... "model-facing acceptance vs protocol acceptance" is a clean way to name the gap, and it's exactly the line current test suites don't cross. Until they do, the manual client check is just the cost of doing business; I'd rather pay it than ship a tool the model silently won't touch.

If anyone's found a decent way to automate that model-facing step — actually driving a model client in CI, not just asserting the protocol — I'd be glad to hear it.

Alex Shev • Jun 13

The closest pattern I have seen is a smoke test that drives the real client with a tiny fixture task and asserts on the transcript, not just the tool response. It is still imperfect and slower than protocol tests, but it catches the “model saw it and still refused to use it” class of failure. I think that layer will become normal for serious MCP tools.

Adam Lewis • Jun 11

This is a tidy real-world case of the gap between validating against the spec and validating against the client. The tests checked the schema was legal, but the thing that mattered was whether one specific consumer would accept it, and only a test that talks to that consumer could have caught it. We've hit the same thing outside MCP, contract tests passing while the actual integration fails, which is why our critical paths now run against the real dependency on every merge.