FlareCanary

Posted on Apr 5

MCP Servers Are APIs — Monitor Them Like APIs

#mcp #ai #monitoring #llm

Your AI agent discovers tools via MCP. Those tools change. Your agent doesn't crash — it confidently returns wrong results.

If that sounds familiar, it's the same problem REST APIs have had for years. But MCP makes it worse.

The discovery flow that breaks silently

Here's how MCP works in practice:

Agent connects to an MCP server
Agent calls tools/list to discover available tools
Agent reads tool schemas — names, parameters, return types
Agent calls tools as needed

This works beautifully... until the MCP server updates.

Tools get renamed. Parameters become required. Return schemas evolve. The server doesn't version these changes. There's no changelog. There's no deprecation header.

Your agent's cached understanding of the tool catalog goes stale.

Why MCP drift is worse than REST drift

When a REST API changes, your code usually fails loudly:

TypeError: Cannot read property 'tracking_number' of undefined
HTTP 400: Missing required parameter 'format'

Noisy failures. You notice them. You debug them.

When an MCP tool changes, the failure is different. The LLM receives unexpected data and adapts. It doesn't crash. It doesn't throw an error. It processes the wrong data and confidently returns incorrect results.

Your monitoring dashboard shows green. Your agent is silently broken.

What MCP tool drift looks like

Here are real patterns we're seeing as the MCP ecosystem matures:

Tool renamed: search_docs → query_knowledge_base

The agent calls the old name. The server returns "tool not found." The LLM wraps this in a helpful-sounding response: "I wasn't able to find any relevant documents." The user thinks there are no results. There are — the agent just called the wrong tool.

Required parameter added

A new format parameter becomes required. The agent doesn't know about it, omits it, and gets whatever the default behavior is. Maybe it was returning JSON and now returns XML. The LLM parses XML tags as content and delivers garbled results.

Output schema changed: results → matches

The agent's prompt says "extract items from the results array." The server now returns a matches array. The agent finds no results, and tells the user "no results found." Zero errors in your logs.

The monitoring gap

LLM observability tools — Langfuse, Arize, LangSmith — monitor your agent's behavior: traces, token usage, latency, evaluation scores. They're watching your application.

But none of them monitor the MCP servers your agent depends on. When a tool schema changes upstream, your observability dashboard catches the symptoms (degraded output quality, user complaints) but not the cause (the tool schema drifted).

By the time you notice, the damage is done. Users got wrong answers. Workflows failed silently. Trust eroded.

How to monitor MCP tool schemas

The approach is the same one that works for REST APIs: establish a baseline and continuously diff against it.

For MCP servers specifically:

Connect to the MCP endpoint via Streamable HTTP
Call tools/list to discover the full tool catalog
Snapshot the schemas — tool names, parameter names, types, required flags, return types
Poll on a schedule — hourly, daily, whatever matches your risk tolerance
Diff each poll against the baseline — flag additions, removals, and modifications
Classify severity — new optional tool = informational, removed tool = breaking, renamed parameter = breaking

This is what we built into FlareCanary. You register an MCP endpoint the same way you register a REST endpoint — point it at the Streamable HTTP URL, set a check interval, and FlareCanary handles the initialize → tools/list lifecycle and tracks the tool catalog over time.

When a tool changes, you get an alert with the exact diff: what changed, what severity, and when it happened.

The MCP ecosystem is early — that's the risk

MCP servers are changing fast right now. The protocol is new. Implementations are evolving. Tool schemas are being refactored as maintainers figure out the right abstractions.

The scale is already real. MCP SDKs see 97 million monthly downloads across Python and TypeScript. Over 10,000 active servers are in the wild. Pinterest just published their production MCP deployment — a fleet of domain-specific servers handling 66,000 monthly invocations across 844 engineers. That's not experimentation. That's infrastructure.

And infrastructure needs monitoring.

This is exactly the period when monitoring matters most. Stable, mature APIs rarely surprise you. Fast-moving, actively-developed integrations surprise you constantly.

If your AI agents depend on MCP servers — and increasingly, they do — monitoring the tool schemas is not optional. It's the same operational hygiene that mature teams apply to REST API dependencies, adapted for a new protocol.

Getting started

If you want to try this today:

Sign up at flarecanary.com — free tier, no credit card
Add your MCP endpoint — paste the Streamable HTTP URL
FlareCanary discovers the tool catalog and establishes a baseline
You get alerts when tools change — email or webhook

Five endpoints free, daily checks, 7-day history. Enough to cover the MCP servers your most critical agents depend on.

FlareCanary monitors REST APIs and MCP servers for schema drift. Catch breaking changes before your users do.

DEV Community