CI Tests Won't Save You from MCP Schema Drift

#mcp #ai #devops #monitoring

There's a growing category of tools that validate MCP server schemas in CI/CD pipelines. Run them on pull requests, catch schema mismatches before you deploy.

This is genuinely useful. But it solves the wrong half of the problem.

The MCP drift problem has two halves

Half 1: Your code drifts from the server. You change your agent code, but the MCP server's tool schemas haven't changed. CI testing catches this — run tests, verify your code still matches the tool definitions.

Half 2: The server drifts from your code. The MCP server updates its tool schemas, but you haven't deployed anything. CI doesn't run because you didn't change anything. Your agent keeps calling tools with the old parameter names, and the LLM silently adapts (or silently fails).

Half 2 is the dangerous one. And CI can't catch it by definition.

Why LLMs make this worse

When a REST API changes, your code throws an error. A missing field causes a TypeError. A renamed endpoint returns a 404. The failure is loud.

When an MCP tool schema changes, the LLM doesn't crash. It adapts. If a parameter gets renamed from search_query to query_text, the LLM might:

Pass the old parameter name and get an empty result
Interpret the empty result as "no data found" instead of "wrong parameter"
Tell the user "I couldn't find any matching documents" — a plausible lie

The agent looks healthy. No errors in your logs. No alerts from your monitoring. The user gets a wrong answer and has no way to know.

What CI testing actually catches

CI-based MCP schema validation is good at:

Schema-implementation mismatches: The tool says a parameter is optional, but the server actually requires it.
Regression testing: After you change something, verify it still works.
Type validation: Ensure your inputs match declared types.

These are real problems worth solving. If you're building MCP servers, you should absolutely have CI tests for your tool schemas.

What CI testing misses

Third-party MCP server changes: You don't run CI for someone else's server. When Stripe's MCP server renames a tool parameter, your pipeline doesn't trigger.
Between-deploy drift: The MCP server you depend on ships a breaking change on Saturday night. Your agent is broken from Saturday to Monday morning when someone finally notices.
Gradual schema evolution: A tool starts accepting a new optional parameter. Two weeks later, the old parameter gets deprecated. A month later, it's removed. CI only sees one snapshot.
Runtime behavior changes: The schema says a field is a string. It was always a URL. Now it's a UUID. The type didn't change, but your agent's downstream logic breaks.

The monitoring gap

Most teams have this setup:

CI/CD: Schema tests on deploy ✓
Staging: Smoke tests ✓
Production: Uptime checks ✓ (is the server responding?)
Drift monitoring: ??? (is the server responding *correctly*?)

The gap is in the last line. Your uptime check confirms the MCP server returns 200 OK. It doesn't check whether tools/list returns the same tool definitions it returned last week.

What continuous MCP monitoring looks like

Instead of (or in addition to) CI-time validation:

Poll tools/list on a schedule — hourly, daily, whatever fits your risk tolerance.
Diff the tool schemas against a known baseline — parameter names, types, required flags, descriptions.
Classify changes by severity — new optional parameter = informational. Renamed required parameter = breaking.
Alert on breaking changes — Slack, email, webhook, whatever gets to the right person.
Maintain a timeline — know when the change happened, not just that it happened.

This catches the Saturday-night breaking change before Monday morning. It catches the gradual deprecation cycle. It catches the third-party server update that your CI pipeline will never see.

CI and continuous monitoring are complementary

This isn't an either/or choice.

CI testing validates that your code works with the current MCP server schemas at deploy time. It's a pre-deployment gate.

Continuous monitoring validates that the MCP server schemas haven't changed since your last deploy. It's a post-deployment safety net.

If you only have CI tests, you're assuming the world doesn't change between your deploys. In a world where MCP servers are updated independently by different teams (or different companies), that assumption is broken.

The minimum viable setup:

CI schema validation for MCP servers you build
Continuous monitoring for MCP servers you depend on
Severity-based alerting so you're not drowning in noise

The first catches your mistakes. The second catches everyone else's.

FlareCanary monitors REST APIs and MCP servers for schema drift. Free tier covers 5 endpoints with daily checks — enough to monitor your most critical MCP dependencies.