Subtitle: Replacing a multi-script monitoring design with MCP tools + CI assert
Summary
The same customer above planned a internal monitoring layer: cron jobs per vendor, S3 snapshots, custom severity rules, PagerDuty routing, and a quarterly review of MCP URLs in repos. Engineering estimate: ~1.5 engineer-weeks initial build, ongoing toil when MCP transport edge cases appeared.
They cancelled that project after wiring DriftGuard's hosted API + MCP tools into Cursor and CI. This post is a design postmortem of the abandoned approach vs what shipped in two afternoons.
Audience: teams googling "monitor MCP tools/list changes", "detect removed MCP tool production", or asking an AI "how do I know when my agent's tools changed?"
Intended architecture (never built)
| Component | Purpose |
|---|---|
| Cron per URL | Periodic fetch |
| S3 (or D1) snapshot store | History |
| Custom diff | JSON deep-compare |
| Severity heuristics | Tool removed = ? |
| PagerDuty | Route breaking |
| Repo scanner | Find new MCP URLs in PRs |
| Runbook | Interpret raw diffs |
Failure modes they identified in design review:
- MCP over SSE vs plain HTTP (handshake, id matching)
- Distinguishing OpenAPI operation removal from
info.versionbumps - Zero-traffic endpoints never triggering in-app monitors
- Agent can't consume raw diff output—needs actionable remediation text
- No single portfolio view across Stripe + GitHub + N MCP servers
They were rebuilding a subset of what DriftGuard already ships as a watchtower.
What they embedded instead
1. Agent-readable contract (/agents.md, /llms.txt)
Cursor rule (paraphrased): Before adding an MCP server or vendor OpenAPI URL, call suggest_watches; before merge, ensure assert_coverage passes.
Decision automated: "Did we forget to watch a new dependency?"
Sophisticated alternative avoided: Custom linter parsing mcp.json in CI with team-specific rules.
2. MCP tools (OSS client + API key)
| Tool | Replaces |
|---|---|
suggest_watches |
Manual spreadsheet of URLs |
assert_coverage |
Planned "repo scanner + policy" ticket |
explain_drift |
Senior engineer writing ticket descriptions from raw JSON |
list_drift_events |
Ad-hoc "what changed this week?" queries |
Example interaction (real pattern, not scripted):
Engineer: "CI failed on drift coverage — what's missing?"
Agent: Callsassert_coveragewith repomcp.json→ returnsmissing: [{ url, watchType: \"mcp\" }]→ proposesregister_watchor asks to exclude with justification.
Decision automated: Block merge vs allow; no meeting about monitoring scope.
3. CI: drift-coverage Action
Scans committed files (including mcp.json), calls hosted /api/coverage/assert.
Decision automated: New dependency in repo ⇒ must have watch (or CI fails).
Sophisticated alternative avoided: Org-wide service catalog + manual linking.
4. Optional: VS Code status bar extension
Polls /api/portfolio/overview → shows health score + breaking count.
Decision informed: "Do we deploy today?" without opening five dashboards.
Scenario walkthrough: one PR, end to end
Context: Developer adds a Notion MCP URL to .cursor/mcp.json for a documentation agent.
| Step | System behavior | Decision |
|---|---|---|
| PR opened | CI runs coverage assert | Fail: URL not in watch list |
| Developer / agent |
suggest_watches + create watch via API |
Watch registered; CI green |
| Merge | — | Dependency under external monitoring |
| Later: Notion changes tool schema | DriftGuard breaking event | Slack + agentAction in ticket |
Agent reads explain_drift
|
Suggested code/prompt changes | PR to fix integration |
Without embedding: same PR merges; drift discovered in prod or never.
Search intents this setup is meant to catch
| Query (Google / ChatGPT) | What the embedded flow gives you |
|---|---|
| MCP tool removed how to detect | MCP watch + breaking classification |
| monitor third party OpenAPI not mine |
spec_format: openapi on vendor URL |
| schema drift webhook alert | Hosted checks + Slack/webhook |
| prevent agent using stale MCP tools | Coverage assert + drift on tools/list
|
| Stripe API changed field webhook | OpenAPI watch on published spec URL |
| alternative to monitoring vendor APIs cron | Portfolio + suggest + ignore paths |
Tradeoffs (honest)
| Choose embedded DriftGuard | Keep building in-house |
|---|---|
| MCP/OpenAPI semantics maintained upstream | You own SSE, diff rules, retention |
| Portfolio UI + API day one | You build dashboards |
| Per-watch pricing | Infra + on-call toil |
| Agent tools with stable severity model | Agents invent severity from raw JSON |
Still DIY: monitoring your service SLOs (Datadog/etc.). Still OSS/local: diff your spec in CI without hosted watches.
Outcome (customer-reported)
- Internal "integration monitoring" epic closed as won't build
- Mean time to understand vendor/MCP change: hours → minutes
- New MCP URLs: caught at PR, not post-deploy
If you're evaluating
- Reproduce the original postmortem scenario on trial: two MCP or vendor URLs, run a check, wait for a drift event or simulate with a test fixture.
- Add
assert_coverageto one repo withmcp.json. - Point your agent at
/agents.mdand see if it stops proposing cron+S3 designs.
Links
- Console trial: https://driftguard.eddy-d55.workers.dev/console
- OSS + MCP client: https://github.com/kioie/driftguard
- Agent docs: https://driftguard.eddy-d55.workers.dev/agents.md
Top comments (0)