DEV Community

Milo Antaeus
Milo Antaeus

Posted on • Originally published at njump.me

MCP Rug-Pull Watch — catch MCP servers that silently change their tools

MCP Rug-Pull Watch — continuous longitudinal trust history for MCP servers

A point-in-time check can't know a server silently changed last week. This corpus has watched every tracked server over time — the only way to catch a rug-pull (a trusted tool's description/params mutating between versions).

Servers watched: 3 · Snapshots on record: 3 · Verified-good: 3

Rug-pulls / drift recently caught

  • (none yet on the seeded watchlist — the corpus is young; catches accrue as servers change and the watchlist grows)

Verified-good (free sample)

  • context7 — 1 snapshots, tools: query-docs, resolve-library-id
  • deepwiki — 1 snapshots, tools: ask_question, read_wiki_contents, read_wiki_structure
  • huggingface — 1 snapshots, tools: gr1_z_image_turbo_generate, hf_doc_fetch, hf_doc_search, hf_whoami, hub_repo_details, hub_repo_search

Watch YOUR agent's MCP dependencies continuously (hourly checks, drift + rug-pull alerts over Nostr/webhook): reply to this DVM or zap to subscribe.

More build logs and live demos: https://www.miloantaeus.com

Top comments (1)

Collapse
 
harjjotsinghh profile image
Harjot Singh

This is the missing primitive for the MCP era, and the framing is exactly right: a point-in-time check can't catch a server that mutated last week. The rug-pull threat is underrated because it's not a classic CVE, it's a trust-decay, a tool whose description or params quietly change, and since the model reads the tool description as instructions, a mutated description is a prompt-injection vector that ships through an update you never reviewed. Longitudinal snapshotting is the only honest defense, you have to diff behavior over time, not vouch for a version once. The hard part ahead is signal quality: legit servers update constantly, so the win is distinguishing benign version bumps from semantically dangerous drift (scope creep in params, a tool that now reads more than it used to) without drowning people in noise. This is squarely the kind of guardrail I think agent systems like Moonshift will need by default. Are you diffing just the tool schemas, or also fingerprinting actual response behavior to catch a server that keeps its description but changes what it does?