DEV Community

xaip-agent
xaip-agent

Posted on

A Claude Code hook that warns you before calling a low-trust MCP server

Last week researchers at Ox published findings showing that the MCP STDIO transport lets arbitrary command execution slip through unchecked, and that 9 of 11 MCP marketplaces they tested were poisonable. Anthropic's response: STDIO is out of scope for protocol-level fixes, the ecosystem is responsible for operational trust.

Fair — Anthropic donated MCP to the Linux Foundation's Agentic AI Foundation in December 2025 specifically so independent infrastructure could grow around it. But that leaves a real gap for anyone running Claude Code today: how do you know whether an MCP server you're about to invoke is trustworthy?

The Anthropic official registry is pure metadata (license, commit count, popularity). mcp-scorecard.ai scores repos, not behavior. BlueRock runs OWASP-style static scans. None of these ask the one question that actually matters:

Does this MCP server, in real call-time use, work?

So I built a small thing to answer it.

The hook

A zero-config Claude Code hook that does two things on every MCP tool call:

  1. Before the call — queries a public trust API for that server. If the score is low, Claude shows an inline warning:
   ⚠ XAIP: "some-server" trust=0.32 (caution, 87 receipts) Risk: high_error_rate
Enter fullscreen mode Exit fullscreen mode
  1. After the call — emits an Ed25519-signed receipt (success, latency, hashed input/output) to a public aggregator that updates the score.

Install:

npm install -g xaip-claude-hook
xaip-claude-hook install
Enter fullscreen mode Exit fullscreen mode

Next MCP call fires the hook. That's the whole UX.

What a receipt looks like

No raw content leaves your machine — only hashes.

{
  "agentDid":      "did:web:context7",
  "callerDid":     "did:key:a1c6cd34…",
  "toolName":      "resolve-library-id",
  "taskHash":      "9f3e…",   // sha256(input).slice(0,16)
  "resultHash":    "1b78…",   // sha256(response).slice(0,16)
  "success":       true,
  "latencyMs":     668,
  "failureType":   "",
  "timestamp":     "2026-04-17T04:24:59.925Z",
  "signature":     "...",     // Ed25519 over canonical JSON (agent key)
  "callerSignature": "..."    // Ed25519 over canonical JSON (caller key)
}
Enter fullscreen mode Exit fullscreen mode

The aggregator rejects anything that fails signature verification. The trust API computes a Bayesian score across all verified receipts per server, weighted by caller diversity — so one enthusiastic installer can't fake a reputation.

What the scores actually look like right now

Being transparent: the dataset is small. A curl against the live trust API today:

Server Trust Verdict Receipts Flag
memory 0.800 trusted 112
git 0.775 trusted 35
sqlite 0.753 trusted 42
puppeteer 0.671 caution 32 high_error_rate
context7 0.618 caution 560 low_caller_diversity
filesystem 0.579 caution 610 low_caller_diversity
playwright 0.394 low_trust 37 high_error_rate
fetch 0.365 low_trust 36 high_error_rate

Verify any of these yourself:

curl https://xaip-trust-api.kuma-github.workers.dev/v1/trust/context7
Enter fullscreen mode Exit fullscreen mode

The low_caller_diversity flag on high-volume servers is the single most honest number in that table. It means: I'm the biggest caller right now, and that's exactly the problem this tool is supposed to solve. The flag only clears when independent installers start generating receipts — which is what the npm package is for.

Why this is architecturally different from existing approaches

Every other "MCP trust" project I've seen scores the repository:

  • Commit frequency, license, stars, contributor count (mcp-scorecard.ai)
  • Static source-code vulnerability scans (BlueRock)
  • Registry inclusion as implicit trust (official MCP registry)

These are useful proxies, but none of them tell you whether a server works in practice. A well-maintained repo can have a buggy release; a single-author repo can be rock solid; a newly-forked malicious repo looks identical to the original under static scan.

XAIP scores observed behavior. Every call is a signed attestation. The scoring is Bayesian, so:

  • Servers with few receipts get insufficient_data — no verdict, no warning
  • High-variance patterns (mixed success/failure) get lower confidence
  • The high_error_rate flag is computed from real response content, classifying quota exceeded, rate limit, unauthorized, and "isError": true as failures

This is the same philosophy as OpenSSF Scorecard vs. runtime attestation in supply chain: you want both, but only one of them catches regressions in production.

What's missing / where this could go wrong

I want to be specific about limitations, because "AI trust protocol" posts tend to overpromise:

  • ~10 servers, ~1500 receipts total. Small. This post is partly an ask for installers to fix that.
  • One aggregator node. Byzantine fault tolerance requires quorum; right now there's one Cloudflare Worker. Quorum needs multiple operators, which is the next milestone.
  • Client-side inferSuccess is heuristic. We look at response text for error patterns. False positives and negatives are possible — fetch's 36% error rate might be over-counted (legit 404s shouldn't hurt the server's score) or real.
  • Privacy model relies on hashes, not ZK. Inputs and outputs are hashed before transmission, but statistical correlation across taskHashes is possible in principle. Migration to ZK receipt aggregation is a future idea, not a current feature.
  • I personally generated most of the high-volume receipts. The low_caller_diversity flag you see on context7 and filesystem is me.

Running it yourself

npm install -g xaip-claude-hook
xaip-claude-hook install
xaip-claude-hook status
Enter fullscreen mode Exit fullscreen mode

Open a new Claude Code session. Call any MCP tool. Check:

cat ~/.xaip/hook.log
Enter fullscreen mode Exit fullscreen mode

You'll see lines like:

2026-04-17T04:24:59Z POST context7/resolve-library-id ok=true lat=668ms → 200
Enter fullscreen mode Exit fullscreen mode

And the next time you (or Claude) invoke a low-trust server, the warning shows up inline.

Uninstall is a single command. Keys under ~/.xaip/ persist — delete manually to wipe.

Links

Issues, scoring bugs, angry takes — all welcome on GitHub. If you maintain an MCP server and your score looks wrong, I want to hear about it first.

Top comments (0)