This week the agent economy narrative crystallized in three posts.
Cameron Winklevoss (Gemini): "Humans may have built crypto, but crypto is not so much money for humans as it is money for machines."
Brian Armstrong (Coinbase): launched Agentic.market, a discovery layer where AI agents find and pay for services over x402.
t54.ai: "Every check in today's financial stack was designed around a human. Signatures, IDs, clicks, chargebacks. When an AI agent is the one transacting, each of those checks has a gap."
Three different angles, one convergent thesis: agents are becoming first-class economic actors, and the existing stack doesn't fit them.
Payments have a shipped answer (x402). Discovery now has a shipped answer (Agentic.market). The question I've been sitting with is what sits underneath both of those:
When an agent calls a service, how does it know the service is trustworthy in practice, not just in documentation?
That's the trust layer. It's the one that's still missing — and it's the one I've been building.
The gap
A signed transaction proves an agent authorized a call. It doesn't prove the call was safe to make.
- The repo can look well-maintained and still ship a buggy release.
- The marketplace listing can be legitimate and still be an attack (see the Ox Security research on MCP marketplace poisoning published April 16).
- The provider can be fine at T=0 and compromised at T=30 days.
These are problems payments don't solve. Discovery doesn't solve them either — an agent finding a service via Agentic.market still needs to know if that service has been acting suspiciously over the last 1,000 calls.
t54.ai's framing — "each of those checks has a gap" — applies one layer lower than they were writing about. The same gap exists for which services an agent should call at all.
What a trust layer actually is
Three things, in order of difficulty:
- Signed receipts — an attestation that agent A called server B, dual-signed, hashes only (no raw content).
- Aggregation with defense — receipts feed a score. The scoring must be Byzantine-robust or the whole thing is theater.
- Live scores agents can query before calling — one HTTP GET, no auth, no SDK.
Code is the easy part. The hard parts are:
- Cold start. A trust layer with no receipts is useless. A trust layer with 10 receipts is misleading.
- Caller diversity. If one participant dominates the dataset, you're scoring their experience, not the server's.
- Adversarial robustness. Someone will try to tank a competitor's score. The math has to make that expensive.
The XAIP receipt layer
I shipped one implementation of this. If you want the hook-level walkthrough, the first article covers installation and the developer-facing side.
Briefly:
- Ed25519-signed receipts per MCP tool call (hashed I/O only)
- Public Cloudflare Worker aggregator, Bayesian scoring, per-server flags (
high_error_rate,low_caller_diversity, etc.) - One-command Claude Code hook that consumes the scores and contributes receipts
Live scores right now (8 servers, ~1,500 receipts, small but real):
memory 0.800 trusted
git 0.775 trusted
sqlite 0.753 trusted
puppeteer 0.671 caution (high_error_rate)
context7 0.618 caution (low_caller_diversity)
filesystem 0.579 caution (low_caller_diversity)
playwright 0.394 low_trust (high_error_rate)
fetch 0.365 low_trust (high_error_rate)
curl https://xaip-trust-api.kuma-github.workers.dev/v1/trust/context7
Why this is an ecosystem problem, not a product
A trust layer only works if many independent participants contribute receipts. One person running it alone — which is the current state of XAIP — triggers low_caller_diversity on every high-volume server. That's not a bug; that's the flag working correctly. It's literally telling you not to trust the scores until more callers are in the dataset.
So I'm not pitching a product. I'm asking: if you're building in the agent space and you think trust scoring is a layer that should exist, contribute receipts. Or run an aggregator node (the spec is in the repo, BFT quorum is the next milestone). Or tell me why the design is wrong.
Stack picture
Agent economy layers (rough)
───────────────────────────────
Payments → x402 (shipped)
Discovery → Agentic.market (shipped)
Trust scoring → XAIP + ? (small, needs company)
Identity → DID / passkeys (fragmented)
XAIP is one attempt at the trust row. Almost certainly not the final one — but the row has to get filled, and waiting for Anthropic or a well-funded startup to do it means the first large-scale MCP compromise happens before the layer exists.
Links
- Live dashboard: https://xkumakichi.github.io/xaip-protocol/ (scores auto-refresh, no auth)
- Previous article: https://dev.to/xkumakichi/a-claude-code-hook-that-warns-you-before-calling-a-low-trust-mcp-server-ckk
- Repo: https://github.com/xkumakichi/xaip-protocol (MIT, zero deps)
- npm: https://www.npmjs.com/package/xaip-claude-hook
- Trust API: https://xaip-trust-api.kuma-github.workers.dev/v1/trust/context7
If you're working on adjacent layers — payment, discovery, identity for agents — I'd be glad to compare notes. The interesting question isn't whose trust layer wins; it's whether any trust layer exists by the time the stack starts mattering.
— xkumakichi
Top comments (0)