What the agent stack is still missing

#agents #ai #cryptocurrency #web3

This week the agent economy narrative crystallized in three posts.

Cameron Winklevoss (Gemini): "Humans may have built crypto, but crypto is not so much money for humans as it is money for machines."

Brian Armstrong (Coinbase): launched Agentic.market, a discovery layer where AI agents find and pay for services over x402.

t54.ai: "Every check in today's financial stack was designed around a human. Signatures, IDs, clicks, chargebacks. When an AI agent is the one transacting, each of those checks has a gap."

Three different angles, one convergent thesis: agents are becoming first-class economic actors, and the existing stack doesn't fit them.

Payments have a shipped answer (x402). Discovery now has a shipped answer (Agentic.market). The question I've been sitting with is what sits underneath both of those:

When an agent calls a service, how does it know the service is trustworthy in practice, not just in documentation?

That's the trust layer. It's the one that's still missing — and it's the one I've been building.

The gap

A signed transaction proves an agent authorized a call. It doesn't prove the call was safe to make.

The repo can look well-maintained and still ship a buggy release.
The marketplace listing can be legitimate and still be an attack (see the Ox Security research on MCP marketplace poisoning published April 16).
The provider can be fine at T=0 and compromised at T=30 days.

These are problems payments don't solve. Discovery doesn't solve them either — an agent finding a service via Agentic.market still needs to know if that service has been acting suspiciously over the last 1,000 calls.

t54.ai's framing — "each of those checks has a gap" — applies one layer lower than they were writing about. The same gap exists for which services an agent should call at all.

What a trust layer actually is

Three things, in order of difficulty:

Signed receipts — an attestation that agent A called server B, dual-signed, hashes only (no raw content).
Aggregation with defense — receipts feed a score. The scoring must be Byzantine-robust or the whole thing is theater.
Live scores agents can query before calling — one HTTP GET, no auth, no SDK.

Code is the easy part. The hard parts are:

Cold start. A trust layer with no receipts is useless. A trust layer with 10 receipts is misleading.
Caller diversity. If one participant dominates the dataset, you're scoring their experience, not the server's.
Adversarial robustness. Someone will try to tank a competitor's score. The math has to make that expensive.

The XAIP receipt layer

I shipped one implementation of this. If you want the hook-level walkthrough, the first article covers installation and the developer-facing side.

Briefly:

Ed25519-signed receipts per MCP tool call (hashed I/O only)
Public Cloudflare Worker aggregator, Bayesian scoring, per-server flags (high_error_rate, low_caller_diversity, etc.)
One-command Claude Code hook that consumes the scores and contributes receipts

Live scores right now (8 servers, ~1,500 receipts, small but real):

memory      0.800  trusted
git         0.775  trusted
sqlite      0.753  trusted
puppeteer   0.671  caution  (high_error_rate)
context7    0.618  caution  (low_caller_diversity)
filesystem  0.579  caution  (low_caller_diversity)
playwright  0.394  low_trust (high_error_rate)
fetch       0.365  low_trust (high_error_rate)

curl https://xaip-trust-api.kuma-github.workers.dev/v1/trust/context7

Why this is an ecosystem problem, not a product

A trust layer only works if many independent participants contribute receipts. One person running it alone — which is the current state of XAIP — triggers low_caller_diversity on every high-volume server. That's not a bug; that's the flag working correctly. It's literally telling you not to trust the scores until more callers are in the dataset.

So I'm not pitching a product. I'm asking: if you're building in the agent space and you think trust scoring is a layer that should exist, contribute receipts. Or run an aggregator node (the spec is in the repo, BFT quorum is the next milestone). Or tell me why the design is wrong.

Stack picture

Agent economy layers (rough)
───────────────────────────────
Payments       → x402 (shipped)
Discovery      → Agentic.market (shipped)
Trust scoring  → XAIP + ?          (small, needs company)
Identity       → DID / passkeys    (fragmented)

XAIP is one attempt at the trust row. Almost certainly not the final one — but the row has to get filled, and waiting for Anthropic or a well-funded startup to do it means the first large-scale MCP compromise happens before the layer exists.

Links

Live dashboard: https://xkumakichi.github.io/xaip-protocol/ (scores auto-refresh, no auth)
Previous article: https://dev.to/xkumakichi/a-claude-code-hook-that-warns-you-before-calling-a-low-trust-mcp-server-ckk
Repo: https://github.com/xkumakichi/xaip-protocol (MIT, zero deps)
npm: https://www.npmjs.com/package/xaip-claude-hook
Trust API: https://xaip-trust-api.kuma-github.workers.dev/v1/trust/context7

If you're working on adjacent layers — payment, discovery, identity for agents — I'd be glad to compare notes. The interesting question isn't whose trust layer wins; it's whether any trust layer exists by the time the stack starts mattering.

— xkumakichi