DEV Community

Andy Salvo
Andy Salvo

Posted on • Originally published at agentrank.info

Your AI agent's trust score can be faked. Here's the one check that can't.

If you are building agents that pay other agents — x402, AP2, agent-to-agent commerce — you have already hit the question that has no clean answer: before my agent sends money to a counterparty it has never seen, how does it know that counterparty is real?

The industry's answer, so far, is a trust score. Identity, activity, reputation, work history, consistency — bundle them into a 0-100 number, show it to the agent, let the agent decide. It reads well. It also does not work, and we can now show that with data.

We ran the experiment

We built a preregistered study (design and hypotheses sealed to a public hash chain before any data were collected) across up to 13 frontier and low-cost models and over 2,600 real agent payment decisions. The setup was simple: put an autonomous agent in front of two counterparties and ask it who to pay.

  • One counterparty was honest and genuinely settlement-backed — it had actually been paid, on-chain, by real participants.
  • The other was a counterfeit: it displayed the surface of trust — impressive figures, an on-chain-styled but invalid reference — with nothing real behind it.

The agents chose the counterfeit 99% of the time.

They were not reasoning about whether the trust was real. They were pattern-matching the costume of verifiability. A claim merely labeled "verified" was obeyed even when it was false.

Then we changed one thing: instead of letting the agent read the displayed signal, we made it perform an actual verification — go check the real history. The correct-choice rate moved from 1% to 81%. And under exact surface mimicry, where the counterfeit copied the honest agent's display byte-for-byte, every displayed signal collapsed to chance (46%) while only the performed check recovered the truth.

Full paper and preregistration: Counterfeit Verifiability in Autonomous Agent Payments (DOI 10.5281/zenodo.21042364).

Why displayed trust is structurally broken

The result is not a quirk of one model. It is structural, and once you see it you cannot unsee it:

A signal that is cheap to display is cheap to fake.

Identity can be minted. Reputation feedback can be self-vouched — rings of agents rating each other five stars. Work history can be manufactured by a fresh wallet running a loop. Every dimension in a typical multi-dimensional trust score is a display, and a motivated counterparty controls its own display. Aggregating five fakeable signals into one number does not make the number harder to fake. It makes it easier to trust.

The one exception: settlement

There is exactly one signal a counterparty cannot cheaply fake: money it has actually received from participants who themselves have standing.

To fake "I have been paid $50,000 by 200 reputable agents," you have to actually be paid $50,000 by 200 reputable agents — which is the same thing the score is measuring. You cannot sybil real settlement the way you can sybil free feedback, because real settlement costs real money paid by parties who had something to lose.

This is why the reliable check is not "read the badge" but "perform the settlement check at decision time." Before your agent settles, ask an unfakeable question: has this counterparty actually been paid, by whom, and how much?

Doing it in practice

You do not have to build a settlement indexer. We run one over the x402 economy and expose it free and read-only. Any of these returns a 0-1000 settlement-grounded score for a wallet or domain:

# HTTP
curl https://api.agentrank.info/resolve/blockrun.ai
# -> { verified, score, settlement: { usd, payers }, verdict }
Enter fullscreen mode Exit fullscreen mode
# MCP: add the server, call check_agent_trust before you settle
https://api.agentrank.info/mcp

# A2A: agent card at
https://agentrank.info/.well-known/agent-card.json  (skill: verify_counterparty_reputation)
Enter fullscreen mode Exit fullscreen mode

A verified, high score means the counterparty has settled real value. A 0 means no settlement was found — not proof of fraud, but nobody has vouched with money, so treat it as unverified. The score is deterministic and recomputable from public inputs, and it is continuously stress-tested against collusion (in the latest sealed adversarial sweep, structured collusion rings at budgets up to $750 could not buy a top-10 position).

The takeaway

If you are shipping paying agents, do not gate payments on a displayed trust score — the data says your agent will believe a costume. Gate them on a performed check against something that cannot be cheaply faked. In the agent economy, that thing is settlement.

Free, read-only, no API key: agentrank.info/verify. By Crest Deployment Systems.

Top comments (0)