Five ways an x402 payment can go wrong — and which ones you can catch before your agent pays

#x402 #ai #security #webdev

x402 turns an HTTP 402 Payment Required into something an autonomous agent can actually act on: the server quotes a price, your agent pays in stablecoin, the request goes through. No human in the loop. That is the whole point — and also the whole problem.

The moment a program can move money without you watching, "discovery" and "payment" collapse into a single step. Your agent finds an endpoint and pays it in the same breath. If anything about that endpoint is wrong — the address, the price, the destination — the money is already gone by the time you would have noticed.

There is now a small but real body of work on how this gets exploited. Five Attacks on x402 and A402 (both on arXiv), Halborn's writeup, and AgentLISA's position paper all converge on the same handful of attack vectors. I read through them while building Frisk, an open-source pre-transaction screening library, and I want to do something specific in this post: walk the documented attacks, and for each one, be honest about whether you can catch it locally, before the payment, in your own code — or whether it fundamentally requires data you don't have on your machine.

That line — local-and-deterministic vs. needs-reputation-data — turns out to be the most useful way to think about agent payment safety. So let's draw it.

The attack surface

Pulling from the papers above, the recurring vectors are:

Dynamic payTo swap. In x402 V2 the destination address can change per request. A seller (or a man-in-the-middle) quotes you address A, then returns address B in the actual payment requirement. Your agent pays B.
Malicious 402 / overcharging. The endpoint quotes an absurd price, or a price that drifts upward across calls, and a naive agent just pays whatever the 402 says.
Insecure transport. The quote — including the address you're about to pay — arrives over plaintext HTTP, where anyone on the path can rewrite it.
Sybil-induced discovery. An attacker floods a discovery surface with fake, well-reviewed-looking endpoints to steer your agent toward a wallet they control.
Prompt-injection-to-payment. Content the agent reads convinces it to send funds somewhere it shouldn't.

Here's the part nobody says out loud: some of these are checkable with pure local logic, and some are not. Conflating them is why "agent payment security" sounds harder than it is.

What you can catch locally, before the payment

Vectors 1, 2, and 3 are structural. You don't need a reputation graph or a threat feed to catch them — you need a few deterministic checks run against the request the instant before your agent signs it. No network call, no service dependency, no trust in a third party.

This is exactly the slice Frisk's lite mode handles. It runs entirely on your machine, ships with zero runtime dependencies, and returns a verdict — allow, review, or block — with reasons. Here's the whole thing in use:

import { Client } from "frisk-screen";

const client = new Client(); // lite mode, no API key

const result = await client.screen("0x9a3f1b2c3d4e5f60718293a4b5c6d7e8f9a0bc12", {
  endpoint: "https://api.seller.x402/quote",
  amount: 2.5,
  asset: "USDC",
  observedPayTo: quote.payTo,        // what the endpoint actually told us to pay
  policy: { maxPerCall: 5.0, allowedAssets: ["USDC"] },
});

if (!result.allowed) {
  console.log(result.verdict, result.reasons);
  // e.g. "block", ["payTo differs from the expected counterparty"]
}

(There's a Python package with the identical API — pip install frisk-screen, same screen() call.)

Now map each check back to an attack:

Dynamic payTo swap → catch it. You know the counterparty you intended to pay. You also have the payTo the endpoint actually returned. If they differ, that's the V2 swap attack, and it's a one-line comparison:

if (request.observedPayTo &&
    request.observedPayTo.toLowerCase() !== counterparty) {
  // the address moved between quote and payment — don't pay
}

This is the single most valuable local check, because the swap is invisible to a human reviewing code — it only happens at runtime, per request.

Overcharging → catch it with policy. You can't know the "fair" price of an arbitrary endpoint without market data, but you absolutely know your own limits. A per-call ceiling and an asset allowlist are deterministic and offline:

if (policy.maxPerCall !== undefined && amount > policy.maxPerCall) { /* review */ }
if (policy.allowedAssets && !policy.allowedAssets.includes(asset)) { /* review */ }

This won't tell you a $2 call should cost $0.05. It will stop your agent from silently paying $400 because a malicious 402 said so. Most overcharging damage is just the absence of a spending limit.

Insecure transport → catch it. If the quote that carries the payment address came over http://, the address is untrustworthy on arrival. Refuse to act on it:

if (endpoint && !endpoint.toLowerCase().startsWith("https://")) { /* downgrade */ }

Plus the obvious hygiene: is the counterparty even a well-formed address? A malformed counterparty is either a bug or a probe, and either way you shouldn't pay it. (Lite also runs a local seed blocklist — an offline check against known-bad addresses; the live, continuously updated list is the one thing here that belongs to the hosted service.)

That's a handful of deterministic checks, all running before a single token moves, all in code you can read in one file. No service to trust. This is the floor every x402 agent should have, and it's the part I made free and MIT precisely because it shouldn't be behind anyone's API — including mine.

What you cannot catch locally — and where I'll be honest

Vectors 4 and 5 — Sybil discovery and prompt-injection-to-payment — are different in kind. A locally-running function genuinely cannot know that an address belongs to a Sybil cluster, or that an endpoint with a clean-looking history has been quietly draining wallets for a week. That requires reputation data: a graph of who-paid-whom across many agents, accumulated over time. No amount of clever offline code substitutes for it.

And there's a third category the papers above actually spend most of their pages on, which no screening library — lite or hosted — should claim to fix: the protocol- and settlement-layer attacks. Payment replay; the settlement races where a server delivers before payment finalizes (the "paid-but-denied" and "unpaid-service" outcomes the Five Attacks paper centers on); facilitator trust and economic DoS against endpoints. Those live in the x402 spec, the facilitator, and the on-chain settlement path — not in the request your agent is about to sign. Frisk screens the counterparty and the shape of the transaction; it does not, and cannot, repair the protocol underneath it. So this post is deliberately scoped to the vectors a pre-payment check can actually touch — pretending a screening call closes a replay or atomicity hole would be the other half of how agent-payment security gets oversold.

So lite mode is upfront about this: it always reports "low" confidence and it does not claim to detect Sybil attacks. Pretending a local check can catch a reputation problem is how you ship false confidence, which is worse than no check at all.

This is the line between the open-source library and the hosted service — and I'd rather state it plainly than blur it for a pitch. The hosted side of Frisk is where reputation history and threat intelligence would live, and it is early; the part I'm comfortable telling every x402 developer to install today is the deterministic floor above. If you're shipping an agent that pays, start there. The five checks cost you nothing and close the attacks that are actually closeable in your own process.

Takeaways

Agent payment safety isn't one problem. It's three: structural checks you can do before paying, reputation you have to source from data, and protocol/settlement holes that sit below any screening call. Solve them separately — and don't let a tool for one pretend to cover the others.
The deterministic floor — payTo-swap detection, spending policy, transport and address sanity — catches three of the five documented x402 attack classes, with no network call and no trust in anyone. Ship it.
Be skeptical of anything claiming to detect Sybil/reputation attacks with purely local logic. That category needs data, and honesty about the boundary is the whole game.

Frisk is MIT and the lite engine is dependency-free: npm i frisk-screen / pip install frisk-screen. Source, threat-model notes, and the (short, readable) check logic are on GitHub. If you're building on x402 and you catch a vector I missed, open an issue — that's exactly the kind of thing this should accrete.

The reputation-backed hosted tier is in early access. If that's the part you need, email support@tryfrisk.dev — but the deterministic floor above is free, MIT, and yours to ship today regardless.

Sources referenced: "Five Attacks on x402" and "A402" (arXiv); Halborn, "x402 Explained: Security Risks and Controls"; AgentLISA x402 security position paper.