This is our third post in a series on agentic commerce. Previously: AI shopping agents have no standard way to verify merchants — so we built one and AI Agents Need a Trust Layer Before They Can Transact.
Last month, Anthropic published something quietly significant.
They called it Project Deal. For one week in December 2025, they created a Craigslist-style internal marketplace — but with a twist: every transaction was handled entirely by Claude agents acting on behalf of 69 employees. No human intervention once the experiment started. Agents posted listings, made offers, countered, and closed deals autonomously via Slack.
The result: 186 deals, $4,000+ transacted, across 500+ listed items.
It worked.
But buried in their findings is something that points directly at an unresolved infrastructure problem — one we've been building into.
What Project Deal actually demonstrated
The headline finding is that agent-to-agent commerce is real and closer than most people think. But the more interesting finding is what happened when agents weren't equally matched.
Anthropic ran a parallel secret experiment: half the participants were randomly assigned Claude Opus 4.5 (their frontier model), half got Claude Haiku 4.5 (their smallest model). The results were measurable and consistent:
- Opus sellers extracted $2.68 more per item on average
- Opus buyers paid $2.45 less per item on average
- Opus agents completed roughly 2 more deals overall
The same broken folding bike sold for $38 when represented by Haiku. $65 when represented by Opus.
Here's the uncomfortable part: participants on the losing end didn't notice. Perceived fairness scores were virtually identical across both groups — 4.05 for Opus deals, 4.06 for Haiku deals, on a 1–7 scale.
As the authors put it, the inequality was "imperceptible to the participants."
The gap Project Deal doesn't address
Project Deal was a controlled experiment. 69 Anthropic employees, known participants, a closed Slack environment. Every agent on both sides was Claude. The marketplace was trusted by definition.
That's not what the open web looks like.
In the real world, an agent being given a shopping task — "find me black running shoes under $200" — isn't operating in a closed trusted environment. It's being pointed at the open web, where merchants range from legitimate operators to outright fraudulent storefronts. The agent has to decide who to transact with.
And right now, there is no standard way for it to make that determination.
The trust signals that humans use — brand recognition, visual design, review scores, word of mouth — are largely invisible to agents. Agents parse structure, policies, and machine-readable signals. They don't "feel" trust. They either have a signal to evaluate or they don't.
Project Deal proved the commerce layer works. What it didn't address is the verification layer underneath it.
What we built
We've been building GenGEO specifically for this gap: a machine-readable merchant verification registry that agents can query before transacting.
The API is intentionally simple:
GET https://api.gengeo.co/api/verify?domain=example.com
Verified merchant:
{
"domain": "example.com",
"verified": true,
"status": "active",
"eligible_for_ai_agent_purchase": "yes",
"decision": "verified",
"registry": "GenGEO"
}
Unverified merchant:
{
"domain": "example.com",
"verified": false,
"status": "not_found",
"eligible_for_ai_agent_purchase": "unknown",
"decision": "verification_required",
"registry": "GenGEO"
}
We deliberately chose binary over scored. Agents work better with deterministic signals. A score creates a secondary decision problem — what does 67/100 mean, and at what threshold does the agent proceed? Binary keeps the logic clean:
if verified → proceed
if not verified → flag / fallback / surface to user
We also built an MCP server so agents can call verification directly as a tool, without HTTP plumbing:
verify_store(domain)
The full implementation is open source:
👉 github.com/warwickwood-cell/gengeo-agent-registry
Why Project Deal makes this more urgent, not less
Anthropic's authors end their paper with a note that's worth sitting with:
"The policy and legal frameworks around AI models that transact on our behalf simply don't exist yet. But this experiment shows that such a world is plausible. More than that, it shows that such a world isn't far away."
If that's true — and the trajectory suggests it is — then the verification layer needs to exist before agentic commerce scales, not after. The same way payment infrastructure had to exist before ecommerce could scale. The same way SSL had to exist before people would enter card numbers online.
Trust infrastructure is boring until it isn't.
Project Deal was a closed system with known participants and no adversarial merchants. The open web has none of those properties. As agents begin transacting at scale on behalf of users, the question of who they're transacting with becomes one of the most commercially and ethically important questions in the stack.
What we're looking for
We're early. Most of this is still experimental. But we're actively looking to talk to:
- Developers building shopping or commerce agents
- Teams working on MCP integrations
- Anyone who has hit this problem in their own agent workflows
If you're building in this space and want to integrate verification into your agent flow, the MCP server is ready to use. Takes one tool call.
And if you think the framing is wrong — that agents will handle trust differently than we're assuming, or that platform-level solutions will absorb this entirely — we'd genuinely like to hear that argument.
The paper that prompted this is worth reading in full: anthropic.com/features/project-deal. Credit to Kevin K. Troy, Dylan Shields, Keir Bradwell, and Peter McCrory for running an experiment that surfaces questions the industry needs to be asking.
GenGEO is a merchant verification registry for AI agents. API docs and MCP server: github.com/warwickwood-cell/gengeo-agent-registry
Top comments (0)