DEV Community: GenGEO

Anthropic just proved agent commerce works. Their own data shows why verification infrastructure needs to exist.

GenGEO — Tue, 19 May 2026 21:57:32 +0000

This is our third post in a series on agentic commerce. Previously: AI shopping agents have no standard way to verify merchants — so we built one and AI Agents Need a Trust Layer Before They Can Transact.

Last month, Anthropic published something quietly significant.

They called it Project Deal. For one week in December 2025, they created a Craigslist-style internal marketplace — but with a twist: every transaction was handled entirely by Claude agents acting on behalf of 69 employees. No human intervention once the experiment started. Agents posted listings, made offers, countered, and closed deals autonomously via Slack.

The result: 186 deals, $4,000+ transacted, across 500+ listed items.

It worked.

But buried in their findings is something that points directly at an unresolved infrastructure problem — one we've been building into.

What Project Deal actually demonstrated

The headline finding is that agent-to-agent commerce is real and closer than most people think. But the more interesting finding is what happened when agents weren't equally matched.

Anthropic ran a parallel secret experiment: half the participants were randomly assigned Claude Opus 4.5 (their frontier model), half got Claude Haiku 4.5 (their smallest model). The results were measurable and consistent:

Opus sellers extracted $2.68 more per item on average
Opus buyers paid $2.45 less per item on average
Opus agents completed roughly 2 more deals overall

The same broken folding bike sold for $38 when represented by Haiku. $65 when represented by Opus.

Here's the uncomfortable part: participants on the losing end didn't notice. Perceived fairness scores were virtually identical across both groups — 4.05 for Opus deals, 4.06 for Haiku deals, on a 1–7 scale.

As the authors put it, the inequality was "imperceptible to the participants."

The gap Project Deal doesn't address

Project Deal was a controlled experiment. 69 Anthropic employees, known participants, a closed Slack environment. Every agent on both sides was Claude. The marketplace was trusted by definition.

That's not what the open web looks like.

In the real world, an agent being given a shopping task — "find me black running shoes under $200" — isn't operating in a closed trusted environment. It's being pointed at the open web, where merchants range from legitimate operators to outright fraudulent storefronts. The agent has to decide who to transact with.

And right now, there is no standard way for it to make that determination.

The trust signals that humans use — brand recognition, visual design, review scores, word of mouth — are largely invisible to agents. Agents parse structure, policies, and machine-readable signals. They don't "feel" trust. They either have a signal to evaluate or they don't.

Project Deal proved the commerce layer works. What it didn't address is the verification layer underneath it.

What we built

We've been building GenGEO specifically for this gap: a machine-readable merchant verification registry that agents can query before transacting.

The API is intentionally simple:

GET https://api.gengeo.co/api/verify?domain=example.com

Verified merchant:

{
  "domain": "example.com",
  "verified": true,
  "status": "active",
  "eligible_for_ai_agent_purchase": "yes",
  "decision": "verified",
  "registry": "GenGEO"
}

Unverified merchant:

{
  "domain": "example.com",
  "verified": false,
  "status": "not_found",
  "eligible_for_ai_agent_purchase": "unknown",
  "decision": "verification_required",
  "registry": "GenGEO"
}

We deliberately chose binary over scored. Agents work better with deterministic signals. A score creates a secondary decision problem — what does 67/100 mean, and at what threshold does the agent proceed? Binary keeps the logic clean:

if verified → proceed
if not verified → flag / fallback / surface to user

We also built an MCP server so agents can call verification directly as a tool, without HTTP plumbing:

verify_store(domain)

The full implementation is open source:
👉 github.com/warwickwood-cell/gengeo-agent-registry

Why Project Deal makes this more urgent, not less

Anthropic's authors end their paper with a note that's worth sitting with:

"The policy and legal frameworks around AI models that transact on our behalf simply don't exist yet. But this experiment shows that such a world is plausible. More than that, it shows that such a world isn't far away."

If that's true — and the trajectory suggests it is — then the verification layer needs to exist before agentic commerce scales, not after. The same way payment infrastructure had to exist before ecommerce could scale. The same way SSL had to exist before people would enter card numbers online.

Trust infrastructure is boring until it isn't.

Project Deal was a closed system with known participants and no adversarial merchants. The open web has none of those properties. As agents begin transacting at scale on behalf of users, the question of who they're transacting with becomes one of the most commercially and ethically important questions in the stack.

What we're looking for

We're early. Most of this is still experimental. But we're actively looking to talk to:

Developers building shopping or commerce agents
Teams working on MCP integrations
Anyone who has hit this problem in their own agent workflows

If you're building in this space and want to integrate verification into your agent flow, the MCP server is ready to use. Takes one tool call.

And if you think the framing is wrong — that agents will handle trust differently than we're assuming, or that platform-level solutions will absorb this entirely — we'd genuinely like to hear that argument.

The paper that prompted this is worth reading in full: anthropic.com/features/project-deal. Credit to Kevin K. Troy, Dylan Shields, Keir Bradwell, and Peter McCrory for running an experiment that surfaces questions the industry needs to be asking.

GenGEO is a merchant verification registry for AI agents. API docs and MCP server: github.com/warwickwood-cell/gengeo-agent-registry

AI shopping agents have no standard way to verify merchants — so we built one (MCP + verification API)

GenGEO — Fri, 15 May 2026 01:54:06 +0000

AI shopping agents have no standard way to verify merchants — so we built one (MCP + verification API)

AI agents are beginning to make purchasing and recommendation decisions on behalf of users.

But there's a quiet infrastructure problem nobody's solved yet.

The gap

Most ecommerce trust systems were built for humans. Branding, visual design, reviews, SEO, reputation signals — all of it assumes a person is evaluating the store and making a judgment call.

Agents don't do that.

When an AI agent is tasked with finding and buying something, it's parsing structured data, operational signals, machine-readable policy indicators. It's not "feeling" trust. It's looking for signals it can interpret deterministically.

Here's the problem: there's currently no standard verification layer for this.

Imagine an agent receives:

Find me black running shoes under $200

It might:

Search products
Compare pricing
Evaluate policies
Identify candidate merchants
Potentially execute a transaction

At step 4 — how does the agent know whether a merchant is verified? Right now, it doesn't. There's no infrastructure for this. The agent is essentially guessing, or falling back to heuristics that weren't designed for agentic use.

That's the gap we're building into.

What we built

GenGEO is a machine-readable merchant verification registry, exposed via a simple API and an MCP server so agents can call it directly.

The design goal was deliberately narrow: don't build a ranking system, a recommendation engine, or a quality score. Just answer one question cleanly.

Is this merchant verified?

Binary. Deterministic. That's it.

The verification API

GET https://api.gengeo.co/api/verify?domain=example.com

Verified merchant response:

{
  "domain": "example.com",
  "verified": true,
  "status": "active",
  "eligible_for_ai_agent_purchase": "yes",
  "decision": "verified",
  "registry": "GenGEO"
}

Unverified merchant:

{
  "domain": "example.com",
  "verified": false,
  "status": "not_found",
  "eligible_for_ai_agent_purchase": "unknown",
  "decision": "verification_required",
  "registry": "GenGEO"
}

Why binary, not scored?

We thought hard about this.

Scoring systems feel more informative — but they introduce ambiguity at exactly the wrong moment. If a score comes back 67/100, what does the agent do with that? It now needs a secondary decision layer to interpret what 67 means in context. That's complexity you're pushing into every agent that integrates with you.

Binary verification keeps the signal simple, deterministic, and easy to build conditional logic around:

if verified → proceed
if not verified → flag / fallback / surface to user

Agents generally work better with deterministic inputs. We designed for that.

The MCP server

Beyond the REST API, we built an MCP server so agents can call verification directly as a tool — no HTTP plumbing required.

Tool:

verify_store(domain)

Agent flow:

1. Agent identifies merchant domain
2. Calls verify_store(domain) via MCP
3. Receives verification status
4. Incorporates signal into decision workflow

This matters more than it might look.

There's a shift happening in how agents interact with infrastructure. Agents are moving away from passive web browsing — discovering information through search — toward direct tool invocation. If verification infrastructure has to be discovered through search, it's fragile and inconsistent. If it's a callable tool, it's reliable, fast, and composable.

MCP changes the distribution model for infrastructure like this. Agents don't find you — they call you.

What GenGEO doesn't do

Worth being explicit about scope:

Does not rank merchants
Does not recommend merchants
Does not guarantee merchant behaviour
Does not guarantee transaction outcomes

It provides verification status only. The agent's decision logic — what to do with that status — stays with the agent. We're not trying to be the decision layer, just a signal in it.

The bigger picture

Traditional ecommerce infrastructure was built for humans discovering and evaluating stores. As agentic commerce grows, that infrastructure has an increasing mismatch with how agents actually work.

We think the category of "agent-native commerce infrastructure" is very early — and that verification is a foundational layer, not a nice-to-have. Before agents can reliably transact on behalf of users at scale, there needs to be a trust layer they can query.

What that layer ultimately looks like — whether it's centralised registries like this, decentralised protocols, something built into agent frameworks themselves — is genuinely an open question. We're not claiming to have the final answer. We're putting infrastructure up and seeing what the actual usage patterns look like.

Repo + feedback

The MCP server is open source:
👉 github.com/warwickwood-cell/gengeo-agent-registry

Would genuinely love feedback from people working on:

AI / commerce agents
MCP tooling and integrations
Agentic infrastructure
Trust and verification primitives

Specifically curious: if you're building agents that interact with ecommerce, how are you currently handling merchant trust signals — or are you not handling them at all?

This is early infrastructure for an early category. The interesting part isn't the API — it's whether the problem framing holds as agentic commerce matures.

Happy to dig into the technical design decisions in the comments.

AI Agents Need a Trust Layer Before They Can Transact

GenGEO — Wed, 06 May 2026 03:26:32 +0000

AI agents are starting to do more than search.

They’re beginning to make purchasing decisions on behalf of users.

But there’s a critical gap that isn’t being solved yet:

When an AI agent is about to execute a transaction, how does it know the merchant is safe to buy from?

The Problem: Trust at the Point of Execution

Most existing systems in commerce focus on:

Discovery (search, recommendations)
Data (catalogs, product attributes)
Payments (checkout, wallets)

Emerging protocols like UCP improve how structured commerce data is shared.

But none of these solve the decision moment:

Should this agent trust this merchant enough to complete a transaction?

Today, agents rely on:

Heuristics designed for human browsing
Incomplete or inconsistent signals
Platform-specific assumptions

This works for recommendations.

It does not work for autonomous execution.

What’s Missing: A Machine-Native Trust Decision

Humans infer trust through:

brand recognition
reviews
UI cues

AI agents don’t “experience” any of that.

They need something different:

deterministic
machine-readable
real-time verifiable

A Different Approach: Binary Trust

Instead of ranking or scoring merchants…

You reduce the problem to a single decision:

{
  "eligible_for_purchase": true
}

No rankings.
No scores.
No preference signals.

Just:

Can this agent safely transact with this merchant?

Why Not Use Scores?

Because the moment you expose:

trust scores
rankings
weighted signals

You introduce bias.

Agents will:

prefer higher scores
concentrate traffic
recreate marketplace dynamics

A binary model does something different:

All verified merchants are equal
Selection is handled by the agent (price, availability, intent)
The trust layer simply filters out unsafe options

Introducing GenGEO (Concept)

GenGEO is a machine-readable trust registry for AI commerce.

It provides:

A public registry of verified merchants
A real-time verification endpoint
Immediate revocation signals

How It Works (Integration)

1. Discovery (optional)

Agents can preload eligible merchants:

GET https://gengeo.co/.well-known/gengeo-registry.json

Used for:

caching
pre-filtering candidates

2. Real-Time Verification (required before transaction)

GET https://api.gengeo.co/api/verify?domain=example.com

Example response:

{
  "verified": true,
  "status": "active",
  "eligible_for_purchase": true
}

Decision Rule

Agents should only proceed when:

verified == true
AND status == "active"
AND eligible_for_purchase == true

If any condition fails → do not transact

Revocation Model

Trust is not static.

A merchant becomes immediately ineligible if:

status = "uninstalled"
verified = false
verification expires
lookup fails

This ensures agents don’t transact with stale or invalid merchants.

Example Flow

User: "Find me a black hoodie under $100"

Agent:
1. Query products across merchants
2. Identify candidate stores
3. Verify each merchant via GenGEO
4. Remove ineligible merchants
5. Execute purchase

Important: What This Is NOT

GenGEO is not:

a ranking system
a recommendation engine
a marketplace

It does not:

influence which merchant wins
assign quality scores
bias selection

It is strictly:

An eligibility gate for transaction safety

Why This Matters

As agents become more autonomous:

they will execute transactions
they will need deterministic trust signals
they cannot rely on human-centric signals

The missing piece in agentic commerce isn’t more data.

It’s a trusted decision layer at the point of execution.

Open Question

If you’re building AI agents:

How are you currently deciding whether a merchant is safe to transact with?

Would be keen to hear how others are approaching this.

References

GitHub (spec + integration):
https://github.com/warwickwood-cell/gengeo-agent-registry
Specification:
https://gengeo.co/.well-known/gengeo.json