Benji Fisher

Posted on May 14 • Originally published at ucpchecker.com

The State of Agentic Commerce — May 2026

#ecommerce #ai #data #ucp

In April, the story was a platform pulling a lever: Shopify migrated its entire UCP fleet to v2026-04-08 in four days, BigCommerce showed up with three stores, and we said the question for May was which platform ships next — because every prior jump in the directory had been a step function caused by a platform-level deployment.

May's answer: none, and it didn't matter. No platform shipped a UCP wave this month. BigCommerce still has three verified stores. WooCommerce still has three. Salesforce Commerce Cloud still has none verified, though a custom build is reportedly in development. And the directory still grew ~32% — the same rate as April — because the baseline discovery rate stepped up. For the first time since we started this report, UCP grew on a slope instead of a staircase.

This is the fourth monthly state-of-the-ecosystem report from UCP Checker. Here's what the data says as of May 12, 2026.

The numbers

5,294 verified UCP stores (up from 4,014 in April, +32%)
5,892 total domains tracked
1,829 new merchants discovered this month; 775 this week alone
5,264 verified stores on the latest v2026-04-08 spec (99.4%)
5,235 verified stores at A grade on UCP Score (98.9%)

Three consecutive months of ~30% growth is a real curve now, not a launch artifact. But the shape changed. February was discovery (first 1,000 Shopify stores). March was expansion (crossed 3,000, first non-Shopify manifests). April was consolidation (the four-day Shopify spec migration). May is the first month where the headline growth came from neither a new platform nor a spec event — it came from crawler optimisations we shipped in early May. The stores were always out there; we just got faster at finding them.

That distinction matters for forecasting. If May's growth had been platform-driven, you'd model the next jump as "wait for SFCC." Since it's discovery-rate-driven, the model is different: the directory keeps filling at a steady clip until either we exhaust the discoverable Shopify long tail, or a platform finally ships a wave and the staircase resumes. Both will happen; the order is the open question.

Shopify's head start, four months in

Platform	Monitored	Verified	Verified %	Avg score (verified)	Avg manifest latency
Shopify	5,242	5,241	~100%	92.5	178 ms
Custom & Headless	642	45	7.0%	83.0	356 ms
WooCommerce	3	3	100%	92.3	1,023 ms
BigCommerce	3	3	100%	88.3	993 ms
Magento	1	1	100%	85.0	218 ms
PrestaShop	1	1	100%	84.0	548 ms

Shopify is 99% of the verified directory — unchanged from April. Every non-Shopify platform combined sums to 53 verified stores, the same as last month. The head start is still the dominant signal in the data, and the Custom & Headless cohort is the mirror image: 642 domains attempted UCP, only 45 got to verified (a 7% completion rate). When a platform hands you the boilerplate, you compound; when you build it yourself, most attempts stall before validation. That's a tooling gap, not a spec problem.

The more interesting movement came from two more platforms shipping UCP support — Bareconnect and Selly.io — both of which already have verified stores live in the directory today, not roadmap promises. The numbers are still small. How either platform is exposing UCP (default for every storefront, opt-in, or a paid tier) decides whether this stays a handful or turns into a wave — that detail we don't know yet. But it's the first new platform movement since the Shopify migration.

Two structural notes on the table. BigCommerce and WooCommerce manifests run ~1 second versus Shopify's 178 ms because they're served from the storefront origin rather than a CDN-cached endpoint — a meaningful handicap as agent response budgets tighten. And geographically the directory is still a US/.com story: 4,720 of 5,294 verified stores ship under generic TLDs; the largest attributable ccTLD cohorts are .uk (229), .au (120), and .ca (66); continental Europe is under 2% by ccTLD (a floor, not a true distribution).

Capability coverage: the ceiling, and the edges

Capability	Verified adopters
`dev.ucp.shopping.checkout`	5,269
`dev.ucp.shopping.fulfillment`	5,264
`dev.ucp.shopping.catalog.lookup`	5,257
`dev.ucp.shopping.catalog.search`	5,256
`dev.ucp.shopping.order`	5,256
`dev.ucp.shopping.discount`	5,253
`dev.ucp.shopping.cart`	5,249
— the cliff —
`dev.ucp.common.identity_linking`	6
`dev.ucp.shopping.buyer_consent`	3
`dev.ucp.shopping.checkout.embedded`	2
`dev.ucp.shopping.ap2_mandate`	1
`dev.ucp.shopping.payment`	0

Identical pattern to March and April: the seven core shopping capabilities ship together as a Shopify-side bundle (~5,250 adopters each), then an 800× cliff. Identity linking: 6. AP2 mandate — the primitive that makes an agentic transaction auditably user-authorised — still 1 (houseofparfum.nl, WooCommerce, scoring 100). Payment capability: still 0. Of 5,294 verified stores, 5,161 (>99%) sit at Tier 2, one is Tier 3, one is Tier 4. The deeper primitives aren't slow-adopting, they're not adopting yet. When demand for AP2 turns into pressure (regulators, payment networks, the working group's eventual requirements), this number moves fast — the way checkout did once Shopify bundled it. Until then, "UCP store" means "agent-shoppable," not "mandate-credentialed."

Where the movement was: the edges of the spec

The new signals in May's data sit at the edges of the spec rather than its core. The first is in the capability namespace itself: below the standard dev.ucp.* entries, a handful of non-standard, vendor-prefixed capabilities are now appearing on real verified manifests:

com.pwc.accelerator.loyalty.rewards — 2 stores. PwC's agentic-commerce accelerator (more below).
com.appointedd.schedule / .booking / .intent — 1 store. Appointment-scheduling primitives — booking-vertical UCP, not retail.
com.woocommerce.ai_storefront — 1 store. A WooCommerce-specific storefront extension.
sh.agentscore.identity — 1 store. An identity primitive from a third party.
com.agoragentic.x402.checkout — 1 store. A checkout extension referencing x402 (the HTTP-402 micropayment pattern).

None is adopted at scale yet — 1–2 stores each, almost certainly vendors' own test deployments — but it's the first month the namespace long tail has held anything other than Shopify defaults. It's the leading indicator of a UCP extension ecosystem: third parties shipping vertical capabilities (loyalty, booking, identity, micropayments) on top of the core spec, a more realistic near-term diversification path than "another commerce platform ships a wave."

The PwC entry is worth pulling out, because it isn't a platform — it's a consultancy. PwC has launched an agentic-commerce accelerator: a practice that stands up custom UCP-enabled storefronts for enterprise clients, with its own capability extensions (the com.pwc.accelerator.* namespace) layered on the core spec. That's a third adoption channel, distinct from "platform ships a wave" and "developer hand-builds" — call it consulting-led. It's slower per engagement, but each accelerator that standardises on UCP arrives with a portfolio of enterprise clients attached. PwC is the leading edge; Deloitte, EY, KPMG, Accenture, McKinsey, BCG, and the systems integrators (Capgemini, IBM, TCS, Infosys) all face the same build-it-once, deploy-to-many incentive.

Transports and payment handlers: the monoculture, and the experiments tier

Transport	Verified declarations
MCP	5,258
Embedded	5,243
REST	47
A2A	2

MCP and Embedded are universal because Shopify declares both. REST shows up on 47 stores — the non-Shopify hand-builds, REST being the natural fit for anyone implementing without an MCP server. A2A (Google's Agent2Agent transport, formally added in v2026-04-08) holds at two. Payment handlers tell the same monoculture story: 5,250 verified stores declare Google Pay and 5,241 declare Shopify Card — the same shared Shopify-managed handler IDs we flagged in February as a single point of failure. Everything else is a rounding error. The payment partner ecosystem (Stripe, Adyen, Visa, Mastercard, PayPal, Affirm, Splitit — all on the registry) is mature on paper; the live handler declarations are two Shopify-managed IDs and a handful of experiments.

The experiments are the part worth zooming in on, because the same small set of builders is populating the spec's newer transport, its newer handler shapes, and its newer capability namespaces simultaneously. Both A2A adopters are agent-native rather than retail: one is an agent-identity storefront running pure A2A with a cryptographically signed manifest (JWS / EdDSA) and two custom payment handlers on crypto rails — an mpp rail on Tempo mainnet and an x402 rail on Base; the other is an agent-to-agent service exposed across MCP + A2A + REST, selling a USDC-priced audit via a com.agoragentic.x402 handler plus a direct USDC receive address. Both ship the custom capability namespaces flagged in the capability section above (sh.agentscore.identity, com.agoragentic.x402.checkout).

Separately, payment processors are starting to run dev UCP endpoints with fully custom handler integrations — their own handler IDs, their own init / verify protocol shapes, declared at v2026-04-08 over REST against real merchants, iterating against the Checker as they build. Still dev, not live, but for the first time the gap between the partner roster and the live handler declarations has something in it that's neither Shopify-default nor mock fixture — and it's coming from processors with the scale to move real merchant bases. Two data points in each direction don't make a trend, but the pattern is coherent: the spec's newer surfaces (A2A transport, custom handler shapes, third-party namespaces) are populated by a small set of builders doing novel work in parallel, while the core carries volume. That's the shape of a protocol leaving its launch phase.

How agents actually perform

The numbers above tell you which stores have UCP. This section is which stores work when an agent shops them. UCP Playground Evals passed 1,000 recorded agent sessions this month — and it's well past that now: a thousand-plus end-to-end agent shopping runs across 105 unique stores and 16 frontier models, totalling ~57M tokens, ~12 hours of cumulative agent runtime, and roughly $119,000 in aggregate cart value.

Outcomes: where the agent stops

Outcome	Sessions	Share
`checkout_reached`	475	37.9%
`search_only` (browsed, didn't cart)	344	27.4%
`failed` (provider error, refusal, max turns)	261	20.8%
`cart_created` (carted, didn't proceed)	172	13.7%

62% of sessions end without a completed checkout — and that ratio has stayed stable as the dataset grew, which is itself the finding. As we add models and stores, the shape of failure doesn't change: agents find products fine (search works nearly everywhere), build carts often, then ~14% of sessions stall at a cart that won't convert and ~21% fail outright (about half of those are variant-shape problems — the agent picks a variant ID the cart rejects and flails until it hits the turn limit). We dug into exactly that this month in UCP Variant Data: The #1 Reason Agent Checkouts Fail — the single largest categorisable cause of the gap between "has a manifest" and "agent can buy from it," and almost entirely fixable in the merchant's variant data without touching any tooling.

Model leaderboard

Checkout-conversion rate by model, from the UCP Playground model leaderboard — sessions where the agent reached a checkout URL ÷ total sessions for that model (the live leaderboard breaks out search, cart, and speed too):

Model	Sessions	Checkout %	Avg session	Vendor
Claude Sonnet 4.5	256	52.0%	~38 s	Anthropic
Llama 3.3 70B	75	49.3%	~48 s	Meta
DeepSeek V3.2	60	45.0%	~46 s	DeepSeek
Gemini 3 Flash	174	42.0%	~21 s	Google
Grok 4	53	39.6%	~77 s	xAI
Claude Opus 4.6	123	39.0%	~30 s	Anthropic
Gemini 2.5 Flash	125	36.0%	~12 s	Google
GPT-4o	63	31.7%	~15 s	OpenAI
Gemini 3.1 Pro	96	29.2%	~48 s	Google
Gemini 2.5 Pro	79	27.8%	~34 s	Google
GPT-5.2	63	20.6%	~36 s	OpenAI
DeepSeek R1	19	15.8%	~60 s	DeepSeek
o4-mini	21	14.3%	~42 s	OpenAI
Grok 3 Mini	21	9.5%	~57 s	xAI
QwQ 32B	25	0%	~61 s	Alibaba

Three things hold from April, plus one shift:

Search works everywhere. Checkout completion is the next frontier. Every model that runs to completion finds products. Checkout conversion ranges from 0% to 52% — a 50-point spread across the field, which is exactly where the work-to-do sits. The best model in the field completes checkout about half the time today; the headroom from there is the frontier the next quarter gets to push.

Reasoning-tuned models still underperform. QwQ 32B: 0% across 25 sessions. Grok 3 Mini: 9.5%. o4-mini: 14.3%. DeepSeek R1: 15.8%. Models that burn tokens on deliberation struggle with the fast, sequential, low-ambiguity tool-calling that shopping requires. Shopping rewards decisive, not thoughtful — true in April, true with 3× the data. (GPT-5.2 also lands below the median at 20.6%.)

Speed and success are decoupled. Gemini 2.5 Flash finishes a session in ~12 seconds; Grok 4 takes ~77. Their checkout rates are 36% and 40% — basically a wash. Being fast doesn't make you good at this; being slow doesn't either. The Claude models sit mid-pack on speed (~30–38 s) and top on conversion, which is the combination that actually matters when the agent is spending someone's money.

The shift: in April we reported DeepSeek V3.2 leading the composite shopping score. With ~3× the sessions, Claude Sonnet 4.5 is now clearly out front on checkout completion — 52% over 256 sessions, by far the largest sample — with Meta's Llama 3.3 70B the surprise second. Treat any single month's ranking as provisional until the eval dataset gets to the point — soon — where it stops being indicative and becomes authoritative.

The reliability gap, one more time

We've made this the editorial spine of every one of these reports, and the May data doesn't let us retire it. 98.9% of verified stores carry an A on UCP Score (5,235 of 5,294; the rest are 57 B's and two C's). By conformance, the directory is in excellent shape. But conformance isn't end-to-end agent-readiness, and that's the gap UCP Score doesn't grade.

A clean schema doesn't tell you whether the cart endpoint accepts the variant the agent picked, whether response-time budgets hold under load, whether payment-handler tokenisation completes inside the agent's timeout window, or whether the checkout URL drops the agent into an auth loop a browser would have handled with cookies. UCP Playground is the test harness developers use to exercise that second layer — replay sessions, probe edge cases, see exactly where an agent trips. By design it surfaces failure modes, not steady-state performance; treating Playground completion rates as a consumer-shopping success metric mis-reads the tool. But the categories of failure it surfaces — variant mismatch, slow tokenisation, malformed cart responses, checkout redirect loops — are real, and they're what separate an A-graded manifest from a store an agent can reliably transact against in production.

That's the gap we'd point a platform team at — and it isn't a percentage, it's a posture. The protocol's first phase, call it the first four months, was about getting the schema right, and the ecosystem did that. The next phase is the unglamorous second-order work: error recovery, schema robustness, response-time SLAs, variant-data hygiene, the long tail of edge cases that separate "manifest valid" from "agent transacts without anything tripping it up." That work is happening — the Playground sessions above are senior engineers doing exactly it. The open question is whether the posture spreads from the engineering teams already running this loop to the long tail of merchants still on bundled defaults. That's where the next quarter's competitive distance gets built.

The demand side: AI traffic is converting

For four months this report has focused on supply — which stores have UCP, what capabilities they declare, the shape of their manifests, what agents do against them in testing. On May 11 Shopify published its first real demand-side dataset, and the numbers reframe the urgency of everything above.

Across Shopify storefronts in Q1 2026, by Shopify's analysis:

AI-referred orders grew nearly 13× year-over-year. Referral sessions from AI chatbots (ChatGPT, Perplexity, Gemini, Copilot, Claude, Grok) grew more than 8× YoY.
AI-referred sessions convert at ~50% higher rates than organic search when they start on product pages.
Average order value is 14% higher for AI-referred than for organic-search orders.
More than half of AI-referred sessions start on a product detail page, vs ~20% for organic — "journey compression," the buyer arrives ready to buy because the AI did the research first.
AI-referred conversion outperforms organic SEO in 23 of 25 merchant categories.

Caveat: this is Shopify's analysis of Shopify storefronts with undisclosed methodology, so treat the precise numbers as Shopify-published rather than independently verified. But the direction is the story: agentic commerce isn't theoretical traffic any more. It's converting at premium rates, in volume, growing fast — and that's the demand signal that explains why every TC member is racing to ship at the productisation layer right now. Shopify Field CTO Sandy Jeong framed the operational work in three buckets: data readiness (machine-readable catalog with structured attributes), channel infrastructure (direct API syndication to AI platforms), and organisational alignment (a named DRI, not a committee). The teams that get those three right capture the 13× curve; the teams that don't watch it route around them.

Spec and ecosystem

Attribution landed in core. On May 5 the Technical Council merged a top-level attribution field into cart, checkout, catalog, and order operations — campaign IDs, click identifiers (gclid, fbclid, ttclid), source/medium markers, as an open string-keyed map. It's the first time advertising-and-measurement infrastructure has landed in UCP core, and the trajectory implication is the story: a protocol that carries attribution context is a protocol being built for commercial-scale deployment, not just technical demos.

The council expanded — and the regional question got sharper. Amazon, Meta, Microsoft, Salesforce, and Stripe joined the Technical Council at the end of April — a governance signal as much as an adoption one (none of the five has shipped a UCP store wave yet), but a notable one: the steering group now includes the company building the leading proprietary alternative (Amazon's "Buy for Me") and the company behind the leading rival protocol (Stripe, ACP). Convergence pressure, formalised.

Two German commerce trade publications picked up the expansion within a day of each other and used our breakdown of the 16-seat composition as a primary source: Exciting Commerce on April 27 (which drove the European enterprise retail audience UCP Alerts was built for), and Shoptechblog the next day. Both lead with the same regional point — "Keine Rolle spielen weiter europäische und asiatische Unternehmen" ("European and Asian companies continue to play no role") — and Shoptechblog adds the analytical layer: the new members sent senior engineers and architects rather than C-suite executives (implementation work, not press); each company's participation reads as defensive; and the real contest isn't the standardised protocol but the layers above it — ranking, paid placement, customer ownership. Which is exactly why attribution-in-core is more than plumbing: it's the first of those upper layers getting wired into the spec itself.

Two TC members shipped at the productisation layer. The contest moving up the stack got two concrete examples this month. On May 5 Google expanded UCP-powered checkout out of AI Mode into the main shopping section of standard Search results, with Wayfair the first live retailer on the new surface — a "Buy" button on listings inside Google Search itself, Google Pay tokenisation, checkout completing without leaving the page. Zero-click search results just became zero-click purchases. The two-track adoption story we drew in February has its first major convergence event.

Google AI Mode shopping flow on Wayfair: AI Mode query, product detail with Buy button, Google Pay order review, order complete confirmation — Google's UCP-powered checkout flow on Wayfair: AI Mode query → product page with Buy button → Google Pay review → order complete. Source: Google.

Shopify, separately, started rolling out an Agentic Storefronts dashboard in merchant admin this week (live docs) — surfaces ChatGPT / Microsoft Copilot / AI Mode traffic, offers an "Allow Shopify to manage for me" toggle that auto-generates the AI-readability files (llms.txt, llms-full.txt, agents.md) for stores that opt in. The dashboard is protocol-agnostic: it covers ChatGPT (ACP), Copilot, and UCP-powered Search inside one admin view. UCP is one of the protocols Shopify is now monetising on the agentic-readiness layer, not the whole product. For Shopify it's the natural next step after the v2026-04-08 fleet migration; for everyone else watching the head start, it's the answer to what the next phase of it looks like.

Shopify Agentic Storefronts dashboard in merchant admin showing 2,060 agentic sessions and $6,447 earned in the last 30 days, split by ChatGPT, Microsoft Copilot, and Shop Channel, with an 'Allow Shopify to manage for me' toggle and agentic readiness checklist — Shopify Agentic Storefronts in merchant admin — ChatGPT / Microsoft Copilot / Shop Channel split, "Allow Shopify to manage for me" toggle, agentic-readiness checklist.

A potential spec gap, still being validated. In the variant-data guide we noted that v2026-04-08 makes variant.options[] optional even on products where product.options[] is non-empty and there are multiple variants — meaning two fully spec-compliant manifests can produce identical-looking payloads where one is unambiguous and the other is agent-unresolvable. The candidate fix would be a conditional MUST ("when product.options is non-empty and variants.length > 1, every variant MUST populate options[]"). It's a working hypothesis from one analysis, not a filed proposal — we want to sweep more of the live dataset for real-world incidence and check the edge cases (single-variant simple products, productGroup behaviour, platforms that already populate options by default) before raising it formally. If the pattern holds, it's a candidate for a future minor release.

No v2026-05. v2026-04-08 remains current. On the cadence so far, the next minor release more likely lands late summer (a notional v2026-08), probably bundling AP2 mandate refinements, schema corrections shaken out by running validators against thousands of real stores, and whatever the council formalises over the next two months. On the partner side: the registry now lists 61 merchants, 11 agents, and 8 extensions; the payment-handler roster (Adyen, Amex, Mastercard, Stripe, Visa, Checkout.com, Affirm, Splitit, PayPal) is unchanged and still almost entirely unrepresented in live manifest declarations.

What we shipped — and what developers are doing with it

UCP Variant Data: The #1 Reason Agent Checkouts Fail — the five variant-data anti-patterns, what clean variant data looks like, and the spec gap that lets compliant stores still be broken.
How to Test Your UCP Implementation — the three-layer validation workflow: static audit, live agent test, continuous monitoring.
UCP Score is doing exactly what it was built to do. This is the one we're proudest of this quarter. The Score turns "is my manifest agent-ready?" into a concrete, category-by-category checklist — and developers are using it that way: we've watched a failing manifest climb to an A grade in the space of a few hours, the developer iterating against the score breakdown between checks. That's the loop it was designed for, and it's now the loop it runs.
UCP Playground got sharper as a development tool. Two halves of the same loop: the agent-inspection tooling — replay any session, see the exact tool call where an agent tripped — and the runtime shopping evals, now past 1,000 recorded sessions and more than 12 hours of cumulative agent runtime against real stores. Together they take the build → test → fix cycle for an agent-ready storefront down from a sprint to an afternoon. Every improvement that got us there is in the changelog.
Crawler throughput — we roughly tripled the hourly crawl rate in early May (and added per-IP and global throttles to the expensive public routes so the directory stays fast under load). That's what moved the discovery curve this month.

What to watch in June

Second adopters at every edge. May produced first adopters across multiple novel patterns — non-Shopify platforms shipping UCP (Bareconnect, Selly.io), a consultancy-built accelerator (PwC), non-default payment-handler integrations (the processors in dev), AP2 mandate (still one), third-party capability namespaces (each at 1–2 stores). The diagnostic for June is whether any doubles up. Each is a distinct watch item; the meta-question is the same: did May's first adopters survive contact with month two?

Google's next live partner on main Search. Wayfair is first up on Google's UCP-checkout expansion into standard Search results. The other co-developing TC retailers — Etsy, Target, Walmart — are the next-most-likely to follow. The cadence of those rollouts is the diagnostic for how fast Google is willing to push agent-completed transactions onto its highest-traffic surface.

The platform-level integration question. SFCC, Adobe Commerce, Wix, Squarespace — any of them shipping a platform-level UCP integration is still the single highest-impact possible event, and still hasn't happened. The one-platform structure is four months old.

Whether the eval leaderboard holds its shape. Claude Sonnet 4.5 leads checkout completion on the largest sample; Llama 3.3 70B is the surprise second. Another month of sessions either confirms that or reshuffles it.

Sources

All data is from the UCP Checker crawler (re-checks every tracked domain at least every 24 hours) and UCP Playground's eval sessions, as of May 12, 2026. The verified-merchant dataset is published monthly on Hugging Face under CC-BY 4.0; the same data, a public REST API, the bulk checker, and the rest of our developer tools are all ungated.

Browse the directory: ucpchecker.com/directory
Track adoption live: ucpchecker.com/stats
Run a UCP Score: ucpchecker.com/score
Model + store leaderboard: ucpplayground.com/evals
Public dataset, REST API & developer tools: ucpchecker.com/developer-tools
Previous report: State of Agentic Commerce — April 2026

External coverage cited in this report:

Jochen Krisch, "Amazon schließt sich Googles Universal Commerce Protocol an," Exciting Commerce, April 27, 2026
Roman Zenner, "Agentic Commerce: Das UCP Council wächst," Shoptechblog, April 28, 2026
Google expands UCP Checkout to main Search shopping results, Search Engine Land, May 2026
New tech and tools for retailers to succeed in an agentic shopping era, Google blog (Ads & Commerce)
Shopify Agentic commerce developer docs — Agentic Storefronts, llms.txt, llms-full.txt, agents.md reference
What Shopify checks for agentic readiness, WISLR Research
Kyle Risley, "AI-referred shoppers convert better and spend more (2026)", Shopify Enterprise Blog, May 11, 2026

DEV Community