<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Benji Fisher</title>
    <description>The latest articles on DEV Community by Benji Fisher (@benjifisher).</description>
    <link>https://dev.to/benjifisher</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3787687%2F0c8176d8-b238-43f2-b0af-71689e955123.jpg</url>
      <title>DEV Community: Benji Fisher</title>
      <link>https://dev.to/benjifisher</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/benjifisher"/>
    <language>en</language>
    <item>
      <title>How to Test Your UCP Implementation with AI Agents</title>
      <dc:creator>Benji Fisher</dc:creator>
      <pubDate>Fri, 15 May 2026 09:11:04 +0000</pubDate>
      <link>https://dev.to/benjifisher/how-to-test-your-ucp-implementation-with-ai-agents-180g</link>
      <guid>https://dev.to/benjifisher/how-to-test-your-ucp-implementation-with-ai-agents-180g</guid>
      <description>&lt;p&gt;You ship a UCP manifest. The validator returns green. The schema parses cleanly. Every required field is present, every URL resolves, every transport responds. You declare the work done and move on.&lt;/p&gt;

&lt;p&gt;Three weeks later, you find out your store has been quietly failing every agent shopping session. The cart endpoint accepts adds but rejects checkouts. A specific variant ID throws a 400 on &lt;code&gt;update_cart&lt;/code&gt;. The agent reaches &lt;code&gt;ready_for_complete&lt;/code&gt; and stalls because your payment handler doesn't recognise the token format. None of these issues showed up in static validation. All of them block real users on agent-mediated flows.&lt;/p&gt;

&lt;p&gt;This post is about how to actually test your UCP implementation — not as a schema document, but as a runtime surface that real frontier agents have to operate against. The short version: &lt;strong&gt;schema validation is necessary but not sufficient&lt;/strong&gt;. The long version is the rest of this post.&lt;/p&gt;

&lt;h2&gt;
  
  
  What validators catch and what they miss
&lt;/h2&gt;

&lt;p&gt;A UCP validator (including ours, &lt;a href="https://ucpchecker.com/ucp-validator" rel="noopener noreferrer"&gt;the validator at ucpchecker.com/ucp-validator&lt;/a&gt;) checks structural things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Manifest is valid JSON&lt;/li&gt;
&lt;li&gt;Required fields are present (&lt;code&gt;spec&lt;/code&gt;, &lt;code&gt;services&lt;/code&gt;, &lt;code&gt;signing_keys&lt;/code&gt;, etc.)&lt;/li&gt;
&lt;li&gt;Declared spec version is one we recognise&lt;/li&gt;
&lt;li&gt;Transport endpoints return non-error responses&lt;/li&gt;
&lt;li&gt;Schema URLs resolve&lt;/li&gt;
&lt;li&gt;Capability namespaces match the spec catalogue&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are the things you can verify without actually running an agent flow against the store. They're table-stakes, and the &lt;a href="https://ucpchecker.com/blog/introducing-ucp-score-agent-readiness-grade" rel="noopener noreferrer"&gt;UCP Score&lt;/a&gt; bakes them into the structural-conformance dimension of its grade.&lt;/p&gt;

&lt;p&gt;What static validation doesn't catch:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Whether &lt;code&gt;update_cart&lt;/code&gt; rejects valid variant IDs intermittently&lt;/li&gt;
&lt;li&gt;Whether the cart endpoint's success response contains the line items it claims to contain&lt;/li&gt;
&lt;li&gt;Whether the checkout flow surfaces the buyer-specific payment instruments your customer can actually use&lt;/li&gt;
&lt;li&gt;Whether your &lt;code&gt;search_catalog&lt;/code&gt; returns more than 8 KB of HTML in a &lt;code&gt;description&lt;/code&gt; field that crashes Claude's tool-calling layer&lt;/li&gt;
&lt;li&gt;Whether two different models pick the same variant ID for "Medium" against your product (the &lt;a href="https://ucpchecker.com/blog/ucp-variant-data-guide" rel="noopener noreferrer"&gt;variant-data problem&lt;/a&gt; we cover separately)&lt;/li&gt;
&lt;li&gt;Whether the agent can recover when one of your tool calls returns a 500 mid-flow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are runtime properties. They only surface when you run an actual agent against an actual checkout. And they're where the gap between "store passes validation" and "agent can buy" lives. The April &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-april-2026" rel="noopener noreferrer"&gt;State of Agentic Commerce report&lt;/a&gt; sized that gap concretely: of 4,014 verified UCP stores, only &lt;strong&gt;9 delivered a flawless end-to-end agent experience&lt;/strong&gt;. A 0.2% flawless rate against a 98%+ conformance rate. The runtime gap is the gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  The three-layer testing pyramid
&lt;/h2&gt;

&lt;p&gt;The right way to test UCP is not "validator or no validator" — it's three layers, each catching a different class of problem, in increasing order of cost and fidelity:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Catches&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;1. Schema validation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://ucpchecker.com/ucp-validator" rel="noopener noreferrer"&gt;&lt;code&gt;/ucp-validator&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Manifest parse errors, missing required fields, malformed URLs&lt;/td&gt;
&lt;td&gt;seconds, free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2. Capability score&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;&lt;code&gt;/score&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Surface signals, declared capabilities, transport reachability, robots/sitemap hygiene&lt;/td&gt;
&lt;td&gt;seconds, free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;3. Live agent eval&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/" rel="noopener noreferrer"&gt;UCP Playground&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Variant resolution, cart/checkout shape, error recovery, multi-model behaviour, attribution flow&lt;/td&gt;
&lt;td&gt;dollars per session, paid&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each layer feeds the next. If layer 1 fails, layer 2 has nothing to score. If layer 2 reports gaps, layer 3 will find them magnified in real agent runs. Skipping layers wastes layer 3's time on bugs the cheaper layers would have caught — that's the case for running them in order rather than going straight to live agents.&lt;/p&gt;

&lt;p&gt;Most teams stop at layer 2. &lt;strong&gt;Stopping at layer 2 is what produces the 99.8%-conformant / 0.2%-flawless gap.&lt;/strong&gt; A clean Score gets you to "the agent has a fair chance." A clean Score plus a clean eval gets you to "the agent reliably completes the flow you care about."&lt;/p&gt;

&lt;h2&gt;
  
  
  What live agent testing actually looks like
&lt;/h2&gt;

&lt;p&gt;Layer 3 is where most readers are unfamiliar, so this section walks through what running an agent test against your own store actually involves.&lt;/p&gt;

&lt;p&gt;The shape: you point a frontier agent (Claude, GPT, Gemini, Grok, Llama — whichever model you want to evaluate against) at your store's UCP manifest endpoint and give it a multi-turn shopping prompt. The agent does what an agent does — discovers your tools via the manifest, calls &lt;code&gt;search_catalog&lt;/code&gt; against your products, evaluates the results, picks something, calls &lt;code&gt;update_cart&lt;/code&gt;, navigates checkout. The framework records every tool call, every response, every model decision, the full token-by-token event stream.&lt;/p&gt;

&lt;p&gt;At the end of the session you get a structured report:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Did the agent reach &lt;code&gt;checkout_reached&lt;/code&gt; (full transaction completion)?&lt;/li&gt;
&lt;li&gt;Or did it stop at &lt;code&gt;cart_created&lt;/code&gt;, &lt;code&gt;search_only&lt;/code&gt;, or &lt;code&gt;failed&lt;/code&gt;?&lt;/li&gt;
&lt;li&gt;How many tool calls did it make? How many succeeded? Which ones errored?&lt;/li&gt;
&lt;li&gt;How many tokens did the model consume?&lt;/li&gt;
&lt;li&gt;How long did the session take?&lt;/li&gt;
&lt;li&gt;If the agent failed, why?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's the data layer 1 and layer 2 can't produce. &lt;strong&gt;Schema validation tells you what your store says; agent eval tells you what an agent does with what your store says.&lt;/strong&gt; They're answering different questions.&lt;/p&gt;

&lt;p&gt;For most stores, the first eval session is uncomfortable. The agent picks the wrong variant. Or it adds something to the cart and then stalls because the response shape isn't quite what it expected. Or it reaches &lt;code&gt;ready_for_complete&lt;/code&gt; and can't move forward because your payment-handler declaration doesn't match what the agent has been trained to handle. Each of those is a fix you can make, and each fix lifts your real conversion rate the next time an actual user-facing agent shops your store.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why testing on one model isn't enough
&lt;/h2&gt;

&lt;p&gt;A useful pattern from the &lt;a href="https://ucpchecker.com/blog/ucp-playground-1000-agent-sessions" rel="noopener noreferrer"&gt;Playground 1,000-session dataset&lt;/a&gt;: the same store gets meaningfully different outcomes across different models. A store that completes checkout 65% of the time on Claude Sonnet 4.5 might complete only 18% of the time on GPT-5.2 — the same UCP implementation, the same shopping prompt, just a different model.&lt;/p&gt;

&lt;p&gt;That spread isn't because one model is "better." It's because each frontier model has its own quirks in how it handles tool calls, schemas, error responses, and ambiguous data. Models differ on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How they handle empty arrays vs missing fields&lt;/li&gt;
&lt;li&gt;Whether they follow up on a 4xx response or move on&lt;/li&gt;
&lt;li&gt;How aggressively they retry failed tool calls&lt;/li&gt;
&lt;li&gt;How they parse multi-line strings in description fields&lt;/li&gt;
&lt;li&gt;Whether they pass through optional metadata fields verbatim&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The real-world implication: &lt;strong&gt;your customers don't all use the same agent&lt;/strong&gt;. Some use ChatGPT-routed flows; some use Anthropic's; some use Google AI Mode; some use a custom agent built on Llama. Testing against just one model means catching only the bugs that one model surfaces, while shipping silent failures to everyone using a different one. Multi-model coverage is what gets you from "this passes for our internal demo" to "this works for real customer traffic."&lt;/p&gt;

&lt;p&gt;UCP Playground supports head-to-head testing across 15+ frontier models. The &lt;a href="https://ucpplayground.com/models/compare?models=claude-sonnet-4-5%2Cgpt-5-2" rel="noopener noreferrer"&gt;comparison view&lt;/a&gt; lets you run the same store against any two models on the same workload. We'd suggest at minimum testing against:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One Anthropic model (Claude Opus or Sonnet)&lt;/li&gt;
&lt;li&gt;One OpenAI model (GPT-5.2 or GPT-4o)&lt;/li&gt;
&lt;li&gt;One Google model (Gemini 3.1 Pro or 2.5 Flash)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Three models cover most of the deployed-agent universe. If any of the three behaves badly against your store, you have a real problem worth fixing before more traffic arrives.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wiring tests into your deploy pipeline
&lt;/h2&gt;

&lt;p&gt;Manual eval is fine for one-off audits. If you're shipping changes regularly, you want this in CI. The Playground exposes a &lt;a href="https://ucpplayground.com/" rel="noopener noreferrer"&gt;headless API&lt;/a&gt; for exactly that:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST /api/v1/collections          — define a test (sequence of prompts + models + stores)
POST /api/v1/collections/{id}/run — trigger the test
GET  /api/v1/collection-runs/{id} — poll status + results
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pattern most teams ship first: a deploy-time test that triggers an eval after every UCP-related code change, asserts on key metrics, and fails the build if any of them regress. A reasonable assertion shape:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .github/workflows/ucp-eval.yml&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run UCP eval&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;curl -X POST $PLAYGROUND_API/v1/collections/$COLLECTION_ID/run \\&lt;/span&gt;
      &lt;span class="s"&gt;-H "Authorization: Bearer $PLAYGROUND_TOKEN"&lt;/span&gt;
    &lt;span class="s"&gt;# Poll, then assert:&lt;/span&gt;
    &lt;span class="s"&gt;# - checkout_rate &amp;gt;= 80&lt;/span&gt;
    &lt;span class="s"&gt;# - errors.total == 0&lt;/span&gt;
    &lt;span class="s"&gt;# - avg_duration_ms &amp;lt; 30000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same shape as Lighthouse CI for web performance. A regression catch you bolt onto your pipeline rather than rediscover in production. The &lt;a href="https://ucpchecker.com/blog/ucp-playground-evals" rel="noopener noreferrer"&gt;UCP Playground Evals launch post&lt;/a&gt; walks through the full pattern with a worked example.&lt;/p&gt;

&lt;h2&gt;
  
  
  The order to do this in
&lt;/h2&gt;

&lt;p&gt;If you're starting from a fresh UCP implementation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Run the &lt;a href="https://ucpchecker.com/ucp-validator" rel="noopener noreferrer"&gt;validator&lt;/a&gt;&lt;/strong&gt; against your manifest. Fix any structural errors. This is the cheapest layer; do it first.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Get a &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;UCP Score&lt;/a&gt;&lt;/strong&gt; for your domain. Aim for B+ (70+) before moving to live testing. Below that, you have surface-level gaps that'll dominate the eval results and waste your test budget.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run a &lt;a href="https://ucpplayground.com/evals" rel="noopener noreferrer"&gt;Playground eval&lt;/a&gt;&lt;/strong&gt; against your store with two different frontier models on a single shopping sequence. Fix whatever fails. Common first-time failures: variant-data ambiguity, response-shape inconsistencies, tool argument validation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Expand to three models&lt;/strong&gt; once your single-model baseline works. Multi-model coverage is what catches the long-tail issues.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wire the eval into CI&lt;/strong&gt; once your implementation is stable. From this point on, every code change that touches UCP runs against real agents before it ships.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you've already got a UCP implementation in production and are trying to figure out why agents aren't completing checkouts, skip step 2 and go straight to step 3. The eval will show you the specific failure mode, and you can backfill the score work later.&lt;/p&gt;

&lt;h2&gt;
  
  
  What good looks like
&lt;/h2&gt;

&lt;p&gt;A store that's passed all three layers cleanly looks like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Validator&lt;/strong&gt;: green&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Score&lt;/strong&gt;: A grade (85+) across Discovery, Conformance, and Capability Coverage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Eval&lt;/strong&gt;: 80%+ checkout rate against Claude Sonnet 4.5, Gemini 3 Flash, and one other model of your choice; &amp;lt;5s average tool-call latency; zero categorised errors across at least 20 sessions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's the bar. The &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-april-2026" rel="noopener noreferrer"&gt;State of Agentic Commerce&lt;/a&gt; is tracking how many stores hit that bar — currently fewer than 1% of verified stores. The work to get from 99% conformance to 1% bar-clearing is mostly testing work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Validator&lt;/strong&gt; (free, instant): &lt;a href="https://ucpchecker.com/ucp-validator" rel="noopener noreferrer"&gt;ucpchecker.com/ucp-validator&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Score&lt;/strong&gt; (free, instant): &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;ucpchecker.com/score&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live agent eval&lt;/strong&gt; (paid per session): &lt;a href="https://ucpplayground.com/evals" rel="noopener noreferrer"&gt;ucpplayground.com/evals&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-model comparison view&lt;/strong&gt;: &lt;a href="https://ucpplayground.com/models" rel="noopener noreferrer"&gt;ucpplayground.com/models&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CI-ready eval API&lt;/strong&gt;: documented at &lt;a href="https://ucpplayground.com/" rel="noopener noreferrer"&gt;ucpplayground.com&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Schema validation is necessary. It is not sufficient. The agents your customers use will run real flows against your store, and the only way to know whether those flows succeed is to run them yourself first.&lt;/p&gt;

&lt;p&gt;Test before they do.&lt;/p&gt;

</description>
      <category>ecommerce</category>
      <category>webdev</category>
      <category>tutorial</category>
      <category>ucp</category>
    </item>
    <item>
      <title>The State of Agentic Commerce — May 2026</title>
      <dc:creator>Benji Fisher</dc:creator>
      <pubDate>Thu, 14 May 2026 09:45:37 +0000</pubDate>
      <link>https://dev.to/benjifisher/the-state-of-agentic-commerce-may-2026-500g</link>
      <guid>https://dev.to/benjifisher/the-state-of-agentic-commerce-may-2026-500g</guid>
      <description>&lt;p&gt;In April, the story was a platform pulling a lever: Shopify migrated its entire UCP fleet to v2026-04-08 in four days, BigCommerce showed up with three stores, and we said the question for May was &lt;em&gt;which platform ships next&lt;/em&gt; — because every prior jump in the directory had been a step function caused by a platform-level deployment.&lt;/p&gt;

&lt;p&gt;May's answer: none, and it didn't matter. No platform shipped a UCP wave this month. BigCommerce still has three verified stores. WooCommerce still has three. Salesforce Commerce Cloud still has none verified, though a custom build is reportedly in development. And the directory still grew ~32% — the same rate as April — because the &lt;em&gt;baseline&lt;/em&gt; discovery rate stepped up. For the first time since we started this report, UCP grew on a slope instead of a staircase.&lt;/p&gt;

&lt;p&gt;This is the fourth monthly state-of-the-ecosystem report from UCP Checker. Here's what the data says as of May 12, 2026.&lt;/p&gt;

&lt;h2&gt;
  
  
  The numbers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;5,294&lt;/strong&gt; &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;verified UCP stores&lt;/a&gt; (up from 4,014 in April, &lt;strong&gt;+32%&lt;/strong&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5,892&lt;/strong&gt; total domains tracked&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1,829&lt;/strong&gt; new merchants discovered this month; &lt;strong&gt;775&lt;/strong&gt; this week alone&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5,264&lt;/strong&gt; verified stores on the latest &lt;a href="https://ucpchecker.com/specs/2026-04-08" rel="noopener noreferrer"&gt;v2026-04-08 spec&lt;/a&gt; (99.4%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5,235&lt;/strong&gt; verified stores at A grade on &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;UCP Score&lt;/a&gt; (98.9%)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Three consecutive months of ~30% growth is a real curve now, not a launch artifact. But the &lt;em&gt;shape&lt;/em&gt; changed. February was discovery (first 1,000 Shopify stores). March was expansion (crossed 3,000, first non-Shopify manifests). April was consolidation (the four-day Shopify spec migration). May is the first month where the headline growth came from neither a new platform nor a spec event — it came from crawler optimisations we shipped in early May. The stores were always out there; we just got faster at finding them.&lt;/p&gt;

&lt;p&gt;That distinction matters for forecasting. If May's growth had been platform-driven, you'd model the next jump as "wait for SFCC." Since it's discovery-rate-driven, the model is different: the directory keeps filling at a steady clip until either we exhaust the discoverable Shopify long tail, or a platform finally ships a wave and the staircase resumes. Both will happen; the order is the open question.&lt;/p&gt;

&lt;h2&gt;
  
  
  Shopify's head start, four months in
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Monitored&lt;/th&gt;
&lt;th&gt;Verified&lt;/th&gt;
&lt;th&gt;Verified %&lt;/th&gt;
&lt;th&gt;Avg score (verified)&lt;/th&gt;
&lt;th&gt;Avg manifest latency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpchecker.com/platforms/shopify" rel="noopener noreferrer"&gt;Shopify&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;5,242&lt;/td&gt;
&lt;td&gt;5,241&lt;/td&gt;
&lt;td&gt;~100%&lt;/td&gt;
&lt;td&gt;92.5&lt;/td&gt;
&lt;td&gt;178 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpchecker.com/platforms/custom" rel="noopener noreferrer"&gt;Custom &amp;amp; Headless&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;642&lt;/td&gt;
&lt;td&gt;45&lt;/td&gt;
&lt;td&gt;7.0%&lt;/td&gt;
&lt;td&gt;83.0&lt;/td&gt;
&lt;td&gt;356 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpchecker.com/platforms/woocommerce" rel="noopener noreferrer"&gt;WooCommerce&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;92.3&lt;/td&gt;
&lt;td&gt;1,023 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpchecker.com/platforms/bigcommerce" rel="noopener noreferrer"&gt;BigCommerce&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;88.3&lt;/td&gt;
&lt;td&gt;993 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpchecker.com/platforms/magento" rel="noopener noreferrer"&gt;Magento&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;85.0&lt;/td&gt;
&lt;td&gt;218 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpchecker.com/platforms/prestashop" rel="noopener noreferrer"&gt;PrestaShop&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;84.0&lt;/td&gt;
&lt;td&gt;548 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Shopify is 99% of the verified directory — unchanged from April. Every non-Shopify platform combined sums to 53 verified stores, the same as last month. The head start is still the dominant signal in the data, and the Custom &amp;amp; Headless cohort is the mirror image: 642 domains attempted UCP, only 45 got to verified (a 7% completion rate). When a platform hands you the boilerplate, you compound; when you build it yourself, most attempts stall before validation. That's a tooling gap, not a spec problem.&lt;/p&gt;

&lt;p&gt;The more interesting movement came from two more platforms shipping UCP support — &lt;strong&gt;Bareconnect&lt;/strong&gt; and &lt;strong&gt;Selly.io&lt;/strong&gt; — both of which already have verified stores live in the directory today, not roadmap promises. The numbers are still small. How either platform is exposing UCP (default for every storefront, opt-in, or a paid tier) decides whether this stays a handful or turns into a wave — that detail we don't know yet. But it's the first new platform movement since the Shopify migration.&lt;/p&gt;

&lt;p&gt;Two structural notes on the table. BigCommerce and WooCommerce manifests run ~1 second versus Shopify's 178 ms because they're served from the storefront origin rather than a CDN-cached endpoint — a meaningful handicap as agent response budgets tighten. And geographically the directory is still a US/&lt;code&gt;.com&lt;/code&gt; story: 4,720 of 5,294 verified stores ship under generic TLDs; the largest attributable ccTLD cohorts are &lt;code&gt;.uk&lt;/code&gt; (229), &lt;code&gt;.au&lt;/code&gt; (120), and &lt;code&gt;.ca&lt;/code&gt; (66); continental Europe is under 2% by ccTLD (a floor, not a true distribution).&lt;/p&gt;

&lt;h2&gt;
  
  
  Capability coverage: the ceiling, and the edges
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Verified adopters&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dev.ucp.shopping.checkout&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;5,269&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dev.ucp.shopping.fulfillment&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;5,264&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dev.ucp.shopping.catalog.lookup&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;5,257&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dev.ucp.shopping.catalog.search&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;5,256&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dev.ucp.shopping.order&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;5,256&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dev.ucp.shopping.discount&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;5,253&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dev.ucp.shopping.cart&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;5,249&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;— the cliff —&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dev.ucp.common.identity_linking&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dev.ucp.shopping.buyer_consent&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dev.ucp.shopping.checkout.embedded&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dev.ucp.shopping.ap2_mandate&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dev.ucp.shopping.payment&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Identical pattern to March and April: the seven core shopping capabilities ship together as a Shopify-side bundle (~5,250 adopters each), then an 800× cliff. Identity linking: 6. AP2 mandate — the primitive that makes an agentic transaction auditably user-authorised — still 1 (&lt;a href="https://ucpchecker.com/status/houseofparfum.nl" rel="noopener noreferrer"&gt;houseofparfum.nl&lt;/a&gt;, WooCommerce, scoring 100). Payment capability: still 0. Of 5,294 verified stores, &lt;strong&gt;5,161 (&amp;gt;99%) sit at Tier 2&lt;/strong&gt;, one is Tier 3, one is Tier 4. The deeper primitives aren't slow-adopting, they're &lt;em&gt;not adopting yet&lt;/em&gt;. When demand for AP2 turns into pressure (regulators, payment networks, the working group's eventual requirements), this number moves fast — the way checkout did once Shopify bundled it. Until then, "UCP store" means "agent-shoppable," not "mandate-credentialed."&lt;/p&gt;

&lt;h3&gt;
  
  
  Where the movement was: the edges of the spec
&lt;/h3&gt;

&lt;p&gt;The new signals in May's data sit at the edges of the spec rather than its core. The first is in the capability namespace itself: below the standard &lt;code&gt;dev.ucp.*&lt;/code&gt; entries, a handful of non-standard, vendor-prefixed capabilities are now appearing on real verified manifests:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;com.pwc.accelerator.loyalty.rewards&lt;/code&gt; — 2 stores. PwC's agentic-commerce accelerator (more below).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;com.appointedd.schedule&lt;/code&gt; / &lt;code&gt;.booking&lt;/code&gt; / &lt;code&gt;.intent&lt;/code&gt; — 1 store. Appointment-scheduling primitives — booking-vertical UCP, not retail.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;com.woocommerce.ai_storefront&lt;/code&gt; — 1 store. A WooCommerce-specific storefront extension.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sh.agentscore.identity&lt;/code&gt; — 1 store. An identity primitive from a third party.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;com.agoragentic.x402.checkout&lt;/code&gt; — 1 store. A checkout extension referencing x402 (the HTTP-402 micropayment pattern).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None is adopted at scale yet — 1–2 stores each, almost certainly vendors' own test deployments — but it's the first month the namespace long tail has held anything other than Shopify defaults. It's the leading indicator of a UCP &lt;em&gt;extension&lt;/em&gt; ecosystem: third parties shipping vertical capabilities (loyalty, booking, identity, micropayments) on top of the core spec, a more realistic near-term diversification path than "another commerce platform ships a wave."&lt;/p&gt;

&lt;p&gt;The PwC entry is worth pulling out, because it isn't a platform — it's a consultancy. PwC has launched an &lt;strong&gt;agentic-commerce accelerator&lt;/strong&gt;: a practice that stands up custom UCP-enabled storefronts for enterprise clients, with its own capability extensions (the &lt;code&gt;com.pwc.accelerator.*&lt;/code&gt; namespace) layered on the core spec. That's a third adoption channel, distinct from "platform ships a wave" and "developer hand-builds" — call it &lt;strong&gt;consulting-led&lt;/strong&gt;. It's slower per engagement, but each accelerator that standardises on UCP arrives with a portfolio of enterprise clients attached. PwC is the leading edge; Deloitte, EY, KPMG, Accenture, McKinsey, BCG, and the systems integrators (Capgemini, IBM, TCS, Infosys) all face the same build-it-once, deploy-to-many incentive.&lt;/p&gt;

&lt;h2&gt;
  
  
  Transports and payment handlers: the monoculture, and the experiments tier
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Transport&lt;/th&gt;
&lt;th&gt;Verified declarations&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;MCP&lt;/td&gt;
&lt;td&gt;5,258&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Embedded&lt;/td&gt;
&lt;td&gt;5,243&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;REST&lt;/td&gt;
&lt;td&gt;47&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A2A&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;MCP and Embedded are universal because Shopify declares both. REST shows up on 47 stores — the non-Shopify hand-builds, REST being the natural fit for anyone implementing without an MCP server. A2A (Google's Agent2Agent transport, formally added in v2026-04-08) holds at two. Payment handlers tell the same monoculture story: &lt;strong&gt;5,250 verified stores declare Google Pay and 5,241 declare Shopify Card&lt;/strong&gt; — the same shared Shopify-managed handler IDs we &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-february-2026" rel="noopener noreferrer"&gt;flagged in February&lt;/a&gt; as a single point of failure. Everything else is a rounding error. The payment &lt;em&gt;partner&lt;/em&gt; ecosystem (Stripe, Adyen, Visa, Mastercard, PayPal, Affirm, Splitit — all on the &lt;a href="https://ucpregistry.com" rel="noopener noreferrer"&gt;registry&lt;/a&gt;) is mature on paper; the live &lt;em&gt;handler declarations&lt;/em&gt; are two Shopify-managed IDs and a handful of experiments.&lt;/p&gt;

&lt;p&gt;The experiments are the part worth zooming in on, because the same small set of builders is populating the spec's newer transport, its newer handler shapes, and its newer capability namespaces simultaneously. Both A2A adopters are agent-native rather than retail: one is an agent-identity storefront running pure A2A with a cryptographically signed manifest (JWS / EdDSA) and two custom payment handlers on crypto rails — an &lt;code&gt;mpp&lt;/code&gt; rail on Tempo mainnet and an &lt;code&gt;x402&lt;/code&gt; rail on Base; the other is an agent-to-agent service exposed across MCP + A2A + REST, selling a USDC-priced audit via a &lt;code&gt;com.agoragentic.x402&lt;/code&gt; handler plus a direct USDC receive address. Both ship the custom capability namespaces flagged in the capability section above (&lt;code&gt;sh.agentscore.identity&lt;/code&gt;, &lt;code&gt;com.agoragentic.x402.checkout&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;Separately, payment processors are starting to run dev UCP endpoints with fully custom handler integrations — their own handler IDs, their own &lt;code&gt;init&lt;/code&gt; / &lt;code&gt;verify&lt;/code&gt; protocol shapes, declared at v2026-04-08 over REST against real merchants, iterating against the &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;Checker&lt;/a&gt; as they build. Still dev, not live, but for the first time the gap between the partner roster and the live handler declarations has something in it that's neither Shopify-default nor mock fixture — and it's coming from processors with the scale to move real merchant bases. Two data points in each direction don't make a trend, but the &lt;em&gt;pattern&lt;/em&gt; is coherent: the spec's newer surfaces (A2A transport, custom handler shapes, third-party namespaces) are populated by a small set of builders doing novel work in parallel, while the core carries volume. That's the shape of a protocol leaving its launch phase.&lt;/p&gt;

&lt;h2&gt;
  
  
  How agents actually perform
&lt;/h2&gt;

&lt;p&gt;The numbers above tell you which stores &lt;em&gt;have&lt;/em&gt; UCP. This section is which stores &lt;em&gt;work&lt;/em&gt; when an agent shops them. &lt;a href="https://ucpplayground.com/evals" rel="noopener noreferrer"&gt;UCP Playground Evals&lt;/a&gt; &lt;a href="https://ucpchecker.com/blog/ucp-playground-1000-agent-sessions" rel="noopener noreferrer"&gt;passed 1,000 recorded agent sessions&lt;/a&gt; this month — and it's well past that now: a thousand-plus end-to-end agent shopping runs across 105 unique stores and 16 frontier models, totalling ~57M tokens, &lt;strong&gt;~12 hours of cumulative agent runtime&lt;/strong&gt;, and roughly $119,000 in aggregate cart value.&lt;/p&gt;

&lt;h3&gt;
  
  
  Outcomes: where the agent stops
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Outcome&lt;/th&gt;
&lt;th&gt;Sessions&lt;/th&gt;
&lt;th&gt;Share&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;checkout_reached&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;475&lt;/td&gt;
&lt;td&gt;37.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;search_only&lt;/code&gt; (browsed, didn't cart)&lt;/td&gt;
&lt;td&gt;344&lt;/td&gt;
&lt;td&gt;27.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;failed&lt;/code&gt; (provider error, refusal, max turns)&lt;/td&gt;
&lt;td&gt;261&lt;/td&gt;
&lt;td&gt;20.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;cart_created&lt;/code&gt; (carted, didn't proceed)&lt;/td&gt;
&lt;td&gt;172&lt;/td&gt;
&lt;td&gt;13.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;62% of sessions end without a completed checkout&lt;/strong&gt; — and that ratio has stayed stable as the dataset grew, which is itself the finding. As we add models and stores, the &lt;em&gt;shape&lt;/em&gt; of failure doesn't change: agents find products fine (search works nearly everywhere), build carts often, then ~14% of sessions stall at a cart that won't convert and ~21% fail outright (about half of those are variant-shape problems — the agent picks a variant ID the cart rejects and flails until it hits the turn limit). We dug into exactly that this month in &lt;a href="https://ucpchecker.com/blog/ucp-variant-data-guide" rel="noopener noreferrer"&gt;UCP Variant Data: The #1 Reason Agent Checkouts Fail&lt;/a&gt; — the single largest categorisable cause of the gap between "has a manifest" and "agent can buy from it," and almost entirely fixable in the merchant's variant data without touching any tooling.&lt;/p&gt;

&lt;h3&gt;
  
  
  Model leaderboard
&lt;/h3&gt;

&lt;p&gt;Checkout-conversion rate by model, from the &lt;a href="https://ucpplayground.com/leaderboard" rel="noopener noreferrer"&gt;UCP Playground model leaderboard&lt;/a&gt; — sessions where the agent reached a checkout URL ÷ total sessions for that model (the live leaderboard breaks out search, cart, and speed too):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Sessions&lt;/th&gt;
&lt;th&gt;Checkout %&lt;/th&gt;
&lt;th&gt;Avg session&lt;/th&gt;
&lt;th&gt;Vendor&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/claude-sonnet-4-5" rel="noopener noreferrer"&gt;Claude Sonnet 4.5&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;256&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;52.0%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~38 s&lt;/td&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/llama-3-3-70b" rel="noopener noreferrer"&gt;Llama 3.3 70B&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;75&lt;/td&gt;
&lt;td&gt;49.3%&lt;/td&gt;
&lt;td&gt;~48 s&lt;/td&gt;
&lt;td&gt;Meta&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/deepseek-v3-2" rel="noopener noreferrer"&gt;DeepSeek V3.2&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;60&lt;/td&gt;
&lt;td&gt;45.0%&lt;/td&gt;
&lt;td&gt;~46 s&lt;/td&gt;
&lt;td&gt;DeepSeek&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gemini-3-flash" rel="noopener noreferrer"&gt;Gemini 3 Flash&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;174&lt;/td&gt;
&lt;td&gt;42.0%&lt;/td&gt;
&lt;td&gt;~21 s&lt;/td&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/grok-4" rel="noopener noreferrer"&gt;Grok 4&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;53&lt;/td&gt;
&lt;td&gt;39.6%&lt;/td&gt;
&lt;td&gt;~77 s&lt;/td&gt;
&lt;td&gt;xAI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/claude-opus-4-6" rel="noopener noreferrer"&gt;Claude Opus 4.6&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;123&lt;/td&gt;
&lt;td&gt;39.0%&lt;/td&gt;
&lt;td&gt;~30 s&lt;/td&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gemini-2-5-flash" rel="noopener noreferrer"&gt;Gemini 2.5 Flash&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;125&lt;/td&gt;
&lt;td&gt;36.0%&lt;/td&gt;
&lt;td&gt;~12 s&lt;/td&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gpt-4o" rel="noopener noreferrer"&gt;GPT-4o&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;63&lt;/td&gt;
&lt;td&gt;31.7%&lt;/td&gt;
&lt;td&gt;~15 s&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gemini-3-1-pro" rel="noopener noreferrer"&gt;Gemini 3.1 Pro&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;96&lt;/td&gt;
&lt;td&gt;29.2%&lt;/td&gt;
&lt;td&gt;~48 s&lt;/td&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gemini-2-5-pro" rel="noopener noreferrer"&gt;Gemini 2.5 Pro&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;79&lt;/td&gt;
&lt;td&gt;27.8%&lt;/td&gt;
&lt;td&gt;~34 s&lt;/td&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gpt-5-2" rel="noopener noreferrer"&gt;GPT-5.2&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;63&lt;/td&gt;
&lt;td&gt;20.6%&lt;/td&gt;
&lt;td&gt;~36 s&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/deepseek-r1" rel="noopener noreferrer"&gt;DeepSeek R1&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;15.8%&lt;/td&gt;
&lt;td&gt;~60 s&lt;/td&gt;
&lt;td&gt;DeepSeek&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/o4-mini" rel="noopener noreferrer"&gt;o4-mini&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;14.3%&lt;/td&gt;
&lt;td&gt;~42 s&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/grok-3-mini" rel="noopener noreferrer"&gt;Grok 3 Mini&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;9.5%&lt;/td&gt;
&lt;td&gt;~57 s&lt;/td&gt;
&lt;td&gt;xAI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/qwq-32b" rel="noopener noreferrer"&gt;QwQ 32B&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~61 s&lt;/td&gt;
&lt;td&gt;Alibaba&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Three things hold from April, plus one shift:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Search works everywhere. Checkout completion is the next frontier.&lt;/strong&gt; Every model that runs to completion finds products. Checkout conversion ranges from 0% to 52% — a 50-point spread across the field, which is exactly where the work-to-do sits. The best model in the field completes checkout about half the time today; the headroom from there is the frontier the next quarter gets to push.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reasoning-tuned models still underperform.&lt;/strong&gt; QwQ 32B: 0% across 25 sessions. Grok 3 Mini: 9.5%. o4-mini: 14.3%. DeepSeek R1: 15.8%. Models that burn tokens on deliberation struggle with the fast, sequential, low-ambiguity tool-calling that shopping requires. Shopping rewards decisive, not thoughtful — true in April, true with 3× the data. (GPT-5.2 also lands below the median at 20.6%.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Speed and success are decoupled.&lt;/strong&gt; Gemini 2.5 Flash finishes a session in ~12 seconds; Grok 4 takes ~77. Their checkout rates are 36% and 40% — basically a wash. Being fast doesn't make you good at this; being slow doesn't either. The Claude models sit mid-pack on speed (~30–38 s) and top on conversion, which is the combination that actually matters when the agent is spending someone's money.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The shift:&lt;/strong&gt; in April we reported DeepSeek V3.2 leading the composite shopping score. With ~3× the sessions, Claude Sonnet 4.5 is now clearly out front on checkout completion — 52% over 256 sessions, by far the largest sample — with Meta's Llama 3.3 70B the surprise second. Treat any single month's ranking as provisional until the eval dataset gets to the point — soon — where it stops being indicative and becomes authoritative.&lt;/p&gt;

&lt;h2&gt;
  
  
  The reliability gap, one more time
&lt;/h2&gt;

&lt;p&gt;We've made this the editorial spine of every one of these reports, and the May data doesn't let us retire it. &lt;strong&gt;98.9% of verified stores carry an A on UCP Score&lt;/strong&gt; (5,235 of 5,294; the rest are 57 B's and two C's). By conformance, the directory is in excellent shape. But conformance isn't end-to-end agent-readiness, and that's the gap UCP Score doesn't grade.&lt;/p&gt;

&lt;p&gt;A clean schema doesn't tell you whether the cart endpoint accepts the variant the agent picked, whether response-time budgets hold under load, whether payment-handler tokenisation completes inside the agent's timeout window, or whether the checkout URL drops the agent into an auth loop a browser would have handled with cookies. &lt;a href="https://ucpplayground.com/evals" rel="noopener noreferrer"&gt;UCP Playground&lt;/a&gt; is the test harness developers use to exercise that second layer — replay sessions, probe edge cases, see exactly where an agent trips. By design it surfaces failure modes, not steady-state performance; treating Playground completion rates as a consumer-shopping success metric mis-reads the tool. But the &lt;em&gt;categories&lt;/em&gt; of failure it surfaces — variant mismatch, slow tokenisation, malformed cart responses, checkout redirect loops — are real, and they're what separate an A-graded manifest from a store an agent can reliably transact against in production.&lt;/p&gt;

&lt;p&gt;That's the gap we'd point a platform team at — and it isn't a percentage, it's a posture. The protocol's first phase, call it the first four months, was about getting the schema right, and the ecosystem did that. The next phase is the unglamorous second-order work: error recovery, schema robustness, response-time SLAs, variant-data hygiene, the long tail of edge cases that separate "manifest valid" from "agent transacts without anything tripping it up." That work &lt;em&gt;is&lt;/em&gt; happening — the Playground sessions above are senior engineers doing exactly it. The open question is whether the posture spreads from the engineering teams already running this loop to the long tail of merchants still on bundled defaults. That's where the next quarter's competitive distance gets built.&lt;/p&gt;

&lt;h2&gt;
  
  
  The demand side: AI traffic is converting
&lt;/h2&gt;

&lt;p&gt;For four months this report has focused on supply — which stores have UCP, what capabilities they declare, the shape of their manifests, what agents do against them in testing. On May 11 Shopify &lt;a href="https://www.shopify.com/enterprise/blog/ai-search-insights" rel="noopener noreferrer"&gt;published its first real demand-side dataset&lt;/a&gt;, and the numbers reframe the urgency of everything above.&lt;/p&gt;

&lt;p&gt;Across Shopify storefronts in Q1 2026, by Shopify's analysis:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AI-referred orders grew nearly 13× year-over-year.&lt;/strong&gt; Referral sessions from AI chatbots (ChatGPT, Perplexity, Gemini, Copilot, Claude, Grok) grew more than &lt;strong&gt;8× YoY&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI-referred sessions convert at ~50% higher rates&lt;/strong&gt; than organic search when they start on product pages.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Average order value is 14% higher&lt;/strong&gt; for AI-referred than for organic-search orders.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;More than half of AI-referred sessions start on a product detail page&lt;/strong&gt;, vs ~20% for organic — "journey compression," the buyer arrives ready to buy because the AI did the research first.&lt;/li&gt;
&lt;li&gt;AI-referred conversion outperforms organic SEO in &lt;strong&gt;23 of 25 merchant categories&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Caveat: this is Shopify's analysis of Shopify storefronts with undisclosed methodology, so treat the precise numbers as Shopify-published rather than independently verified. But the &lt;em&gt;direction&lt;/em&gt; is the story: agentic commerce isn't theoretical traffic any more. It's converting at premium rates, in volume, growing fast — and that's the demand signal that explains why every TC member is racing to ship at the productisation layer right now. Shopify Field CTO Sandy Jeong framed the operational work in three buckets: &lt;strong&gt;data readiness&lt;/strong&gt; (machine-readable catalog with structured attributes), &lt;strong&gt;channel infrastructure&lt;/strong&gt; (direct API syndication to AI platforms), and &lt;strong&gt;organisational alignment&lt;/strong&gt; (a named DRI, not a committee). The teams that get those three right capture the 13× curve; the teams that don't watch it route around them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Spec and ecosystem
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Attribution landed in core.&lt;/strong&gt; On May 5 the Technical Council &lt;a href="https://ucpchecker.com/blog/ucp-tc-ships-attribution-into-core" rel="noopener noreferrer"&gt;merged a top-level &lt;code&gt;attribution&lt;/code&gt; field&lt;/a&gt; into cart, checkout, catalog, and order operations — campaign IDs, click identifiers (&lt;code&gt;gclid&lt;/code&gt;, &lt;code&gt;fbclid&lt;/code&gt;, &lt;code&gt;ttclid&lt;/code&gt;), source/medium markers, as an open string-keyed map. It's the first time advertising-and-measurement infrastructure has landed in UCP core, and the trajectory implication is the story: a protocol that carries attribution context is a protocol being built for commercial-scale deployment, not just technical demos.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The council expanded — and the regional question got sharper.&lt;/strong&gt; Amazon, Meta, Microsoft, Salesforce, and Stripe &lt;a href="https://ucpchecker.com/blog/ucp-tech-council-expands-amazon-meta-microsoft-salesforce-stripe" rel="noopener noreferrer"&gt;joined the Technical Council&lt;/a&gt; at the end of April — a governance signal as much as an adoption one (none of the five has shipped a UCP store wave yet), but a notable one: the steering group now includes the company building the leading proprietary alternative (Amazon's "Buy for Me") and the company behind the leading rival protocol (Stripe, ACP). Convergence pressure, formalised.&lt;/p&gt;

&lt;p&gt;Two German commerce trade publications picked up the expansion within a day of each other and used our breakdown of the 16-seat composition as a primary source: &lt;a href="https://excitingcommerce.de/2026/04/27/amazon-schliesst-sich-googles-universal-commerce-protocol-an/" rel="noopener noreferrer"&gt;Exciting Commerce&lt;/a&gt; on April 27 (which drove the European enterprise retail audience &lt;a href="https://ucpchecker.com/alerts" rel="noopener noreferrer"&gt;UCP Alerts&lt;/a&gt; was built for), and &lt;a href="https://www.shoptechblog.com/2026/04/28/agentic-commerce-das-ucp-council-wachst/" rel="noopener noreferrer"&gt;Shoptechblog&lt;/a&gt; the next day. Both lead with the same regional point — &lt;em&gt;"Keine Rolle spielen weiter europäische und asiatische Unternehmen"&lt;/em&gt; ("European and Asian companies continue to play no role") — and Shoptechblog adds the analytical layer: the new members sent senior engineers and architects rather than C-suite executives (implementation work, not press); each company's participation reads as defensive; and the real contest isn't the standardised protocol but the layers &lt;em&gt;above&lt;/em&gt; it — ranking, paid placement, customer ownership. Which is exactly why attribution-in-core is more than plumbing: it's the first of those upper layers getting wired into the spec itself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Two TC members shipped at the productisation layer.&lt;/strong&gt; The contest moving up the stack got two concrete examples this month. On May 5 Google &lt;a href="https://searchengineland.com/google-expands-ucp-checkout-to-main-search-shopping-results-476540" rel="noopener noreferrer"&gt;expanded UCP-powered checkout out of AI Mode into the main shopping section of standard Search results&lt;/a&gt;, with &lt;a href="https://www.thekeyword.co/news/google-ucp-checkout-main-search" rel="noopener noreferrer"&gt;Wayfair the first live retailer on the new surface&lt;/a&gt; — a "Buy" button on listings inside Google Search itself, Google Pay tokenisation, checkout completing without leaving the page. Zero-click search results just became zero-click &lt;em&gt;purchases&lt;/em&gt;. The &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-february-2026" rel="noopener noreferrer"&gt;two-track adoption story&lt;/a&gt; we drew in February has its first major convergence event.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fucpchecker.s3.eu-west-1.amazonaws.com%2Fblog%2Fstate-of-agentic-commerce-may-2026%2F02-google-wayfair-flow.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fucpchecker.s3.eu-west-1.amazonaws.com%2Fblog%2Fstate-of-agentic-commerce-may-2026%2F02-google-wayfair-flow.webp" alt="Google AI Mode shopping flow on Wayfair: AI Mode query, product detail with Buy button, Google Pay order review, order complete confirmation" width="800" height="389"&gt;&lt;/a&gt;&lt;/p&gt;
Google's UCP-powered checkout flow on Wayfair: AI Mode query → product page with Buy button → Google Pay review → order complete. Source: Google.



&lt;p&gt;Shopify, separately, started rolling out an &lt;strong&gt;Agentic Storefronts dashboard&lt;/strong&gt; in merchant admin this week (&lt;a href="https://shopify.dev/docs/agents" rel="noopener noreferrer"&gt;live docs&lt;/a&gt;) — surfaces ChatGPT / Microsoft Copilot / AI Mode traffic, offers an "Allow Shopify to manage for me" toggle that auto-generates the AI-readability files (&lt;code&gt;llms.txt&lt;/code&gt;, &lt;code&gt;llms-full.txt&lt;/code&gt;, &lt;code&gt;agents.md&lt;/code&gt;) for stores that opt in. The dashboard is &lt;strong&gt;protocol-agnostic&lt;/strong&gt;: it covers ChatGPT (ACP), Copilot, and UCP-powered Search inside one admin view. UCP is one of the protocols Shopify is now monetising on the agentic-readiness layer, not the whole product. For Shopify it's the natural next step after the &lt;a href="https://ucpchecker.com/blog/ucp-v2026-04-08-spec-update" rel="noopener noreferrer"&gt;v2026-04-08 fleet migration&lt;/a&gt;; for everyone else watching the head start, it's the answer to what the &lt;em&gt;next&lt;/em&gt; phase of it looks like.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fucpchecker.s3.eu-west-1.amazonaws.com%2Fblog%2Fstate-of-agentic-commerce-may-2026%2F01-shopify-agentic-dashboard.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fucpchecker.s3.eu-west-1.amazonaws.com%2Fblog%2Fstate-of-agentic-commerce-may-2026%2F01-shopify-agentic-dashboard.webp" alt="Shopify Agentic Storefronts dashboard in merchant admin showing 2,060 agentic sessions and $6,447 earned in the last 30 days, split by ChatGPT, Microsoft Copilot, and Shop Channel, with an 'Allow Shopify to manage for me' toggle and agentic readiness checklist" width="800" height="661"&gt;&lt;/a&gt;&lt;/p&gt;
Shopify Agentic Storefronts in merchant admin — ChatGPT / Microsoft Copilot / Shop Channel split, "Allow Shopify to manage for me" toggle, agentic-readiness checklist.



&lt;p&gt;&lt;strong&gt;A potential spec gap, still being validated.&lt;/strong&gt; In &lt;a href="https://ucpchecker.com/blog/ucp-variant-data-guide" rel="noopener noreferrer"&gt;the variant-data guide&lt;/a&gt; we noted that v2026-04-08 makes &lt;code&gt;variant.options[]&lt;/code&gt; optional even on products where &lt;code&gt;product.options[]&lt;/code&gt; is non-empty and there are multiple variants — meaning two fully spec-compliant manifests can produce identical-looking payloads where one is unambiguous and the other is agent-unresolvable. The candidate fix would be a conditional &lt;code&gt;MUST&lt;/code&gt; ("when &lt;code&gt;product.options&lt;/code&gt; is non-empty and &lt;code&gt;variants.length &amp;gt; 1&lt;/code&gt;, every variant MUST populate &lt;code&gt;options[]&lt;/code&gt;"). It's a working hypothesis from one analysis, not a filed proposal — we want to sweep more of the live dataset for real-world incidence and check the edge cases (single-variant simple products, productGroup behaviour, platforms that already populate &lt;code&gt;options&lt;/code&gt; by default) before raising it formally. If the pattern holds, it's a candidate for a future minor release.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No v2026-05.&lt;/strong&gt; v2026-04-08 remains current. On the cadence so far, the next minor release more likely lands late summer (a notional v2026-08), probably bundling AP2 mandate refinements, schema corrections shaken out by running validators against thousands of real stores, and whatever the council formalises over the next two months. On the partner side: the &lt;a href="https://ucpregistry.com" rel="noopener noreferrer"&gt;registry&lt;/a&gt; now lists 61 merchants, 11 agents, and 8 extensions; the payment-handler roster (Adyen, Amex, Mastercard, Stripe, Visa, Checkout.com, Affirm, Splitit, PayPal) is unchanged and still almost entirely unrepresented in live manifest declarations.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we shipped — and what developers are doing with it
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/blog/ucp-variant-data-guide" rel="noopener noreferrer"&gt;UCP Variant Data: The #1 Reason Agent Checkouts Fail&lt;/a&gt;&lt;/strong&gt; — the five variant-data anti-patterns, what clean variant data looks like, and the spec gap that lets compliant stores still be broken.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/blog/how-to-test-ucp-implementation" rel="noopener noreferrer"&gt;How to Test Your UCP Implementation&lt;/a&gt;&lt;/strong&gt; — the three-layer validation workflow: static audit, live agent test, continuous monitoring.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;UCP Score&lt;/a&gt; is doing exactly what it was built to do.&lt;/strong&gt; This is the one we're proudest of this quarter. The Score turns "is my manifest agent-ready?" into a concrete, category-by-category checklist — and developers are using it that way: we've watched a failing manifest climb to an A grade in the space of a few hours, the developer iterating against the score breakdown between checks. That's the loop it was designed for, and it's now the loop it runs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpplayground.com" rel="noopener noreferrer"&gt;UCP Playground&lt;/a&gt; got sharper as a development tool.&lt;/strong&gt; Two halves of the same loop: the agent-inspection tooling — replay any session, see the exact tool call where an agent tripped — and the runtime shopping evals, now past &lt;strong&gt;1,000 recorded sessions&lt;/strong&gt; and &lt;strong&gt;more than 12 hours of cumulative agent runtime&lt;/strong&gt; against real stores. Together they take the build → test → fix cycle for an agent-ready storefront down from a sprint to an afternoon. Every improvement that got us there is in the &lt;a href="https://ucpplayground.com/changelog" rel="noopener noreferrer"&gt;changelog&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Crawler throughput&lt;/strong&gt; — we roughly tripled the hourly crawl rate in early May (and added per-IP and global throttles to the expensive public routes so the directory stays fast under load). That's what moved the discovery curve this month.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What to watch in June
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Second adopters at every edge.&lt;/strong&gt; May produced first adopters across multiple novel patterns — non-Shopify platforms shipping UCP (Bareconnect, Selly.io), a consultancy-built accelerator (PwC), non-default payment-handler integrations (the processors in dev), AP2 mandate (still one), third-party capability namespaces (each at 1–2 stores). The diagnostic for June is whether any doubles up. Each is a distinct watch item; the meta-question is the same: did May's first adopters survive contact with month two?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Google's next live partner on main Search.&lt;/strong&gt; Wayfair is first up on Google's UCP-checkout expansion into standard Search results. The other co-developing TC retailers — Etsy, Target, Walmart — are the next-most-likely to follow. The cadence of those rollouts is the diagnostic for how fast Google is willing to push agent-completed transactions onto its highest-traffic surface.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The platform-level integration question.&lt;/strong&gt; SFCC, Adobe Commerce, Wix, Squarespace — any of them shipping a platform-level UCP integration is still the single highest-impact possible event, and still hasn't happened. The one-platform structure is four months old.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Whether the eval leaderboard holds its shape.&lt;/strong&gt; Claude Sonnet 4.5 leads checkout completion on the largest sample; Llama 3.3 70B is the surprise second. Another month of sessions either confirms that or reshuffles it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;p&gt;All data is from the UCP Checker crawler (re-checks every tracked domain at least every 24 hours) and UCP Playground's eval sessions, as of May 12, 2026. The verified-merchant dataset is published monthly on &lt;a href="https://huggingface.co/datasets/UCPChecker/ucp-merchants" rel="noopener noreferrer"&gt;Hugging Face&lt;/a&gt; under CC-BY 4.0; the same data, a public REST API, the bulk checker, and the rest of our &lt;a href="https://ucpchecker.com/developer-tools" rel="noopener noreferrer"&gt;developer tools&lt;/a&gt; are all ungated.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Browse the directory: &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;ucpchecker.com/directory&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Track adoption live: &lt;a href="https://ucpchecker.com/stats" rel="noopener noreferrer"&gt;ucpchecker.com/stats&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Run a UCP Score: &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;ucpchecker.com/score&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Model + store leaderboard: &lt;a href="https://ucpplayground.com/evals" rel="noopener noreferrer"&gt;ucpplayground.com/evals&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Public dataset, REST API &amp;amp; developer tools: &lt;a href="https://ucpchecker.com/developer-tools" rel="noopener noreferrer"&gt;ucpchecker.com/developer-tools&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Previous report: &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-april-2026" rel="noopener noreferrer"&gt;State of Agentic Commerce — April 2026&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;External coverage cited in this report:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Jochen Krisch, &lt;em&gt;"Amazon schließt sich Googles Universal Commerce Protocol an,"&lt;/em&gt; &lt;a href="https://excitingcommerce.de/2026/04/27/amazon-schliesst-sich-googles-universal-commerce-protocol-an/" rel="noopener noreferrer"&gt;Exciting Commerce&lt;/a&gt;, April 27, 2026&lt;/li&gt;
&lt;li&gt;Roman Zenner, &lt;em&gt;"Agentic Commerce: Das UCP Council wächst,"&lt;/em&gt; &lt;a href="https://www.shoptechblog.com/2026/04/28/agentic-commerce-das-ucp-council-wachst/" rel="noopener noreferrer"&gt;Shoptechblog&lt;/a&gt;, April 28, 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://searchengineland.com/google-expands-ucp-checkout-to-main-search-shopping-results-476540" rel="noopener noreferrer"&gt;Google expands UCP Checkout to main Search shopping results&lt;/a&gt;, Search Engine Land, May 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://blog.google/products/ads-commerce/agentic-commerce-ai-tools-protocol-retailers-platforms/" rel="noopener noreferrer"&gt;New tech and tools for retailers to succeed in an agentic shopping era&lt;/a&gt;, Google blog (Ads &amp;amp; Commerce)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://shopify.dev/docs/agents" rel="noopener noreferrer"&gt;Shopify Agentic commerce developer docs&lt;/a&gt; — Agentic Storefronts, &lt;code&gt;llms.txt&lt;/code&gt;, &lt;code&gt;llms-full.txt&lt;/code&gt;, &lt;code&gt;agents.md&lt;/code&gt; reference&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.wislr.com/research/what-shopify-checks-for-agentic-readiness" rel="noopener noreferrer"&gt;What Shopify checks for agentic readiness&lt;/a&gt;, WISLR Research&lt;/li&gt;
&lt;li&gt;Kyle Risley, &lt;a href="https://www.shopify.com/enterprise/blog/ai-search-insights" rel="noopener noreferrer"&gt;"AI-referred shoppers convert better and spend more (2026)"&lt;/a&gt;, Shopify Enterprise Blog, May 11, 2026&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ecommerce</category>
      <category>ai</category>
      <category>data</category>
      <category>ucp</category>
    </item>
    <item>
      <title>UCP Variant Data: The #1 Reason Agent Checkouts Fail</title>
      <dc:creator>Benji Fisher</dc:creator>
      <pubDate>Wed, 13 May 2026 11:12:54 +0000</pubDate>
      <link>https://dev.to/benjifisher/ucp-variant-data-the-1-reason-agent-checkouts-fail-4jp5</link>
      <guid>https://dev.to/benjifisher/ucp-variant-data-the-1-reason-agent-checkouts-fail-4jp5</guid>
      <description>&lt;p&gt;A user asks an AI shopping agent for "a medium grey t-shirt." The agent finds the product. It picks a variant. It adds it to the cart. The merchant rejects the cart. The agent retries with a different variant. The merchant rejects that one too. The session ends in &lt;code&gt;cart_created&lt;/code&gt; without a checkout — the user's $40 purchase quietly disappears, and nobody on the merchant side ever sees the failure.&lt;/p&gt;

&lt;p&gt;This pattern is the &lt;strong&gt;single largest source of agent checkout failures we see across the 4,500+ verified UCP stores in the &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;directory&lt;/a&gt;&lt;/strong&gt;. More than schema invalidity, more than tool errors, more than payment-handler problems. Variant mismatch — the agent and the merchant disagreeing on which SKU corresponds to "Medium" — is responsible for a meaningful fraction of the gap between "store has a UCP manifest" and "agent can actually buy from it."&lt;/p&gt;

&lt;p&gt;The good news: it's almost entirely fixable on the merchant side, in your variant data structure, without changing any tooling. This post walks through the failure pattern, the five most common variant data anti-patterns we observe, and what clean variant data looks like in practice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anatomy of a variant mismatch
&lt;/h2&gt;

&lt;p&gt;Here's the cleanest way to see the failure:&lt;/p&gt;

&lt;p&gt;Two frontier agents — call them Agent A and Agent B — get the same prompt against the same store: &lt;em&gt;"Add a medium grey t-shirt to my cart."&lt;/em&gt; Both agents call &lt;code&gt;search_catalog&lt;/code&gt;, both get the same product back, both see three variants:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"variants"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var_5571"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Color"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Grey"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Size"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Small"&lt;/span&gt;&lt;span class="p"&gt;}]},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var_5572"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Color"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Grey"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Size"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Medium"&lt;/span&gt;&lt;span class="p"&gt;}]},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var_5573"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Color"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Grey"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Size"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Large"&lt;/span&gt;&lt;span class="p"&gt;}]}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Agent A picks &lt;code&gt;var_5572&lt;/code&gt;. Agent B picks &lt;code&gt;var_5572&lt;/code&gt;. Both add to cart. Both succeed. &lt;strong&gt;Clean data, predictable behaviour.&lt;/strong&gt; Each variant declares its options as an array of &lt;code&gt;{name, label}&lt;/code&gt; pairs — the spec's &lt;a href="https://ucp.dev/2026-04-08/schemas/shopping/types/selected_option.json" rel="noopener noreferrer"&gt;&lt;code&gt;selected_option&lt;/code&gt;&lt;/a&gt; shape — so the agent matches "medium" against the &lt;code&gt;Size&lt;/code&gt; axis unambiguously.&lt;/p&gt;

&lt;p&gt;Now the broken version. Same prompt, same product, but the variant data looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"variants"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var_5571"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Color"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Grey"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Size"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"S"&lt;/span&gt;&lt;span class="p"&gt;}]},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var_5572"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Color"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Grey"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Size"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"M"&lt;/span&gt;&lt;span class="p"&gt;}]},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var_5573"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Color"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Grey"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Size"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"L"&lt;/span&gt;&lt;span class="p"&gt;}]},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var_5574"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Color"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Grey"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Size"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Medium / Regular Fit"&lt;/span&gt;&lt;span class="p"&gt;}]},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var_5575"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Color"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Grey"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Size"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Medium / Slim Fit"&lt;/span&gt;&lt;span class="p"&gt;}]}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Agent A picks &lt;code&gt;var_5572&lt;/code&gt; (interpreting "M" as the canonical "Medium"). Agent B picks &lt;code&gt;var_5574&lt;/code&gt; (interpreting "Medium / Regular Fit" as the more explicit match). &lt;strong&gt;Neither is wrong.&lt;/strong&gt; The user said "medium" and both interpretations are defensible. But because the variant data conflates two different axes — size and fit — into a single &lt;code&gt;Size&lt;/code&gt; label, the two agents diverge, and the user's experience depends on which model they're using. The spec form makes the bug obvious: &lt;code&gt;Fit&lt;/code&gt; should be its own &lt;code&gt;selected_option&lt;/code&gt;, not crammed into the &lt;code&gt;Size&lt;/code&gt; label.&lt;/p&gt;

&lt;p&gt;Worse: many real implementations don't even include the option labels. They expose only opaque variant IDs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"variants"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var_5571"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var_5572"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var_5573"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the agent has no way to know which variant corresponds to "Medium" at all. It guesses. Sometimes it guesses right. Often it doesn't. That's how checkout sessions end up in &lt;code&gt;cart_created&lt;/code&gt; without ever reaching &lt;code&gt;checkout_reached&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this is the #1 failure mode
&lt;/h2&gt;

&lt;p&gt;Across the &lt;a href="https://ucpchecker.com/blog/ucp-playground-1000-agent-sessions" rel="noopener noreferrer"&gt;Playground session dataset&lt;/a&gt;, roughly &lt;strong&gt;62% of sessions end without a completed checkout&lt;/strong&gt;. The breakdown is informative:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Outcome&lt;/th&gt;
&lt;th&gt;Share&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;checkout_reached&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;38%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;search_only&lt;/code&gt; (browsed, didn't add)&lt;/td&gt;
&lt;td&gt;27%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;failed&lt;/code&gt; (provider error, model refusal, max turns)&lt;/td&gt;
&lt;td&gt;22%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;cart_created&lt;/code&gt; (added, didn't proceed)&lt;/td&gt;
&lt;td&gt;13%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;cart_created&lt;/code&gt; cohort — sessions where the agent successfully picked something but couldn't finish — is the variant-mismatch signal. The agent had enough information to add to cart but the cart contents weren't valid for checkout. That's the structural shape of "wrong variant picked."&lt;/p&gt;

&lt;p&gt;Roughly half of the categorised &lt;code&gt;failed&lt;/code&gt; sessions are also variant-shape problems — the agent picked a variant ID that the cart endpoint rejects, retried with another, hit &lt;code&gt;max_turns_exceeded&lt;/code&gt; while flailing through the variant list. Add those in and &lt;strong&gt;variant-related failures account for somewhere around a fifth of all sessions&lt;/strong&gt;, which is more than any other categorisable failure mode.&lt;/p&gt;

&lt;p&gt;The thing that makes this pattern so consistent: &lt;strong&gt;clean variant data is not part of UCP Score or schema validation&lt;/strong&gt;. A store can pass &lt;a href="https://ucpchecker.com/blog/introducing-ucp-score-agent-readiness-grade" rel="noopener noreferrer"&gt;UCP Score&lt;/a&gt; at A grade and still emit variant data that breaks every agent in the field. The validator looks at whether the manifest parses; it doesn't look at whether the variants are agent-resolvable. That gap is exactly why this post exists.&lt;/p&gt;

&lt;h2&gt;
  
  
  A spec gap that compounds the problem
&lt;/h2&gt;

&lt;p&gt;Even when a store is fully UCP-compliant, the protocol leaves room for ambiguity. The 2026-04-08 schema makes &lt;code&gt;variant.options[]&lt;/code&gt; optional — including on products where &lt;code&gt;product.options[]&lt;/code&gt; is non-empty and there are multiple variants. So a payload like &lt;code&gt;{"options": [{"name": "Size", "values": [{"label": "Small"}, {"label": "Medium"}]}], "variants": [{"id": "var_a"}, {"id": "var_b"}]}&lt;/code&gt; is technically valid but agent-unresolvable: nothing links &lt;code&gt;var_a&lt;/code&gt; to "Small" rather than "Medium." Two consumers looking at this payload can defensibly pick different variants for the same prompt.&lt;/p&gt;

&lt;p&gt;A conditional &lt;code&gt;MUST&lt;/code&gt; in the spec — &lt;em&gt;"when &lt;code&gt;product.options&lt;/code&gt; is non-empty and &lt;code&gt;variants.length &amp;gt; 1&lt;/code&gt;, every variant MUST populate &lt;code&gt;options[]&lt;/code&gt;"&lt;/em&gt; — would close this cleanly. Until that lands, agent-resolvability is on the merchant rather than the protocol.&lt;/p&gt;

&lt;h2&gt;
  
  
  The five variant anti-patterns
&lt;/h2&gt;

&lt;p&gt;In rough order of frequency observed across the dataset:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Opaque variant IDs with no option metadata
&lt;/h3&gt;

&lt;p&gt;The shape from the third example above — variants exposed only as &lt;code&gt;var_5572&lt;/code&gt;, no &lt;code&gt;options&lt;/code&gt;, no &lt;code&gt;attributes&lt;/code&gt;, no human-readable axis. Agents have no way to map a user's "Medium" to a specific ID. They either guess or pick the first variant, both of which produce wrong outcomes routinely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; every variant must carry the axis values that distinguish it from siblings, in the spec's &lt;a href="https://ucp.dev/2026-04-08/schemas/shopping/types/selected_option.json" rel="noopener noreferrer"&gt;&lt;code&gt;selected_option&lt;/code&gt;&lt;/a&gt; array form: &lt;code&gt;"options": [{"name": "Size", "label": "Medium"}, {"name": "Color", "label": "Grey"}]&lt;/code&gt;. The &lt;code&gt;name&lt;/code&gt; field tells the agent which axis the value belongs to; &lt;code&gt;label&lt;/code&gt; is what gets matched against the user's request. Whatever the product's options page shows to a human shopper — size, colour, material, fit — the variant data should expose programmatically with one &lt;code&gt;selected_option&lt;/code&gt; entry per axis.&lt;/p&gt;

&lt;p&gt;The corollary: descriptive attributes that aren't selection axes belong in &lt;code&gt;metadata&lt;/code&gt;, not &lt;code&gt;product.options[]&lt;/code&gt;. A one-variant simple product with "Color: Gray" should expose Gray as &lt;code&gt;metadata.attributes&lt;/code&gt;, not as a single-value &lt;code&gt;product.option&lt;/code&gt; — otherwise consumer UIs render a one-button picker that looks selectable but isn't. The split: &lt;code&gt;product.options[]&lt;/code&gt; is for axes the buyer chooses across; &lt;code&gt;metadata&lt;/code&gt; is for descriptive properties of the (only) variant.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Conflated axes in a single string
&lt;/h3&gt;

&lt;p&gt;The shape from the second example — &lt;code&gt;"Medium / Regular Fit"&lt;/code&gt; as a single option value where size and fit are two separate user choices. Agents can parse this, but inconsistently across models, because the conflation is ambiguous. Different models split the string differently, and the variant they end up picking depends on which side of the slash they prioritise.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; each variant attribute lives in its own field. Don't compose. If your product has size + fit as two axes, the variant data should look like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var_5574"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Size"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Medium"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Fit"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Regular"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two clean axes, two unambiguous values, no string parsing required. Agents pick consistently. The array-of-&lt;code&gt;selected_option&lt;/code&gt; form is the shape UCP &lt;code&gt;2026-04-08&lt;/code&gt; defines for &lt;code&gt;variant.options&lt;/code&gt; — see &lt;a href="https://ucp.dev/2026-04-08/schemas/shopping/types/selected_option.json" rel="noopener noreferrer"&gt;&lt;code&gt;selected_option.json&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Inconsistent labelling between sibling variants
&lt;/h3&gt;

&lt;p&gt;Not all variants on the same product use the same option vocabulary. One says &lt;code&gt;"M"&lt;/code&gt;, another says &lt;code&gt;"Medium"&lt;/code&gt;, another says &lt;code&gt;"med"&lt;/code&gt;. We see this on stores that have grown organically — different teams added variants over different years, naming conventions drifted, the inconsistency is invisible to the merchandising team because the storefront UI hides it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; one canonical label per axis value, applied consistently across every variant on every product. If "Medium" is the canonical label, every Medium variant uses exactly &lt;code&gt;"Medium"&lt;/code&gt;. No &lt;code&gt;"M"&lt;/code&gt;, no &lt;code&gt;"med"&lt;/code&gt;, no &lt;code&gt;"Medium "&lt;/code&gt; (trailing space). Agents reason by string match; consistency is what makes the match reliable.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Missing or inconsistent stock / availability flags
&lt;/h3&gt;

&lt;p&gt;A variant exists in the catalogue but is sold out, and the variant data doesn't say so. The agent picks it, the cart accepts the add, the checkout endpoint rejects it. The agent doesn't know to retry with a different variant — it had no signal that the variant was unavailable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; every variant declares its &lt;code&gt;availability&lt;/code&gt; object — &lt;code&gt;{"available": true, "status": "in_stock"}&lt;/code&gt; is the spec shape, with well-known status values &lt;code&gt;in_stock&lt;/code&gt;, &lt;code&gt;backorder&lt;/code&gt;, &lt;code&gt;preorder&lt;/code&gt;, &lt;code&gt;out_of_stock&lt;/code&gt;, and &lt;code&gt;discontinued&lt;/code&gt;. Agents skip unavailable variants if you tell them to, and &lt;code&gt;status&lt;/code&gt; gives them enough signal to decide whether to wait, substitute, or surface an out-of-stock message to the user.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Declared axes that variants don't honor
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;product.options[]&lt;/code&gt; declares the selectable axes; &lt;code&gt;variants[]&lt;/code&gt; is the universe of actual purchasable combinations. When the cardinality of declared axes doesn't match what variants actually carry — e.g., &lt;code&gt;product.options&lt;/code&gt; declares Color × Size = 9 combinations but only 3 color-only variants exist — agents try to satisfy a Size selection that no variant honors. Strict consumers return null and refuse to add; lenient consumers guess and pick wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; keep &lt;code&gt;product.options[]&lt;/code&gt; and &lt;code&gt;variants[]&lt;/code&gt; in sync. Either every declared axis combination has a corresponding variant, or the axis shouldn't be in &lt;code&gt;product.options[]&lt;/code&gt;. If sizes aren't actually configurable for this product, drop &lt;code&gt;Size&lt;/code&gt; from the axes; don't leave it dangling.&lt;/p&gt;

&lt;h2&gt;
  
  
  What clean variant data looks like
&lt;/h2&gt;

&lt;p&gt;Here's the shape that resolves cleanly across every frontier model we test:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"prod_42"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Heavyweight Crew Tee"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Heavyweight cotton crew-neck tee."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"price_range"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"min"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"amount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"currency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"USD"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"max"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"amount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"currency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"USD"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Color"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"values"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Charcoal"&lt;/span&gt;&lt;span class="p"&gt;}]},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Size"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;"values"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Small"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Medium"&lt;/span&gt;&lt;span class="p"&gt;}]},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Fit"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="nl"&gt;"values"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Regular"&lt;/span&gt;&lt;span class="p"&gt;}]}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"variants"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var_5571"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Charcoal / Small / Regular"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Heavyweight crew tee, charcoal, size small, regular fit."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"amount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"currency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"USD"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"availability"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"available"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"in_stock"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Color"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Charcoal"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Size"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Small"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Fit"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Regular"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var_5572"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Charcoal / Medium / Regular"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Heavyweight crew tee, charcoal, size medium, regular fit."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"amount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"currency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"USD"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"availability"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"available"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"in_stock"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Color"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Charcoal"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Size"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Medium"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Fit"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Regular"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Four spec fields make this work:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;product.options&lt;/code&gt;&lt;/strong&gt; at the product level — declares the axes (&lt;code&gt;Color&lt;/code&gt;, &lt;code&gt;Size&lt;/code&gt;, &lt;code&gt;Fit&lt;/code&gt;) and their valid values as an array of &lt;a href="https://ucp.dev/2026-04-08/schemas/shopping/types/product_option.json" rel="noopener noreferrer"&gt;&lt;code&gt;product_option&lt;/code&gt;&lt;/a&gt; &lt;code&gt;{name, values: [{label}]}&lt;/code&gt;. Agents know upfront how many dimensions a variant occupies and what values are valid on each axis.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;variant.options&lt;/code&gt;&lt;/strong&gt; as an array of &lt;a href="https://ucp.dev/2026-04-08/schemas/shopping/types/selected_option.json" rel="noopener noreferrer"&gt;&lt;code&gt;selected_option&lt;/code&gt;&lt;/a&gt; &lt;code&gt;{name, label}&lt;/code&gt; — each axis has its own entry, no string parsing, no conflation. The &lt;code&gt;name&lt;/code&gt; matches the product-level axis; the &lt;code&gt;label&lt;/code&gt; matches the user's request.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;variant.availability&lt;/code&gt;&lt;/strong&gt; with &lt;code&gt;available&lt;/code&gt; and &lt;code&gt;status&lt;/code&gt; — agents skip unavailable variants without trial-and-error, and &lt;code&gt;status&lt;/code&gt; (&lt;code&gt;in_stock&lt;/code&gt;, &lt;code&gt;backorder&lt;/code&gt;, &lt;code&gt;preorder&lt;/code&gt;, &lt;code&gt;out_of_stock&lt;/code&gt;, &lt;code&gt;discontinued&lt;/code&gt;) gives them enough signal to wait, substitute, or surface the right message.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Required scaffolding&lt;/strong&gt; — &lt;code&gt;id&lt;/code&gt;, &lt;code&gt;title&lt;/code&gt;, &lt;code&gt;description&lt;/code&gt;, and &lt;code&gt;price&lt;/code&gt; on every &lt;a href="https://ucp.dev/2026-04-08/schemas/shopping/types/variant.json" rel="noopener noreferrer"&gt;variant&lt;/a&gt;, and &lt;code&gt;id&lt;/code&gt;, &lt;code&gt;title&lt;/code&gt;, &lt;code&gt;description&lt;/code&gt;, &lt;code&gt;price_range&lt;/code&gt;, &lt;code&gt;variants&lt;/code&gt; on the &lt;a href="https://ucp.dev/2026-04-08/schemas/shopping/types/product.json" rel="noopener noreferrer"&gt;product&lt;/a&gt;. These aren't "nice to have"; they're the schema's required fields. Variants missing any of them won't validate.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Bonus stability:&lt;/strong&gt; when present, &lt;code&gt;option_value.id&lt;/code&gt; and &lt;code&gt;selected_option.id&lt;/code&gt; give stable identifiers that survive label drift. If your platform supports it (most do — Shopify uses GIDs, WooCommerce uses &lt;code&gt;pa_*&lt;/code&gt; taxonomy slugs), populate &lt;code&gt;id&lt;/code&gt; alongside &lt;code&gt;label&lt;/code&gt; and consumers can match on the stable key when labels change.&lt;/p&gt;

&lt;p&gt;Stores running variant data in this shape resolve user prompts to specific variants reliably across every model we've benchmarked. The pattern isn't novel — it's the same shape Shopify uses internally, the same shape WooCommerce variations use when properly structured, the same shape every traditional e-commerce platform ends up at after enough years of evolution. UCP just exposes it programmatically to the agent layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to validate your variant data
&lt;/h2&gt;

&lt;p&gt;Three layers, in order:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Static audit.&lt;/strong&gt; Run your store through &lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;UCPChecker&lt;/a&gt;. The validator surfaces variants with missing &lt;code&gt;options&lt;/code&gt; data, conflated axes, inconsistent labels across sibling variants, and missing availability flags. None of this is part of strict UCP-spec conformance, but our &lt;a href="https://ucpchecker.com/methodology" rel="noopener noreferrer"&gt;methodology&lt;/a&gt; flags variant-quality issues as part of the Capability Coverage score because they materially affect whether agents can transact.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Live agent test.&lt;/strong&gt; Run a multi-model agent session against your store via &lt;a href="https://ucpplayground.com/" rel="noopener noreferrer"&gt;UCP Playground&lt;/a&gt;. The framework exercises the full search → variant-pick → cart → checkout flow against frontier agents across &lt;a href="https://ucpplayground.com/models" rel="noopener noreferrer"&gt;15+ models&lt;/a&gt;. If your variant data is ambiguous, you'll see different models pick different variants for the same prompt — the exact pattern we walk through in &lt;a href="https://ucpchecker.com/blog/ucp-playground-1000-agent-sessions" rel="noopener noreferrer"&gt;the Playground 1,000-sessions analysis&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Continuous monitoring.&lt;/strong&gt; Variant data changes over time as you add products and SKUs. Set up &lt;a href="https://ucpchecker.com/alerts" rel="noopener noreferrer"&gt;UCP Alerts&lt;/a&gt; so you get notified when a variant audit starts surfacing new issues — typically a sign that a recent merchandising change introduced inconsistent labelling at scale.&lt;/p&gt;

&lt;p&gt;The order matters. Static audit catches the easy cases (missing fields, schema-shaped problems) cheaply. Live agent test catches the cases where the schema is fine but agents disagree (the conflated-axis cases, the inconsistent-label cases). Monitoring catches drift over time. Skipping any of the three leaves a class of variant problems undetected.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to fix in your store, and how to verify it
&lt;/h2&gt;

&lt;p&gt;If you're a merchant reading this and your store is running on &lt;a href="https://ucpchecker.com/platforms/shopify" rel="noopener noreferrer"&gt;Shopify&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/woocommerce" rel="noopener noreferrer"&gt;WooCommerce&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/bigcommerce" rel="noopener noreferrer"&gt;BigCommerce&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/magento" rel="noopener noreferrer"&gt;Magento&lt;/a&gt;, or &lt;a href="https://ucpchecker.com/platforms/prestashop" rel="noopener noreferrer"&gt;PrestaShop&lt;/a&gt;, the variant data structure is mostly determined by your platform's defaults. The platform-specific fixes are documented in the platform guides — but the meta-pattern is the same:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Audit&lt;/strong&gt; at &lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;ucpchecker.com/check&lt;/a&gt; — get a list of variant-data issues&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fix&lt;/strong&gt; the most common one first (usually missing &lt;code&gt;options&lt;/code&gt; metadata)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test&lt;/strong&gt; at &lt;a href="https://ucpplayground.com/" rel="noopener noreferrer"&gt;ucpplayground.com&lt;/a&gt; with two different models against the same product, asking for the same variant&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verify&lt;/strong&gt; that both models pick the same variant ID consistently&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor&lt;/strong&gt; weekly — variant drift is the most common reason a store's UCP Score regresses&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Variant data is a back-office data-quality problem dressed up as an agentic commerce problem. The fix is mostly editorial — get your axis labels consistent, expose your option values structurally, mark sold-out variants as such. None of this is technically hard. It's the kind of work that adds up to "agents can buy from your store" rather than "agents try to buy from your store and quietly fail."&lt;/p&gt;

&lt;p&gt;If you fix one thing on the agent-readiness side this quarter, fix variant data. The conversion lift is bigger than any other single change you can make.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One thing worth naming:&lt;/strong&gt; consumer tools that silently paper over variant data problems (substring matching, positional guessing, falling back to &lt;code&gt;variants[0]&lt;/code&gt;) make this worse, not better. They hide the failure mode from merchants who would otherwise see it and fix the data. Faithful rendering — null when the match is ambiguous, errors when the data is inconsistent — is what produces correct merchant behaviour. If your variant data only works in some agents, that's a signal the data is the problem, not the agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Audit your variants now&lt;/strong&gt;: &lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;ucpchecker.com/check&lt;/a&gt; — flags variant issues alongside the rest of the UCP Score&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test variant resolution with real agents&lt;/strong&gt;: &lt;a href="https://ucpplayground.com/" rel="noopener noreferrer"&gt;ucpplayground.com&lt;/a&gt; — run two models against your store on the same prompt, see if they pick the same variant&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read the broader failure-mode taxonomy&lt;/strong&gt;: &lt;a href="https://ucpchecker.com/blog/common-ucp-errors" rel="noopener noreferrer"&gt;Common UCP Errors and How to Fix Them&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Track ecosystem-wide variant adoption&lt;/strong&gt;: &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-april-2026" rel="noopener noreferrer"&gt;State of Agentic Commerce — April 2026&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ecommerce</category>
      <category>webdev</category>
      <category>tutorial</category>
      <category>ucp</category>
    </item>
    <item>
      <title>The UCP Technical Council Just Shipped Attribution into Core. Here's What That Means.</title>
      <dc:creator>Benji Fisher</dc:creator>
      <pubDate>Wed, 06 May 2026 07:43:57 +0000</pubDate>
      <link>https://dev.to/benjifisher/the-ucp-technical-council-just-shipped-attribution-into-core-heres-what-that-means-2cnh</link>
      <guid>https://dev.to/benjifisher/the-ucp-technical-council-just-shipped-attribution-into-core-heres-what-that-means-2cnh</guid>
      <description>&lt;p&gt;On &lt;strong&gt;May 5, 2026&lt;/strong&gt;, the UCP Technical Council merged &lt;a href="https://github.com/Universal-Commerce-Protocol/ucp/pull/391" rel="noopener noreferrer"&gt;PR #391&lt;/a&gt; into the spec's &lt;code&gt;main&lt;/code&gt; branch — adding a top-level &lt;code&gt;attribution&lt;/code&gt; field to cart, checkout, catalog, and order operations. The field carries platform-emitted referral and conversion-event context: campaign IDs, click identifiers (&lt;code&gt;gclid&lt;/code&gt;, &lt;code&gt;fbclid&lt;/code&gt;, &lt;code&gt;ttclid&lt;/code&gt;), source/medium markers. Open string-keyed map. Universal across requests; not gated by capability negotiation.&lt;/p&gt;

&lt;p&gt;As UCP matures, attribution landing in core was always going to happen. Agentic commerce can't operate as commercial infrastructure without a path for advertising and measurement context to flow alongside the transactional data — and the longer that gap stayed open, the more pressure would have built for vendors to ship incompatible parallel solutions. The merge isn't the surprising part. &lt;strong&gt;The interesting part is the specific shape of what shipped, and what its presence in core tells us about where the spec is heading.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Two things to dig into: the technical detail of the field itself, and the trajectory implication of advertising and measurement infrastructure landing in UCP core for the first time.&lt;/p&gt;

&lt;h2&gt;
  
  
  What shipped
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;attribution&lt;/code&gt; field is structurally simple. From Grigorik's own example in the PR:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"attribution"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"campaign_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"18234567890"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"campaign_source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"google"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"campaign_medium"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cpc"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"campaign_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"spring_2026"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"gclid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"EAIaIQobChMI..."&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No prescribed schema beyond "string-keyed object." Platforms populate it with whatever conventions they already use — GA4 campaign parameters, click identifiers, custom tracking keys. Businesses receive the data and process per their own analytics needs. UCP itself does &lt;strong&gt;not&lt;/strong&gt; prescribe attribution windows, models, or assignment logic. The protocol carries the data; attribution math happens downstream.&lt;/p&gt;

&lt;p&gt;The field appears in three roles across the request lifecycle:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Direction&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;catalog&lt;/code&gt; (search, lookup)&lt;/td&gt;
&lt;td&gt;Platform-emitted input&lt;/td&gt;
&lt;td&gt;Platform → merchant&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cart&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Platform-emitted input&lt;/td&gt;
&lt;td&gt;Platform → merchant&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;checkout&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Platform-emitted input&lt;/td&gt;
&lt;td&gt;Platform → merchant&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;order&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Business-emitted snapshot&lt;/td&gt;
&lt;td&gt;Merchant → platform&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The asymmetry matters. On catalog/cart/checkout, the platform writes attribution as it would write a UTM string into a browser URL — referral context flowing forward. On &lt;code&gt;order&lt;/code&gt;, the business preserves the originating attribution as a snapshot — closing the loop between agent-mediated conversion and the platform that produced it.&lt;/p&gt;

&lt;p&gt;Grigorik's framing in the PR is the cleanest one-line summary of intent: the field "carries the same parameters platforms communicate via URL query parameters in browser-based flows, in the same flat key-value form." Attribution in agent-mediated commerce is the agent counterpart of UTM strings. Same parameters, same model, different transport layer.&lt;/p&gt;

&lt;p&gt;Thirteen files changed. The core addition is &lt;code&gt;source/schemas/shopping/types/attribution.json&lt;/code&gt; — the new type definition. Schemas for cart, catalog_lookup, catalog_search, checkout, and order all gain the field as an optional property. Specification docs across cart, catalog, checkout, order, and the overview were updated to describe the field's purpose and semantics.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architectural decision: core field, not extension
&lt;/h2&gt;

&lt;p&gt;The substantively interesting part of this PR is not what got added. It's how it got added.&lt;/p&gt;

&lt;p&gt;PR #391 was Grigorik's alternative proposal to &lt;a href="https://github.com/Universal-Commerce-Protocol/ucp/pull/295" rel="noopener noreferrer"&gt;PR #295&lt;/a&gt;, which James Andersen had opened earlier proposing an &lt;code&gt;event_context&lt;/code&gt; extension. Both proposals tried to solve the same problem — give platforms a way to pass referral/attribution data through to merchants in agent flows — but with very different architectural shapes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;#295 (Andersen, Meta):&lt;/strong&gt; Attribution as a &lt;strong&gt;structured extension&lt;/strong&gt;. Capability-negotiated. Validated against a defined schema. Standardised vocabulary across platforms.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;#391 (Grigorik, Shopify):&lt;/strong&gt; Attribution as a &lt;strong&gt;top-level core field&lt;/strong&gt;. Open key-value map. No capability negotiation. Each platform uses its own conventions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Andersen formally approved Grigorik's alternative — &lt;em&gt;"thanks for finding a better home for attribution data than the original proposal"&lt;/em&gt; — and the rearchitecture went on to merge through TC discussion. That cross-vendor pattern (one TC member proposes; another offers a structurally different alternative; the original proposer endorses it) is the dynamic that produces robust standards rather than fragmented vendor extensions.&lt;/p&gt;

&lt;p&gt;The PR discussion pivots on which architectural shape this kind of data deserves. Amit Handa wrote the canonical comment on May 3 establishing the decision framework — worth quoting because it'll likely be cited as governance precedent in future spec discussions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criterion&lt;/th&gt;
&lt;th&gt;Use a UCP Extension&lt;/th&gt;
&lt;th&gt;Use Optional Flat Key-Value Pairs&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Impact on Behavior&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Changes state or execution of the operation&lt;/td&gt;
&lt;td&gt;Purely informational&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Stability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Stable, standardized vocabulary&lt;/td&gt;
&lt;td&gt;Volatile, platform-specific, rapidly evolving&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Capability Negotiation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Requires mutual agreement + active parent capability&lt;/td&gt;
&lt;td&gt;Best-effort, consumed at-will, no gating&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Schema Validation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Strict — transaction integrity matters&lt;/td&gt;
&lt;td&gt;Flexible — validation happens downstream&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multi-Platform Scale&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Data normalization across diverse platforms&lt;/td&gt;
&lt;td&gt;Low friction; normalization burden on receiver&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Typical Examples&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;discount&lt;/code&gt;, &lt;code&gt;fulfillment&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;attribution&lt;/code&gt;, referral tracking, session tags&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Attribution falls cleanly on the right side of every row. Marketing identifiers (&lt;code&gt;gclid&lt;/code&gt;, &lt;code&gt;fbclid&lt;/code&gt;, &lt;code&gt;ttclid&lt;/code&gt;) are volatile and platform-specific — every adtech vendor invents their own; standardising them in the spec would be obsolete the moment a new platform launches. Attribution doesn't change protocol behaviour — it's read-only context that some downstream pipeline cares about, with no transactional consequence. There's nothing for a merchant to negotiate; either you record it or you don't.&lt;/p&gt;

&lt;p&gt;The merged PR locks this decision in. Future contributors proposing similar volatile, informational, platform-specific data structures now have a precedent: &lt;strong&gt;the spec prefers flat optional key-value pairs over structured extensions for non-state-changing context.&lt;/strong&gt; That's a piece of governance documentation as much as a feature merge, and Handa's table will be the reference for it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The trajectory implication
&lt;/h2&gt;

&lt;p&gt;UCP up to this point has been &lt;strong&gt;protocol mechanics&lt;/strong&gt;. How agents discover stores. How they shop. How they pay. How they identify users. How they handle returns. The mechanics are necessary, but they don't directly produce commercial value for the ecosystem participants. A merchant with a perfectly conformant UCP implementation but no attribution can't measure agent-driven conversions, can't optimise marketing spend, can't close the loop between platform investment and merchant outcomes.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;attribution&lt;/code&gt; closes that loop. With the field in core, the entire adtech infrastructure that powers current ecommerce extends naturally into agent-mediated commerce. Platforms attribute conversions to specific campaigns. Click identifiers persist across the agent flow. Businesses run their existing analytics pipelines on agent-driven traffic with no special handling. The bridge that makes UCP commercially usable for marketing teams — not just engineering teams — now exists in the core spec.&lt;/p&gt;

&lt;p&gt;The trajectory implication is the part worth sitting with: &lt;strong&gt;UCP is evolving from protocol mechanics into commercial infrastructure.&lt;/strong&gt; Each subsequent spec addition probably bridges another piece of existing commerce infrastructure into the agent layer. Loyalty programs. Customer data platforms. Marketing automation triggers. Inventory hooks. Each one makes UCP more complete as commercial infrastructure rather than just protocol mechanics.&lt;/p&gt;

&lt;p&gt;The architectural-precedent decision in #391 makes that trajectory more efficient. Future contributors proposing similar bridges (attribution-adjacent measurement primitives, marketing identifiers, session metadata) now have a clear template: flat key-value pairs into core, governance precedent already established. The spec doesn't need to relitigate the core-vs-extension decision every time a volatile, informational primitive comes up.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it means in practice
&lt;/h2&gt;

&lt;p&gt;For &lt;strong&gt;merchants&lt;/strong&gt;: your UCP implementation should accept the &lt;code&gt;attribution&lt;/code&gt; field on incoming cart, checkout, and catalog requests, preserve it through to order records, and surface it through your analytics pipeline. The lift is small — it's a string-keyed JSON object on existing endpoints — but missing it means agent-driven conversions arrive at your analytics with no source attribution, which means your marketing team can't measure the channel.&lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;platform vendors&lt;/strong&gt; (&lt;a href="https://ucpchecker.com/platforms/shopify" rel="noopener noreferrer"&gt;Shopify&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/woocommerce" rel="noopener noreferrer"&gt;WooCommerce&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/bigcommerce" rel="noopener noreferrer"&gt;BigCommerce&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/magento" rel="noopener noreferrer"&gt;Magento&lt;/a&gt;, and others): rolling attribution support into the next platform-side compatibility release is now table-stakes work. The stores running on your stack will need to accept and preserve attribution by the time the next published spec version makes this part of conformance.&lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;agent platforms&lt;/strong&gt; (those of us building or testing agents that shop UCP stores): pass platform-emitted attribution forward into every cart/checkout/catalog request. The data is informational, not state-changing — your agent doesn't need to do anything with it beyond passing it through. The merchant decides what to do with it on the receive side.&lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;evaluators&lt;/strong&gt; (us): the &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;UCP Score&lt;/a&gt; will incorporate attribution-acceptance and attribution-preservation conformance in its next release. A store that accepts attribution on cart/checkout/catalog and threads it through to order records will score higher than one that drops it. The &lt;a href="https://ucpchecker.com/methodology" rel="noopener noreferrer"&gt;methodology&lt;/a&gt; page will reflect the rule update when the next score-version drops.&lt;/p&gt;

&lt;h2&gt;
  
  
  Timing: in core today, in the published spec next
&lt;/h2&gt;

&lt;p&gt;One important distinction worth making explicit. PR #391 merged into the spec's &lt;code&gt;main&lt;/code&gt; branch — not into a currently-published spec version. The latest released spec is &lt;strong&gt;v2026-04-08&lt;/strong&gt;, which does not include &lt;code&gt;attribution&lt;/code&gt;. The field lands for conformance purposes in whatever the next published spec version ships (no fixed cadence; expected in the next few months). Until then, attribution sits in the working draft on &lt;code&gt;main&lt;/code&gt; — implementers can adopt it ahead of the release if they want, but it's not yet part of conformance for the published spec.&lt;/p&gt;

&lt;p&gt;That distinction shapes how we're rolling out support across our tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpplayground.com/" rel="noopener noreferrer"&gt;UCP Playground&lt;/a&gt;&lt;/strong&gt; will adopt attribution support when the next spec version drops — agents will pass platform attribution through to merchants.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;UCP Score&lt;/a&gt;&lt;/strong&gt; will incorporate attribution-acceptance and attribution-preservation rules in the score release that aligns with the next published spec.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/ucp-validator" rel="noopener noreferrer"&gt;The validator&lt;/a&gt;&lt;/strong&gt; will support the new field as soon as the next spec ships, and the &lt;a href="https://ucpchecker.com/bulk-check" rel="noopener noreferrer"&gt;bulk checker&lt;/a&gt; will surface attribution conformance per-merchant after that.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The architectural certainty is already here — the schema is locked, the field is documented, the design pattern is settled. The spec drop is the &lt;strong&gt;conformance trigger&lt;/strong&gt;, not the design moment. Implementers who start work today against the working draft are operating against a known target.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to read more
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The PR itself: &lt;a href="https://github.com/Universal-Commerce-Protocol/ucp/pull/391" rel="noopener noreferrer"&gt;#391 on Universal-Commerce-Protocol/ucp&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;The merge commit: &lt;a href="https://github.com/Universal-Commerce-Protocol/ucp/commit/76a35394051222bcef8169c9c5c4c03072542a98" rel="noopener noreferrer"&gt;&lt;code&gt;76a3539&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;The new schema type: &lt;a href="https://github.com/Universal-Commerce-Protocol/ucp/blob/main/source/schemas/shopping/types/attribution.json" rel="noopener noreferrer"&gt;&lt;code&gt;source/schemas/shopping/types/attribution.json&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Updated authoring guidance: &lt;a href="https://github.com/Universal-Commerce-Protocol/ucp/blob/main/docs/documentation/schema-authoring.md" rel="noopener noreferrer"&gt;&lt;code&gt;docs/documentation/schema-authoring.md&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  About UCP Checker
&lt;/h2&gt;

&lt;p&gt;UCP Checker is the independent validation and monitoring layer for the &lt;a href="https://ucp.dev" rel="noopener noreferrer"&gt;Universal Commerce Protocol&lt;/a&gt;. We crawl, validate, and grade every public UCP manifest in the open web, run the &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;merchant directory&lt;/a&gt; and the &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;UCP Score&lt;/a&gt;, publish the &lt;a href="https://ucpchecker.com/leaderboard" rel="noopener noreferrer"&gt;leaderboard&lt;/a&gt; and &lt;a href="https://ucpchecker.com/stats" rel="noopener noreferrer"&gt;adoption stats&lt;/a&gt;, and track major spec events like this one as they ship.&lt;/p&gt;

&lt;p&gt;If you're building on UCP and want to know whether your store is ready for the next spec version: &lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;run a check&lt;/a&gt;. If you're tracking the spec's evolution professionally: subscribe to our &lt;a href="https://ucpchecker.com/stats/sample-report" rel="noopener noreferrer"&gt;weekly digest&lt;/a&gt; — we cover spec changes like this one within a week of merge.&lt;/p&gt;

</description>
      <category>ecommerce</category>
      <category>webdev</category>
      <category>ai</category>
      <category>ucp</category>
    </item>
    <item>
      <title>UCP Playground at 1,000+ Agent Sessions: What 16 Models and 97 Real Stores Reveal About AI Shopping</title>
      <dc:creator>Benji Fisher</dc:creator>
      <pubDate>Tue, 05 May 2026 09:11:37 +0000</pubDate>
      <link>https://dev.to/benjifisher/ucp-playground-at-1000-agent-sessions-what-16-models-and-97-real-stores-reveal-about-ai-shopping-155p</link>
      <guid>https://dev.to/benjifisher/ucp-playground-at-1000-agent-sessions-what-16-models-and-97-real-stores-reveal-about-ai-shopping-155p</guid>
      <description>&lt;p&gt;Two and a half months ago we &lt;a href="https://ucpchecker.com/blog/why-we-built-ucp-playground" rel="noopener noreferrer"&gt;published Why We Built UCP Playground&lt;/a&gt;, which closed on 114 agent sessions and an honest acknowledgement that the dataset was thin — most models had single-digit sample sizes, store coverage was uneven, and the headline rates moved meaningfully with every new run. A month later we crossed a different threshold: the &lt;a href="https://ucpchecker.com/blog/first-autonomous-ai-agent-purchase-ucp" rel="noopener noreferrer"&gt;first fully autonomous AI agent purchase through UCP&lt;/a&gt; — a Gemini agent searching, adding to cart, linking identity, paying, and completing checkout at &lt;a href="https://ucpchecker.com/status/houseofparfum.nl" rel="noopener noreferrer"&gt;houseofparfum.nl&lt;/a&gt; without a human past the initial prompt.&lt;/p&gt;

&lt;p&gt;Eighty days on from the first post, and roughly forty days after that autonomous purchase, the dataset is in a different shape:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Over 1,000 agent shopping sessions&lt;/strong&gt; captured end-to-end with full tool-call timelines and replayable event streams&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;16 frontier models&lt;/strong&gt; — every major lab, plus a reasoning-tuned subset&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;97 distinct UCP-enabled stores&lt;/strong&gt; across Shopify, WooCommerce, BigCommerce, Magento, PrestaShop, and custom stacks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;$96,032 of agent-driven cart value&lt;/strong&gt; generated, primarily in USD with a long tail across EUR, GBP, INR, ILS, PKR&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;80 days of run history&lt;/strong&gt; since Feb 14, 2026&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's the reference dataset for this post. Eight findings emerge from it. Most of them survive being scrutinised at the new sample size; one or two reverse the early-data narrative.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 1 — Claude Sonnet 4.5 leads on aggregate checkout rate
&lt;/h2&gt;

&lt;p&gt;With sample sizes now large enough to take seriously, the per-model checkout-rate leaderboard looks like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Share of dataset&lt;/th&gt;
&lt;th&gt;Checkout rate&lt;/th&gt;
&lt;th&gt;Avg tokens&lt;/th&gt;
&lt;th&gt;Avg duration&lt;/th&gt;
&lt;th&gt;Fail rate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/claude-sonnet-4-5" rel="noopener noreferrer"&gt;Claude Sonnet 4.5&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;20.7%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;50.8%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;71,195&lt;/td&gt;
&lt;td&gt;38.1s&lt;/td&gt;
&lt;td&gt;17.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/llama-3-3-70b" rel="noopener noreferrer"&gt;Llama 3.3 70B&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;6.4%&lt;/td&gt;
&lt;td&gt;49.3%&lt;/td&gt;
&lt;td&gt;57,676&lt;/td&gt;
&lt;td&gt;47.7s&lt;/td&gt;
&lt;td&gt;14.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/deepseek-v3-2" rel="noopener noreferrer"&gt;DeepSeek V3.2&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;5.1%&lt;/td&gt;
&lt;td&gt;45.0%&lt;/td&gt;
&lt;td&gt;32,502&lt;/td&gt;
&lt;td&gt;46.0s&lt;/td&gt;
&lt;td&gt;21.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gemini-3-flash" rel="noopener noreferrer"&gt;Gemini 3 Flash&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;12.5%&lt;/td&gt;
&lt;td&gt;44.6%&lt;/td&gt;
&lt;td&gt;46,520&lt;/td&gt;
&lt;td&gt;21.8s&lt;/td&gt;
&lt;td&gt;15.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/grok-4" rel="noopener noreferrer"&gt;Grok 4&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;4.5%&lt;/td&gt;
&lt;td&gt;39.6%&lt;/td&gt;
&lt;td&gt;34,297&lt;/td&gt;
&lt;td&gt;77.1s&lt;/td&gt;
&lt;td&gt;9.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/claude-opus-4-6" rel="noopener noreferrer"&gt;Claude Opus 4.6&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;10.2%&lt;/td&gt;
&lt;td&gt;38.8%&lt;/td&gt;
&lt;td&gt;44,611&lt;/td&gt;
&lt;td&gt;29.7s&lt;/td&gt;
&lt;td&gt;25.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gemini-2-5-flash" rel="noopener noreferrer"&gt;Gemini 2.5 Flash&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;9.9%&lt;/td&gt;
&lt;td&gt;36.8%&lt;/td&gt;
&lt;td&gt;32,394&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;11.8s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;23.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gpt-4o" rel="noopener noreferrer"&gt;GPT-4o&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;5.2%&lt;/td&gt;
&lt;td&gt;29.5%&lt;/td&gt;
&lt;td&gt;32,811&lt;/td&gt;
&lt;td&gt;14.7s&lt;/td&gt;
&lt;td&gt;24.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gemini-3-1-pro" rel="noopener noreferrer"&gt;Gemini 3.1 Pro&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;7.9%&lt;/td&gt;
&lt;td&gt;29.0%&lt;/td&gt;
&lt;td&gt;30,971&lt;/td&gt;
&lt;td&gt;48.7s&lt;/td&gt;
&lt;td&gt;28.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gemini-2-5-pro" rel="noopener noreferrer"&gt;Gemini 2.5 Pro&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;6.4%&lt;/td&gt;
&lt;td&gt;27.6%&lt;/td&gt;
&lt;td&gt;31,566&lt;/td&gt;
&lt;td&gt;34.4s&lt;/td&gt;
&lt;td&gt;22.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gpt-5-2" rel="noopener noreferrer"&gt;GPT-5.2&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;4.7%&lt;/td&gt;
&lt;td&gt;23.6%&lt;/td&gt;
&lt;td&gt;30,585&lt;/td&gt;
&lt;td&gt;37.4s&lt;/td&gt;
&lt;td&gt;27.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/deepseek-r1" rel="noopener noreferrer"&gt;DeepSeek R1&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;1.4%&lt;/td&gt;
&lt;td&gt;17.6%&lt;/td&gt;
&lt;td&gt;35,360&lt;/td&gt;
&lt;td&gt;61.4s&lt;/td&gt;
&lt;td&gt;29.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/o4-mini" rel="noopener noreferrer"&gt;o4-mini&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;1.4%&lt;/td&gt;
&lt;td&gt;12.5%&lt;/td&gt;
&lt;td&gt;64,055&lt;/td&gt;
&lt;td&gt;38.1s&lt;/td&gt;
&lt;td&gt;37.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/grok-3-mini" rel="noopener noreferrer"&gt;Grok 3 Mini&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;1.7%&lt;/td&gt;
&lt;td&gt;10.0%&lt;/td&gt;
&lt;td&gt;58,386&lt;/td&gt;
&lt;td&gt;55.6s&lt;/td&gt;
&lt;td&gt;35.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/qwq-32b" rel="noopener noreferrer"&gt;QwQ 32B&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;2.0%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.0%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;25,525&lt;/td&gt;
&lt;td&gt;63.9s&lt;/td&gt;
&lt;td&gt;50.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Claude Sonnet 4.5 leads on aggregate checkout rate at 50.8% on the largest single share of the dataset — a sample large enough that the rank ordering is no longer noise. Llama 3.3 70B sits a fraction below at 49.3% on a smaller but still meaningful share. The two are statistically tied; both are operating in a different regime than the rest of the field.&lt;/p&gt;

&lt;p&gt;The most interesting result on this table is &lt;strong&gt;GPT-5.2&lt;/strong&gt;, which at 23.6% lands in the bottom third despite being one of the most capable frontier models on essentially every public benchmark. The gap between its performance on standard reasoning benchmarks and its performance on transactional shopping flows is the single largest delta in the leaderboard. We dig into why in the development notes below.&lt;/p&gt;

&lt;p&gt;One caveat worth flagging up-front: GPT-5.2's 23.6% figure reflects performance across the full 80-day window, including the period before our cursor-stripping fix landed mid-dataset. Sessions after that fix show GPT-5.2 performing meaningfully more competitively. We'll publish the longitudinal split in the August update — the aggregate number above is the worst-case read.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 2 — Reasoning-tuned models continue to underperform
&lt;/h2&gt;

&lt;p&gt;The cohort of reasoning-tuned models (DeepSeek R1, o4-mini, Grok 3 Mini, QwQ 32B) sits unambiguously at the bottom of the leaderboard. Three of them are in the bottom four overall. QwQ 32B has yet to record a single completed checkout across its share of the dataset.&lt;/p&gt;

&lt;p&gt;The pattern was visible in the &lt;a href="https://ucpchecker.com/blog/ucp-playground-evals" rel="noopener noreferrer"&gt;original four-session sample report&lt;/a&gt; shipped with the eval-framework launch in April; it has only sharpened as the dataset grew two orders of magnitude. The pattern is consistent across labs and across architectures (chain-of-thought variants, exploratory reasoning, distilled-from-frontier models — all underperform on shopping flows compared to their non-reasoning counterparts from the same lab).&lt;/p&gt;

&lt;p&gt;The working hypothesis remains: shopping requires fast tool-use rhythm, not deliberation. The decisions in a shopping sequence — search this term, add this item, proceed to checkout — are individually shallow but happen in series. A reasoning model that pauses to deliberate at each step burns clock time and tokens on decisions that don't reward deliberation. Combined with reasoning models' tendency to over-question their own outputs, the result is sessions that hit &lt;code&gt;max_turns_exceeded&lt;/code&gt; before completing.&lt;/p&gt;

&lt;p&gt;Worth noting what isn't in this hypothesis: reasoning models are not bad at commerce in general. They may be excellent at higher-stakes flows — disputed transactions, multi-step contractual reasoning, regulatory edge cases — that the current eval workload doesn't probe. The benchmark says: when the workload is "shop normally," fast non-reasoning models win. Other workloads will tell different stories.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 3 — Speed and accuracy aren't correlated
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://ucpplayground.com/models/gemini-2-5-flash" rel="noopener noreferrer"&gt;Gemini 2.5 Flash&lt;/a&gt; finishes the average shopping session in &lt;strong&gt;11.8 seconds&lt;/strong&gt; — the only model in the field under 15s. Its checkout rate is 36.8% — middling. &lt;a href="https://ucpplayground.com/models/claude-sonnet-4-5" rel="noopener noreferrer"&gt;Claude Sonnet 4.5&lt;/a&gt; takes 38.1s on average and lands a 50.8% checkout rate — the highest on the leaderboard, at more than triple Flash's clock time.&lt;/p&gt;

&lt;p&gt;Two real surfaces: &lt;strong&gt;latency-bound use cases&lt;/strong&gt; (voice agents, mobile commerce, conversational checkout where the user is waiting in real time) effectively must use Gemini 2.5 Flash or Gemini 3 Flash, and pay for the latency win with lower closed-checkout rates. &lt;strong&gt;Throughput-bound use cases&lt;/strong&gt; (batch agents, scheduled buying, autonomous shopping where wall-clock time is mostly hidden) should use Claude Sonnet 4.5 or Llama 3.3 70B and accept the latency cost for the conversion lift.&lt;/p&gt;

&lt;p&gt;The naive intuition merchants reach for — "the better model is faster and more accurate" — doesn't survive contact with this data. The two axes are essentially independent within this corpus. That's a finding nobody can extract from a single-model demo or a vendor benchmark.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 4 — The failure mode taxonomy is dominated by tool errors, not model refusals
&lt;/h2&gt;

&lt;p&gt;Across the 256 failed sessions in the dataset, the categorised error taxonomy is:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Error type&lt;/th&gt;
&lt;th&gt;Sessions&lt;/th&gt;
&lt;th&gt;% of categorised failures&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;openrouter_error&lt;/code&gt; (provider-side)&lt;/td&gt;
&lt;td&gt;51&lt;/td&gt;
&lt;td&gt;56%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;model_refused&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;td&gt;24%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;max_turns_exceeded&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;20%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The single-largest categorised failure mode is &lt;strong&gt;provider-side errors&lt;/strong&gt; — the routing layer between the agent and the model returning a non-200 before the session can complete. This is a cost of operating at scale across 16 models and reflects the still-maturing infrastructure underneath frontier-model API access, not anything specific to UCP.&lt;/p&gt;

&lt;p&gt;The second-largest, &lt;strong&gt;model refusals&lt;/strong&gt;, is more interesting. Twenty-two refusals across the dataset is a refusal rate of roughly 2%. We see refusals concentrated in two situations: (1) sessions against demo stores with unusual product names that pattern-match a model's safety filters, and (2) sessions where the user prompt contains adversarial content seeded by us as part of a prompt-injection eval. We've recorded &lt;strong&gt;6/6 prompt-injection resistance&lt;/strong&gt; across the dedicated injection-eval runs to date, so the model_refused category is partly capturing models doing exactly what they should.&lt;/p&gt;

&lt;p&gt;The third, &lt;strong&gt;max_turns_exceeded&lt;/strong&gt;, is concentrated in the reasoning-model cohort and is the empirical signal for the over-deliberation pattern in Finding 2.&lt;/p&gt;

&lt;p&gt;The remaining 165 failures don't carry a categorised error_type — typically these are sessions where the model abandoned the flow without raising an explicit error. That's a tagging gap in the framework that we're closing in the next iteration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 5 — Store implementation explains most of the cross-store variance
&lt;/h2&gt;

&lt;p&gt;The benchmark's most strategically important finding doesn't come from the per-model column. It comes from the per-store one.&lt;/p&gt;

&lt;p&gt;Across the 97 stores in the dataset, the same model produces dramatically different outcomes. Between the most agent-friendly and least agent-friendly implementations at meaningful sample sizes, the checkout-rate spread exceeds &lt;strong&gt;60 percentage points&lt;/strong&gt; — wider than any model-versus-model gap on the leaderboard. &lt;strong&gt;No model in the field, at any sample size, produces a 60-point spread purely on its own merits.&lt;/strong&gt; Almost all of that variance is store-side, and the rigorous run history across thousands of sessions makes the pattern hard to attribute to anything else.&lt;/p&gt;

&lt;p&gt;The cleanest predictor we've found is whether the store's MCP implementation is &lt;strong&gt;stateless&lt;/strong&gt; or &lt;strong&gt;stateful&lt;/strong&gt;, and how it handles the boundary between them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stateless implementations&lt;/strong&gt; treat every tool call as self-contained. Cart state lives in the agent's context, or in opaque tokens the agent threads through. Identity is established once and re-asserted on each call. The agent doesn't have to remember anything the server is also remembering, because the server isn't remembering anything. Stores running stateless implementations cluster at the high end of the checkout-rate distribution — frontier agents work well against them because there's no hidden contract; what's in the response is the entire state.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stateful implementations&lt;/strong&gt; persist server-side session, cart, and auth across calls, exposed to the agent through session IDs, cookies, or scoped tokens. When this works, it works well. When it breaks — session expiry mid-flow, cart drift between a read and a subsequent write, identity tokens that silently lose scope between tool calls — it produces the failure modes that cluster at the bottom of the per-store distribution. The agent calls a tool the server has quietly desynced from, and the flow fails in ways that don't surface until checkout.&lt;/p&gt;

&lt;p&gt;The hybrid case is the most error-prone: stores that are stateless in some tools and stateful in others, without making the boundary explicit in the manifest or the tool response shapes. Frontier agents have no way to infer which category any individual call falls into and tend to default to the stateless assumption — which is exactly the wrong default for the calls that aren't.&lt;/p&gt;

&lt;p&gt;Beyond the state axis, the rigorous testing surfaces a consistent set of secondary trip-wires: variant IDs without human-readable axis labels, description strings exceeding 8K tokens for a single product, tool responses including nested HTML in fields agents expect to be plain text, cart endpoints returning success codes for failed mutations. None of these break &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;UCP Score&lt;/a&gt; validation. All of them break agent flows.&lt;/p&gt;

&lt;p&gt;These are merchant-side fixes, not model-side ones. The strategic implication for any team operating a UCP-enabled store: &lt;strong&gt;fixing your manifest and tool responses produces more conversion lift than choosing the right model.&lt;/strong&gt; That's load-bearing — it's why the integrated &lt;a href="https://ucpchecker.com/blog/ucp-playground-evals#how-evals-fit-the-broader-development-cycle" rel="noopener noreferrer"&gt;Score → Check → Eval workflow&lt;/a&gt; exists, and it's where we'd point a team starting from zero on UCP.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 6 — Cart value generated is concentrated in USD and high-AOV verticals
&lt;/h2&gt;

&lt;p&gt;Of the 1,000+ sessions, 96 produced a non-zero cart value. The breakdown:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Currency&lt;/th&gt;
&lt;th&gt;Sessions&lt;/th&gt;
&lt;th&gt;Total cart value&lt;/th&gt;
&lt;th&gt;Avg cart value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;USD&lt;/td&gt;
&lt;td&gt;85&lt;/td&gt;
&lt;td&gt;$95,647.23&lt;/td&gt;
&lt;td&gt;$1,125.26&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;INR&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;₹3,845.00&lt;/td&gt;
&lt;td&gt;₹1,922.50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PKR&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;₨4,490.00&lt;/td&gt;
&lt;td&gt;₨2,245.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EUR&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;€296.74&lt;/td&gt;
&lt;td&gt;€59.35&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ILS&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;₪189.60&lt;/td&gt;
&lt;td&gt;₪189.60&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GBP&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;£47.99&lt;/td&gt;
&lt;td&gt;£24.00&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;USD cart value totals &lt;strong&gt;$95,647 across 85 sessions&lt;/strong&gt; with an average cart value of $1,125. That figure is heavily skewed by a small number of high-AOV sessions against electronics and high-end apparel stores; the median session cart value is closer to $240. We don't yet have the granularity to break out cart value by store type or model — that's a feature in the eval reporting roadmap.&lt;/p&gt;

&lt;p&gt;The cross-currency long tail (EUR/GBP/INR/PKR/ILS) is small but informative. It tells us the framework is handling multi-currency stores correctly end-to-end, including currency-aware variant pricing and locale-correct checkout flows. Worth noting because it's a class of bug that doesn't surface until you actually transact.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 7 — Session volume is now meaningful enough to reveal trajectory
&lt;/h2&gt;

&lt;p&gt;Plotted week-over-week, session volume has three distinct phases over the 80-day window:&lt;/p&gt;

&lt;p&gt;UCP Playground weekly session volume, mid-February through late April 2026Trend line showing three phases: a small founding wave in mid-February, a steady-state oscillation through March and mid-April, and a sharp acceleration in late April that produces the largest single week of the dataset.Feb 14Apr 27&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Founding wave (mid-February).&lt;/strong&gt; A small launch surge coinciding with the &lt;a href="https://ucpchecker.com/blog/why-we-built-ucp-playground" rel="noopener noreferrer"&gt;Why We Built UCP Playground&lt;/a&gt; post — first publishers running first sessions, signal that the framework worked end-to-end against real stores.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Steady state (March through mid-April).&lt;/strong&gt; Weekly volume oscillating in a tight band as more frontier models came online and the eval framework matured. Some weeks heavier than others, but the median stayed roughly flat — characteristic of a tool finding its operational rhythm.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Acceleration (late April).&lt;/strong&gt; The largest single week of the dataset, driven mostly by a batch of &lt;a href="https://ucpchecker.com/blog/ucp-playground-evals" rel="noopener noreferrer"&gt;eval-collection runs&lt;/a&gt; against stores onboarded after the council expansion announcement. The line bends upward at the end of the window.&lt;/p&gt;

&lt;p&gt;The trajectory matters mostly because it lets us start tracking model drift. With several thousand more sessions accumulating over the next quarter, we'll be able to observe how the same model performs against the same store between Q2 and Q3 — the loop that turns the framework from a one-shot benchmark into an actual reliability record.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 8 — The 0.2% flawless-end-to-end rate has improved, slightly
&lt;/h2&gt;

&lt;p&gt;The April &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-april-2026" rel="noopener noreferrer"&gt;State of Agentic Commerce report&lt;/a&gt; flagged that of 4,014 verified UCP stores, only 9 delivered a flawless end-to-end agent shopping experience. That's the 0.2% figure that's been quoted around the launch posts — measured by static validation across the full directory.&lt;/p&gt;

&lt;p&gt;Eighty days later, with 97 stores tested directly through the eval framework, roughly &lt;strong&gt;0.5–0.7%&lt;/strong&gt; reach the same bar. That's a higher rate, though the comparison isn't apples-to-apples: direct testing surfaces issues that static validation misses (most of the failure modes in this post fall into that category), and the sample composition has shifted toward more deliberately UCP-aware merchants over the period. The honest read is that the rate looks better and the comparison's loose enough that we'd want a same-methodology re-run on the full directory to call it a real improvement.&lt;/p&gt;

&lt;p&gt;What we can say cleanly: for every store running a clean, agent-friendly UCP implementation, there are still 100+ that pass conformance but stumble somewhere in the agent flow. The gap continues to be on the merchant side. We haven't yet seen a model-side improvement large enough to close meaningful ground on it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Playground stays neutral
&lt;/h2&gt;

&lt;p&gt;Every finding above hinges on one design choice: the system prompt and the orchestration loop are &lt;strong&gt;generic&lt;/strong&gt;. Same for every model. Same for every store. No store-specific scaffolding, no model-specific workarounds. That's what makes the framework work as a testing environment.&lt;/p&gt;

&lt;p&gt;The temptation to add a workaround when a particular model trips on a particular store is real — there's almost always a one-line patch that would push that store's checkout rate up by ten points against that one model. We don't ship those patches, on principle. The moment we do, the results stop being comparable across the matrix and we're not benchmarking anymore — we're tuning. Vendor stacks already do that work, in vendor-flavoured ways, with vendor-shaped numbers.&lt;/p&gt;

&lt;p&gt;Independence here means a specific thing: &lt;strong&gt;the orchestration is neutral, the protocol layer is full-featured.&lt;/strong&gt; Stores get the tools they declare. Identity linking works. Payment handlers pass through. Multi-turn context flows the way the &lt;a href="https://ucpchecker.com/specs" rel="noopener noreferrer"&gt;spec&lt;/a&gt; defines. What stays generic is the harness around that — the prompts, the turn discipline, the success criteria, the error-handling rhythm.&lt;/p&gt;

&lt;p&gt;The reason that design choice matters can be put in two sentences:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If a model doesn't follow the checkout flow, that's signal about the model.&lt;/li&gt;
&lt;li&gt;If a store returns the wrong status, that's signal about the store.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both signals are useful. Both are visible because the orchestration didn't paper over either one. Hiding either defeats the purpose of running the test.&lt;/p&gt;

&lt;p&gt;Companies building their own internal infrastructure to evaluate agent behaviour against their own stores is expected, and good. Every serious commerce platform will eventually have something like that running in CI against its own merchants — and the &lt;a href="https://ucpchecker.com/blog/ucp-playground-evals#how-evals-fit-the-broader-development-cycle" rel="noopener noreferrer"&gt;Score → Check → Eval workflow&lt;/a&gt; is exactly the surface they should plug into. But the comparison layer — the one that asks how Anthropic's frontier model performs against the same workload Google's, OpenAI's, xAI's, DeepSeek's, and Meta's are also running, against the same stores — has to sit outside all of those organisations. &lt;strong&gt;Vendors can't credibly benchmark themselves; the platform layer has the same problem one level down.&lt;/strong&gt; Independence is the only way the comparisons aggregate into a record anyone can quote.&lt;/p&gt;

&lt;p&gt;That's the niche this layer occupies. The leaderboard, the failure-mode taxonomy, the store-side variance pattern in this post only hold up if the orchestration stays neutral. The moment it doesn't, the framework loses the property that made any of it worth publishing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we learned building this
&lt;/h2&gt;

&lt;p&gt;The framework didn't ship in May the same shape it shipped in February. Eighty days of running it against real stores produced a steady stream of bugs and surprises that drove the development work — many of them documented in the &lt;a href="https://ucpplayground.com/changelog" rel="noopener noreferrer"&gt;public changelog&lt;/a&gt;. Five worth surfacing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cursor stripping unlocked GPT-5.2 search.&lt;/strong&gt; Through February we had &lt;a href="https://ucpplayground.com/models/gpt-5-2" rel="noopener noreferrer"&gt;GPT-5.2&lt;/a&gt; at a 0% search success rate on Shopify stores. The cause was a model-side tic: GPT-5.2 always included the optional &lt;code&gt;after&lt;/code&gt; cursor parameter on &lt;code&gt;search_shop_catalog&lt;/code&gt; calls, filling it with placeholders like &lt;code&gt;""&lt;/code&gt;, &lt;code&gt;"null"&lt;/code&gt;, or &lt;code&gt;"__NONE__"&lt;/code&gt; — values Shopify always rejects. A server-side sanitizer that strips invalid placeholders before the call leaves Playground pushed GPT-5.2's search success from 0% to 100% overnight. The model wasn't bad at search; it had a tool-calling habit nobody had isolated yet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Failed tool calls used to inflate conversion metrics.&lt;/strong&gt; An earlier version of step detection counted a failed &lt;code&gt;update_cart&lt;/code&gt; as a &lt;code&gt;cart_created&lt;/code&gt; completion. That bug inflated the cart and conversion numbers on every report we'd published before mid-March. Fixed in 0.9.3 by gating step detection on the tool response's &lt;code&gt;isError&lt;/code&gt; flag, plus the same gate on cart-data extraction. The per-model checkout rates in this post are computed under the corrected logic; older snapshots from before that fix may read 5–10 points high on the conversion-side metrics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;REST-only stores forced a transport rework.&lt;/strong&gt; The &lt;a href="https://ucpchecker.com/blog/ucp-v2026-04-08-spec-update" rel="noopener noreferrer"&gt;v2026-04-08 spec drop&lt;/a&gt; in early April brought new tool names (&lt;code&gt;search_catalog&lt;/code&gt; replacing &lt;code&gt;search_shop_catalog&lt;/code&gt;), new response shapes (price as &lt;code&gt;{amount, currency}&lt;/code&gt; objects, descriptions as &lt;code&gt;{plain, html}&lt;/code&gt; objects), and a wave of WooCommerce stores that exposed REST-only endpoints rather than MCP. The 0.10.x release line was mostly absorbing that — REST-only store support, a REST tool-call adapter, response-format normalization across spec versions. Pre-04-08 sessions and v2026-04-08 sessions are both in the dataset and tagged appropriately, which is what lets the longitudinal data hold together across a non-trivial spec change.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The GPay token wall built ECP.&lt;/strong&gt; In a February session, &lt;a href="https://ucpplayground.com/models/claude-sonnet-4-5" rel="noopener noreferrer"&gt;Claude Sonnet 4.5&lt;/a&gt; reached &lt;code&gt;ready_for_complete&lt;/code&gt; correctly — and stalled, because the merchant's checkout required a Google Pay payment token the agent couldn't produce. That's the genuine limit: agents shop through the protocol layer cleanly but stop at the secure-credential boundary. The Embedded Commerce Protocol shipped in 0.8.0 to hand control to the merchant's checkout UI at exactly that boundary and resume agent control once the user completes the credential step. A feature directly driven by a finding the framework couldn't have surfaced any other way.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A Playground session became a spec proposal.&lt;/strong&gt; A live test against &lt;a href="https://ucpchecker.com/status/houseofparfum.nl" rel="noopener noreferrer"&gt;houseofparfum.nl&lt;/a&gt; exposed a different gap: an identity-linked buyer with a wallet balance hit the checkout, the OAuth flow completed cleanly, the buyer object came back populated — but the wallet was nowhere the agent could see it. &lt;code&gt;payment.instruments&lt;/code&gt; was empty, the only declared handler (&lt;code&gt;dev.ucp.delegate_payment&lt;/code&gt;) didn't accept the wallet, and the session escalated to the merchant's continue_url every time. Authenticated checkout was provably blocked, by spec. We wrote it up and submitted &lt;a href="https://github.com/Universal-Commerce-Protocol/ucp/issues/358" rel="noopener noreferrer"&gt;Proposal #358 to the UCP spec repository&lt;/a&gt; — &lt;code&gt;payment.available_instruments&lt;/code&gt;, a per-buyer per-session list of usable payment methods (wallet, saved cards, loyalty, gift cards) resolved at runtime from the identity-linked session. Submitted by Benji Fisher (&lt;a href="https://github.com/appdrops" rel="noopener noreferrer"&gt;@appdrops&lt;/a&gt;) and co-authored with Almin Zolotic (&lt;a href="https://github.com/zologic" rel="noopener noreferrer"&gt;@zologic&lt;/a&gt;) of UCPReady, who'd seen the same wall from the merchant side. Currently submitted to the UCP technical council for review. That's the loop the framework is built to feed: multi-store, multi-model testing surfaces a structural gap; the gap goes back into spec governance as a concrete proposal; the next spec drop closes it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Methodology, briefly
&lt;/h2&gt;

&lt;p&gt;Each session is a real frontier-model agent shopping run against a real UCP-enabled store, captured end-to-end via MCP tool calls. Sessions are initiated either through the public &lt;a href="https://ucpplayground.com/playground" rel="noopener noreferrer"&gt;Playground UI&lt;/a&gt; (user-initiated, ad-hoc prompts) or through the &lt;a href="https://ucpplayground.com/evals" rel="noopener noreferrer"&gt;Evals framework&lt;/a&gt; (scripted multi-turn sequences across pre-selected store/model matrices).&lt;/p&gt;

&lt;p&gt;Outcomes are tagged at session close: &lt;code&gt;checkout_reached&lt;/code&gt; (full transaction completion), &lt;code&gt;cart_created&lt;/code&gt; (added items, didn't proceed), &lt;code&gt;search_only&lt;/code&gt; (browsed, didn't add), &lt;code&gt;failed&lt;/code&gt; (provider error, model refusal, or max-turn exceeded), or &lt;code&gt;info_provided&lt;/code&gt; (informational query, no transactional intent).&lt;/p&gt;

&lt;p&gt;Every session has a clickable replay link in its source ULID. If you want to audit any single number in this post, the underlying session data is the artifact. That's intentional — independent reproducibility is the point.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;Three concrete next steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Run a benchmark against your own store.&lt;/strong&gt; Create a collection at &lt;a href="https://ucpplayground.com/evals" rel="noopener noreferrer"&gt;ucpplayground.com/evals&lt;/a&gt;, pick a sequence, pick two models, and compare your store's per-model performance against the aggregate above.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;See where individual models stand.&lt;/strong&gt; Each model on the leaderboard has its own &lt;a href="https://ucpplayground.com/models" rel="noopener noreferrer"&gt;shopping profile&lt;/a&gt; with detailed performance data, known issues, and store-by-store breakdowns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compare two models head-to-head.&lt;/strong&gt; The &lt;a href="https://ucpplayground.com/models/compare?models=claude-sonnet-4-5%2Cgemini-3-flash" rel="noopener noreferrer"&gt;comparison view&lt;/a&gt; lets you pit any two models against each other on the same workload — useful before you commit to a primary model for a deployment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The next data update — likely 2,000+ sessions, refreshed model lineup, and a fuller error-tagging surface — drops in early August.&lt;/p&gt;

</description>
      <category>ecommerce</category>
      <category>webdev</category>
      <category>ai</category>
      <category>data</category>
    </item>
    <item>
      <title>UCP Requirements: What Your Store Needs Before Going Live</title>
      <dc:creator>Benji Fisher</dc:creator>
      <pubDate>Mon, 04 May 2026 12:23:16 +0000</pubDate>
      <link>https://dev.to/benjifisher/ucp-requirements-what-your-store-needs-before-going-live-9ag</link>
      <guid>https://dev.to/benjifisher/ucp-requirements-what-your-store-needs-before-going-live-9ag</guid>
      <description>&lt;p&gt;What do you need for UCP? There are two levels of UCP readiness. The first is the &lt;strong&gt;minimum viable manifest&lt;/strong&gt; — the bare requirements to pass validation and appear in the &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;UCP directory&lt;/a&gt;. The second is the &lt;strong&gt;agent-ready setup&lt;/strong&gt; — what it actually takes for an AI agent to browse, cart, and check out at your store without friction.&lt;/p&gt;

&lt;p&gt;Think of this as your UCP checklist — the minimum requirements plus the recommended prerequisites that separate stores agents can find from stores agents can actually shop. Most guides only cover the first level. This one covers both, grounded in data from &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;4,024 verified merchants&lt;/a&gt; and hundreds of &lt;a href="https://ucpplayground.com" rel="noopener noreferrer"&gt;agent testing sessions&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Minimum requirements (pass validation)
&lt;/h2&gt;

&lt;p&gt;These are the fields required to produce a valid UCP manifest on the current &lt;a href="https://ucpchecker.com/specs/2026-04-08" rel="noopener noreferrer"&gt;v2026-04-08 spec&lt;/a&gt;:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. A JSON file at /.well-known/ucp
&lt;/h3&gt;

&lt;p&gt;The manifest must be publicly accessible at &lt;code&gt;https://yourdomain.com/.well-known/ucp&lt;/code&gt;, served with &lt;code&gt;Content-Type: application/json&lt;/code&gt;, and reachable without authentication.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform notes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://ucpchecker.com/platforms/shopify" rel="noopener noreferrer"&gt;Shopify&lt;/a&gt;: handled automatically&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ucpchecker.com/platforms/woocommerce" rel="noopener noreferrer"&gt;WooCommerce&lt;/a&gt;: manual publish via plugin or custom route&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ucpchecker.com/platforms/bigcommerce" rel="noopener noreferrer"&gt;BigCommerce&lt;/a&gt;: manual, served from storefront origin&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ucpchecker.com/platforms/magento" rel="noopener noreferrer"&gt;Magento&lt;/a&gt;: manual, typically via custom module&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Full publishing guide with code examples: &lt;a href="https://ucpchecker.com/well-known-ucp" rel="noopener noreferrer"&gt;/.well-known/ucp developer reference&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. ucp.version (required)
&lt;/h3&gt;

&lt;p&gt;A string identifying which spec version the manifest is written against. Current latest: &lt;code&gt;"2026-04-08"&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-april-2026" rel="noopener noreferrer"&gt;99.4% of verified stores&lt;/a&gt; are on this version. If you're starting fresh, use it. If you're on an older version, the &lt;a href="https://ucpchecker.com/blog/ucp-v2026-04-08-spec-update" rel="noopener noreferrer"&gt;spec update post&lt;/a&gt; walks through the migration.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. ucp.services (required)
&lt;/h3&gt;

&lt;p&gt;At least one service entry declaring a transport (&lt;code&gt;mcp&lt;/code&gt;, &lt;code&gt;rest&lt;/code&gt;, &lt;code&gt;a2a&lt;/code&gt;, or &lt;code&gt;embedded&lt;/code&gt;) and an endpoint URL. This tells agents where to send requests.&lt;/p&gt;

&lt;p&gt;MCP is the dominant transport — &lt;a href="https://ucpchecker.com/transports" rel="noopener noreferrer"&gt;~100% of verified stores declare it&lt;/a&gt;. If you're building from scratch, start with MCP. See the &lt;a href="https://ucpchecker.com/transports" rel="noopener noreferrer"&gt;transport comparison&lt;/a&gt; for the tradeoffs.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. ucp.payment_handlers (required)
&lt;/h3&gt;

&lt;p&gt;A map of payment handler namespaces. Can be an empty object &lt;code&gt;{}&lt;/code&gt; if your store uses checkout-link redirects instead of tokenized payments (common on &lt;a href="https://ucpchecker.com/blog/woocommerce-ucp-guide-ai-agent-commerce" rel="noopener noreferrer"&gt;WooCommerce&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;If you declare handlers, use reverse-domain namespaces like &lt;code&gt;com.stripe.card&lt;/code&gt; or &lt;code&gt;dev.shopify.card&lt;/code&gt;. See the &lt;a href="https://ucpchecker.com/payment-handlers" rel="noopener noreferrer"&gt;payment handlers directory&lt;/a&gt; for examples.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. signing_keys (required, at root level)
&lt;/h3&gt;

&lt;p&gt;An array of JWK objects at the &lt;strong&gt;document root&lt;/strong&gt; (not nested inside &lt;code&gt;ucp&lt;/code&gt;). An empty array &lt;code&gt;[]&lt;/code&gt; is valid if you're not signing payloads yet, but the key must be present.&lt;/p&gt;

&lt;p&gt;This field moved from &lt;code&gt;ucp.signing_keys&lt;/code&gt; to the root in v2026-04-08 — the most &lt;a href="https://ucpchecker.com/blog/common-ucp-errors" rel="noopener noreferrer"&gt;common validation warning&lt;/a&gt; we see is stores that still nest it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recommended setup (agent-ready)
&lt;/h2&gt;

&lt;p&gt;Passing validation gets you into the directory. The requirements below determine whether agents can actually &lt;em&gt;shop&lt;/em&gt; your store — the difference between a &lt;a href="https://ucpchecker.com/leaderboard" rel="noopener noreferrer"&gt;B+ grade and an A grade&lt;/a&gt; in our benchmarks.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Capabilities declaration
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;ucp.capabilities&lt;/code&gt; field is optional per spec but strongly recommended. Without it, agents know your store exists but not what it can do.&lt;/p&gt;

&lt;p&gt;Declare every capability you support:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/capabilities/checkout" rel="noopener noreferrer"&gt;checkout&lt;/a&gt;&lt;/strong&gt; — 99.5% adoption across verified stores&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/capabilities/cart" rel="noopener noreferrer"&gt;cart&lt;/a&gt;&lt;/strong&gt; — 99.1% adoption&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/capabilities/catalog-search" rel="noopener noreferrer"&gt;catalog-search&lt;/a&gt;&lt;/strong&gt; — required for &lt;a href="https://ucpchecker.com/product-discovery" rel="noopener noreferrer"&gt;product discovery&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/capabilities/identity-linking" rel="noopener noreferrer"&gt;identity-linking&lt;/a&gt;&lt;/strong&gt; — 3 stores, massive first-mover opportunity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/capabilities/payment" rel="noopener noreferrer"&gt;payment&lt;/a&gt;&lt;/strong&gt; — 0 stores, the frontier&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Full list: &lt;a href="https://ucpchecker.com/capabilities" rel="noopener noreferrer"&gt;capability registry&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Clean variant data
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://ucpchecker.com/blog/agentic-commerce-optimization-ucp-readiness-data" rel="noopener noreferrer"&gt;Variant mismatches are the #1 failure mode&lt;/a&gt; in agent shopping sessions. Every variant needs a stable ID, a clear name, and consistent representation across discovery and checkout. This is the single highest-impact fix you can make.&lt;/p&gt;

&lt;h3&gt;
  
  
  8. Responsive MCP endpoint
&lt;/h3&gt;

&lt;p&gt;Latency matters. The average &lt;a href="https://ucpchecker.com/platforms/shopify" rel="noopener noreferrer"&gt;Shopify store&lt;/a&gt; responds in ~130ms. &lt;a href="https://ucpchecker.com/platforms/bigcommerce" rel="noopener noreferrer"&gt;BigCommerce stores&lt;/a&gt; average ~890ms. Agents have timeout budgets — if your endpoint is slow, sessions drop silently. Target under 500ms for tool responses.&lt;/p&gt;

&lt;h3&gt;
  
  
  9. robots.txt allowing AI crawlers
&lt;/h3&gt;

&lt;p&gt;Make sure &lt;code&gt;/.well-known/ucp&lt;/code&gt; is explicitly allowed in your robots.txt. Some WAFs and CDN configurations block well-known paths by default. Check the &lt;a href="https://ucpchecker.com/blog/common-ucp-errors" rel="noopener noreferrer"&gt;common errors guide&lt;/a&gt; for the fix.&lt;/p&gt;

&lt;h3&gt;
  
  
  10. Supported_versions for backward compatibility
&lt;/h3&gt;

&lt;p&gt;Declare &lt;code&gt;supported_versions&lt;/code&gt; in your manifest listing both the current and previous spec version. This lets agents that haven't migrated yet still find a valid endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"supported_versions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"2026-04-08"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://yourstore.com/.well-known/ucp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"2026-01-23"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://yourstore.com/.well-known/ucp/2026-01-23"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The UCP readiness checklist
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Requirement&lt;/th&gt;
&lt;th&gt;Required?&lt;/th&gt;
&lt;th&gt;% of stores that have it&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Manifest at /.well-known/ucp&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;100% (by definition)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ucp.version&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ucp.services with transport + endpoint&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ucp.payment_handlers&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;signing_keys at root&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;~97% (rest have it nested)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ucp.capabilities&lt;/td&gt;
&lt;td&gt;Recommended&lt;/td&gt;
&lt;td&gt;~99% (Shopify default)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Clean variant data&lt;/td&gt;
&lt;td&gt;Recommended&lt;/td&gt;
&lt;td&gt;Unknown (runtime issue)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Latency &amp;lt; 500ms&lt;/td&gt;
&lt;td&gt;Recommended&lt;/td&gt;
&lt;td&gt;~95% (Shopify), ~30% (others)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;robots.txt allows /.well-known/ucp&lt;/td&gt;
&lt;td&gt;Recommended&lt;/td&gt;
&lt;td&gt;~99%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;supported_versions&lt;/td&gt;
&lt;td&gt;Recommended&lt;/td&gt;
&lt;td&gt;~70%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Validate your setup
&lt;/h2&gt;

&lt;p&gt;Not sure if you pass? Start with &lt;a href="https://ucpchecker.com/blog/is-my-store-ucp-ready" rel="noopener noreferrer"&gt;Is My Store UCP Ready?&lt;/a&gt; — it walks through the full diagnostic in 60 seconds. Or jump straight to the tool:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;Run a live check&lt;/a&gt; on your domain — it tests every requirement above in seconds. For runtime issues (variant mismatches, checkout failures), &lt;a href="https://ucpplayground.com" rel="noopener noreferrer"&gt;test with real agents in Playground&lt;/a&gt;. For ongoing monitoring, &lt;a href="https://ucpchecker.com/alerts" rel="noopener noreferrer"&gt;set up alerts&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;Once you're verified, make sure your listing on &lt;a href="https://ucpregistry.com" rel="noopener noreferrer"&gt;UCP Registry&lt;/a&gt; is accurate — that's what agents see when deciding which stores to route customers to. And if you're a developer building agents rather than stores, the &lt;a href="https://ucpchecker.com/agents" rel="noopener noreferrer"&gt;Build an Agent quickstart&lt;/a&gt; covers the other side of the equation.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Check your store now at &lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;UCPChecker.com&lt;/a&gt;. See how you compare: &lt;a href="https://ucpchecker.com/compare" rel="noopener noreferrer"&gt;side-by-side store comparison&lt;/a&gt;. Platform guides: &lt;a href="https://ucpchecker.com/blog/shopify-ucp-guide-ai-agent-commerce" rel="noopener noreferrer"&gt;Shopify&lt;/a&gt; · &lt;a href="https://ucpchecker.com/blog/woocommerce-ucp-guide-ai-agent-commerce" rel="noopener noreferrer"&gt;WooCommerce&lt;/a&gt; · &lt;a href="https://ucpchecker.com/blog/bigcommerce-ucp-guide-ai-agent-commerce" rel="noopener noreferrer"&gt;BigCommerce&lt;/a&gt; · &lt;a href="https://ucpchecker.com/blog/magento-adobe-commerce-ucp-guide-ai-agent-commerce" rel="noopener noreferrer"&gt;Magento&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ecommerce</category>
      <category>webdev</category>
      <category>tutorial</category>
      <category>ucp</category>
    </item>
    <item>
      <title>AI Commerce Needs MLPerf — and Here's an Early Attempt</title>
      <dc:creator>Benji Fisher</dc:creator>
      <pubDate>Fri, 01 May 2026 12:07:45 +0000</pubDate>
      <link>https://dev.to/benjifisher/ai-commerce-needs-mlperf-and-heres-an-early-attempt-2lg1</link>
      <guid>https://dev.to/benjifisher/ai-commerce-needs-mlperf-and-heres-an-early-attempt-2lg1</guid>
      <description>&lt;p&gt;Validating a UCP manifest takes a second. &lt;a href="https://ucpchecker.com/blog/introducing-ucp-score-agent-readiness-grade" rel="noopener noreferrer"&gt;Scoring it for agent-readiness&lt;/a&gt; takes another. Neither of those answers the harder question: when a real frontier agent — &lt;a href="https://ucpplayground.com/models/claude-opus-4-6" rel="noopener noreferrer"&gt;Claude&lt;/a&gt; or &lt;a href="https://ucpplayground.com/models/gpt-5-2" rel="noopener noreferrer"&gt;GPT&lt;/a&gt; or &lt;a href="https://ucpplayground.com/models/gemini-3-1-pro" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt;, picked by a user three weeks from now — walks up to your store with an ordinary shopping prompt, does it actually complete a checkout? Compared to the next implementation? Across the models people are actually using?&lt;/p&gt;

&lt;p&gt;Today there's no shared way to find out. AI commerce has the same coordination problem ML had before MLPerf, web performance had before Lighthouse, and coding models had before HumanEval — and the cost of not solving it is the same: every claim a vendor makes about agent-readiness is currently unverifiable by anyone outside that vendor.&lt;/p&gt;

&lt;p&gt;This post is about what we've been building to close that gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pre-benchmark moment
&lt;/h2&gt;

&lt;p&gt;Every category that grew up around AI has gone through a pre-benchmark moment.&lt;/p&gt;

&lt;p&gt;Machine learning before MLPerf was a pile of vendor-flavoured numbers. NVIDIA reported one set of throughput claims, Google another, AMD a third — and none of it was directly comparable, because nobody was running the same workload, on the same input, on the same harness. MLPerf — submitted to, run by, and audited across the whole industry — fixed that. Buyers could finally compare. The category matured.&lt;/p&gt;

&lt;p&gt;Web performance before Lighthouse was the same. "Fast website" was vibes. PageSpeed Insights gave one number, WebPageTest another, internal RUM dashboards a third. Lighthouse — graded, reproducible, open — fixed it. Today nobody ships a serious site without checking their score.&lt;/p&gt;

&lt;p&gt;Coding models before HumanEval were even worse. Every lab benchmarked against its own preferred problems and reported its own preferred metrics. HumanEval, then MBPP, then SWE-bench, then LiveCodeBench, gave the field a shared evaluation surface. Comparisons stopped being marketing.&lt;/p&gt;

&lt;p&gt;Agentic commerce is in exactly the place those categories were before their benchmarks landed. The standard has converged — UCP is the open spec the industry is building against, and the &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;public directory&lt;/a&gt; tracks 4,500+ verified stores. Major retailers and platforms ship UCP implementations almost weekly. The recent &lt;a href="https://ucpchecker.com/blog/ucp-tech-council-expands-amazon-meta-microsoft-salesforce-stripe" rel="noopener noreferrer"&gt;tech council expansion&lt;/a&gt; brings in most of the rest. &lt;strong&gt;But there is still no neutral, reproducible way to evaluate how well any of those implementations actually work when a real frontier agent tries to shop them.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can't get this from inside a vendor. Shopify cannot credibly benchmark Shopify stores. OpenAI cannot credibly benchmark OpenAI agents. Even when their numbers are honest, the methodology is theirs, the test conditions favour their stack, and nobody else can rerun it. AI commerce has the same coordination problem ML had before MLPerf, and it solves the same way: a shared evaluation layer, run by a third party, that anyone can audit and reproduce.&lt;/p&gt;

&lt;p&gt;Agentic commerce can't mature without that layer. We've built a first credible attempt at one.&lt;/p&gt;

&lt;h2&gt;
  
  
  What UCP Playground Evals does
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://ucpplayground.com/evals" rel="noopener noreferrer"&gt;UCP Playground Evals&lt;/a&gt; is a benchmark framework for agentic commerce. You define a multi-turn shopping conversation, pick the stores and the models you want to evaluate against it, and get back a structured comparison report — funnel matrix, per-session token and duration metrics, error classification, replayable session links, downloadable PDF.&lt;/p&gt;

&lt;p&gt;The point isn't the report format. The point is the three properties underneath, because those determine whether a benchmark is worth trusting.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Standardised, multi-turn sequences
&lt;/h3&gt;

&lt;p&gt;Agentic commerce is conversational, not single-prompt. A real shopping session looks like &lt;em&gt;"Show me products under $60"&lt;/em&gt; → &lt;em&gt;"Add both to my cart"&lt;/em&gt; → &lt;em&gt;"Proceed to checkout"&lt;/em&gt;, with full context carried across turns. That's the unit an eval has to operate on.&lt;/p&gt;

&lt;p&gt;Each eval is a scripted sequence of turns. Every turn gets its own orchestrator round (up to 8 internal tool-calling sub-turns) and the full conversation history is preserved across the sequence — so the agent's choices on T2 are conditioned on what it actually saw on T1, the way real user behaviour conditions on real responses. Four collections ship today: &lt;strong&gt;Browse &amp;amp; Buy&lt;/strong&gt; (4 turns, generic shopping journey), &lt;strong&gt;Multi-Item&lt;/strong&gt; (3 turns, multi-product cart composition and checkout), &lt;strong&gt;Price Constrained&lt;/strong&gt; (3 turns, budget-anchored reasoning across a single purchase), and &lt;strong&gt;Custom&lt;/strong&gt; for user-defined sequences.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Cross-store comparability
&lt;/h3&gt;

&lt;p&gt;The sequences are intentionally generic. Not &lt;em&gt;"Find Nike Air Max 90 in size 10"&lt;/em&gt; but &lt;em&gt;"Show me products under $60"&lt;/em&gt;. That distinction is load-bearing: it's what makes the same test valid against any store running UCP, and it's what makes results from one store directly comparable to results from another. Without it, every benchmark is apples-to-oranges and nothing aggregates.&lt;/p&gt;

&lt;p&gt;The eval runner discovers MCP endpoints automatically from each store's &lt;a href="https://ucpchecker.com/well-known-ucp" rel="noopener noreferrer"&gt;&lt;code&gt;/.well-known/ucp&lt;/code&gt;&lt;/a&gt; manifest, so any UCP-conformant store works without per-store wiring — &lt;a href="https://ucpchecker.com/platforms/shopify" rel="noopener noreferrer"&gt;Shopify&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/woocommerce" rel="noopener noreferrer"&gt;WooCommerce&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/bigcommerce" rel="noopener noreferrer"&gt;BigCommerce&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/magento" rel="noopener noreferrer"&gt;Magento&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/prestashop" rel="noopener noreferrer"&gt;PrestaShop&lt;/a&gt;, and &lt;a href="https://ucpchecker.com/platforms/custom" rel="noopener noreferrer"&gt;Custom &amp;amp; Headless&lt;/a&gt; stacks all work the same way.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Multi-model coverage
&lt;/h3&gt;

&lt;p&gt;The same sequence runs against any of &lt;a href="https://ucpplayground.com/models" rel="noopener noreferrer"&gt;15 frontier models&lt;/a&gt; currently wired up — every major lab, plus a reasoning-tuned subset:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/claude-opus-4-6" rel="noopener noreferrer"&gt;Claude Opus 4.6&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;td&gt;Frontier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/claude-sonnet-4-5" rel="noopener noreferrer"&gt;Claude Sonnet 4.5&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;td&gt;Frontier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gpt-5-2" rel="noopener noreferrer"&gt;GPT-5.2&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;Frontier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gpt-4o" rel="noopener noreferrer"&gt;GPT-4o&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;Frontier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gemini-3-1-pro" rel="noopener noreferrer"&gt;Gemini 3.1 Pro&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;td&gt;Frontier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gemini-3-flash" rel="noopener noreferrer"&gt;Gemini 3 Flash&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;td&gt;Frontier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gemini-2-5-pro" rel="noopener noreferrer"&gt;Gemini 2.5 Pro&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;td&gt;Frontier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gemini-2-5-flash" rel="noopener noreferrer"&gt;Gemini 2.5 Flash&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;td&gt;Frontier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/grok-4" rel="noopener noreferrer"&gt;Grok 4&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;xAI&lt;/td&gt;
&lt;td&gt;Frontier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/deepseek-v3-2" rel="noopener noreferrer"&gt;DeepSeek V3.2&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;DeepSeek&lt;/td&gt;
&lt;td&gt;Frontier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/llama-3-3-70b" rel="noopener noreferrer"&gt;Llama 3.3 70B&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Meta&lt;/td&gt;
&lt;td&gt;Frontier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/deepseek-r1" rel="noopener noreferrer"&gt;DeepSeek R1&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;DeepSeek&lt;/td&gt;
&lt;td&gt;Reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/qwq-32b" rel="noopener noreferrer"&gt;QwQ 32B&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Alibaba&lt;/td&gt;
&lt;td&gt;Reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/grok-3-mini" rel="noopener noreferrer"&gt;Grok 3 Mini&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;xAI&lt;/td&gt;
&lt;td&gt;Reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/o4-mini" rel="noopener noreferrer"&gt;o4-mini&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;Reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The model is part of the test matrix. Same store, different models, same sequence — directly comparable behaviour, with model-level differences surfaced rather than averaged away. Any two can also be &lt;a href="https://ucpplayground.com/models/compare?models=gemini-3-1-pro%2Cclaude-sonnet-4-5" rel="noopener noreferrer"&gt;compared side-by-side&lt;/a&gt; outside the eval framework, on the same workload.&lt;/p&gt;

&lt;h3&gt;
  
  
  The math is straightforward
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;stores × models × sequences = sessions&lt;/code&gt;. Two stores × two models × one sequence = four sessions. Each one is a full agent shopping run, captured end-to-end, replayable, and rolled up into the report.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Standardised, reproducible, vendor-neutral. The three properties that make a benchmark worth trusting.&lt;/strong&gt; Everything else in the framework is built to defend those three.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the framework actually surfaces
&lt;/h2&gt;

&lt;p&gt;The clearest way to show what evals do is to walk through one. Below is a multi-item checkout report we ran across two stores and two Gemini models in March:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://ucpplayground.com/examples/eval-report-sample.pdf" rel="noopener noreferrer"&gt;Download the full multi-item checkout report (PDF) →&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Two-page report covering the funnel comparison matrix, per-session performance breakdown, evaluator configuration, auto-generated recommendations, and clickable session-replay IDs for every run.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Two stores (&lt;a href="https://ucpchecker.com/status/oakywood.shop" rel="noopener noreferrer"&gt;oakywood.shop&lt;/a&gt;, &lt;a href="https://ucpchecker.com/status/ugmonk.com" rel="noopener noreferrer"&gt;ugmonk.com&lt;/a&gt;). Two models (Gemini 3 Flash, Gemini 3.1 Pro). One sequence (multi-item checkout: search → add → checkout). Four sessions total. The headline numbers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;100% checkout rate&lt;/strong&gt; across all four sessions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;95,513 average tokens&lt;/strong&gt; per session&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;48.3s average duration&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;0 errors&lt;/strong&gt; across the matrix&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's the boring summary. The interesting parts are in the per-session table.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Store&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Tokens&lt;/th&gt;
&lt;th&gt;Duration&lt;/th&gt;
&lt;th&gt;Turns&lt;/th&gt;
&lt;th&gt;Cart value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;oakywood.shop&lt;/td&gt;
&lt;td&gt;Gemini 3.1 Pro&lt;/td&gt;
&lt;td&gt;85,614&lt;/td&gt;
&lt;td&gt;93.4s&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;EUR 82.75&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;oakywood.shop&lt;/td&gt;
&lt;td&gt;Gemini 3 Flash&lt;/td&gt;
&lt;td&gt;154,294&lt;/td&gt;
&lt;td&gt;34.7s&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ugmonk.com&lt;/td&gt;
&lt;td&gt;Gemini 3.1 Pro&lt;/td&gt;
&lt;td&gt;46,084&lt;/td&gt;
&lt;td&gt;35.1s&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;USD 77.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ugmonk.com&lt;/td&gt;
&lt;td&gt;Gemini 3 Flash&lt;/td&gt;
&lt;td&gt;96,058&lt;/td&gt;
&lt;td&gt;29.9s&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Same sequence, same stores, two models. Gemini 3.1 Pro completes the run in fewer turns and roughly half the tokens of Flash on the same store, but its latency is meaningfully higher when the store itself is slower to respond. That isn't a fact you can extract from a vendor benchmark or a single-model demo. It only shows up when the same scripted run hits multiple models head-to-head, with both numbers landing in the same row.&lt;/p&gt;

&lt;p&gt;The auto-generated recommendations point at where the real engineering work is, and they're grounded in the actual run data:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Average token usage is 95,513 — above the 40K baseline. Product descriptions may be inflating context. Consider truncating descriptions in MCP responses.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Average session duration is 48.3s — above the 15s target. Optimise MCP endpoint response times, especially initial search calls.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Those are concrete merchandising actions. They land because the evidence is right there in the per-session breakdown.&lt;/p&gt;

&lt;p&gt;The deeper signal shows up across runs against richer stores. In a separate eval against a single shop, two models picked &lt;em&gt;different variant IDs for "Medium"&lt;/em&gt; — one mapped Medium to one variant ID, the other to a different one, and neither is provably correct because the store doesn't expose a human-readable size axis in its variant data. That isn't a bug in either model. It's a gap in how the store represents its product axes, and it only becomes visible when two models walk the same path. &lt;strong&gt;This is the kind of behavioural divergence between frontier models that evals surface — and that vendor-internal benchmarks can't credibly report.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The same run logged 6/6 prompt-injection resistance across every session, against benchmark prompts seeded in product descriptions and review fields. Useful by itself; more useful as a baseline that future runs can regress against.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's on the evals roadmap
&lt;/h2&gt;

&lt;p&gt;This is v1. A few things on the roadmap, in priority order.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;More eval collections.&lt;/strong&gt; The four built-in sequences cover the core shopping flow. The next batch is more diagnostic: single-item flow (the simplest path), variant selection accuracy (the size-label gap above, formalised), prompt-injection resistance (already running, becoming its own collection), escalation handling (&lt;code&gt;requires_escalation&lt;/code&gt; compliance), attribution accuracy (UTM and referrer handling at checkout hand-off), return policy surfacing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Public benchmark leaderboards.&lt;/strong&gt; Same pattern as the &lt;a href="https://ucpchecker.com/leaderboard" rel="noopener noreferrer"&gt;UCP Score leaderboard&lt;/a&gt; — by-store and by-model rankings against the standard sequences, refreshed on schedule, indexed and shareable. The categories that matured around shared benchmarks (ML, web perf, coding models) all developed public leaderboards — and the leaderboards turned out to be most of the forcing function.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Headless API and CI/CD integration.&lt;/strong&gt; Already shipped. The full automation surface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST /api/v1/collections          — create
POST /api/v1/collections/{id}/run — trigger
GET  /api/v1/collection-runs/{id} — poll status + results
GET  /api/v1/collection-runs/{id}/pdf — download report
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first integration we expect anyone to ship is a deploy-time check: trigger an eval after every UCP manifest deploy, assert &lt;code&gt;checkout_rate &amp;gt;= 80&lt;/code&gt;, &lt;code&gt;errors.total == 0&lt;/code&gt;, &lt;code&gt;avg_duration_ms &amp;lt; 30000&lt;/code&gt;, fail the build otherwise. Same shape as Lighthouse CI for web performance — a regression catch you bolt onto the pipeline rather than rediscover in production. Full developer documentation — authentication, rate limits, and a worked GitHub Actions example — lives at &lt;a href="https://ucpchecker.com/developer-tools" rel="noopener noreferrer"&gt;ucpchecker.com/developer-tools&lt;/a&gt;, alongside the rest of the public API surface.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scheduled runs and version tracking.&lt;/strong&gt; Also shipped. Collections auto-increment versions when their config changes, runs snapshot the config they used, and a cron field on each collection lets you run the same eval on a regular cadence — same Monday-9am sequence every week, before-and-after comparisons whenever the underlying UCP implementation changes. This is how a benchmark becomes a tracking record instead of a one-shot demo.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cloning and team scoping.&lt;/strong&gt; Public collections can be cloned into any team workspace; quotas are scoped per team. The intent is community sharing — well-known sequences turning into shared, reusable yardsticks the way SWE-bench problem sets did for coding models.&lt;/p&gt;

&lt;h2&gt;
  
  
  How evals fit the broader development cycle
&lt;/h2&gt;

&lt;p&gt;Evals don't sit alone. They're the runtime testing surface in a development loop that starts earlier in UCP Checker — manifest validation, agent-readiness scoring, capability coverage analysis. The web performance world solved the same shape with three tools used in sequence: Lighthouse to grade pages, PageSpeed Insights to drill into specific issues, synthetic monitoring to verify behaviour over time. UCP implementations follow the same arc: validate the manifest at &lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;&lt;code&gt;/check&lt;/code&gt;&lt;/a&gt;, score it against agent-readiness criteria with the &lt;a href="https://ucpchecker.com/blog/introducing-ucp-score-agent-readiness-grade" rel="noopener noreferrer"&gt;UCP Score&lt;/a&gt;, then run evals against it to see how it actually behaves when a real frontier agent shops it.&lt;/p&gt;

&lt;p&gt;Each tool surfaces something different. Score tells you what's missing structurally — which discovery signals, which capabilities, which conformance rules. Check confirms the manifest validates after fixes land. Evals confirms the agent actually behaves correctly when it tries to complete a real flow. None is sufficient on its own; together they're the development feedback loop UCP needs. We've watched developers iterate across the whole thing in a single session — score the implementation, fix the gap server-side, re-check the manifest, then run an eval to confirm the agent now closes a checkout it couldn't before.&lt;/p&gt;

&lt;p&gt;If you're starting from zero on a UCP implementation, the natural sequence is: get a Score first to see what's missing, fix the highest-impact issues, run a Check to confirm the manifest validates cleanly, then run Evals to confirm real agents complete the flows you care about. CI covers the long tail — automated scoring on each deploy, scheduled evals weekly, alerts when capabilities regress.&lt;/p&gt;

&lt;h2&gt;
  
  
  Methodology and verification
&lt;/h2&gt;

&lt;p&gt;Three properties separate a credible benchmark from a marketing claim. UCP Playground Evals are designed around all three.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Every result links to a replayable session.&lt;/strong&gt; Each eval session generates the same &lt;code&gt;agent_sessions&lt;/code&gt; data the public Playground UI produces — full tool-call timeline, model responses, token-by-token event stream, every retrieved page. The session IDs in any report are clickable. Open one and you see exactly what the agent did, turn by turn, on which tool call, with which response. The sample report above lists four such IDs (e.g. &lt;code&gt;01KMJZM5MG2CA4QN5M983H19E1&lt;/code&gt;) and each resolves to a full replay at &lt;code&gt;ucpplayground.com/sessions/{id}&lt;/code&gt;. &lt;strong&gt;This isn't a marketing claim; it's a verifiable test you can audit.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Every collection is versioned.&lt;/strong&gt; When the configuration of a collection changes — turns added, models swapped, store list updated — the version increments and every run snapshots the config it ran against. Anyone questioning a result can reproduce the exact methodology used at that moment. The PDF report itself prints the collection version at the bottom of every page; the sample above is &lt;code&gt;Collection v3&lt;/code&gt;. Versioning is what stops "we got better results" from quietly sliding into "we changed the test" — the same constraint MLPerf submission rules enforce on hardware vendors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The methodology is open.&lt;/strong&gt; The framework configuration shape is documented — the turns, the orchestrator loop, the stop conditions, the success metrics, the PDF schema. Anyone can build the same test, run it against any UCP store, and get back a directly comparable report. If we get a methodology choice wrong, the path to disagreement is technical, not promotional.&lt;/p&gt;

&lt;p&gt;That's the credibility floor. Everything else in the product builds on it.&lt;/p&gt;

&lt;h2&gt;
  
  
  About UCP Checker and UCP Playground
&lt;/h2&gt;

&lt;p&gt;UCP Checker is the independent validation and monitoring layer for the &lt;a href="https://ucp.dev" rel="noopener noreferrer"&gt;Universal Commerce Protocol&lt;/a&gt;. We crawl, validate, and grade every public UCP manifest in the open web, run the &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;merchant directory&lt;/a&gt; and the &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;UCP Score&lt;/a&gt;, publish the &lt;a href="https://ucpchecker.com/leaderboard" rel="noopener noreferrer"&gt;leaderboard&lt;/a&gt; and &lt;a href="https://ucpchecker.com/stats" rel="noopener noreferrer"&gt;adoption stats&lt;/a&gt;, and ship developer tools — the &lt;a href="https://ucpchecker.com/ucp-validator" rel="noopener noreferrer"&gt;validator&lt;/a&gt;, &lt;a href="https://ucpchecker.com/bulk-check" rel="noopener noreferrer"&gt;bulk checker&lt;/a&gt;, &lt;a href="https://ucpchecker.com/extension" rel="noopener noreferrer"&gt;browser extension&lt;/a&gt;, &lt;a href="https://ucpchecker.com/developer-tools" rel="noopener noreferrer"&gt;public dataset&lt;/a&gt;, and a public REST API. The whole dataset is open, indexed, and ungated.&lt;/p&gt;

&lt;p&gt;UCP Playground is the agent shopping layer that sits next to it — same data model, same &lt;code&gt;/.well-known/ucp&lt;/code&gt; discovery, same replayable session format. UCP Playground Evals is the benchmark surface on top of that. Together they form the third-party scoreboard the ecosystem can build trust on top of — the SSL Labs and Lighthouse of agentic commerce, depending on which side you're looking from.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The interesting eval gaps are the ones nobody's tested yet.&lt;/strong&gt; If a result surprises you — your own store, a competitor's, a model you assumed was a clear winner that turns out not to be — &lt;a href="https://ucpchecker.com/contact" rel="noopener noreferrer"&gt;let us know&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Three concrete next steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Run an eval against your own UCP store.&lt;/strong&gt; Create a collection at &lt;a href="https://ucpplayground.com/evals" rel="noopener noreferrer"&gt;ucpplayground.com/evals&lt;/a&gt;, pick a sequence, pick two models, run it. The four-session example above is the shape most first runs take.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read a public eval report.&lt;/strong&gt; Sample reports are linked from the framework page. Each has clickable session IDs you can replay end-to-end.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wire it into CI.&lt;/strong&gt; The &lt;a href="https://ucpchecker.com/developer-tools" rel="noopener noreferrer"&gt;developer tools page&lt;/a&gt; covers authentication, rate limits, and a GitHub Actions worked example. The assertion shape is the same one Lighthouse CI uses for web performance — &lt;code&gt;checkout_rate&lt;/code&gt;, &lt;code&gt;errors.total&lt;/code&gt;, &lt;code&gt;avg_duration_ms&lt;/code&gt; instead of LCP and TBT.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ecommerce</category>
      <category>webdev</category>
      <category>product</category>
      <category>ucp</category>
    </item>
    <item>
      <title>Is My Store UCP Ready? How to Check in 60 Seconds</title>
      <dc:creator>Benji Fisher</dc:creator>
      <pubDate>Thu, 30 Apr 2026 10:25:51 +0000</pubDate>
      <link>https://dev.to/benjifisher/is-my-store-ucp-ready-how-to-check-in-60-seconds-4fco</link>
      <guid>https://dev.to/benjifisher/is-my-store-ucp-ready-how-to-check-in-60-seconds-4fco</guid>
      <description>&lt;p&gt;The short answer: &lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;enter your domain here&lt;/a&gt; and you'll know in under 60 seconds. This UCP ready check runs the same validation that AI agents use to decide whether your store is worth shopping.&lt;/p&gt;

&lt;p&gt;The longer answer — what "UCP ready" actually means, why it matters, and what to do about the result — is what this post covers.&lt;/p&gt;

&lt;h2&gt;
  
  
  What UCP readiness means
&lt;/h2&gt;

&lt;p&gt;A store is "UCP ready" when it publishes a valid manifest at &lt;a href="https://ucpchecker.com/well-known-ucp" rel="noopener noreferrer"&gt;&lt;code&gt;/.well-known/ucp&lt;/code&gt;&lt;/a&gt; that AI shopping agents can discover, parse, and act on. That's the technical definition.&lt;/p&gt;

&lt;p&gt;In practice, there are three levels:&lt;/p&gt;

&lt;h3&gt;
  
  
  Level 1: Verified
&lt;/h3&gt;

&lt;p&gt;Your manifest exists, returns valid JSON, and passes &lt;a href="https://ucpchecker.com/ucp-validator" rel="noopener noreferrer"&gt;schema validation&lt;/a&gt; against the current &lt;a href="https://ucpchecker.com/specs/2026-04-08" rel="noopener noreferrer"&gt;v2026-04-08 spec&lt;/a&gt;. You appear in the &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;UCP directory&lt;/a&gt;. Agents can find you.&lt;/p&gt;

&lt;p&gt;As of this month, &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-april-2026" rel="noopener noreferrer"&gt;4,024 stores&lt;/a&gt; are at this level.&lt;/p&gt;

&lt;h3&gt;
  
  
  Level 2: Agent-functional
&lt;/h3&gt;

&lt;p&gt;Agents can actually &lt;em&gt;shop&lt;/em&gt; your store — not just discover it. Your MCP endpoint responds, your product data is clean, your checkout flow completes without errors. You score B+ or higher on the &lt;a href="https://ucpchecker.com/leaderboard" rel="noopener noreferrer"&gt;Playground leaderboard&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;422 stores are at this level. The gap between "verified" and "agent-functional" is where most &lt;a href="https://ucpchecker.com/blog/common-ucp-errors" rel="noopener noreferrer"&gt;common errors&lt;/a&gt; live.&lt;/p&gt;

&lt;h3&gt;
  
  
  Level 3: Optimized
&lt;/h3&gt;

&lt;p&gt;Agents complete purchases reliably across multiple models. Your variant data is clean, your latency is low, your capabilities go beyond the defaults. You score A. Only 9 stores are here today.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://ucpchecker.com/blog/ucp-requirements" rel="noopener noreferrer"&gt;UCP requirements checklist&lt;/a&gt; breaks down exactly what each level requires.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to check your store
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Run the checker
&lt;/h3&gt;

&lt;p&gt;Go to &lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;UCPChecker.com/check&lt;/a&gt; and enter your domain. When you check your UCP status, the checker will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fetch &lt;code&gt;/.well-known/ucp&lt;/code&gt; from your domain&lt;/li&gt;
&lt;li&gt;Validate the JSON against the current spec&lt;/li&gt;
&lt;li&gt;Check your robots.txt for AI bot policies&lt;/li&gt;
&lt;li&gt;Inventory your declared capabilities, transports, and payment handlers&lt;/li&gt;
&lt;li&gt;Verify your UCP compliance and report every error and warning with specific error codes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The whole process takes about 1 second. You'll get a full diagnostic report on your &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;status page&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Read the result
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Verified&lt;/strong&gt; (green) — your manifest is valid. You're in the directory. Agents can find you. Check the warnings section for things to improve.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Invalid&lt;/strong&gt; (amber) — your manifest exists but fails validation. The diagnostic panel shows exactly which fields are wrong or missing. Most invalid manifests are one fix away from passing — usually a &lt;a href="https://ucpchecker.com/blog/common-ucp-errors" rel="noopener noreferrer"&gt;missing required field or a misplaced signing_keys&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not Detected&lt;/strong&gt; (grey) — no manifest found at &lt;code&gt;/.well-known/ucp&lt;/code&gt;. Your store isn't UCP ready yet. See the &lt;a href="https://ucpchecker.com/blog/ucp-requirements" rel="noopener noreferrer"&gt;requirements post&lt;/a&gt; for what to publish.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Blocked&lt;/strong&gt; (orange) — your robots.txt or firewall is preventing access to the manifest. The diagnostic will tell you whether it's a robots.txt rule or an HTTP-level block.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Fix what's broken
&lt;/h3&gt;

&lt;p&gt;The checker tells you &lt;em&gt;what&lt;/em&gt; is wrong. Here's where to go for &lt;em&gt;how&lt;/em&gt; to fix it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Platform-specific guides:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/blog/shopify-ucp-guide-ai-agent-commerce" rel="noopener noreferrer"&gt;Shopify&lt;/a&gt; · &lt;a href="https://ucpchecker.com/blog/woocommerce-ucp-guide-ai-agent-commerce" rel="noopener noreferrer"&gt;WooCommerce&lt;/a&gt; · &lt;a href="https://ucpchecker.com/blog/bigcommerce-ucp-guide-ai-agent-commerce" rel="noopener noreferrer"&gt;BigCommerce&lt;/a&gt; · &lt;a href="https://ucpchecker.com/blog/magento-adobe-commerce-ucp-guide-ai-agent-commerce" rel="noopener noreferrer"&gt;Magento&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manifest reference:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/well-known-ucp" rel="noopener noreferrer"&gt;/.well-known/ucp developer guide&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error-by-error fixes:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/blog/common-ucp-errors" rel="noopener noreferrer"&gt;Common UCP errors&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spec changes:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/blog/ucp-v2026-04-08-spec-update" rel="noopener noreferrer"&gt;v2026-04-08 update&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 4: Test with real agents
&lt;/h3&gt;

&lt;p&gt;Schema validation tells you if your manifest is syntactically correct. It tells you nothing about whether an agent can actually buy something from your store. For that, you need &lt;a href="https://ucpplayground.com" rel="noopener noreferrer"&gt;UCP Playground&lt;/a&gt; — it runs real AI agent sessions against your store and shows you exactly where the flow breaks.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://ucpchecker.com/blog/agentic-commerce-optimization-ucp-readiness-data" rel="noopener noreferrer"&gt;agent testing data&lt;/a&gt; shows that the most common runtime failure is &lt;a href="https://ucpchecker.com/blog/common-ucp-errors" rel="noopener noreferrer"&gt;variant mismatches&lt;/a&gt; — clean product data matters more than perfect schema.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Monitor
&lt;/h3&gt;

&lt;p&gt;Your UCP endpoint is a live API. Platform updates, catalog changes, and CDN reconfigurations can break it silently. Set up &lt;a href="https://ucpchecker.com/alerts" rel="noopener noreferrer"&gt;UCP Alerts&lt;/a&gt; to get emailed the moment your status changes — before agents notice.&lt;/p&gt;

&lt;h2&gt;
  
  
  How you compare
&lt;/h2&gt;

&lt;p&gt;Once you're verified, see how your store stacks up:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/compare" rel="noopener noreferrer"&gt;Compare side-by-side&lt;/a&gt;&lt;/strong&gt; with a competitor or partner store — capabilities, transports, payment handlers, latency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/platforms" rel="noopener noreferrer"&gt;Browse your platform&lt;/a&gt;&lt;/strong&gt; — see all verified &lt;a href="https://ucpchecker.com/platforms/shopify" rel="noopener noreferrer"&gt;Shopify&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/woocommerce" rel="noopener noreferrer"&gt;WooCommerce&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/bigcommerce" rel="noopener noreferrer"&gt;BigCommerce&lt;/a&gt;, or &lt;a href="https://ucpchecker.com/platforms/magento" rel="noopener noreferrer"&gt;Magento&lt;/a&gt; stores ranked by capability depth.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/leaderboard" rel="noopener noreferrer"&gt;Check the leaderboard&lt;/a&gt;&lt;/strong&gt; — stores graded A through F on real agent shopping performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why this matters now
&lt;/h2&gt;

&lt;p&gt;UCP adoption is accelerating. &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-april-2026" rel="noopener noreferrer"&gt;1,400+ new merchants&lt;/a&gt; were discovered in April alone. Shopify &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-april-2026" rel="noopener noreferrer"&gt;migrated its entire fleet&lt;/a&gt; to the latest spec in four days. BigCommerce, WooCommerce, and Magento stores are appearing every week.&lt;/p&gt;

&lt;p&gt;Am I UCP ready? The question isn't whether your store will need UCP. It's whether you'll be ready when agents start shopping — and &lt;a href="https://ucpchecker.com/blog/agentic-commerce-optimization-ucp-readiness-data" rel="noopener noreferrer"&gt;they already are&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Before you check, it helps to understand the building blocks: &lt;a href="https://ucpchecker.com/capabilities" rel="noopener noreferrer"&gt;capabilities&lt;/a&gt; define what your store can do for agents, &lt;a href="https://ucpchecker.com/payment-handlers" rel="noopener noreferrer"&gt;payment handlers&lt;/a&gt; define how agents pay, &lt;a href="https://ucpchecker.com/transports" rel="noopener noreferrer"&gt;transports&lt;/a&gt; define how agents connect, and &lt;a href="https://ucpchecker.com/product-discovery" rel="noopener noreferrer"&gt;product discovery&lt;/a&gt; is the flow agents actually run when they shop.&lt;/p&gt;

&lt;p&gt;Make sure your listing on &lt;a href="https://ucpregistry.com" rel="noopener noreferrer"&gt;UCP Registry&lt;/a&gt; is accurate once you're verified — that's how agents find you in the first place.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;Check your store now →&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Build your own agent: &lt;a href="https://ucpchecker.com/agents" rel="noopener noreferrer"&gt;developer quickstart&lt;/a&gt;. Understand the protocol stack: &lt;a href="https://ucpchecker.com/blog/mcp-vs-ucp-vs-ap2-whats-the-difference" rel="noopener noreferrer"&gt;MCP vs UCP vs AP2&lt;/a&gt;. Monthly ecosystem data: &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-april-2026" rel="noopener noreferrer"&gt;State of Agentic Commerce&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ecommerce</category>
      <category>webdev</category>
      <category>tutorial</category>
      <category>ucp</category>
    </item>
    <item>
      <title>Introducing the UCP Score: A 0–100 Agent-Readiness Grade for Every UCP Store</title>
      <dc:creator>Benji Fisher</dc:creator>
      <pubDate>Wed, 29 Apr 2026 09:41:44 +0000</pubDate>
      <link>https://dev.to/benjifisher/introducing-the-ucp-score-a-0-100-agent-readiness-grade-for-every-ucp-store-1851</link>
      <guid>https://dev.to/benjifisher/introducing-the-ucp-score-a-0-100-agent-readiness-grade-for-every-ucp-store-1851</guid>
      <description>&lt;p&gt;After every status check on UCPChecker, the same follow-up question lands in our inbox: &lt;strong&gt;"OK, my manifest is verified. But is it actually any good?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That question comes from everywhere. Engineering leads who shipped a manifest last quarter and want to know if it would actually carry an agent through checkout. Platform teams pitching agent-readiness to merchants who need a number, not a status pill. Analysts trying to chart "&lt;a href="https://ucpchecker.com/platforms" rel="noopener noreferrer"&gt;how Shopify compares to WooCommerce&lt;/a&gt;" and finding that "verified" tells them next to nothing. &lt;a href="https://ucpchecker.com/developer-tools" rel="noopener noreferrer"&gt;Developers&lt;/a&gt; picking which UCP store to integrate with first. AI agent builders deciding whose endpoints to feature in demo flows. &lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;Store owners&lt;/a&gt; benchmarking against direct competitors before a quarterly review.&lt;/p&gt;

&lt;p&gt;None of these audiences really care that a manifest exists. They care about how good it is. Whether it has the surface signals that keep AI shopping agents finding it. Whether the declared transports actually respond when you call them. Whether the spec and schema URLs in the manifest resolve, or quietly 404 the moment a strict agent tries to validate the response shape. The interesting answer is always graded.&lt;/p&gt;

&lt;p&gt;Until today, the only way to answer that question on UCPChecker was to read every line of the validator output and squint. So we built the thing people were already trying to do manually.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;Get a UCP Score for any domain at ucpchecker.com/score →&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What the UCP Score is
&lt;/h2&gt;

&lt;p&gt;A 0–100 composite grade that measures how agent-ready any UCP store actually is. Not "does the manifest exist" — that's the status page. &lt;strong&gt;How well does it work for agents.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The score maps to a single letter grade you can share, embed, or watch over time. Bands are deliberately calibrated to match Lighthouse and SSL Labs — A is meant to be hard to earn:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A (85–100)&lt;/strong&gt; — Agent-ready. Valid manifest, strong discovery, broad capability coverage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;B (70–84)&lt;/strong&gt; — Solid. Minor gaps or one weak category, agents can still transact.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;C (50–69)&lt;/strong&gt; — Partial. Manifest works but missing capabilities or surface signals.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;D (30–49)&lt;/strong&gt; — Weak. Manifest reachable but invalid or near-empty.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;F (0–29)&lt;/strong&gt; — Failing. Blocked, unreachable, or no manifest detected.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every score breaks down into three weighted categories so you can see exactly where the points come from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Agent Discovery (30%)&lt;/strong&gt; — Can agents find and reach you? HTTPS, reachability, agent-friendly &lt;code&gt;robots.txt&lt;/code&gt;, plus the surface signals that keep you in the conversation: &lt;code&gt;/llms.txt&lt;/code&gt;, &lt;code&gt;sitemap.xml&lt;/code&gt;, Open Graph tags, Organization JSON-LD, mobile viewport meta.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UCP Conformance (40%)&lt;/strong&gt; — Does the manifest validate against the &lt;a href="https://ucpchecker.com/specs" rel="noopener noreferrer"&gt;spec&lt;/a&gt;? Validity is 3× weighted in this category — an invalid manifest cannot score above ~50 here, regardless of how good the surface polish is.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Capability Coverage (30%)&lt;/strong&gt; — What can an agent actually do at your store? Declared &lt;a href="https://ucpchecker.com/transports" rel="noopener noreferrer"&gt;transports&lt;/a&gt; (REST/MCP/A2A), checkout, &lt;a href="https://ucpchecker.com/payment-handlers" rel="noopener noreferrer"&gt;payment handlers&lt;/a&gt;, and breadth of &lt;a href="https://ucpchecker.com/capabilities" rel="noopener noreferrer"&gt;capabilities&lt;/a&gt;. When functional probes run, declared transport endpoints that don't actually respond drag this score down.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The composite is a straight weighted average: &lt;code&gt;Discovery × 0.30 + Conformance × 0.40 + Capabilities × 0.30&lt;/code&gt;. No tricks, no hidden weights. The full ruleset is documented in our &lt;a href="https://ucpchecker.com/methodology" rel="noopener noreferrer"&gt;methodology&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you actually get
&lt;/h2&gt;

&lt;p&gt;Every score URL is a live page at &lt;code&gt;/score/{your-domain}&lt;/code&gt;, indexed and shareable. Open one and you don't just see a number:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Top priorities&lt;/strong&gt; — The three highest-impact issues we found, ranked by impact × effort. Start here.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Impact vs Effort matrix&lt;/strong&gt; — Quick Wins / Strategic / Incremental / Consider Later quadrants so you can plan a sprint instead of staring at a wall of warnings.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recommendations with copy-paste fixes&lt;/strong&gt; — Every flagged issue surfaces a snippet you can drop straight into your manifest, &lt;code&gt;robots.txt&lt;/code&gt;, sitemap, or HTML &lt;code&gt;&amp;lt;head&amp;gt;&lt;/code&gt;. Hit "Show fix", copy, paste, redeploy, re-check.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Platform-aware percentile&lt;/strong&gt; — "You're at p72 latency vs the median Shopify store." Because comparing your latency against the whole directory is meaningless when half of it runs on a fundamentally different infrastructure profile.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full check breakdown&lt;/strong&gt; — Every signal we evaluate, grouped by category, with a "why it matters" paragraph alongside each check. No black boxes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Save this report&lt;/strong&gt; — We re-run the full check weekly and email you only when something material changes. Score drops, capability regresses, status flips. Free, no marketing, unsubscribe anytime.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The page is ungated. No signup, no paywall, no "create an account to see the breakdown." We're indexing every score — just like SSL Labs grades and PageSpeed scores. Public scores create a baseline and pressure for the ecosystem to improve, in the same way SSL grades did for HTTPS adoption.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why we built it
&lt;/h2&gt;

&lt;p&gt;The honest answer: &lt;strong&gt;verified-or-not is the wrong question now.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When the UCP spec first landed in January (v2026-01-11), finding a verified store at all was novel. The bar was "did anyone publish a manifest." The status page was the right product for that moment, and it still is for the discovery layer.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;directory&lt;/a&gt; has 4,500+ verified domains today. Verified isn't novel. The interesting question shifted to &lt;strong&gt;"how well does this thing actually work for agents,"&lt;/strong&gt; and nobody had a good answer to that — including us.&lt;/p&gt;

&lt;p&gt;When we ran a deeper analysis for our &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-april-2026" rel="noopener noreferrer"&gt;April State of Agentic Commerce report&lt;/a&gt;, the gap was stark: out of &lt;strong&gt;4,014 verified UCP stores, only 9 delivered a flawless end-to-end agent experience&lt;/strong&gt;. A 0.2% flawless rate. The other 99.8% had a manifest published — they just didn't actually work as well as that manifest suggested. That gap between "verified" and "actually works" is the central infrastructure problem in agentic commerce today. The UCP Score makes that gap visible, measurable, and addressable.&lt;/p&gt;

&lt;p&gt;There's a clear analogue: PageSpeed before Lighthouse. Pre-Lighthouse, web performance optimisation was vibes. People knew slow sites were bad and fast sites were good but couldn't quantify "how slow" or "compared to what." Lighthouse gave them three things — a graded score, a category breakdown, and copy-paste optimisations — and the field changed overnight. Nobody ships a serious site today without checking their Lighthouse score first.&lt;/p&gt;

&lt;p&gt;The agentic commerce ecosystem is at exactly that pre-Lighthouse moment. There's no shared yardstick for agent-readiness. Stores have no way to tell whether the integration they shipped last month is competitive. Platform teams have no way to back up "our merchants are more agent-ready" with a number. AI agent builders have no way to filter "show me the stores most likely to actually complete a transaction."&lt;/p&gt;

&lt;p&gt;The UCP Score is meant to be that yardstick. &lt;strong&gt;Lighthouse for agentic commerce.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How we built it (the short version)
&lt;/h2&gt;

&lt;p&gt;Three signal sources, one composite:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Static analysis&lt;/strong&gt; — The same manifest validator that powers &lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;&lt;code&gt;/check&lt;/code&gt;&lt;/a&gt; and &lt;a href="https://ucpchecker.com/ucp-validator" rel="noopener noreferrer"&gt;&lt;code&gt;/ucp-validator&lt;/code&gt;&lt;/a&gt;. Validity, version format, signing keys, payment handlers — every spec rule turned into a check row.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Surface signals&lt;/strong&gt; — Five public files and meta tags fetched in parallel: &lt;code&gt;/llms.txt&lt;/code&gt;, &lt;code&gt;/sitemap.xml&lt;/code&gt;, Open Graph, Organization JSON-LD, viewport. Presence + content captured (with a content hash for change detection on &lt;code&gt;llms.txt&lt;/code&gt; so we can spot when a brand updates their LLM brief).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Functional probes&lt;/strong&gt; (opt-in) — Two probe families. Transport probes hit each declared transport endpoint with a benign request (MCP gets a &lt;code&gt;tools/list&lt;/code&gt;, REST/A2A get a GET). URL resolution probes fetch every &lt;code&gt;spec&lt;/code&gt; and &lt;code&gt;schema&lt;/code&gt; URL declared in the manifest. Probes only run on user-triggered checks — not on the 24h cron sweep, because hammering 4,500 merchants daily with a dozen extra HTTP requests each isn't neighbourly.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each signal feeds one category sub-score (0–100), and the composite is the weighted average. Recommendations join error codes against a fix library so every flagged issue surfaces a copy-paste snippet — the same pattern Lighthouse uses for its audit list. The whole pipeline runs on the same 24h cycle as the rest of the directory; checks you trigger manually run the full probe stack.&lt;/p&gt;

&lt;p&gt;If you want the deep version, the &lt;a href="https://ucpchecker.com/methodology" rel="noopener noreferrer"&gt;methodology page&lt;/a&gt; walks through every category, every check, every grade band, and the "what we don't score" list.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you can do with it
&lt;/h2&gt;

&lt;p&gt;A few workflows the score unlocks immediately:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pre-merge gate&lt;/strong&gt; — Add a check in your CI that fails the build if your &lt;code&gt;/score/{domain}&lt;/code&gt; drops below B. Same pattern as Lighthouse CI. The score URL is stable and the JSON breakdown lands in the API soon.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Platform comparison&lt;/strong&gt; — The &lt;a href="https://ucpchecker.com/platforms" rel="noopener noreferrer"&gt;&lt;code&gt;/platforms&lt;/code&gt;&lt;/a&gt; page now shows average UCP Score by platform — Shopify vs WooCommerce vs BigCommerce vs Magento at a glance. Useful both for picking a stack and for benchmarking the one you're on.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Leaderboard&lt;/strong&gt; — The &lt;a href="https://ucpchecker.com/leaderboard" rel="noopener noreferrer"&gt;leaderboard&lt;/a&gt; is now ranked by UCP Score with sortable columns for each sub-score. Filter by platform to see the top stores on your stack.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitoring&lt;/strong&gt; — Save any report against your email. We re-run it weekly and alert you on regressions. Score drops, capability disappears, status flips — one email, free, no marketing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Competitive benchmarking&lt;/strong&gt; — Run &lt;a href="https://ucpchecker.com/compare/allbirds.com/vs/casper.com" rel="noopener noreferrer"&gt;Allbirds vs Casper&lt;/a&gt; and see grades side by side. The compare page picks up score data automatically.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;This is v1. A few things already on the roadmap:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Score history &amp;amp; sparkline&lt;/strong&gt; — Save a report and you'll see your score trend over time. We're tracking every check in our history table from day one, so the data exists; the visual lands shortly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Score API&lt;/strong&gt; — &lt;code&gt;GET /api/v1/score/{domain}&lt;/code&gt; returning the full breakdown as JSON. The &lt;a href="https://ucpchecker.com/developer-tools" rel="noopener noreferrer"&gt;data feed&lt;/a&gt; is already public; the score endpoint is the same data behind a stable contract.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spec-version-aware scoring weights&lt;/strong&gt; — As new UCP spec versions land with new emphasis, scoring rules for each version live in config and absorb cleanly. Already version-aware for validation; widening to scoring weights too.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We've also taken pains to make the system absorb future spec releases without a rewrite. Static check copy lives in config, not hardcoded; new error codes plug into the recommendations engine via a single config entry. The next spec drop should land as a configuration change, not a refactor.&lt;/p&gt;

&lt;h2&gt;
  
  
  About UCP Checker
&lt;/h2&gt;

&lt;p&gt;UCP Checker is the independent validation and monitoring layer for the &lt;a href="https://ucp.dev" rel="noopener noreferrer"&gt;Universal Commerce Protocol&lt;/a&gt;. We crawl, validate, and grade every public UCP manifest in the open web, run the public &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;merchant directory&lt;/a&gt;, publish the &lt;a href="https://ucpchecker.com/leaderboard" rel="noopener noreferrer"&gt;leaderboard&lt;/a&gt; and &lt;a href="https://ucpchecker.com/stats" rel="noopener noreferrer"&gt;adoption stats&lt;/a&gt;, and ship developer tools — the &lt;a href="https://ucpchecker.com/ucp-validator" rel="noopener noreferrer"&gt;validator&lt;/a&gt;, the &lt;a href="https://ucpchecker.com/bulk-check" rel="noopener noreferrer"&gt;bulk checker&lt;/a&gt;, the &lt;a href="https://ucpchecker.com/extension" rel="noopener noreferrer"&gt;browser extension&lt;/a&gt;, and now the UCP Score. Everything is free, indexed, and ungated; the dataset is published openly under CC-BY 4.0. Think of us as the SSL Labs of agentic commerce — the third-party scoreboard the ecosystem can build trust on top of.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;Pick any domain. Type it into &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;ucpchecker.com/score&lt;/a&gt; and you'll have a graded report in under a second. If you find a score that surprised you — yours or a competitor's — &lt;a href="https://ucpchecker.com/contact" rel="noopener noreferrer"&gt;let us know&lt;/a&gt;. The interesting score gaps are the ones nobody's looked at yet.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Get a score:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;ucpchecker.com/score&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;See the leaderboard:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/leaderboard" rel="noopener noreferrer"&gt;ucpchecker.com/leaderboard&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;How it's calculated:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/methodology" rel="noopener noreferrer"&gt;ucpchecker.com/methodology&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compare two stores:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/compare" rel="noopener noreferrer"&gt;ucpchecker.com/compare&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Track adoption live:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/stats" rel="noopener noreferrer"&gt;ucpchecker.com/stats&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Get notified on changes:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/alerts" rel="noopener noreferrer"&gt;ucpchecker.com/alerts&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ecommerce</category>
      <category>ai</category>
      <category>product</category>
      <category>ucp</category>
    </item>
    <item>
      <title>UCP Tech Council Expands: What the Meeting Minutes Tell Us About Where the Protocol Is Heading</title>
      <dc:creator>Benji Fisher</dc:creator>
      <pubDate>Sun, 26 Apr 2026 21:37:00 +0000</pubDate>
      <link>https://dev.to/benjifisher/ucp-tech-council-expands-what-the-meeting-minutes-tell-us-about-where-the-protocol-is-heading-5a92</link>
      <guid>https://dev.to/benjifisher/ucp-tech-council-expands-what-the-meeting-minutes-tell-us-about-where-the-protocol-is-heading-5a92</guid>
      <description>&lt;p&gt;On Friday just gone, five of the largest technology companies in the world quietly joined the governing body of the Universal Commerce Protocol. No press release. No blog post. Just a &lt;a href="https://github.com/Universal-Commerce-Protocol/ucp/commit/80ea01c" rel="noopener noreferrer"&gt;commit to MAINTAINERS.md&lt;/a&gt; in the spec repository.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon. Meta. Microsoft. Salesforce. Stripe.&lt;/strong&gt; All now have seats on the UCP Tech Council — the body that reviews, debates, and approves every change to the protocol that AI shopping agents use to buy things.&lt;/p&gt;

&lt;p&gt;We know this because we read the meeting minutes. Every week, the TC meets to debate spec changes, vote on PRs, and argue about how agent commerce should work. Most people in the industry don't read these minutes. We do — and what they reveal about where UCP is heading is more interesting than any announcement.&lt;/p&gt;

&lt;p&gt;This is what the minutes tell us.&lt;/p&gt;

&lt;h2&gt;
  
  
  The expansion: who joined and why it matters
&lt;/h2&gt;

&lt;p&gt;The Tech Council grew from roughly 12 seats to &lt;strong&gt;16 members across 8 companies&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Company&lt;/th&gt;
&lt;th&gt;Representatives&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Google&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;4 seats&lt;/td&gt;
&lt;td&gt;Founding sponsor, spec steward&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Shopify&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;4 seats (incl. 2 new)&lt;/td&gt;
&lt;td&gt;Largest platform implementer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Amazon&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Greg Smith (new)&lt;/td&gt;
&lt;td&gt;The world's largest online retailer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Meta&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;James Andersen (new)&lt;/td&gt;
&lt;td&gt;Social commerce, Instagram Shopping&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Microsoft&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Patrick Jordan (new)&lt;/td&gt;
&lt;td&gt;Copilot, enterprise commerce&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Stripe&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Prasad Wangikar (new)&lt;/td&gt;
&lt;td&gt;Payment infrastructure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Salesforce&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Scot DeDeo (new)&lt;/td&gt;
&lt;td&gt;Commerce Cloud, enterprise retail&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Etsy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Imran Hoosain&lt;/td&gt;
&lt;td&gt;Marketplace commerce&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Target&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Maxime Najim&lt;/td&gt;
&lt;td&gt;Enterprise retail&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Wayfair&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Naga Malepati&lt;/td&gt;
&lt;td&gt;Furniture/home goods&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This isn't ceremonial. The TC has binding authority over spec changes — every PR that ships in a UCP release has been reviewed and voted on by this group. When Amazon and Stripe join that table, it changes what gets prioritised, what gets debated, and ultimately what the protocol becomes.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/Universal-Commerce-Protocol/meeting-minutes/blob/main/tc/2026/2026-03-13.md" rel="noopener noreferrer"&gt;meeting minutes from March 13&lt;/a&gt; first mentioned the election process: seats rotating every six months, with growing partner interest. By &lt;a href="https://github.com/Universal-Commerce-Protocol/meeting-minutes/blob/main/tc/2026/2026-03-27.md" rel="noopener noreferrer"&gt;March 27&lt;/a&gt;, six nominations had been received. The final review was scheduled for April 10. The MAINTAINERS.md update landed April 24.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The new members are already contributing.&lt;/strong&gt; James Andersen (Meta) submitted &lt;a href="https://github.com/Universal-Commerce-Protocol/ucp/pull/367" rel="noopener noreferrer"&gt;PR #367&lt;/a&gt; on April 17 — a documentation PR clarifying network token usage and PCI scope in card credentials. Patrick Jordan (Microsoft) contributed documentation accuracy fixes the same day. These aren't advisory seats. They're engineering seats.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the meeting minutes actually say
&lt;/h2&gt;

&lt;p&gt;We reviewed the six TC meetings from March 6 through April 17. Here's what's being debated, decided, and built — translated for a merchant audience.&lt;/p&gt;

&lt;h3&gt;
  
  
  Identity linking is the top priority — and it's hard
&lt;/h3&gt;

&lt;p&gt;The single most discussed topic across all six meetings is &lt;strong&gt;identity linking&lt;/strong&gt; — how an agent knows who the customer is across sessions, stores, and platforms.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/Universal-Commerce-Protocol/meeting-minutes/blob/main/tc/2026/2026-04-17.md" rel="noopener noreferrer"&gt;April 17 minutes&lt;/a&gt; show an active debate about OAuth 2.0 scope design: nested scopes vs flat scopes vs config maps. The TC favoured flat. &lt;a href="https://github.com/Universal-Commerce-Protocol/ucp/pull/354" rel="noopener noreferrer"&gt;PR #354&lt;/a&gt; implements OAuth 2.0 as the foundation for identity linking with capability-driven scopes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this matters for merchants:&lt;/strong&gt; Identity linking is the missing piece that would let an agent complete a purchase without a checkout-page handoff. Right now, agents can browse and cart — but paying requires redirecting the customer to a human checkout flow. Identity linking + &lt;a href="https://ucpchecker.com/payment-handlers" rel="noopener noreferrer"&gt;payment handlers&lt;/a&gt; would close that loop. Until then, agents rely on the &lt;a href="https://ucpchecker.com/transports" rel="noopener noreferrer"&gt;transport layer&lt;/a&gt; to reach the store and the &lt;a href="https://ucpchecker.com/well-known-ucp" rel="noopener noreferrer"&gt;manifest endpoint&lt;/a&gt; for discovery. Our &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-april-2026" rel="noopener noreferrer"&gt;April state-of-commerce report&lt;/a&gt; showed only 3 stores out of 4,024 currently declare identity linking capability. The spec work happening now is what will eventually bring that number up.&lt;/p&gt;

&lt;h3&gt;
  
  
  Loyalty is being trimmed to ship faster
&lt;/h3&gt;

&lt;p&gt;The TC has been debating loyalty schemas since March. &lt;a href="https://github.com/Universal-Commerce-Protocol/ucp/pull/340" rel="noopener noreferrer"&gt;PR #340&lt;/a&gt; implements a loyalty extension for the checkout capability. The &lt;a href="https://github.com/Universal-Commerce-Protocol/meeting-minutes/blob/main/tc/2026/2026-04-10.md" rel="noopener noreferrer"&gt;April 10 minutes&lt;/a&gt; note that the extension is being "trimmed to baseline use cases" — a pragmatic decision to ship something that works for simple loyalty programs now, rather than waiting for a comprehensive solution that handles every edge case.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt; If your store has a loyalty or rewards program, the spec is building the infrastructure for agents to verify loyalty status and redeem points as part of the checkout flow. This is early — don't build against it yet — but understand that it's coming and it's being shaped by people at Google, Shopify, Etsy, and Target who run real loyalty programs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Local commerce is on the roadmap
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://github.com/Universal-Commerce-Protocol/meeting-minutes/blob/main/tc/2026/2026-04-03.md" rel="noopener noreferrer"&gt;April 3 minutes&lt;/a&gt; list Q2 priorities. Among them: &lt;strong&gt;local commerce&lt;/strong&gt;. &lt;a href="https://github.com/Universal-Commerce-Protocol/ucp/pull/375" rel="noopener noreferrer"&gt;PR #375&lt;/a&gt; proposes store-based local inventory and fulfilment options — the infrastructure an agent would need to answer "is this product available at a store near me?"&lt;/p&gt;

&lt;p&gt;This is Target and Wayfair territory. Both have TC seats. Both have store networks. The fact that local commerce is a Q2 priority with retail representation on the council suggests it's not theoretical.&lt;/p&gt;

&lt;h3&gt;
  
  
  Returns are "incredibly complicated"
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://github.com/Universal-Commerce-Protocol/meeting-minutes/blob/main/tc/2026/2026-04-17.md" rel="noopener noreferrer"&gt;April 17 minutes&lt;/a&gt; include the most honest assessment we've seen in any spec discussion: returns are acknowledged as an "incredibly complicated domain." This is refreshing. Most protocol specs pretend returns are simple. UCP's TC is saying out loud that they're not, and that getting them right will take time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Universal-Commerce-Protocol/ucp/pull/257" rel="noopener noreferrer"&gt;PR #257&lt;/a&gt; from the February cycle introduced a returns extension. It's still in review. The complexity is in modelling return windows, refund methods, partial returns, and eligibility rules — all of which vary by merchant, product, and jurisdiction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt; Don't expect agent-managed returns in 2026. But understand that the protocol is building toward it, and the merchants who implement return policies as structured data (not just PDF links) will be ahead when it ships.&lt;/p&gt;

&lt;h3&gt;
  
  
  The spec itself just shipped its biggest release ever
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://ucpchecker.com/specs/2026-04-08" rel="noopener noreferrer"&gt;v2026-04-08&lt;/a&gt; landed with &lt;strong&gt;60+ merged PRs&lt;/strong&gt; — the largest release since the protocol launched. Key additions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cart capability&lt;/strong&gt; — basket building for agents, a prerequisite for multi-item flows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Catalog search + lookup&lt;/strong&gt; — formalised &lt;a href="https://ucpchecker.com/product-discovery" rel="noopener noreferrer"&gt;product discovery&lt;/a&gt; as a spec capability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Request/response signing&lt;/strong&gt; — cryptographic integrity for agent-store communication&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error handling overhaul&lt;/strong&gt; — first-class errors, business logic error types&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Eligibility claims&lt;/strong&gt; — for loyalty, membership, and verification-gated pricing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Discount extension to cart&lt;/strong&gt; — discounts now apply pre-checkout, not just at checkout&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Risk signals&lt;/strong&gt; — authorization and abuse metadata for fraud prevention&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Our crawler showed &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-april-2026" rel="noopener noreferrer"&gt;Shopify migrating its entire fleet&lt;/a&gt; to v2026-04-08 in four days. &lt;a href="https://ucpchecker.com/stats" rel="noopener noreferrer"&gt;99.4% of verified stores&lt;/a&gt; are now on the latest spec.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means for you
&lt;/h2&gt;

&lt;h3&gt;
  
  
  If you're a merchant
&lt;/h3&gt;

&lt;p&gt;The governance expansion doesn't change what you need to do today. Your &lt;a href="https://ucpchecker.com/blog/ucp-requirements" rel="noopener noreferrer"&gt;UCP requirements&lt;/a&gt; are the same: valid manifest, declared &lt;a href="https://ucpchecker.com/capabilities" rel="noopener noreferrer"&gt;capabilities&lt;/a&gt;, clean variant data. &lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;Check your store&lt;/a&gt;, fix any &lt;a href="https://ucpchecker.com/blog/common-ucp-errors" rel="noopener noreferrer"&gt;common errors&lt;/a&gt;, &lt;a href="https://ucpchecker.com/compare" rel="noopener noreferrer"&gt;compare against competitors&lt;/a&gt;, and &lt;a href="https://ucpchecker.com/alerts" rel="noopener noreferrer"&gt;set up alerts&lt;/a&gt; so you know if anything breaks.&lt;/p&gt;

&lt;p&gt;What it does change is the timeline and the confidence. When Amazon, Microsoft, and Salesforce have engineering seats on the governing body, the protocol is not going away. If you've been waiting for a signal that UCP is "real enough" to invest in — five of the ten largest technology companies joining the TC in a single commit is that signal.&lt;/p&gt;

&lt;h3&gt;
  
  
  If you're a platform
&lt;/h3&gt;

&lt;p&gt;If you run &lt;a href="https://ucpchecker.com/platforms/shopify" rel="noopener noreferrer"&gt;Shopify&lt;/a&gt;, you're covered — platform-level UCP support is mature. If you run &lt;a href="https://ucpchecker.com/platforms/bigcommerce" rel="noopener noreferrer"&gt;BigCommerce&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/woocommerce" rel="noopener noreferrer"&gt;WooCommerce&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/magento" rel="noopener noreferrer"&gt;Magento&lt;/a&gt;, or a &lt;a href="https://ucpchecker.com/platforms/custom" rel="noopener noreferrer"&gt;custom stack&lt;/a&gt;, watch the identity linking and loyalty PRs. These are the capabilities that will differentiate agent-ready platforms from agent-compatible ones in H2 2026.&lt;/p&gt;

&lt;p&gt;Salesforce Commerce Cloud now has a seat at the table. If you're on SFCC, this is the clearest signal yet that platform-level UCP support is coming. Our &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-april-2026" rel="noopener noreferrer"&gt;April report&lt;/a&gt; noted that we've already seen SFCC engineering work in progress.&lt;/p&gt;

&lt;h3&gt;
  
  
  If you're building agents
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://ucpchecker.com/agents" rel="noopener noreferrer"&gt;Build an Agent quickstart&lt;/a&gt; still works — the protocol surface you're building against is stable. But start tracking the identity linking PRs. When that capability ships, the agent flow goes from "browse + cart + redirect to checkout" to "browse + cart + pay" — end-to-end autonomous purchasing. That's the step change.&lt;/p&gt;

&lt;p&gt;Check the &lt;a href="https://ucpchecker.com/leaderboard" rel="noopener noreferrer"&gt;store leaderboard&lt;/a&gt; to find the highest-performing targets, understand how &lt;a href="https://ucpchecker.com/product-discovery" rel="noopener noreferrer"&gt;product discovery&lt;/a&gt; works, and test your agent against real stores in &lt;a href="https://ucpplayground.com" rel="noopener noreferrer"&gt;UCP Playground&lt;/a&gt; and use &lt;a href="https://ucpregistry.com" rel="noopener noreferrer"&gt;UCP Registry&lt;/a&gt; for production discovery. Both will surface the new capabilities as they ship.&lt;/p&gt;

&lt;h2&gt;
  
  
  The reading list
&lt;/h2&gt;

&lt;p&gt;For anyone who wants to follow the protocol's evolution themselves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Meeting minutes:&lt;/strong&gt; &lt;a href="https://github.com/Universal-Commerce-Protocol/meeting-minutes/tree/main/tc/2026" rel="noopener noreferrer"&gt;github.com/Universal-Commerce-Protocol/meeting-minutes&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spec repo:&lt;/strong&gt; &lt;a href="https://github.com/Universal-Commerce-Protocol/ucp" rel="noopener noreferrer"&gt;github.com/Universal-Commerce-Protocol/ucp&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;v2026-04-08 release notes:&lt;/strong&gt; &lt;a href="https://github.com/Universal-Commerce-Protocol/ucp/releases/tag/v2026-04-08" rel="noopener noreferrer"&gt;github.com/Universal-Commerce-Protocol/ucp/releases/tag/v2026-04-08&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MAINTAINERS.md:&lt;/strong&gt; &lt;a href="https://github.com/Universal-Commerce-Protocol/ucp/blob/main/MAINTAINERS.md" rel="noopener noreferrer"&gt;github.com/Universal-Commerce-Protocol/ucp/blob/main/MAINTAINERS.md&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Active PRs:&lt;/strong&gt; &lt;a href="https://github.com/Universal-Commerce-Protocol/ucp/pulls" rel="noopener noreferrer"&gt;github.com/Universal-Commerce-Protocol/ucp/pulls&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We'll continue monitoring the spec, the TC minutes, and the &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;4,500+ merchants&lt;/a&gt; building on the protocol. If any of the Q2 priorities (identity, loyalty, local commerce) ship in spec form, we'll cover them in the &lt;a href="https://ucpchecker.com/blog" rel="noopener noreferrer"&gt;May state-of-commerce report&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Check your store's UCP status at &lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;UCPChecker.com&lt;/a&gt;. Browse verified stores at &lt;a href="https://ucpregistry.com" rel="noopener noreferrer"&gt;UCPRegistry.com&lt;/a&gt;. Test agent performance at &lt;a href="https://ucpplayground.com" rel="noopener noreferrer"&gt;UCPPlayground.com&lt;/a&gt;. Read the full protocol stack: &lt;a href="https://ucpchecker.com/blog/mcp-vs-ucp-vs-ap2-whats-the-difference" rel="noopener noreferrer"&gt;MCP vs UCP vs AP2&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ecommerce</category>
      <category>webdev</category>
      <category>data</category>
      <category>ucp</category>
    </item>
    <item>
      <title>Agentic Commerce Optimization: What 4,491 Merchants Reveal About UCP Readiness</title>
      <dc:creator>Benji Fisher</dc:creator>
      <pubDate>Thu, 23 Apr 2026 11:53:42 +0000</pubDate>
      <link>https://dev.to/benjifisher/agentic-commerce-optimization-what-4491-merchants-reveal-about-ucp-readiness-3fk</link>
      <guid>https://dev.to/benjifisher/agentic-commerce-optimization-what-4491-merchants-reveal-about-ucp-readiness-3fk</guid>
      <description>&lt;h1&gt;
  
  
  Agentic Commerce Optimization: What 4,491 Merchants Reveal About UCP Readiness
&lt;/h1&gt;

&lt;p&gt;Every UCP technical guide tells you how to get UCP ready. We decided to measure who actually is.&lt;/p&gt;

&lt;p&gt;Since UCP launched, UCP Checker has tracked 4,491 merchants — 4,024 of which are verified and actively serving &lt;a href="https://ucpchecker.com/well-known-ucp" rel="noopener noreferrer"&gt;UCP endpoints&lt;/a&gt;. We maintain the largest UCP index of live merchant implementations, and the data tells a story that no theoretical guide can. We've run over 1k agent testing sessions in &lt;a href="https://ucpplayground.com" rel="noopener noreferrer"&gt;UCP Playground&lt;/a&gt;, consumed 43 million tokens doing it, and watched real AI agents attempt to browse, cart, and buy products across every major ecommerce platform. The result isn't a theoretical framework for agentic commerce optimization. It's a field report.&lt;/p&gt;

&lt;p&gt;And the field looks very different from what the guides tell you.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "Agentic Commerce Optimization" Actually Means When You Have Data
&lt;/h2&gt;

&lt;p&gt;The term "agentic commerce optimization" — or ACO — has entered the SEO lexicon as a catch-all for making your store ready for &lt;a href="https://ucpchecker.com/product-discovery" rel="noopener noreferrer"&gt;AI-powered shopping agents&lt;/a&gt;. Most of the early writing treats it like a checklist: add Schema.org markup, update your Merchant Center feed, structure your product data. That advice isn't wrong. It's just incomplete, because it's built on assumptions about how agents will behave rather than observations of &lt;a href="https://ucpchecker.com/blog/mcp-vs-ucp-vs-ap2-whats-the-difference" rel="noopener noreferrer"&gt;how they actually do&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;ACO, measured empirically, is the practice of optimizing your ecommerce stack for the specific patterns that AI agents exhibit when they interact with &lt;a href="https://ucpchecker.com/well-known-ucp" rel="noopener noreferrer"&gt;UCP endpoints&lt;/a&gt;. Those patterns are surprising. Agents don't browse the way humans do. They don't use carts the way humans do. And the failure modes that block them from completing purchases are not the ones you'd predict from reading the spec alone.&lt;/p&gt;

&lt;p&gt;The data we've collected across 4,024 verified UCP merchants tells a concrete story about what matters, what doesn't, and where the real optimization opportunities are hiding.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffjw4qi18947rhvgq1up0.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffjw4qi18947rhvgq1up0.webp" alt="UCP Stack Layers — capability adoption across verified merchants" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real State of UCP Readiness
&lt;/h2&gt;

&lt;p&gt;Let's start with what's working. Of the &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;4,024 verified merchants&lt;/a&gt; in &lt;a href="https://ucpregistry.com" rel="noopener noreferrer"&gt;UCP Registry&lt;/a&gt; — the open UCP directory where agents discover merchants — capability adoption breaks down like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/capabilities/checkout" rel="noopener noreferrer"&gt;Checkout&lt;/a&gt;:&lt;/strong&gt; 4,003 merchants (99.5%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/capabilities/cart" rel="noopener noreferrer"&gt;Cart&lt;/a&gt;:&lt;/strong&gt; 3,987 merchants (99.1%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Product discovery:&lt;/strong&gt; Near-universal&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/capabilities/identity-linking" rel="noopener noreferrer"&gt;Identity&lt;/a&gt;:&lt;/strong&gt; 3 merchants&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/capabilities/payment" rel="noopener noreferrer"&gt;Payment&lt;/a&gt;:&lt;/strong&gt; 0 merchants&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Read those last two numbers again. Three merchants support identity. Zero support native payment. This is the defining feature of UCP's current state: the bottom of the funnel is wide open, but the capabilities that would make agentic commerce truly autonomous — knowing who the customer is and processing payment without a handoff — are functionally nonexistent.&lt;/p&gt;

&lt;p&gt;The spec migration numbers are more encouraging. When the &lt;a href="https://ucpchecker.com/specs/2026-04-08" rel="noopener noreferrer"&gt;v2026-04-08 specification&lt;/a&gt; dropped, 3,994 out of 4,022 tracked merchants had migrated within four days. That's a &lt;a href="https://ucpchecker.com/stats" rel="noopener noreferrer"&gt;99.3% adoption rate&lt;/a&gt; in under a week, which speaks to the platform-driven nature of UCP rollout. Most merchants aren't manually implementing UCP. Their platform is doing it for them, and the platforms shipped the update fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  Platform-by-Platform Reality
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj2ok71gdkl7riwv3935o.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj2ok71gdkl7riwv3935o.webp" alt="UCP Transport Comparison — REST vs MCP vs Embedded by platform" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The theoretical guides will tell you that UCP readiness is about your structured data and feed configuration. In practice, it's mostly about &lt;a href="https://ucpchecker.com/platforms" rel="noopener noreferrer"&gt;which platform you're on&lt;/a&gt;. Here's what we've seen across the major players.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://ucpchecker.com/platforms/shopify" rel="noopener noreferrer"&gt;Shopify&lt;/a&gt;: The Default Winner
&lt;/h3&gt;

&lt;p&gt;Shopify accounts for roughly 74% of identified platforms in our dataset (898 of the platform-identified merchants). This dominance isn't because Shopify merchants are more proactive about UCP — it's because Shopify rolled out UCP support at the platform level, giving every store baseline compliance automatically.&lt;/p&gt;

&lt;p&gt;Out of the box, a Shopify store gets functional product discovery, cart, and checkout endpoints. The Schema.org markup is handled. The Merchant Center feed attributes are populated. For the average merchant, getting UCP ready on Shopify means verifying that your product data is clean rather than building anything from scratch.&lt;/p&gt;

&lt;p&gt;The downside: Shopify's one-size-fits-all approach means limited customization of UCP behavior. If you need to implement conversational commerce attributes like substitution logic or compatibility data, you're working within Shopify's constraints. But for baseline agentic commerce readiness, nothing else comes close to the out-of-the-box experience.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://ucpchecker.com/platforms/woocommerce" rel="noopener noreferrer"&gt;WooCommerce&lt;/a&gt;: Flexible but Inconsistent
&lt;/h3&gt;

&lt;p&gt;WooCommerce stores show the widest variance in UCP readiness. The open-source model means implementation quality depends entirely on which plugins a merchant has installed and how they've configured their stack. We've seen WooCommerce stores with excellent structured data and smooth agent interactions right next to stores where basic product attributes are missing or malformed.&lt;/p&gt;

&lt;p&gt;The flexibility is a genuine advantage for merchants who want to implement advanced ACO features — conversational attributes, detailed return policies, rich product relationships. But the inconsistency is a problem for agents, which need predictable data structures to operate reliably. If you're on WooCommerce and serious about agentic commerce optimization, an audit of your specific UCP endpoint output is essential, not optional. Run your store through &lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;UCP Checker&lt;/a&gt; and see what an agent actually encounters.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://ucpchecker.com/platforms/bigcommerce" rel="noopener noreferrer"&gt;BigCommerce&lt;/a&gt;: Strong APIs, Broken Images
&lt;/h3&gt;

&lt;p&gt;BigCommerce has a genuine technical advantage in its API architecture. The platform's API-first design translates well to UCP's endpoint model, and the stores we've tracked generally produce clean, well-structured UCP responses.&lt;/p&gt;

&lt;p&gt;But there's a specific, persistent issue: BigCommerce's S3-hosted image URLs break agent image parsing. This is a real failure mode we've observed in Playground sessions. When an agent can't parse product images, it loses a significant input signal for product matching and variant selection. For a platform that otherwise has strong UCP fundamentals, this is an unfortunate gap — and one that BigCommerce merchants should pressure their platform to fix. For now, it's worth investigating whether your image delivery pipeline produces URLs that agents can reliably consume. Our &lt;a href="https://ucpchecker.com/blog/bigcommerce-ucp-guide-ai-agent-commerce" rel="noopener noreferrer"&gt;BigCommerce guide&lt;/a&gt; walks through the specifics.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://ucpchecker.com/platforms/magento" rel="noopener noreferrer"&gt;Magento&lt;/a&gt; (Adobe Commerce): Enterprise Muscle, Enterprise Complexity
&lt;/h3&gt;

&lt;p&gt;Magento implementations tend to be enterprise-grade, which means the UCP output is thorough but the setup complexity is high. These stores generally have rich product data, detailed catalog structures, and the kind of attribute depth that agents love. But the implementation burden falls more heavily on the merchant's development team compared to Shopify or BigCommerce, where the platform handles the heavy lifting.&lt;/p&gt;

&lt;p&gt;If you're on Magento and aren't UCP ready yet, expect a meaningful engineering investment. If you have started, you're probably in good shape — the platform's data model maps well to what UCP expects, especially for multi-variant products and complex catalog hierarchies. See our &lt;a href="https://ucpchecker.com/blog/magento-adobe-commerce-ucp-guide-ai-agent-commerce" rel="noopener noreferrer"&gt;Magento guide&lt;/a&gt; for implementation specifics.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Agents Actually Do (vs. What Guides Tell You to Optimize For)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foo5hqo00o5k38qggwxbd.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foo5hqo00o5k38qggwxbd.webp" alt="Agent Shopping Flow — MCP tool call sequence" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's where our data diverges most sharply from the advisory content circulating about UCP preparation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Agents Skip the Cart
&lt;/h3&gt;

&lt;p&gt;The conventional model of ecommerce — browse, add to cart, review cart, checkout — doesn't describe how AI agents behave. In our Playground data, we've recorded 395 checkout operations versus just 104 cart operations. Agents are going direct to checkout nearly four times more often than they're using the cart.&lt;/p&gt;

&lt;p&gt;This has major implications for agentic commerce optimization. If you've invested heavily in cart-level features — upsells, cross-sells, minimum order messaging, cart-based promotions — agents are likely bypassing all of it. The checkout endpoint is where the action happens. Your optimization effort should weight accordingly — &lt;a href="https://ucpchecker.com/compare" rel="noopener noreferrer"&gt;compare your store against competitors&lt;/a&gt; to see where you stand: make sure checkout handles single-product and multi-product flows cleanly, with clear variant specification and unambiguous pricing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Variant Mismatches Are the Top Failure Mode
&lt;/h3&gt;

&lt;p&gt;Cart variant mismatches remain the most common reason agent sessions fail to complete a purchase. An agent selects a product, identifies the desired variant (size, color, configuration), and submits a cart or checkout request with a variant ID that doesn't match what the endpoint expects. The session stalls or errors out.&lt;/p&gt;

&lt;p&gt;This isn't an agent intelligence problem — it's a data clarity problem. Stores with clean, unambiguous variant structures and consistent ID schemes see dramatically higher agent completion rates. Stores with complex variant matrices, inconsistent naming, or variant IDs that change between API responses create confusion that even the best models struggle to resolve.&lt;/p&gt;

&lt;p&gt;If you do one thing for ACO today: audit your variant data. Make sure every variant has a stable identifier, a clear human-readable name, and consistent representation across your discovery and checkout endpoints.&lt;/p&gt;

&lt;h3&gt;
  
  
  Token Consumption Tells You Where Agents Struggle
&lt;/h3&gt;

&lt;p&gt;We've consumed 43 million tokens over 1,000 Playground sessions. The per-session cost varies dramatically based on store complexity and model choice, but a telling pattern emerges in checkout flows: completing a purchase takes approximately 55,000 tokens with the best-performing models.&lt;/p&gt;

&lt;p&gt;That number is a proxy for friction. A 55K-token checkout means the agent is making multiple round-trips, parsing product data, resolving variants, handling errors, and re-trying. Stores that produce clean, predictable UCP responses see lower token counts — which directly translates to faster agent interactions and lower cost for the platforms running these agents at scale.&lt;/p&gt;

&lt;h3&gt;
  
  
  Model Performance Varies Significantly
&lt;/h3&gt;

&lt;p&gt;Not all AI models handle UCP interactions equally. Claude Sonnet 4.5 leads our &lt;a href="https://ucpplayground.com/leaderboard" rel="noopener noreferrer"&gt;Playground leaderboard&lt;/a&gt; with 205 sessions, and the checkout completion rate across all sessions sits at 41%. That might sound low, but consider what it represents: four out of ten fully autonomous purchase attempts succeed end-to-end, without any human intervention, across a diverse set of merchants with varying UCP implementation quality.&lt;/p&gt;

&lt;p&gt;The model performance gap matters for merchants because it signals where your UCP implementation has rough edges. If top-tier models struggle with your checkout flow, every agent will struggle. Testing your store in &lt;a href="https://ucpplayground.com" rel="noopener noreferrer"&gt;UCP Playground&lt;/a&gt; with multiple models gives you a direct read on where your implementation creates unnecessary friction.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Capabilities Gap That Will Define Winners
&lt;/h2&gt;

&lt;p&gt;Go back to those adoption numbers: identity at 3 merchants, payment at 0. These aren't just gaps — they're the entire frontier of competitive differentiation in agentic commerce.&lt;/p&gt;

&lt;p&gt;Right now, every UCP checkout ends with a handoff. The agent gets the customer to the point of purchase, then drops them into a traditional checkout flow to enter their identity and payment information. That handoff is where conversion dies. Every redirect, every form field, every authentication step is a chance for the customer to abandon.&lt;/p&gt;

&lt;p&gt;The merchants who figure out identity and payment first — who let an agent complete a purchase end-to-end without a handoff — will have a structural conversion advantage that no amount of Schema.org optimization can match. This is where UCP's roadmap points: loyalty integration, post-purchase management, multi-vertical capabilities. But the foundation is identity and payment.&lt;/p&gt;

&lt;p&gt;We don't yet know what the winning implementation pattern looks like for these capabilities. The spec supports them, but the ecosystem hasn't built them. This is the space to watch, and the space where early investment will pay disproportionate returns.&lt;/p&gt;

&lt;h2&gt;
  
  
  An Optimization Checklist Grounded in Data
&lt;/h2&gt;

&lt;p&gt;Most ACO checklists are derived from the spec. This one is derived from watching &amp;gt;1,000 agent sessions succeed and fail across 4,024 merchants. Here's what actually moves the needle, ranked by observed impact:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Fix your variant data first.&lt;/strong&gt; Stable IDs, clear names, consistent representation across endpoints. This is the single highest-impact fix based on our failure-mode analysis.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Optimize for direct-to-checkout flows.&lt;/strong&gt; Agents skip the cart. Make sure your checkout endpoint handles product selection, variant specification, and pricing in a single clean interaction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Audit your product images.&lt;/strong&gt; If you're on BigCommerce or any platform using CDN-hosted images with complex URL structures, verify that agents can parse your image URLs. Broken image parsing degrades product matching accuracy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Migrate to the latest spec version immediately.&lt;/strong&gt; The &lt;a href="https://ucpchecker.com/blog/ucp-v2026-04-08-spec-update" rel="noopener noreferrer"&gt;v2026-04-08 migration&lt;/a&gt; happened in four days across the ecosystem. If you're still on an older version, you're already behind 99.3% of verified merchants.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Test with actual agents, not just validators.&lt;/strong&gt; Schema validation tells you if your markup is syntactically correct. It tells you nothing about whether an agent can actually complete a purchase. Run your store through &lt;a href="https://ucpplayground.com" rel="noopener noreferrer"&gt;UCPPlayground&lt;/a&gt; and watch what happens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Validate your full UCP endpoint output.&lt;/strong&gt; Use &lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;UCPChecker&lt;/a&gt; to see exactly what your store exposes to agents — capabilities, product data, structured attributes — and where the gaps are.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;7. Clean up your Merchant Center feed.&lt;/strong&gt; Return policies, product identifiers, and the native commerce attributes that feed into UCP discovery. This is table-stakes, but our data confirms that stores with complete feed data see higher agent engagement in discovery flows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;8. Start thinking about identity and payment.&lt;/strong&gt; You won't implement these today — almost nobody has. But understanding the spec's identity and payment capabilities now positions you — our &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-april-2026" rel="noopener noreferrer"&gt;April ecosystem report&lt;/a&gt; tracks adoption monthly to move fast when the ecosystem catches up. The jump from 0 to first-mover will be worth more than incremental improvements to discovery or checkout.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;9. Monitor your platform's UCP updates.&lt;/strong&gt; If you're on Shopify, WooCommerce, BigCommerce, or Magento, your platform is doing most of the UCP work. Stay current with their releases — &lt;a href="https://ucpchecker.com/alerts" rel="noopener noreferrer"&gt;set up domain alerts&lt;/a&gt; to get notified when your store's status changes. Platform-level updates drove 99.3% spec migration in four days — the single most effective "optimization" most merchants can do is simply keeping their platform current.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;10. Get listed in the UCP directory.&lt;/strong&gt; &lt;a href="https://ucpregistry.com" rel="noopener noreferrer"&gt;UCPRegistry&lt;/a&gt; is the open UCP index where agents discover merchants. Your listing is what agents see when deciding which merchants to route a customer to. Make sure you're listed, your data is accurate, and your capabilities are competitive with peers in your vertical.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;Agentic commerce optimization isn't a theoretical exercise anymore. UCP ecommerce is live, it's measurable, and it's growing fast. Our UCP index tracks &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;4,024 verified merchants&lt;/a&gt; serving &lt;a href="https://ucpchecker.com/well-known-ucp" rel="noopener noreferrer"&gt;UCP endpoints&lt;/a&gt; today. AI agents are completing purchases 41% of the time. The gap between being UCP ready and being UCP optimized is measurable in variant data quality, checkout flow design, and capabilities adoption.&lt;/p&gt;

&lt;p&gt;The merchants who treat ACO as a data problem — not just a markup problem — are the ones who'll convert when agents come shopping. And agents are already shopping. We've got 43 million tokens of proof.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Check if your store is UCP ready at &lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;UCPChecker.com&lt;/a&gt;. Browse the UCP directory at &lt;a href="https://ucpregistry.com" rel="noopener noreferrer"&gt;UCPRegistry&lt;/a&gt;. Test agent interactions in &lt;a href="https://ucpplayground.com" rel="noopener noreferrer"&gt;UCPPlayground&lt;/a&gt;. Platform-specific implementation guides: &lt;a href="https://ucpchecker.com/blog/shopify-ucp-guide-ai-agent-commerce" rel="noopener noreferrer"&gt;Shopify&lt;/a&gt; · &lt;a href="https://ucpchecker.com/blog/woocommerce-ucp-guide-ai-agent-commerce" rel="noopener noreferrer"&gt;WooCommerce&lt;/a&gt; · &lt;a href="https://ucpchecker.com/blog/bigcommerce-ucp-guide-ai-agent-commerce" rel="noopener noreferrer"&gt;BigCommerce&lt;/a&gt; · &lt;a href="https://ucpchecker.com/blog/magento-adobe-commerce-ucp-guide-ai-agent-commerce" rel="noopener noreferrer"&gt;Magento&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ecommerce</category>
      <category>webdev</category>
      <category>ai</category>
      <category>data</category>
    </item>
    <item>
      <title>The State of Agentic Commerce — April 2026</title>
      <dc:creator>Benji Fisher</dc:creator>
      <pubDate>Sat, 18 Apr 2026 09:48:53 +0000</pubDate>
      <link>https://dev.to/benjifisher/the-state-of-agentic-commerce-april-2026-l93</link>
      <guid>https://dev.to/benjifisher/the-state-of-agentic-commerce-april-2026-l93</guid>
      <description>&lt;p&gt;In &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-march-2026" rel="noopener noreferrer"&gt;March&lt;/a&gt;, we crossed 3,000 verified stores and started seeing the first non-Shopify platforms in the directory. We said the next question was whether UCP would remain a Shopify story or become a real multi-platform standard.&lt;/p&gt;

&lt;p&gt;April answered that. We crossed &lt;strong&gt;4,000 verified stores&lt;/strong&gt;, Shopify migrated its entire fleet to the new v2026-04-08 spec in a four-day window, BigCommerce entered the directory with its first three stores, and WooCommerce and Magento integrations started appearing from independent developers. The ecosystem grew 33% in one month while simultaneously upgrading the protocol underneath.&lt;/p&gt;

&lt;p&gt;This is the third monthly state-of-the-ecosystem report from UCP Checker. Here's what the data says.&lt;/p&gt;

&lt;h2&gt;
  
  
  The numbers
&lt;/h2&gt;

&lt;p&gt;As of April 17, 2026:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;4,014&lt;/strong&gt; &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;verified UCP stores&lt;/a&gt; (up from ~3,000 in March, +33%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;4,481&lt;/strong&gt; total domains tracked&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;47,154&lt;/strong&gt; total checks run&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1,436&lt;/strong&gt; new merchants discovered this month&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;866&lt;/strong&gt; new merchants this week alone&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3,988&lt;/strong&gt; stores on the latest &lt;a href="https://ucpchecker.com/specs/2026-04-08" rel="noopener noreferrer"&gt;v2026-04-08 spec&lt;/a&gt; (99.4%)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The growth curve is worth examining. February was discovery: we scanned our first thousand Shopify stores and found UCP everywhere on the platform. March was expansion: we broadened the crawler, crossed 3,000, and started seeing non-Shopify manifests for the first time. April is consolidation: the store count grew 33%, but the more significant movement was the spec migration and the first signs of platform diversification.&lt;/p&gt;

&lt;p&gt;The weekly run rate matters here. At 866 new merchants discovered this week alone, the ecosystem is adding roughly 125 stores per day. But the growth isn't organic in the way a consumer product grows — it comes in waves, driven by platform-level deployments. When Shopify flips a switch, hundreds of stores appear overnight. When BigCommerce ships UCP, three appear. The question for May isn't "how many stores" but "which platforms ship next" — because each platform deployment is a step function, not a slope.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Shopify spec migration
&lt;/h2&gt;

&lt;p&gt;This is the story of the month. Between April 13 and April 17, Shopify migrated nearly its entire UCP fleet from v2026-01-23 to &lt;a href="https://ucpchecker.com/blog/ucp-v2026-04-08-spec-update" rel="noopener noreferrer"&gt;v2026-04-08&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;On April 13, our crawler showed &lt;strong&gt;2 stores&lt;/strong&gt; on the new spec. By April 17: &lt;strong&gt;3,988&lt;/strong&gt;. That's 3,986 stores upgraded in roughly four days — a coordinated platform-level migration, not individual merchants updating their manifests.&lt;/p&gt;

&lt;p&gt;The v2026-04-08 spec introduced three breaking changes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;signing_keys&lt;/code&gt; moved from nested to root level.&lt;/strong&gt; Previously at &lt;code&gt;ucp.signing_keys&lt;/code&gt;, now at the document root alongside &lt;code&gt;ucp&lt;/code&gt;. This is the structural change that required a manifest rewrite, not just a version bump.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Business profile distinction.&lt;/strong&gt; The spec now formally separates business profiles (individual store manifests at &lt;code&gt;/.well-known/ucp&lt;/code&gt;) from platform profiles, with different requirements for &lt;code&gt;spec&lt;/code&gt; and &lt;code&gt;schema&lt;/code&gt; fields on services and capabilities. Business profiles are lighter — &lt;code&gt;spec&lt;/code&gt; and &lt;code&gt;schema&lt;/code&gt; are optional.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;a2a&lt;/code&gt; transport formally added.&lt;/strong&gt; Google's Agent2Agent Protocol is now a recognised transport alongside REST, MCP, and Embedded, though adoption is effectively zero in the wild.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The migration means &lt;strong&gt;99.4% of the verified directory is now on the latest spec&lt;/strong&gt;. Only 26 stores remain on older versions: 19 on v2026-01-11, 6 on v2026-01-23, and 1 on v2026-01-14. These are almost entirely non-Shopify stores that need to upgrade manually.&lt;/p&gt;

&lt;p&gt;For the full spec breakdown, see our &lt;a href="https://ucpchecker.com/blog/ucp-v2026-04-08-spec-update" rel="noopener noreferrer"&gt;v2026-04-08 spec announcement&lt;/a&gt; and the &lt;a href="https://ucpchecker.com/specs" rel="noopener noreferrer"&gt;spec versions page&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Beyond Shopify: platform diversification accelerates
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://ucpchecker.com/platforms/shopify" rel="noopener noreferrer"&gt;Shopify&lt;/a&gt; still dominates at 3,982 of 4,014 verified stores (99.2%). But the other 32 verified stores tell a more interesting story — these are developers who chose to publish a UCP manifest without a platform-level integration doing it for them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://ucpchecker.com/platforms/bigcommerce" rel="noopener noreferrer"&gt;BigCommerce&lt;/a&gt; entered the directory&lt;/strong&gt; with its first three verified stores: &lt;a href="https://ucpchecker.com/status/untilgone.com" rel="noopener noreferrer"&gt;untilgone.com&lt;/a&gt;, &lt;a href="https://ucpchecker.com/status/touchupdirect.com" rel="noopener noreferrer"&gt;touchupdirect.com&lt;/a&gt;, and &lt;a href="https://ucpchecker.com/status/midwoodflowershop.com" rel="noopener noreferrer"&gt;midwoodflowershop.com&lt;/a&gt;. All three are on v2026-04-08 with checkout and cart capabilities declared. Notably, their average manifest latency (~890ms) is significantly higher than Shopify's (~130ms) — BigCommerce manifests are served from the storefront origin rather than a CDN-cached endpoint. Platform-level latency differences like this will matter as agent response budgets tighten.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://ucpchecker.com/platforms/woocommerce" rel="noopener noreferrer"&gt;WooCommerce&lt;/a&gt;&lt;/strong&gt; now has 3 verified stores, up from zero in March. These are hand-built integrations — WooCommerce doesn't have native UCP support, so each merchant published their manifest manually. We fixed a &lt;a href="https://ucpchecker.com/blog/ucp-v2026-04-08-spec-update" rel="noopener noreferrer"&gt;validation bug&lt;/a&gt; this month that was incorrectly rejecting WooCommerce manifests with &lt;code&gt;payment_handlers: []&lt;/code&gt; (valid for stores using checkout-link redirect flows).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://ucpchecker.com/platforms/magento" rel="noopener noreferrer"&gt;Magento&lt;/a&gt;&lt;/strong&gt; has 1 verified store. &lt;strong&gt;Custom/headless&lt;/strong&gt; stacks account for 25 verified stores — the most architecturally diverse group, including our own &lt;a href="https://ucpchecker.com/status/ucpchecker.com" rel="noopener noreferrer"&gt;ucpchecker.com&lt;/a&gt; manifest.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Salesforce Commerce Cloud&lt;/strong&gt; has zero verified stores in the directory today. But industry signals suggest SFCC is exploring UCP support at the platform level — not as a one-off client integration, but as a feature that would ship to all Commerce Cloud merchants. If it follows the Shopify pattern — a single platform-level deployment bringing thousands of enterprise storefronts (Puma, Ralph Lauren, Under Armour, Adidas) into the ecosystem in one wave — the directory composition would shift significantly. SFCC is natively REST-based, so a REST-first UCP transport would be the natural fit, compared to Shopify's MCP-first approach. We're watching this closely.&lt;/p&gt;

&lt;p&gt;The full platform breakdown is live on our new &lt;a href="https://ucpchecker.com/platforms" rel="noopener noreferrer"&gt;/platforms&lt;/a&gt; page.&lt;/p&gt;

&lt;h2&gt;
  
  
  How agents actually perform
&lt;/h2&gt;

&lt;p&gt;The numbers above tell you which stores &lt;em&gt;have&lt;/em&gt; UCP. This section tells you which stores &lt;em&gt;work&lt;/em&gt; when an AI agent actually tries to shop them — and which models do it best.&lt;/p&gt;

&lt;h3&gt;
  
  
  Store benchmarks
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://ucpchecker.com/leaderboard" rel="noopener noreferrer"&gt;Playground benchmarks&lt;/a&gt; grade stores A through F on end-to-end agent shopping performance:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Grade&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;th&gt;What it means&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;A&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Agent completes the full flow flawlessly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;B+&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;422&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Works with minor issues — the largest cohort&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;B&lt;/td&gt;
&lt;td&gt;222&lt;/td&gt;
&lt;td&gt;Cart succeeds, checkout has friction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;C+ / C&lt;/td&gt;
&lt;td&gt;225&lt;/td&gt;
&lt;td&gt;Discovery and browse work, deeper flow breaks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;Significant failures across the flow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;F&lt;/td&gt;
&lt;td&gt;289&lt;/td&gt;
&lt;td&gt;Manifest validates but the agent can't complete any step&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The B+ tier at 422 stores is the most important number here. These stores are &lt;em&gt;close&lt;/em&gt; — an agent can reliably discover, search, and cart them, but checkout friction (slow responses, variant mismatches, payment handler quirks) stops the flow short. The path from B+ to A is usually a single fix. The 289 F-grade stores are the other end: technically verified but functionally broken when an agent actually tries to shop them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Model leaderboard
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://ucpplayground.com" rel="noopener noreferrer"&gt;UCP Playground&lt;/a&gt; now supports &lt;strong&gt;15 frontier LLMs&lt;/strong&gt; from 7 vendors, tested against &lt;strong&gt;76 unique stores&lt;/strong&gt;, generating over &lt;strong&gt;$114,000 in aggregate cart value&lt;/strong&gt;. The &lt;a href="https://ucpplayground.com/leaderboard" rel="noopener noreferrer"&gt;model leaderboard&lt;/a&gt; scores every model on search, cart completion, and checkout conversion:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Shopping Score&lt;/th&gt;
&lt;th&gt;Checkout %&lt;/th&gt;
&lt;th&gt;Search %&lt;/th&gt;
&lt;th&gt;Vendor&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek V3.2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;63&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;53.1%&lt;/td&gt;
&lt;td&gt;85.7%&lt;/td&gt;
&lt;td&gt;DeepSeek&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 3 Flash&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;59&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;51.4%&lt;/td&gt;
&lt;td&gt;90.3%&lt;/td&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grok 4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;59&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;42.0%&lt;/td&gt;
&lt;td&gt;92.0%&lt;/td&gt;
&lt;td&gt;xAI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.6&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;52&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;41.9%&lt;/td&gt;
&lt;td&gt;80.0%&lt;/td&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Sonnet 4.5&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;50&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;54.6%&lt;/td&gt;
&lt;td&gt;86.8%&lt;/td&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;And the speed rankings — because latency is the other dimension that matters:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Avg Session&lt;/th&gt;
&lt;th&gt;Vendor&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 2.5 Flash&lt;/td&gt;
&lt;td&gt;~12s&lt;/td&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-4o&lt;/td&gt;
&lt;td&gt;~14s&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 3 Flash&lt;/td&gt;
&lt;td&gt;~17s&lt;/td&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.6&lt;/td&gt;
&lt;td&gt;~31s&lt;/td&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grok 4&lt;/td&gt;
&lt;td&gt;~76s&lt;/td&gt;
&lt;td&gt;xAI&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Three takeaways
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;DeepSeek V3.2 leads the leaderboard.&lt;/strong&gt; An open-weight model tops the composite shopping score at 63 — ahead of every Anthropic, Google, and OpenAI model. The agentic commerce stack is genuinely model-agnostic in practice, not just in spec language.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Search works everywhere. Checkout is the bottleneck.&lt;/strong&gt; Every model scores above 70% on product search. But checkout conversion drops to 13–56% depending on the model. The gap between "can find products" and "can actually buy them" is the reliability frontier for the ecosystem. This is where the work is.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reasoning models underperform.&lt;/strong&gt; QwQ 32B (0% checkout), o4-mini (16.7%), Grok 3 Mini (13.3%), and DeepSeek R1 (21.4%) all score below 40. Models optimised for chain-of-thought reasoning burn tokens on deliberation and struggle to execute the simple, sequential tool-call patterns shopping requires. The best shopping agents are fast and decisive, not thoughtful.&lt;/p&gt;

&lt;p&gt;Full model profiles are on the &lt;a href="https://ucpplayground.com/models" rel="noopener noreferrer"&gt;Playground models page&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The reliability gap: verified is not ready
&lt;/h2&gt;

&lt;p&gt;This is the editorial point we want to make clearly, because the headline number (4,014 verified stores) obscures the more important one: &lt;strong&gt;9 stores score A&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Four thousand stores have valid UCP manifests. Nine of them deliver a flawless end-to-end agent shopping experience. That's a 0.2% flawless rate. The gap between "technically verified" and "actually shoppable by an AI agent without friction" is the central infrastructure problem for agentic commerce in 2026.&lt;/p&gt;

&lt;p&gt;The B+ tier — 422 stores — is where the leverage is. These stores work &lt;em&gt;most&lt;/em&gt; of the time. An agent can discover them, search their catalog, build a cart, and usually reach a checkout URL. But "usually" isn't good enough when the agent is spending someone's money. The failures at B+ level are specific and fixable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cart variant mismatches&lt;/strong&gt; — the agent selects a size/colour variant that doesn't match the store's internal variant ID scheme. The cart call succeeds but adds the wrong item.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Payment handler timeouts&lt;/strong&gt; — the tokenization step takes longer than the agent's timeout window, and the session drops silently.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stale product data&lt;/strong&gt; — the catalog returns products that are out of stock by the time the agent tries to cart them. No error — just an empty cart.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Checkout redirect loops&lt;/strong&gt; — the checkout URL the store returns sends the agent into an authentication loop that a human browser would handle with cookies but an MCP client can't.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each of these is a single-fix problem for the store operator. But at scale, across 422 stores, the aggregate effect is that agents fail more often than they succeed at the final step. &lt;strong&gt;The ecosystem doesn't need more stores. It needs the stores it has to work more reliably.&lt;/strong&gt; That's the infrastructure investment that will actually unlock agent commerce at scale — and it's where we're focusing our tooling work for May.&lt;/p&gt;

&lt;h2&gt;
  
  
  Capability coverage: the ceiling hasn't moved
&lt;/h2&gt;

&lt;p&gt;Across 4,014 verified stores:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Coverage&lt;/th&gt;
&lt;th&gt;Stores&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpchecker.com/capabilities/checkout" rel="noopener noreferrer"&gt;Checkout&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;99.6%&lt;/td&gt;
&lt;td&gt;3,996&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpchecker.com/capabilities/cart" rel="noopener noreferrer"&gt;Cart&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;99.3%&lt;/td&gt;
&lt;td&gt;3,985&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpchecker.com/capabilities/identity-linking" rel="noopener noreferrer"&gt;Identity linking&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;0.07%&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpchecker.com/capabilities/payment" rel="noopener noreferrer"&gt;Payment&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Same pattern as March. Checkout and cart are effectively universal because Shopify ships them by default. The advanced capabilities — identity, loyalty, payment — haven't moved. The gap between "technically verified" and "deeply agent-ready" is still the story. Until more stores declare capabilities beyond the Shopify defaults, the ecosystem depth chart stays flat.&lt;/p&gt;

&lt;h2&gt;
  
  
  The broader ecosystem
&lt;/h2&gt;

&lt;p&gt;April was quieter on the announcements front than March — which saw Splitit, PayPal, and Google all making public UCP commitments in a single week. But the signals that matter in April are structural, not press-release-shaped.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Shopify's fleet-wide spec migration is itself an ecosystem signal.&lt;/strong&gt; It demonstrates that a major platform can coordinate a breaking spec upgrade across thousands of stores in days, not months. Every other platform considering UCP adoption now has a reference point for what a managed migration looks like. The v2026-04-08 changes (signing_keys relocation, business profile distinction) were non-trivial — and Shopify shipped them to its entire fleet without a single store going offline. That's the kind of platform engineering confidence that accelerates the next platform's decision to build UCP support.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The endorsed partner roster continues to grow.&lt;/strong&gt; &lt;a href="https://ucpregistry.com/vendor/adyen" rel="noopener noreferrer"&gt;Adyen&lt;/a&gt;, &lt;a href="https://ucpregistry.com/vendor/american-express" rel="noopener noreferrer"&gt;American Express&lt;/a&gt;, &lt;a href="https://ucpregistry.com/vendor/mastercard" rel="noopener noreferrer"&gt;Mastercard&lt;/a&gt;, &lt;a href="https://ucpregistry.com/vendor/stripe" rel="noopener noreferrer"&gt;Stripe&lt;/a&gt;, &lt;a href="https://ucpregistry.com/vendor/visa" rel="noopener noreferrer"&gt;Visa&lt;/a&gt;, &lt;a href="https://ucpregistry.com/vendor/checkout-com" rel="noopener noreferrer"&gt;Checkout.com&lt;/a&gt;, &lt;a href="https://ucpregistry.com/vendor/affirm" rel="noopener noreferrer"&gt;Affirm&lt;/a&gt;, &lt;a href="https://ucpregistry.com/vendor/splitit" rel="noopener noreferrer"&gt;Splitit&lt;/a&gt;, and &lt;a href="https://ucpregistry.com/vendor/paypal" rel="noopener noreferrer"&gt;PayPal&lt;/a&gt; are all publicly committed to the protocol's payment layer. For any platform evaluating UCP, the payment handler ecosystem is no longer a gap — it's arguably the most mature part of the stack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The model ecosystem is widening faster than the store ecosystem.&lt;/strong&gt; In February, we tested 3 models. In March, 8. In April, 16 — from 7 vendors across the US, China, and Europe. The number of AI models that can speak MCP and execute a UCP shopping flow is growing faster than the number of stores that can serve one. This suggests the bottleneck is shifting from "agents that can shop" to "stores that can be shopped reliably" — which circles back to the reliability gap above.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we shipped
&lt;/h2&gt;

&lt;p&gt;Heavy shipping month on the tooling side:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/compare" rel="noopener noreferrer"&gt;Side-by-side store comparison&lt;/a&gt;&lt;/strong&gt; — compare any two stores head-to-head on metrics, capabilities, transports, and payment handlers. &lt;a href="https://ucpchecker.com/blog/introducing-side-by-side-ucp-store-compare" rel="noopener noreferrer"&gt;Embeddable via iframe&lt;/a&gt; for blog posts and docs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/platforms" rel="noopener noreferrer"&gt;Platform pages&lt;/a&gt;&lt;/strong&gt; — live landing pages for &lt;a href="https://ucpchecker.com/platforms/shopify" rel="noopener noreferrer"&gt;Shopify&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/bigcommerce" rel="noopener noreferrer"&gt;BigCommerce&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/woocommerce" rel="noopener noreferrer"&gt;WooCommerce&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/magento" rel="noopener noreferrer"&gt;Magento&lt;/a&gt;, and &lt;a href="https://ucpchecker.com/platforms/custom" rel="noopener noreferrer"&gt;Custom&lt;/a&gt;. Leaderboards, capability coverage, and transport adoption — auto-populates as stores verify.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/well-known-ucp" rel="noopener noreferrer"&gt;/.well-known/ucp developer guide&lt;/a&gt;&lt;/strong&gt; — field reference, minimal examples, publishing guides for Nginx/Cloudflare/Node, the six most common validation mistakes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/product-discovery" rel="noopener noreferrer"&gt;Product discovery guide&lt;/a&gt;&lt;/strong&gt; — the MCP tool call sequence agents use to find and buy products. Live demo, discovery-ready stores, three-way CTA to &lt;a href="https://ucpplayground.com" rel="noopener noreferrer"&gt;Playground&lt;/a&gt; + &lt;a href="https://ucpregistry.com" rel="noopener noreferrer"&gt;Registry&lt;/a&gt; + &lt;a href="https://ucprails.com" rel="noopener noreferrer"&gt;Rails&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/agents" rel="noopener noreferrer"&gt;Build an Agent quickstart&lt;/a&gt;&lt;/strong&gt; — from zero to a working agent in 30 minutes. Copy-paste code in Python and TypeScript.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spec validation fixes&lt;/strong&gt; — accepted the &lt;code&gt;payment.handlers&lt;/code&gt; nested format (WooCommerce), downgraded empty &lt;code&gt;payment_handlers: []&lt;/code&gt; from hard fail to warning, upgraded our own manifest to v2026-04-08.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What to watch in May
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Salesforce Commerce Cloud.&lt;/strong&gt; First platform-level deployment from the enterprise tier would be the most significant ecosystem event since Shopify's initial rollout. We'll catch any SFCC store that publishes on the next crawl.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The B+ → A path.&lt;/strong&gt; 422 stores are one fix away from flawless agent shopping. We're building tooling to surface the specific issue per store so operators can action it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Non-Shopify growth rate.&lt;/strong&gt; 32 non-Shopify stores this month vs ~15 last month. If this doubles again in May, UCP stops being a "Shopify project" and becomes a genuine multi-platform standard.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AP2 / A2A adoption.&lt;/strong&gt; Zero stores declare either protocol. The v2026-04-08 spec formally added &lt;code&gt;a2a&lt;/code&gt; as a transport. First adopter will be notable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;p&gt;All data comes from the UCP Checker crawler, which re-checks every tracked domain at least every 24 hours. The raw verified-merchant dataset is published monthly on &lt;a href="https://huggingface.co/datasets/UCPChecker/ucp-merchants" rel="noopener noreferrer"&gt;Hugging Face&lt;/a&gt; under CC-BY 4.0.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Browse the directory:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;ucpchecker.com/directory&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Track adoption live:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/stats" rel="noopener noreferrer"&gt;ucpchecker.com/stats&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compare two stores:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/compare" rel="noopener noreferrer"&gt;ucpchecker.com/compare&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Platform breakdown:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/platforms" rel="noopener noreferrer"&gt;ucpchecker.com/platforms&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build your own agent:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/agents" rel="noopener noreferrer"&gt;ucpchecker.com/agents&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ecommerce</category>
      <category>ai</category>
      <category>data</category>
      <category>ucp</category>
    </item>
  </channel>
</rss>
