<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Benji Fisher</title>
    <description>The latest articles on DEV Community by Benji Fisher (@benjifisher).</description>
    <link>https://dev.to/benjifisher</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3787687%2F0c8176d8-b238-43f2-b0af-71689e955123.jpg</url>
      <title>DEV Community: Benji Fisher</title>
      <link>https://dev.to/benjifisher</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/benjifisher"/>
    <language>en</language>
    <item>
      <title>Introducing the UCP Playground Extension: An AI Shopping Agent in Your Side Panel</title>
      <dc:creator>Benji Fisher</dc:creator>
      <pubDate>Thu, 28 May 2026 10:01:33 +0000</pubDate>
      <link>https://dev.to/benjifisher/introducing-the-ucp-playground-extension-an-ai-shopping-agent-in-your-side-panel-4964</link>
      <guid>https://dev.to/benjifisher/introducing-the-ucp-playground-extension-an-ai-shopping-agent-in-your-side-panel-4964</guid>
      <description>&lt;p&gt;Our &lt;a href="https://ucpchecker.com/blog/how-a-browser-extension-became-our-biggest-discovery-engine" rel="noopener noreferrer"&gt;first extension&lt;/a&gt; answered one question while you browse: &lt;em&gt;is this store agent-ready?&lt;/em&gt; A green dot in your toolbar, fired by a single probe to &lt;code&gt;/.well-known/ucp&lt;/code&gt;. It quietly became our biggest discovery engine.&lt;/p&gt;

&lt;p&gt;Today we're shipping our second extension — and it answers the obvious follow-up: &lt;strong&gt;okay, so what actually happens when an agent shops here?&lt;/strong&gt; — without making you leave the page.&lt;/p&gt;

&lt;p&gt;Meet the &lt;strong&gt;&lt;a href="https://chromewebstore.google.com/detail/ucp-playground-%E2%80%94-ai-shopp/hlblkioegnephlgkmgdlemlbkaimhdpp" rel="noopener noreferrer"&gt;UCP Playground — AI Shopping Agent&lt;/a&gt;&lt;/strong&gt; extension. Open Chrome's side panel on any UCP-enabled store, pick a model, and an AI agent shops it live — right beside the page. It searches, compares, builds a cart, and reaches checkout, driving the store through its &lt;em&gt;own published UCP tools&lt;/em&gt;. No screen-scraping, no separate tab.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://chromewebstore.google.com/detail/ucp-playground-%E2%80%94-ai-shopp/hlblkioegnephlgkmgdlemlbkaimhdpp" rel="noopener noreferrer"&gt;Install it free from the Chrome Web Store →&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  From a green dot to a live agent
&lt;/h2&gt;

&lt;p&gt;We've been building toward this in three steps.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;&lt;a href="https://ucpchecker.com/blog/how-a-browser-extension-became-our-biggest-discovery-engine" rel="noopener noreferrer"&gt;first extension&lt;/a&gt;&lt;/strong&gt; is passive detection: one HTTP request per domain, a badge that tells you whether a store speaks the protocol. &lt;strong&gt;&lt;a href="https://ucpplayground.com/?utm_source=ucpchecker&amp;amp;utm_medium=blog&amp;amp;utm_campaign=playground-extension" rel="noopener noreferrer"&gt;UCP Playground&lt;/a&gt;&lt;/strong&gt; on the web is the runtime layer: it runs live agent shopping sessions against real stores and shows every JSON-RPC message flowing between agent and merchant — now across &lt;a href="https://ucpchecker.com/blog/ucp-playground-1000-agent-sessions" rel="noopener noreferrer"&gt;16 frontier models and dozens of real stores&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The gap was the hop between them. You'd spot a UCP store while browsing, then go to a separate site to watch an agent try it. We even teased closing that gap in the first extension's write-up — &lt;em&gt;"see the green dot, click Open in Playground, and you're watching Claude shop."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This extension closes it. Detection and action now live in the same tab.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A side panel, side by side.&lt;/strong&gt; Click the toolbar icon and Playground opens in Chrome's native side panel, next to the storefront. You see the human page and the agent's run at the same time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chat with any UCP store.&lt;/strong&gt; Type what you'd ask a shop assistant — &lt;em&gt;"find a waterproof jacket under £100"&lt;/em&gt; — and the agent executes the store's UCP tools to answer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The full tool-call timeline.&lt;/strong&gt; Every step is visible: search, product lookup, cart, checkout — with raw responses, timing, product cards, cart state, and the final checkout URL. It's the same observability the web Playground is known for, docked beside the page.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pick your model.&lt;/strong&gt; Run any model exposed by your Playground account, so you can compare how different agents handle the same store.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured, not scraped.&lt;/strong&gt; The extension drives the store through its &lt;a href="https://ucpchecker.com/blog/ucp-identity-linking-agentic-commerce" rel="noopener noreferrer"&gt;published MCP tools&lt;/a&gt; — the exact path a real agent takes — so what you see is the &lt;em&gt;protocol&lt;/em&gt; behaving, not a DOM hack that breaks on the next redesign.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl8hd152ipyvnnttpcfpb.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl8hd152ipyvnnttpcfpb.webp" alt="The UCP Playground side panel running a full agent shopping flow on Everlane — product lookup, add-to-cart, and a generated checkout link, beside the live storefront" width="800" height="500"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The side panel drives the store's own UCP tools — get_product_details, update_cart, checkout — beside the page, with timing on every call.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a side panel
&lt;/h2&gt;

&lt;p&gt;Side-by-side is the whole point. When you can see the store's own page and the agent's tool calls together, the gaps jump out: a product the agent can't resolve, an option that doesn't map, a checkout that stalls where the website sails through.&lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;builders&lt;/strong&gt;, that's a debugging loop measured in seconds — validate product schemas, option resolution, and checkout flows against a live agent without wiring up anything. For &lt;strong&gt;merchants&lt;/strong&gt;, it's the first time you can &lt;em&gt;watch how an agent experiences your store&lt;/em&gt; the way a customer's assistant soon will — the same kind of run that produced &lt;a href="https://ucpchecker.com/blog/first-autonomous-ai-agent-purchase-ucp" rel="noopener noreferrer"&gt;the first fully autonomous UCP purchase&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8l7e5vj9fbx40qjk69zp.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8l7e5vj9fbx40qjk69zp.webp" alt="The side panel handling a multi-item request on Kylie Cosmetics — two search_catalog calls, product cards and shade selection, next to the storefront" width="800" height="500"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Multi-item search and option resolution on Kylie Cosmetics, side by side with the store.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://ucpchecker.com/blog/how-a-browser-extension-became-our-biggest-discovery-engine" rel="noopener noreferrer"&gt;first extension&lt;/a&gt; was pure client-side: one &lt;code&gt;fetch()&lt;/code&gt; to &lt;code&gt;/.well-known/ucp&lt;/code&gt;, a badge, done. This one is a thin client in front of the &lt;a href="https://ucpplayground.com/?utm_source=ucpchecker&amp;amp;utm_medium=blog&amp;amp;utm_campaign=playground-extension" rel="noopener noreferrer"&gt;UCP Playground&lt;/a&gt; engine — your browser handles detection and display; the agent itself runs server-side. Four moving parts:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Detection.&lt;/strong&gt; On navigation, the extension probes the current host's &lt;code&gt;/.well-known/ucp&lt;/code&gt; — the same one-request check the first extension uses, with the host permission scoped to that exact path. It only ever reads the &lt;em&gt;hostname&lt;/em&gt; to know where to look.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. The side panel.&lt;/strong&gt; Clicking the toolbar icon opens Chrome's native side panel (&lt;code&gt;sidepanel.html&lt;/code&gt;) docked beside the page. The &lt;code&gt;tabs&lt;/code&gt; permission lets the panel follow you as you move between stores, so it always knows which store to target — without touching anything on the page itself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. The agent runs on the Headless API — not in your browser.&lt;/strong&gt; The extension doesn't run a model locally or scrape the DOM. It sends your message and the store's hostname to the &lt;strong&gt;UCP Playground Headless API&lt;/strong&gt;, authenticated with your personal access token. Playground orchestrates the chosen model and executes the store's &lt;em&gt;published MCP tools&lt;/em&gt; server-side — the same engine behind the web Playground and our &lt;a href="https://ucpchecker.com/blog/ucp-playground-evals" rel="noopener noreferrer"&gt;evals&lt;/a&gt; — then streams the session back to the panel. That's why setup needs a free account and a token.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. The timeline.&lt;/strong&gt; Every step renders in order — &lt;code&gt;search_catalog&lt;/code&gt;, &lt;code&gt;get_product_details&lt;/code&gt;, &lt;code&gt;update_cart&lt;/code&gt;, &lt;code&gt;complete_checkout&lt;/code&gt; — each with its raw response, latency, product cards, cart state, and the final checkout URL. The same observability the web Playground is known for, docked beside the storefront.&lt;/p&gt;

&lt;p&gt;Everything heavy — the model, the tool execution, the session record — lives in Playground. The extension is the lens.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it reads — and what it doesn't
&lt;/h2&gt;

&lt;p&gt;The first extension made a point of its ultra-narrow permissions; we hold the same line here, with one honest difference — a side panel that follows you across stores needs &lt;code&gt;tabs&lt;/code&gt; to know which store you're on.&lt;/p&gt;

&lt;p&gt;Here's the full ledger: &lt;code&gt;storage&lt;/code&gt; (your token and settings), &lt;code&gt;activeTab&lt;/code&gt; + &lt;code&gt;tabs&lt;/code&gt; (the current store's hostname), &lt;code&gt;sidePanel&lt;/code&gt; (the panel itself), and host access scoped to &lt;code&gt;/.well-known/ucp&lt;/code&gt; plus &lt;code&gt;ucpplayground.com&lt;/code&gt;. What it never requests: content-script access to store pages, your cookies, your history, or &lt;code&gt;&amp;lt;all_urls&amp;gt;&lt;/code&gt;. It reads &lt;em&gt;which&lt;/em&gt; store you're looking at, not &lt;em&gt;what's on the page&lt;/em&gt; — and that hostname is the only thing about your browsing that ever leaves the browser.&lt;/p&gt;

&lt;h2&gt;
  
  
  Set up in under a minute
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://chromewebstore.google.com/detail/ucp-playground-%E2%80%94-ai-shopp/hlblkioegnephlgkmgdlemlbkaimhdpp" rel="noopener noreferrer"&gt;Install from the Chrome Web Store&lt;/a&gt;&lt;/strong&gt; (Chrome, Edge, Brave — any Chromium browser).&lt;/li&gt;
&lt;li&gt;Create a free account at &lt;a href="https://ucpplayground.com/?utm_source=ucpchecker&amp;amp;utm_medium=blog&amp;amp;utm_campaign=playground-extension" rel="noopener noreferrer"&gt;ucpplayground.com&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Generate a personal access token in settings and paste it into the extension options.&lt;/li&gt;
&lt;li&gt;Browse. It auto-detects UCP-enabled stores; click the icon to open the side panel and start shopping.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Where it fits in the stack
&lt;/h2&gt;

&lt;p&gt;Each of our tools answers a different question, and now they chain cleanly from your toolbar:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Detect&lt;/strong&gt; — the &lt;a href="https://ucpchecker.com/blog/how-a-browser-extension-became-our-biggest-discovery-engine" rel="noopener noreferrer"&gt;UCP Checker extension&lt;/a&gt;: is this store agent-ready?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Score&lt;/strong&gt; — &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;UCP Checker&lt;/a&gt;: how agent-ready is it?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Act &amp;amp; observe&lt;/strong&gt; — &lt;em&gt;this&lt;/em&gt; extension + &lt;a href="https://ucpplayground.com/?utm_source=ucpchecker&amp;amp;utm_medium=blog&amp;amp;utm_campaign=playground-extension" rel="noopener noreferrer"&gt;UCP Playground&lt;/a&gt;: what happens when an agent actually shops it?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor&lt;/strong&gt; — &lt;a href="https://ucpchecker.com/alerts" rel="noopener noreferrer"&gt;UCP Alerts&lt;/a&gt;: did anything change?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The first extension turned passive browsing into a discovery map. This one turns it into a test bench — point an agent at any store on the web and watch the protocol work, live, beside the page.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://chromewebstore.google.com/detail/ucp-playground-%E2%80%94-ai-shopp/hlblkioegnephlgkmgdlemlbkaimhdpp" rel="noopener noreferrer"&gt;UCP Playground — AI Shopping Agent&lt;/a&gt; — Chrome Web Store&lt;/li&gt;
&lt;li&gt;Our analysis — &lt;a href="https://ucpchecker.com/blog/how-a-browser-extension-became-our-biggest-discovery-engine" rel="noopener noreferrer"&gt;How a browser extension became our biggest discovery engine&lt;/a&gt; and &lt;a href="https://ucpchecker.com/blog/why-we-built-ucp-playground" rel="noopener noreferrer"&gt;Why we built UCP Playground&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ucpchecker.com/blog/ucp-playground-1000-agent-sessions" rel="noopener noreferrer"&gt;UCP Playground at 1,000+ agent sessions&lt;/a&gt; — what 16 models and real stores reveal&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  About UCP Checker
&lt;/h2&gt;

&lt;p&gt;UCP Checker is the independent validation and monitoring layer for the &lt;a href="https://ucpchecker.com/protocol" rel="noopener noreferrer"&gt;Universal Commerce Protocol&lt;/a&gt;. We crawl, validate and grade every public UCP manifest we can find, run the &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;merchant directory&lt;/a&gt;, the &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;UCP Score&lt;/a&gt; and live &lt;a href="https://ucpchecker.com/stats" rel="noopener noreferrer"&gt;adoption stats&lt;/a&gt;, and — through &lt;a href="https://ucpplayground.com/?utm_source=ucpchecker&amp;amp;utm_medium=blog&amp;amp;utm_campaign=playground-extension" rel="noopener noreferrer"&gt;UCP Playground&lt;/a&gt; — test how real AI agents behave against real stores.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Install the Playground extension:&lt;/strong&gt; &lt;a href="https://chromewebstore.google.com/detail/ucp-playground-%E2%80%94-ai-shopp/hlblkioegnephlgkmgdlemlbkaimhdpp" rel="noopener noreferrer"&gt;Chrome Web Store&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Check your store:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;ucpchecker.com/check&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grade it (UCP Score):&lt;/strong&gt; &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;ucpchecker.com/score&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Browse the directory:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;ucpchecker.com/directory&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Get notified on changes:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/alerts" rel="noopener noreferrer"&gt;ucpchecker.com/alerts&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ecommerce</category>
      <category>ai</category>
      <category>product</category>
      <category>ucp</category>
    </item>
    <item>
      <title>Google Opened UCP for Lodging. We Already Tested What It Needs to Handle.</title>
      <dc:creator>Benji Fisher</dc:creator>
      <pubDate>Tue, 26 May 2026 09:39:24 +0000</pubDate>
      <link>https://dev.to/benjifisher/google-opened-ucp-for-lodging-we-already-tested-what-it-needs-to-handle-3m7b</link>
      <guid>https://dev.to/benjifisher/google-opened-ucp-for-lodging-we-already-tested-what-it-needs-to-handle-3m7b</guid>
      <description>&lt;p&gt;Google has opened the door to agentic hotel booking. A new developer page — &lt;a href="https://developers.google.com/hotels/ucp" rel="noopener noreferrer"&gt;UCP for Lodging&lt;/a&gt; — describes an open standard to "turn AI interactions into instant bookings for room reservations." But there is no spec to read yet: the page says plainly that detailed onboarding and specs are "coming soon," and points you to a waitlist rather than documentation. This is an announcement, not a release.&lt;/p&gt;

&lt;p&gt;That makes now the moment to write the piece nobody else has. We have already run five frontier models against expiring travel inventory and documented exactly where agents break when an offer times out mid-session. So instead of paraphrasing a waitlist, here is the prescriptive read: what UCP for Lodging will have to specify for agentic hotel booking to actually work — and what hotels should test before they integrate. Our bias throughout: &lt;strong&gt;we saw it in the data first.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Google actually published — and what it didn't
&lt;/h2&gt;

&lt;p&gt;The page is short on detail by design. UCP for Lodging is positioned as an open standard for "direct, instant reservations" that reduces "friction and abandonment," compatible with AP2, A2A and MCP. (New to how those layers fit together? &lt;a href="https://ucpchecker.com/blog/mcp-vs-ucp-vs-ap2-whats-the-difference" rel="noopener noreferrer"&gt;We break the stack down here&lt;/a&gt;.) Two things it does &lt;strong&gt;not&lt;/strong&gt; contain: a published spec, and any named partners. The only call to action is a "Join the waitlist" button.&lt;/p&gt;

&lt;p&gt;One organizational tell worth registering: the page lives at &lt;code&gt;/hotels/ucp&lt;/code&gt;, not &lt;code&gt;/merchant/ucp&lt;/code&gt; where the retail protocol sits. UCP for Lodging is being driven by Google's hotels team — the group behind hotel ads — not its general commerce team. Different stakeholders, different roadmap. The travel-distribution names from Google Marketing Live — Booking.com, Expedia, Hilton, Marriott, IHG, Accor, Choice Hotels, Trip.com, Wyndham and Amadeus — are the supply context, though none appear on this page.&lt;/p&gt;

&lt;p&gt;We covered the demand side switching on across Google's surfaces in &lt;a href="https://ucpchecker.com/blog/google-universal-cart-ucp" rel="noopener noreferrer"&gt;Google Universal Cart &amp;amp; UCP&lt;/a&gt;, and why travel is agentic commerce's hardest vertical in &lt;a href="https://ucpchecker.com/blog/agentic-ai-travel-ucp" rel="noopener noreferrer"&gt;our travel analysis&lt;/a&gt;. Lodging is the hardest case inside the hardest vertical.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lodging is harder than retail — and harder than flights
&lt;/h2&gt;

&lt;p&gt;A retail SKU is a fixed thing with a fixed price. A hotel "offer" is generated on the fly from a property, a room type, a rate plan, a date range and an occupancy — wrapped in a pile of taxes and fees — and it changes by the second.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The offer is a combination, not a SKU.&lt;/strong&gt; Property × room type × rate plan × dates × guests produces a vast, dynamic offer space. There is no stable identifier to drop in a cart and trust.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Length-of-stay pricing.&lt;/strong&gt; Three nights is not three times the nightly rate; pricing varies by date and stay length.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rate-plan semantics.&lt;/strong&gt; Refundable vs non-refundable, prepaid vs pay-at-property, member rates, packages — each carries different money and different rules.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cancellation and modification policies.&lt;/strong&gt; Free-cancel deadlines and penalty schedules are part of the offer, and getting servicing wrong has real financial stakes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;All-in price transparency.&lt;/strong&gt; Resort fees plus city and occupancy taxes mean the nightly rate an agent sees is not the price the guest pays. An agent quoting the wrong number is a broken booking.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Perishable availability.&lt;/strong&gt; Rooms sell out mid-session; an offer from thirty seconds ago may be gone or repriced.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What we already know breaks: the data
&lt;/h2&gt;

&lt;p&gt;The perishable-availability and repricing problems are not hypothetical — we measured them. We built a mock travel server that issued offers with explicit expiry windows and ran &lt;a href="https://ucpchecker.com/blog/we-tested-5-ai-models-expiring-travel-inventory" rel="noopener noreferrer"&gt;five frontier models against it&lt;/a&gt;. Not one checked an offer's time-to-live before booking; only one survived a three-second expiry window; none surfaced a price change unless asked. The agents treated time-boxed travel inventory exactly like a static retail SKU.&lt;/p&gt;

&lt;p&gt;Apply that to lodging. An agent that does not re-validate will happily "confirm" a room that sold out two minutes ago, or quote a pre-tax nightly rate as the total. Those are not edge cases in hotel booking — they are the median path.&lt;/p&gt;

&lt;h2&gt;
  
  
  What UCP for Lodging will need to specify
&lt;/h2&gt;

&lt;p&gt;From the gaps above, here is the shortlist the spec has to cover for agentic hotel booking to be safe. Concretely, a lodging offer needs to look more like this than a retail product:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"offer_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"marriott-LHR-deluxe-king-20260704"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"room_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"deluxe_king"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"rate_plan"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"non_refundable_prepaid"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"stay"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"check_in"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-07-04"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"nights"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"guests"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"total_price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"amount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;921.00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"currency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"USD"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"includes_taxes_fees"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"free_cancel_until"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"ttl_seconds"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"revalidate_before_booking"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;lodging offer object&lt;/strong&gt; carrying room type, rate plan, stay dates, occupancy and an &lt;strong&gt;all-in total&lt;/strong&gt; (taxes and fees included), with a &lt;strong&gt;TTL&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Machine-readable cancellation and modification policy&lt;/strong&gt; — deadlines and penalties as structured data, not prose.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mandatory re-validation before booking&lt;/strong&gt; — confirm the room is still available at the quoted total before commit.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Servicing capabilities&lt;/strong&gt; for modify, cancel and no-show, with full request context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Identity, loyalty and payment timing&lt;/strong&gt; — member rates, loyalty numbers, deposits and pay-at-property.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A retail offer carries none of these. The lodging extension lives or dies on them.&lt;/p&gt;

&lt;h2&gt;
  
  
  What hotels should test before integrating
&lt;/h2&gt;

&lt;p&gt;You cannot test against a spec that is not published yet — but you can test the thing the spec will sit on top of: your booking engine and the agents that will drive it. Before you join the waitlist and integrate:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Does your offer carry a clock?&lt;/strong&gt; Confirm your booking engine can expose a time-boxed offer with a validity window. &lt;a href="https://ucpchecker.com/ucp-validator" rel="noopener noreferrer"&gt;Validate your endpoint →&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Is your price all-in?&lt;/strong&gt; Make sure an agent can read the total with taxes and fees, not just the nightly rate. &lt;a href="https://ucpchecker.com/catalog-validator" rel="noopener noreferrer"&gt;Check your catalog →&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Do you re-validate before commit?&lt;/strong&gt; An agent must be able to confirm availability and price before it books.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Are your cancellation policies machine-readable?&lt;/strong&gt; Deadlines and penalties an agent can parse, not a paragraph of terms.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Are you fast enough?&lt;/strong&gt; Agents abandon after roughly two seconds; the verified retail median is 156ms. &lt;a href="https://ucpchecker.com/methodology" rel="noopener noreferrer"&gt;How we measure →&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Watch an agent break.&lt;/strong&gt; Run a frontier model through your booking flow and see what it does when an offer expires mid-session — because &lt;a href="https://ucpchecker.com/blog/we-tested-5-ai-models-expiring-travel-inventory" rel="noopener noreferrer"&gt;our data says it will not check&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Two waves: now, and when the spec drops
&lt;/h2&gt;

&lt;p&gt;The spec is "coming soon," and the language suggests weeks, not months. This piece is the first wave — the prescriptive read from the announcement and the data. When the actual UCP for Lodging spec lands, we will publish the side-by-side: what we said it needed versus what Google shipped, with conformance checks against the first live lodging endpoints. If you want that the day it is ready, &lt;a href="https://ucpchecker.com/alerts" rel="noopener noreferrer"&gt;set up alerts&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we're watching
&lt;/h2&gt;

&lt;p&gt;Whether the published spec models offers as time-boxed objects with all-in pricing; which waitlist partners ship a live endpoint first; whether cancellation and servicing make the first cut or get deferred; and — the one we will keep testing — whether agents stop being blind to time. We track the ecosystem monthly in &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-may-2026" rel="noopener noreferrer"&gt;the State of Agentic Commerce census&lt;/a&gt;, and we will report what the data shows, not what the page promises.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Google for Developers — &lt;a href="https://developers.google.com/hotels/ucp" rel="noopener noreferrer"&gt;UCP for Lodging&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Google blog — &lt;a href="https://blog.google/products-and-platforms/products/shopping/shopping-updates-google-marketing-live/" rel="noopener noreferrer"&gt;Shopping updates from Marketing Live&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Our analysis — &lt;a href="https://ucpchecker.com/blog/agentic-ai-travel-ucp" rel="noopener noreferrer"&gt;Agentic AI in Travel: Why UCP Isn't Travel-Ready Yet&lt;/a&gt; and &lt;a href="https://ucpchecker.com/blog/google-universal-cart-ucp" rel="noopener noreferrer"&gt;Google Universal Cart &amp;amp; UCP&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Our testing — &lt;a href="https://ucpchecker.com/blog/we-tested-5-ai-models-expiring-travel-inventory" rel="noopener noreferrer"&gt;We Tested 5 AI Models on Expiring Travel Inventory&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  About UCP Checker
&lt;/h2&gt;

&lt;p&gt;UCP Checker is the independent validation and monitoring layer for the &lt;a href="https://ucpchecker.com/protocol" rel="noopener noreferrer"&gt;Universal Commerce Protocol&lt;/a&gt;. We crawl, validate and grade every public UCP manifest we can find, run the &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;merchant directory&lt;/a&gt;, the &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;UCP Score&lt;/a&gt; and live &lt;a href="https://ucpchecker.com/stats" rel="noopener noreferrer"&gt;adoption stats&lt;/a&gt;, and test how real AI agents behave against real stores. We track spec and ecosystem events — retail today, travel and lodging as they arrive — as they ship.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Check your store or booking endpoint:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;ucpchecker.com/check&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grade it (UCP Score):&lt;/strong&gt; &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;ucpchecker.com/score&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validate your catalog:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/catalog-validator" rel="noopener noreferrer"&gt;ucpchecker.com/catalog-validator&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Browse the directory:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;ucpchecker.com/directory&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Get the side-by-side when the spec drops:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/alerts" rel="noopener noreferrer"&gt;ucpchecker.com/alerts&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Lodging is the hardest booking an agent will attempt. We will be measuring it.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Retail merchant figures are live from the UCP Checker index and update every 24 hours; agent-testing methodology is on our &lt;a href="https://ucpchecker.com/methodology" rel="noopener noreferrer"&gt;methodology&lt;/a&gt; page.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ecommerce</category>
      <category>ai</category>
      <category>data</category>
      <category>ucp</category>
    </item>
    <item>
      <title>Agentic AI in Travel: Why UCP Isn't Travel-Ready Yet — and What We Measured</title>
      <dc:creator>Benji Fisher</dc:creator>
      <pubDate>Fri, 22 May 2026 14:51:28 +0000</pubDate>
      <link>https://dev.to/benjifisher/agentic-ai-in-travel-why-ucp-isnt-travel-ready-yet-and-what-we-measured-19h4</link>
      <guid>https://dev.to/benjifisher/agentic-ai-in-travel-why-ucp-isnt-travel-ready-yet-and-what-we-measured-19h4</guid>
      <description>&lt;p&gt;In a new post, Amadeus — the system of record behind roughly 3 billion flight searches a day, 400-plus airlines and 2 million hotel properties — laid out its read on agentic AI in travel. The headline from CTO Sylvain Roy: protocols, not chatbots, are what make agentic travel work, and the protocols that exist today are not ready. MCP, he argues, is "only a first step"; the Universal Commerce Protocol "could open large new distribution and conversion opportunities" but remains "largely retail-native and is still in the process of being adapted to handle the full complexity of travel."&lt;/p&gt;

&lt;p&gt;He is right — and we can put data behind it. Before Amadeus weighed in, we built a mock travel server with expiring offers and pointed five frontier models at it. Not one checked an offer's time-to-live before booking. Only one survived a three-second expiry window. None flagged a price change unless explicitly asked. Travel's defining feature — perishable inventory priced in real time — is precisely where today's agents fall over.&lt;/p&gt;

&lt;p&gt;So here is the state of play from the monitoring seat: travel's &lt;strong&gt;supply side just showed up to UCP&lt;/strong&gt;, but the &lt;strong&gt;demand side cannot reliably transact on it yet&lt;/strong&gt;, and the protocol needs travel-specific work before it can. This is the read on agentic commerce's hardest vertical. Our bias throughout: &lt;strong&gt;we saw it in the data first.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What just happened: travel's supply side standardized on UCP
&lt;/h2&gt;

&lt;p&gt;Two signals landed in quick succession. At Google Marketing Live 2026, Google extended UCP into hotels and local food delivery, naming a launch roster that reads like the entire travel-distribution stack: Accor, Amadeus, Booking.com, Choice Hotels, Expedia Group, Hilton, IHG, Marriott, Trip.com and Wyndham. Booking a hotel "right from AI Mode in Search" is the demo. (We covered the full package — &lt;a href="https://ucpchecker.com/blog/google-universal-cart-ucp" rel="noopener noreferrer"&gt;Universal Cart, multi-item checkout, BNPL and the rest&lt;/a&gt; — separately.)&lt;/p&gt;

&lt;p&gt;Then Amadeus published its position. Notably, it is &lt;strong&gt;not&lt;/strong&gt; an adoption announcement. Roy frames Amadeus as the "embedded and neutral execution layer for travel" and says it is ready to collaborate on travel-ready agentic protocols — but commits to no UCP timeline. The subtext is the honest one: the supply side is interested, the rails are forming, and nobody is claiming travel is solved.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why travel is agentic commerce's hardest vertical
&lt;/h2&gt;

&lt;p&gt;Retail is, relatively, simple: a product has a fixed SKU, a stable price and a quantity. Travel has none of those guarantees. Amadeus calls out five gaps, and each is a place a retail-native protocol strains.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic offers, not fixed SKUs.&lt;/strong&gt; A flight or room is not a catalog entry — it is a dynamically generated Offer ID built from origin, destination, dates, fare family, ancillaries and disruption rules. There is no stable identifier for an agent to drop in a cart and trust.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Perishable inventory, real-time pricing.&lt;/strong&gt; Availability and price change by the second, with no true SKU equivalent. An offer an agent saw thirty seconds ago may already be gone or repriced.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complex servicing.&lt;/strong&gt; Changes, cancellations and disruption handling demand full request context, and the stakes of getting it wrong are far higher than a returned T-shirt.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Identity, settlement and regulation.&lt;/strong&gt; Multi-party fulfillment, traveler identity verification and compliance obligations ride on every booking.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data quality and control.&lt;/strong&gt; Personalization needs traveler context shared across parties while suppliers keep visibility and control of it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;UCP models the retail case well — fixed catalog, cart, checkout, payment handlers. (New to how UCP sits next to MCP and AP2? &lt;a href="https://ucpchecker.com/blog/mcp-vs-ucp-vs-ap2-whats-the-difference" rel="noopener noreferrer"&gt;We break the stack down here&lt;/a&gt;.) Travel needs primitives the retail spec does not have yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  The data: agents are blind to time
&lt;/h2&gt;

&lt;p&gt;The perishable-inventory and real-time-pricing problems are not theoretical. They are measurable, and we measured them.&lt;/p&gt;

&lt;p&gt;We built a mock travel server that issued offers with explicit expiry windows and ran &lt;a href="https://ucpchecker.com/blog/we-tested-5-ai-models-expiring-travel-inventory" rel="noopener noreferrer"&gt;five frontier models against it across 21 sessions&lt;/a&gt;. The results were stark: not one model checked an offer's TTL before attempting to book; only one survived a three-second expiry window; none surfaced a price change unless we asked. The agents treated time-sensitive travel inventory exactly like a static retail SKU — and that is the failure mode that turns a "confirmed" booking into a charge for an offer that no longer exists.&lt;/p&gt;

&lt;p&gt;The same blindness shows up on the retail side. Across 33 sessions on a live store, only two of five models caught a stale price even when handed yesterday's number — and the deeper problem was structural: the merchant data layer did not expose temporal metadata at all. A model cannot reason about expiry if nothing in the stack tells it the offer expires.&lt;/p&gt;

&lt;p&gt;That is Amadeus's "perishable inventory" and "live search at scale" challenges, observed in production. The protocol that wins travel has to carry time — TTLs, price-as-of timestamps, re-validation before commit — as a first-class signal, and agents have to be built to respect it. Neither is true today.&lt;/p&gt;

&lt;h2&gt;
  
  
  Retail-native, not travel-ready: what UCP still needs
&lt;/h2&gt;

&lt;p&gt;The encouraging part is that UCP is extensible by design. The spec grows through capability extensions — the Technical Council has been adding cart, order, loyalty and identity primitives since launch. Travel is simply the next, harder extension surface. From the gaps above, the shortlist:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;dynamic-offer primitive&lt;/strong&gt; — an offer object carrying a TTL and a price-validity window, replacing the assumption of a stable SKU.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Re-validation in the checkout flow&lt;/strong&gt; — an agent must confirm the offer is still valid and still the quoted price before it commits.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Servicing capabilities&lt;/strong&gt; for change, cancel and disruption, with enough context to handle them safely.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Identity and settlement&lt;/strong&gt; extensions for multi-party, regulated fulfillment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Concretely, the dynamic-offer primitive looks something like this — a travel offer that carries its own clock:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"offer_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"LHR-NRT-20260612-A1F2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"amount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;842.00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"currency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"USD"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"ttl_seconds"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"price_valid_until"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-06-12T19:30:03Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"revalidate_before_checkout"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A retail SKU carries none of those fields, and that is the whole problem: nothing tells an agent the offer it is holding has already expired. None of these are exotic. They are the difference between a protocol that can sell a tote bag and one that can sell a Tokyo itinerary. And the backers are aligned to build them: the companies behind UCP — Google, Shopify, Amazon, Microsoft, Meta, Stripe and Salesforce, the group whose &lt;a href="https://ucpchecker.com/blog/ucp-tech-council-expands-amazon-meta-microsoft-salesforce-stripe" rel="noopener noreferrer"&gt;Technical Council recently expanded&lt;/a&gt; — now have the travel-distribution establishment at the table.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it means
&lt;/h2&gt;

&lt;p&gt;For &lt;strong&gt;travel suppliers&lt;/strong&gt; (airlines, hotels, OTAs, GDSs): being named at a keynote is not the same as being agent-ready. The supply side is forming the rails, but the offer, pricing and servicing semantics that make travel safe for agents are not in the published spec yet. The window to shape those primitives — rather than inherit retail's — is open now, and Amadeus is signalling it wants in.&lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;demand-side players&lt;/strong&gt; (agents, assistants, AI shopping surfaces): do not ship autonomous travel booking on retail assumptions. Our data says agents will book expired offers and miss price changes unless both the protocol and the agent treat time as first-class. Until travel primitives land, keep a human on the commit step.&lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;the ecosystem&lt;/strong&gt; (us): we do not yet monitor live travel UCP endpoints — there are not any in production to monitor. What we can measure today is the agent side, and we have shown the gap is real. As travel manifests go live, the same validation, scoring and monitoring we run across more than 6,500 retail merchants — tracked monthly in &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-may-2026" rel="noopener noreferrer"&gt;the State of Agentic Commerce census&lt;/a&gt; — will extend to them. The neutral measurement layer travel is asking for is the one we already operate.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we're watching
&lt;/h2&gt;

&lt;p&gt;Whether the Technical Council opens a travel working group or a dynamic-offer extension; which of the GML travel partners ships a live, parseable endpoint first; whether Amadeus moves from "ready to collaborate" to a concrete commitment; and — the one we will keep testing — whether the next generation of models stops being blind to time. We will report what the data shows, not what the slides promise.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Amadeus — &lt;a href="https://amadeus.com/en/blog/articles/agentic-ai-travel-mcp-ucp-standards" rel="noopener noreferrer"&gt;Agentic AI in travel: why MCP and UCP standards matter&lt;/a&gt; (Sylvain Roy, CTO)&lt;/li&gt;
&lt;li&gt;Google blog — &lt;a href="https://blog.google/products-and-platforms/products/shopping/shopping-updates-google-marketing-live/" rel="noopener noreferrer"&gt;Shopping updates from Marketing Live&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Our testing — &lt;a href="https://ucpchecker.com/blog/we-tested-5-ai-models-expiring-travel-inventory" rel="noopener noreferrer"&gt;We Tested 5 AI Models on Expiring Travel Inventory&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  About UCP Checker
&lt;/h2&gt;

&lt;p&gt;UCP Checker is the independent validation and monitoring layer for the &lt;a href="https://ucpchecker.com/protocol" rel="noopener noreferrer"&gt;Universal Commerce Protocol&lt;/a&gt;. We crawl, validate and grade every public UCP manifest we can find, run the &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;merchant directory&lt;/a&gt;, the &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;UCP Score&lt;/a&gt; and live &lt;a href="https://ucpchecker.com/stats" rel="noopener noreferrer"&gt;adoption stats&lt;/a&gt;, and test how real AI agents behave against real stores. We track spec and ecosystem events — retail today, travel as it arrives — as they ship.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Check your store:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;ucpchecker.com/check&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grade it (UCP Score):&lt;/strong&gt; &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;ucpchecker.com/score&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;See capability coverage:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/capabilities" rel="noopener noreferrer"&gt;ucpchecker.com/capabilities&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Browse the directory:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;ucpchecker.com/directory&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Get notified when an endpoint changes:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/alerts" rel="noopener noreferrer"&gt;ucpchecker.com/alerts&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Travel is the hardest test agentic commerce has set itself. We will be measuring it.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Retail merchant figures are live from the UCP Checker index and update every 24 hours; agent-testing methodology is on our &lt;a href="https://ucpchecker.com/methodology" rel="noopener noreferrer"&gt;methodology&lt;/a&gt; page.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ecommerce</category>
      <category>ai</category>
      <category>data</category>
      <category>ucp</category>
    </item>
    <item>
      <title>Shopify Just Shipped a UCP CLI. It Buys Anywhere — But Only Finds Shopify.</title>
      <dc:creator>Benji Fisher</dc:creator>
      <pubDate>Thu, 21 May 2026 17:09:41 +0000</pubDate>
      <link>https://dev.to/benjifisher/shopify-just-shipped-a-ucp-cli-it-buys-anywhere-but-only-finds-shopify-38lf</link>
      <guid>https://dev.to/benjifisher/shopify-just-shipped-a-ucp-cli-it-buys-anywhere-but-only-finds-shopify-38lf</guid>
      <description>&lt;p&gt;On &lt;strong&gt;May 18, 2026&lt;/strong&gt;, Shopify published &lt;a href="https://www.npmjs.com/package/@shopify/ucp-cli" rel="noopener noreferrer"&gt;&lt;code&gt;@shopify/ucp-cli&lt;/code&gt;&lt;/a&gt; to npm — a command-line tool and MCP server it describes as "a shopping skill for AI agents, powered by the &lt;a href="https://ucpchecker.com/protocol" rel="noopener noreferrer"&gt;Universal Commerce Protocol&lt;/a&gt;." Within days it's at v0.5.0, MIT-licensed, &lt;a href="https://github.com/Shopify/ucp-cli" rel="noopener noreferrer"&gt;open on GitHub&lt;/a&gt;. The pitch, from Ilya Grigorik's launch post: millions of Shopify merchants natively speak UCP, billions of products are discoverable through a global catalog, and now any agent can learn to shop them with two commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @shopify/ucp-cli
ucp skills add
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The CLI shipping is not the interesting part. A reference client for a maturing protocol was always coming. &lt;strong&gt;The interesting part is that &lt;code&gt;@shopify/ucp-cli&lt;/code&gt; is really two tools wearing one binary — and only one of them leaves Shopify.&lt;/strong&gt; One layer is pure, platform-neutral UCP that will transact against any conformant merchant on any stack. The other is a discovery engine hardwired to Shopify's own catalog. Pull them apart and you get a clear read on where the protocol is genuinely open today, and where the gravity still pulls toward the platform with the biggest index.&lt;/p&gt;

&lt;h2&gt;
  
  
  What shipped
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;@shopify/ucp-cli&lt;/code&gt; is agent-first by design. Every command takes and returns structured JSON, so a model can compose payloads and parse results without scraping human-readable output. &lt;code&gt;ucp skills add&lt;/code&gt; installs a bundled &lt;a href="https://agentskills.io" rel="noopener noreferrer"&gt;&lt;code&gt;SKILL.md&lt;/code&gt;&lt;/a&gt; — the operating manual that teaches an agent &lt;em&gt;when&lt;/em&gt; to search versus discover, how to render totals, how to surface required disclosures, and when to hand off to a human. The full command surface mirrors the UCP shopping service:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ucp catalog search&lt;/code&gt; / &lt;code&gt;catalog lookup&lt;/code&gt; / &lt;code&gt;catalog get_product&lt;/code&gt; — find products&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ucp cart create | update | get | cancel&lt;/code&gt; — build a cart with confirmed pricing&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ucp checkout create | update | complete | cancel&lt;/code&gt; — convert and pay&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ucp order get&lt;/code&gt; — post-purchase status&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ucp discover --business &amp;lt;url&amp;gt;&lt;/code&gt; — ask a store what it actually supports&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Two design choices stand out. First, &lt;strong&gt;live introspection&lt;/strong&gt;: &lt;code&gt;ucp discover&lt;/code&gt; and &lt;code&gt;--input-schema&lt;/code&gt; make real network calls to the merchant, not static doc lookups. The agent composes against whatever the merchant advertises &lt;em&gt;right now&lt;/em&gt; — including extensions added since it last shopped there. Second, &lt;strong&gt;every response carries a &lt;code&gt;cta&lt;/code&gt;&lt;/strong&gt; — the CLI tracks where you are in the flow and returns the next-best commands as structured recommendations, so the agent doesn't have to memorize the operating model. There's also a configurable escalation hook (&lt;code&gt;UCP_ON_ESCALATION&lt;/code&gt;) for the moments a merchant needs the human back in the loop.&lt;/p&gt;

&lt;p&gt;It's a genuinely well-built piece of agent tooling. Which is exactly why the boundary inside it is worth tracing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two layers, two answers
&lt;/h2&gt;

&lt;p&gt;The question that matters for everyone who isn't Shopify: &lt;strong&gt;does it work outside Shopify?&lt;/strong&gt; The honest answer is that the CLI has two layers, and they answer differently.&lt;/p&gt;

&lt;h3&gt;
  
  
  The transaction layer is pure UCP
&lt;/h3&gt;

&lt;p&gt;Cart, checkout, order, and per-merchant catalog operations route through the generic &lt;code&gt;dev.ucp.shopping&lt;/code&gt; service. There is no Shopify branching in the dispatch path. When you pass &lt;code&gt;--business https://store.example.com&lt;/code&gt;, the CLI fetches that store's &lt;code&gt;/.well-known/ucp&lt;/code&gt; profile, negotiates a compatible protocol version and transport, and dispatches against the endpoint the merchant advertises — exactly as the spec describes. The capabilities it negotiates against are the neutral ones on &lt;code&gt;ucp.dev&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dev.ucp.shopping.checkout   → ucp.dev/2026-04-08/schemas/shopping/checkout.json
dev.ucp.shopping.cart       → ucp.dev/2026-04-08/schemas/shopping/cart.json
dev.ucp.shopping.catalog.*  → ucp.dev/2026-04-08/schemas/shopping/catalog_*.json
dev.ucp.shopping.order      → ucp.dev/2026-04-08/schemas/shopping/order.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That means a WooCommerce, Magento, BigCommerce, or fully custom store that publishes a valid UCP profile can be carted and checked out by this CLI today. The README says as much in plain language: build carts and complete checkouts "against any UCP-supporting merchant." The transaction layer is real, standards-based, and platform-neutral.&lt;/p&gt;

&lt;h3&gt;
  
  
  The discovery layer is Shopify
&lt;/h3&gt;

&lt;p&gt;Now run &lt;code&gt;ucp catalog search&lt;/code&gt; with no &lt;code&gt;--business&lt;/code&gt;. Where does it go? Straight to a hardcoded default:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;package.json&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"ucp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"default_catalog_url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://catalog.shopify.com"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The "search across millions of merchants" headline resolves against &lt;a href="https://shopify.dev/docs/agents/catalog/global-catalog-extension" rel="noopener noreferrer"&gt;&lt;code&gt;catalog.shopify.com&lt;/code&gt;&lt;/a&gt;. And the catalog tools aren't even part of core UCP — they're surfaced through two &lt;strong&gt;Shopify-namespaced extensions&lt;/strong&gt;, &lt;code&gt;dev.shopify.catalog&lt;/code&gt; and &lt;code&gt;dev.shopify.catalog.global&lt;/code&gt;, with schemas served from &lt;code&gt;shopify.dev&lt;/code&gt;, not &lt;code&gt;ucp.dev&lt;/code&gt;. The results name their sellers as &lt;code&gt;*.myshopify.com&lt;/code&gt; domains. (The bundled skill even warns agents to check &lt;code&gt;variants[*].seller.domain&lt;/code&gt; rather than trust the brand in a product title — a Keychron keyboard in the global catalog might be sold by a third-party Shopify reseller, not Keychron itself.)&lt;/p&gt;

&lt;p&gt;So the asymmetry, stated plainly:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Scope&lt;/th&gt;
&lt;th&gt;Backed by&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Discovery&lt;/strong&gt; (&lt;code&gt;catalog search&lt;/code&gt;, no &lt;code&gt;--business&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Shopify merchants only&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;catalog.shopify.com&lt;/code&gt;, &lt;code&gt;dev.shopify.catalog.global&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Transaction&lt;/strong&gt; (&lt;code&gt;--business &amp;lt;url&amp;gt;&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Any UCP merchant, any platform&lt;/td&gt;
&lt;td&gt;generic &lt;code&gt;dev.ucp.shopping&lt;/code&gt;, &lt;code&gt;ucp.dev&lt;/code&gt; schemas&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;It can buy anywhere. It can only find Shopify.&lt;/strong&gt; To transact with a non-Shopify store, an agent has to already know that store's URL — the CLI will happily check out there, but it will never surface the store in a search.&lt;/p&gt;

&lt;p&gt;One smaller detail for implementers: the CLI negotiates a protocol range of &lt;code&gt;2026-01-23&lt;/code&gt; to &lt;code&gt;2026-04-08&lt;/code&gt; — the same dated releases — and it doesn't speak the working-draft &lt;code&gt;"draft"&lt;/code&gt; version. Reference tooling is tracking published spec versions, not &lt;code&gt;main&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why discovery is the layer that's hard to open
&lt;/h2&gt;

&lt;p&gt;This isn't a knock on Shopify, and it isn't an oversight. It's structural. &lt;strong&gt;A transaction is a one-to-one conversation; discovery is an index.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Standardizing a checkout means agreeing on a message shape between two parties who already found each other. UCP does that well, and the CLI proves it — point it at any conformant endpoint and it works. But standardizing &lt;em&gt;discovery&lt;/em&gt; means someone has to build and host the index of who sells what, keep it fresh, and serve it at query time. That is expensive, and it accrues to whoever already has the catalog. Shopify has millions of merchants and their product data in one place, so a Shopify-backed global catalog is the path of least resistance — for Shopify.&lt;/p&gt;

&lt;p&gt;The consequence is that as agent commerce matures, the transaction layer commoditizes while discovery concentrates. The protocol gives every merchant an equal ability to &lt;em&gt;be transacted with&lt;/em&gt;. It does not, on its own, give every merchant an equal ability to &lt;em&gt;be found&lt;/em&gt;. That gap isn't in the spec — it's in who builds the index on top of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it means in practice
&lt;/h2&gt;

&lt;p&gt;For &lt;strong&gt;Shopify merchants&lt;/strong&gt;: you're discoverable by default. Your products sit in the global catalog an agent searches before it knows which store to visit, with no work on your part. That's a real distribution advantage, and it's the clearest argument in the launch.&lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;non-Shopify merchants&lt;/strong&gt; (&lt;a href="https://ucpchecker.com/platforms/woocommerce" rel="noopener noreferrer"&gt;WooCommerce&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/magento" rel="noopener noreferrer"&gt;Magento&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/bigcommerce" rel="noopener noreferrer"&gt;BigCommerce&lt;/a&gt;, and custom stacks): your UCP endpoint works fine with this CLI — an agent that has your URL can cart and check out against you today. But you will not appear in the default &lt;code&gt;catalog search&lt;/code&gt;. Conformance buys you transactability; it does not buy you discoverability inside a Shopify-owned index. If agent-driven discovery matters to your channel, you need to live in a catalog or directory that spans platforms — and you need your manifest to actually be valid, because an agent that can't parse your profile can't transact with you no matter how it arrived. (&lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;Run a check&lt;/a&gt;.)&lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;agent builders&lt;/strong&gt;: treat the global catalog as a Shopify discovery surface, not a universal one. It's an excellent default for "find me X" with no merchant named. For cross-platform coverage you'll need to bring merchant URLs from elsewhere and lean on &lt;code&gt;discover --business&lt;/code&gt;. The transaction primitives are solid and genuinely portable; the discovery primitive is platform-scoped. Plan your retrieval layer accordingly.&lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;evaluators&lt;/strong&gt; (us): this sharpens why we run an independent, cross-platform &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;directory&lt;/a&gt;. We index UCP merchants by &lt;em&gt;verification&lt;/em&gt; — does this domain serve a valid, working manifest — not by membership in any one platform's catalog. The WooCommerce and Magento long tail that a Shopify global catalog structurally can't surface is exactly the part of the ecosystem we track. We measured the platform split in our &lt;a href="https://ucpchecker.com/blog/agentic-commerce-optimization-ucp-readiness-data" rel="noopener noreferrer"&gt;readiness data&lt;/a&gt; and revisit it every month in the &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-may-2026" rel="noopener noreferrer"&gt;State of Agentic Commerce&lt;/a&gt; census; a Shopify-only discovery default is a meaningful shift in how that long tail gets found.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to read more
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The package: &lt;a href="https://www.npmjs.com/package/@shopify/ucp-cli" rel="noopener noreferrer"&gt;&lt;code&gt;@shopify/ucp-cli&lt;/code&gt; on npm&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;The source: &lt;a href="https://github.com/Shopify/ucp-cli" rel="noopener noreferrer"&gt;github.com/Shopify/ucp-cli&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;The bundled agent skill: &lt;a href="https://github.com/Shopify/ucp-cli/blob/main/skills/ucp/SKILL.md" rel="noopener noreferrer"&gt;&lt;code&gt;skills/ucp/SKILL.md&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Shopify's global catalog extension: &lt;a href="https://shopify.dev/docs/agents/catalog/global-catalog-extension" rel="noopener noreferrer"&gt;global-catalog-extension&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;How the three protocol layers fit together: &lt;a href="https://ucpchecker.com/blog/mcp-vs-ucp-vs-ap2-whats-the-difference" rel="noopener noreferrer"&gt;MCP vs UCP vs AP2&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  About UCP Checker
&lt;/h2&gt;

&lt;p&gt;UCP Checker is the independent validation and monitoring layer for the &lt;a href="https://ucpchecker.com/protocol" rel="noopener noreferrer"&gt;Universal Commerce Protocol&lt;/a&gt;. We crawl, validate, and grade every public UCP manifest we can find — across &lt;a href="https://ucpchecker.com/platforms/shopify" rel="noopener noreferrer"&gt;Shopify&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/woocommerce" rel="noopener noreferrer"&gt;WooCommerce&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/bigcommerce" rel="noopener noreferrer"&gt;BigCommerce&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/magento" rel="noopener noreferrer"&gt;Magento&lt;/a&gt; and beyond — and run the cross-platform &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;merchant directory&lt;/a&gt;, the &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;UCP Score&lt;/a&gt;, and the &lt;a href="https://ucpchecker.com/stats" rel="noopener noreferrer"&gt;adoption stats&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you want to know whether agents can actually transact with your store, on any platform: &lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;run a check&lt;/a&gt;. If you want to catch the moment your store's UCP support changes — a platform update, a spec bump, an endpoint breaking silently — &lt;a href="https://ucpchecker.com/alerts" rel="noopener noreferrer"&gt;set up UCP Alerts&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ecommerce</category>
      <category>ai</category>
      <category>ucp</category>
      <category>shopify</category>
    </item>
    <item>
      <title>How to Test Your UCP Implementation with AI Agents</title>
      <dc:creator>Benji Fisher</dc:creator>
      <pubDate>Fri, 15 May 2026 09:11:04 +0000</pubDate>
      <link>https://dev.to/benjifisher/how-to-test-your-ucp-implementation-with-ai-agents-180g</link>
      <guid>https://dev.to/benjifisher/how-to-test-your-ucp-implementation-with-ai-agents-180g</guid>
      <description>&lt;p&gt;You ship a UCP manifest. The validator returns green. The schema parses cleanly. Every required field is present, every URL resolves, every transport responds. You declare the work done and move on.&lt;/p&gt;

&lt;p&gt;Three weeks later, you find out your store has been quietly failing every agent shopping session. The cart endpoint accepts adds but rejects checkouts. A specific variant ID throws a 400 on &lt;code&gt;update_cart&lt;/code&gt;. The agent reaches &lt;code&gt;ready_for_complete&lt;/code&gt; and stalls because your payment handler doesn't recognise the token format. None of these issues showed up in static validation. All of them block real users on agent-mediated flows.&lt;/p&gt;

&lt;p&gt;This post is about how to actually test your UCP implementation — not as a schema document, but as a runtime surface that real frontier agents have to operate against. The short version: &lt;strong&gt;schema validation is necessary but not sufficient&lt;/strong&gt;. The long version is the rest of this post.&lt;/p&gt;

&lt;h2&gt;
  
  
  What validators catch and what they miss
&lt;/h2&gt;

&lt;p&gt;A UCP validator (including ours, &lt;a href="https://ucpchecker.com/ucp-validator" rel="noopener noreferrer"&gt;the validator at ucpchecker.com/ucp-validator&lt;/a&gt;) checks structural things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Manifest is valid JSON&lt;/li&gt;
&lt;li&gt;Required fields are present (&lt;code&gt;spec&lt;/code&gt;, &lt;code&gt;services&lt;/code&gt;, &lt;code&gt;signing_keys&lt;/code&gt;, etc.)&lt;/li&gt;
&lt;li&gt;Declared spec version is one we recognise&lt;/li&gt;
&lt;li&gt;Transport endpoints return non-error responses&lt;/li&gt;
&lt;li&gt;Schema URLs resolve&lt;/li&gt;
&lt;li&gt;Capability namespaces match the spec catalogue&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are the things you can verify without actually running an agent flow against the store. They're table-stakes, and the &lt;a href="https://ucpchecker.com/blog/introducing-ucp-score-agent-readiness-grade" rel="noopener noreferrer"&gt;UCP Score&lt;/a&gt; bakes them into the structural-conformance dimension of its grade.&lt;/p&gt;

&lt;p&gt;What static validation doesn't catch:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Whether &lt;code&gt;update_cart&lt;/code&gt; rejects valid variant IDs intermittently&lt;/li&gt;
&lt;li&gt;Whether the cart endpoint's success response contains the line items it claims to contain&lt;/li&gt;
&lt;li&gt;Whether the checkout flow surfaces the buyer-specific payment instruments your customer can actually use&lt;/li&gt;
&lt;li&gt;Whether your &lt;code&gt;search_catalog&lt;/code&gt; returns more than 8 KB of HTML in a &lt;code&gt;description&lt;/code&gt; field that crashes Claude's tool-calling layer&lt;/li&gt;
&lt;li&gt;Whether two different models pick the same variant ID for "Medium" against your product (the &lt;a href="https://ucpchecker.com/blog/ucp-variant-data-guide" rel="noopener noreferrer"&gt;variant-data problem&lt;/a&gt; we cover separately)&lt;/li&gt;
&lt;li&gt;Whether the agent can recover when one of your tool calls returns a 500 mid-flow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are runtime properties. They only surface when you run an actual agent against an actual checkout. And they're where the gap between "store passes validation" and "agent can buy" lives. The April &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-april-2026" rel="noopener noreferrer"&gt;State of Agentic Commerce report&lt;/a&gt; sized that gap concretely: of 4,014 verified UCP stores, only &lt;strong&gt;9 delivered a flawless end-to-end agent experience&lt;/strong&gt;. A 0.2% flawless rate against a 98%+ conformance rate. The runtime gap is the gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  The three-layer testing pyramid
&lt;/h2&gt;

&lt;p&gt;The right way to test UCP is not "validator or no validator" — it's three layers, each catching a different class of problem, in increasing order of cost and fidelity:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Catches&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;1. Schema validation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://ucpchecker.com/ucp-validator" rel="noopener noreferrer"&gt;&lt;code&gt;/ucp-validator&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Manifest parse errors, missing required fields, malformed URLs&lt;/td&gt;
&lt;td&gt;seconds, free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2. Capability score&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;&lt;code&gt;/score&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Surface signals, declared capabilities, transport reachability, robots/sitemap hygiene&lt;/td&gt;
&lt;td&gt;seconds, free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;3. Live agent eval&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/" rel="noopener noreferrer"&gt;UCP Playground&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Variant resolution, cart/checkout shape, error recovery, multi-model behaviour, attribution flow&lt;/td&gt;
&lt;td&gt;dollars per session, paid&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each layer feeds the next. If layer 1 fails, layer 2 has nothing to score. If layer 2 reports gaps, layer 3 will find them magnified in real agent runs. Skipping layers wastes layer 3's time on bugs the cheaper layers would have caught — that's the case for running them in order rather than going straight to live agents.&lt;/p&gt;

&lt;p&gt;Most teams stop at layer 2. &lt;strong&gt;Stopping at layer 2 is what produces the 99.8%-conformant / 0.2%-flawless gap.&lt;/strong&gt; A clean Score gets you to "the agent has a fair chance." A clean Score plus a clean eval gets you to "the agent reliably completes the flow you care about."&lt;/p&gt;

&lt;h2&gt;
  
  
  What live agent testing actually looks like
&lt;/h2&gt;

&lt;p&gt;Layer 3 is where most readers are unfamiliar, so this section walks through what running an agent test against your own store actually involves.&lt;/p&gt;

&lt;p&gt;The shape: you point a frontier agent (Claude, GPT, Gemini, Grok, Llama — whichever model you want to evaluate against) at your store's UCP manifest endpoint and give it a multi-turn shopping prompt. The agent does what an agent does — discovers your tools via the manifest, calls &lt;code&gt;search_catalog&lt;/code&gt; against your products, evaluates the results, picks something, calls &lt;code&gt;update_cart&lt;/code&gt;, navigates checkout. The framework records every tool call, every response, every model decision, the full token-by-token event stream.&lt;/p&gt;

&lt;p&gt;At the end of the session you get a structured report:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Did the agent reach &lt;code&gt;checkout_reached&lt;/code&gt; (full transaction completion)?&lt;/li&gt;
&lt;li&gt;Or did it stop at &lt;code&gt;cart_created&lt;/code&gt;, &lt;code&gt;search_only&lt;/code&gt;, or &lt;code&gt;failed&lt;/code&gt;?&lt;/li&gt;
&lt;li&gt;How many tool calls did it make? How many succeeded? Which ones errored?&lt;/li&gt;
&lt;li&gt;How many tokens did the model consume?&lt;/li&gt;
&lt;li&gt;How long did the session take?&lt;/li&gt;
&lt;li&gt;If the agent failed, why?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's the data layer 1 and layer 2 can't produce. &lt;strong&gt;Schema validation tells you what your store says; agent eval tells you what an agent does with what your store says.&lt;/strong&gt; They're answering different questions.&lt;/p&gt;

&lt;p&gt;For most stores, the first eval session is uncomfortable. The agent picks the wrong variant. Or it adds something to the cart and then stalls because the response shape isn't quite what it expected. Or it reaches &lt;code&gt;ready_for_complete&lt;/code&gt; and can't move forward because your payment-handler declaration doesn't match what the agent has been trained to handle. Each of those is a fix you can make, and each fix lifts your real conversion rate the next time an actual user-facing agent shops your store.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why testing on one model isn't enough
&lt;/h2&gt;

&lt;p&gt;A useful pattern from the &lt;a href="https://ucpchecker.com/blog/ucp-playground-1000-agent-sessions" rel="noopener noreferrer"&gt;Playground 1,000-session dataset&lt;/a&gt;: the same store gets meaningfully different outcomes across different models. A store that completes checkout 65% of the time on Claude Sonnet 4.5 might complete only 18% of the time on GPT-5.2 — the same UCP implementation, the same shopping prompt, just a different model.&lt;/p&gt;

&lt;p&gt;That spread isn't because one model is "better." It's because each frontier model has its own quirks in how it handles tool calls, schemas, error responses, and ambiguous data. Models differ on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How they handle empty arrays vs missing fields&lt;/li&gt;
&lt;li&gt;Whether they follow up on a 4xx response or move on&lt;/li&gt;
&lt;li&gt;How aggressively they retry failed tool calls&lt;/li&gt;
&lt;li&gt;How they parse multi-line strings in description fields&lt;/li&gt;
&lt;li&gt;Whether they pass through optional metadata fields verbatim&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The real-world implication: &lt;strong&gt;your customers don't all use the same agent&lt;/strong&gt;. Some use ChatGPT-routed flows; some use Anthropic's; some use Google AI Mode; some use a custom agent built on Llama. Testing against just one model means catching only the bugs that one model surfaces, while shipping silent failures to everyone using a different one. Multi-model coverage is what gets you from "this passes for our internal demo" to "this works for real customer traffic."&lt;/p&gt;

&lt;p&gt;UCP Playground supports head-to-head testing across 15+ frontier models. The &lt;a href="https://ucpplayground.com/models/compare?models=claude-sonnet-4-5%2Cgpt-5-2" rel="noopener noreferrer"&gt;comparison view&lt;/a&gt; lets you run the same store against any two models on the same workload. We'd suggest at minimum testing against:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One Anthropic model (Claude Opus or Sonnet)&lt;/li&gt;
&lt;li&gt;One OpenAI model (GPT-5.2 or GPT-4o)&lt;/li&gt;
&lt;li&gt;One Google model (Gemini 3.1 Pro or 2.5 Flash)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Three models cover most of the deployed-agent universe. If any of the three behaves badly against your store, you have a real problem worth fixing before more traffic arrives.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wiring tests into your deploy pipeline
&lt;/h2&gt;

&lt;p&gt;Manual eval is fine for one-off audits. If you're shipping changes regularly, you want this in CI. The Playground exposes a &lt;a href="https://ucpplayground.com/" rel="noopener noreferrer"&gt;headless API&lt;/a&gt; for exactly that:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST /api/v1/collections          — define a test (sequence of prompts + models + stores)
POST /api/v1/collections/{id}/run — trigger the test
GET  /api/v1/collection-runs/{id} — poll status + results
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pattern most teams ship first: a deploy-time test that triggers an eval after every UCP-related code change, asserts on key metrics, and fails the build if any of them regress. A reasonable assertion shape:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .github/workflows/ucp-eval.yml&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run UCP eval&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;curl -X POST $PLAYGROUND_API/v1/collections/$COLLECTION_ID/run \\&lt;/span&gt;
      &lt;span class="s"&gt;-H "Authorization: Bearer $PLAYGROUND_TOKEN"&lt;/span&gt;
    &lt;span class="s"&gt;# Poll, then assert:&lt;/span&gt;
    &lt;span class="s"&gt;# - checkout_rate &amp;gt;= 80&lt;/span&gt;
    &lt;span class="s"&gt;# - errors.total == 0&lt;/span&gt;
    &lt;span class="s"&gt;# - avg_duration_ms &amp;lt; 30000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same shape as Lighthouse CI for web performance. A regression catch you bolt onto your pipeline rather than rediscover in production. The &lt;a href="https://ucpchecker.com/blog/ucp-playground-evals" rel="noopener noreferrer"&gt;UCP Playground Evals launch post&lt;/a&gt; walks through the full pattern with a worked example.&lt;/p&gt;

&lt;h2&gt;
  
  
  The order to do this in
&lt;/h2&gt;

&lt;p&gt;If you're starting from a fresh UCP implementation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Run the &lt;a href="https://ucpchecker.com/ucp-validator" rel="noopener noreferrer"&gt;validator&lt;/a&gt;&lt;/strong&gt; against your manifest. Fix any structural errors. This is the cheapest layer; do it first.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Get a &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;UCP Score&lt;/a&gt;&lt;/strong&gt; for your domain. Aim for B+ (70+) before moving to live testing. Below that, you have surface-level gaps that'll dominate the eval results and waste your test budget.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run a &lt;a href="https://ucpplayground.com/evals" rel="noopener noreferrer"&gt;Playground eval&lt;/a&gt;&lt;/strong&gt; against your store with two different frontier models on a single shopping sequence. Fix whatever fails. Common first-time failures: variant-data ambiguity, response-shape inconsistencies, tool argument validation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Expand to three models&lt;/strong&gt; once your single-model baseline works. Multi-model coverage is what catches the long-tail issues.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wire the eval into CI&lt;/strong&gt; once your implementation is stable. From this point on, every code change that touches UCP runs against real agents before it ships.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you've already got a UCP implementation in production and are trying to figure out why agents aren't completing checkouts, skip step 2 and go straight to step 3. The eval will show you the specific failure mode, and you can backfill the score work later.&lt;/p&gt;

&lt;h2&gt;
  
  
  What good looks like
&lt;/h2&gt;

&lt;p&gt;A store that's passed all three layers cleanly looks like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Validator&lt;/strong&gt;: green&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Score&lt;/strong&gt;: A grade (85+) across Discovery, Conformance, and Capability Coverage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Eval&lt;/strong&gt;: 80%+ checkout rate against Claude Sonnet 4.5, Gemini 3 Flash, and one other model of your choice; &amp;lt;5s average tool-call latency; zero categorised errors across at least 20 sessions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's the bar. The &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-april-2026" rel="noopener noreferrer"&gt;State of Agentic Commerce&lt;/a&gt; is tracking how many stores hit that bar — currently fewer than 1% of verified stores. The work to get from 99% conformance to 1% bar-clearing is mostly testing work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Validator&lt;/strong&gt; (free, instant): &lt;a href="https://ucpchecker.com/ucp-validator" rel="noopener noreferrer"&gt;ucpchecker.com/ucp-validator&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Score&lt;/strong&gt; (free, instant): &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;ucpchecker.com/score&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live agent eval&lt;/strong&gt; (paid per session): &lt;a href="https://ucpplayground.com/evals" rel="noopener noreferrer"&gt;ucpplayground.com/evals&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-model comparison view&lt;/strong&gt;: &lt;a href="https://ucpplayground.com/models" rel="noopener noreferrer"&gt;ucpplayground.com/models&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CI-ready eval API&lt;/strong&gt;: documented at &lt;a href="https://ucpplayground.com/" rel="noopener noreferrer"&gt;ucpplayground.com&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Schema validation is necessary. It is not sufficient. The agents your customers use will run real flows against your store, and the only way to know whether those flows succeed is to run them yourself first.&lt;/p&gt;

&lt;p&gt;Test before they do.&lt;/p&gt;

</description>
      <category>ecommerce</category>
      <category>webdev</category>
      <category>tutorial</category>
      <category>ucp</category>
    </item>
    <item>
      <title>The State of Agentic Commerce — May 2026</title>
      <dc:creator>Benji Fisher</dc:creator>
      <pubDate>Thu, 14 May 2026 09:45:37 +0000</pubDate>
      <link>https://dev.to/benjifisher/the-state-of-agentic-commerce-may-2026-500g</link>
      <guid>https://dev.to/benjifisher/the-state-of-agentic-commerce-may-2026-500g</guid>
      <description>&lt;p&gt;In April, the story was a platform pulling a lever: Shopify migrated its entire UCP fleet to v2026-04-08 in four days, BigCommerce showed up with three stores, and we said the question for May was &lt;em&gt;which platform ships next&lt;/em&gt; — because every prior jump in the directory had been a step function caused by a platform-level deployment.&lt;/p&gt;

&lt;p&gt;May's answer: none, and it didn't matter. No platform shipped a UCP wave this month. BigCommerce still has three verified stores. WooCommerce still has three. Salesforce Commerce Cloud still has none verified, though a custom build is reportedly in development. And the directory still grew ~32% — the same rate as April — because the &lt;em&gt;baseline&lt;/em&gt; discovery rate stepped up. For the first time since we started this report, UCP grew on a slope instead of a staircase.&lt;/p&gt;

&lt;p&gt;This is the fourth monthly state-of-the-ecosystem report from UCP Checker. Here's what the data says as of May 12, 2026.&lt;/p&gt;

&lt;h2&gt;
  
  
  The numbers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;5,294&lt;/strong&gt; &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;verified UCP stores&lt;/a&gt; (up from 4,014 in April, &lt;strong&gt;+32%&lt;/strong&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5,892&lt;/strong&gt; total domains tracked&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1,829&lt;/strong&gt; new merchants discovered this month; &lt;strong&gt;775&lt;/strong&gt; this week alone&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5,264&lt;/strong&gt; verified stores on the latest &lt;a href="https://ucpchecker.com/specs/2026-04-08" rel="noopener noreferrer"&gt;v2026-04-08 spec&lt;/a&gt; (99.4%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5,235&lt;/strong&gt; verified stores at A grade on &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;UCP Score&lt;/a&gt; (98.9%)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Three consecutive months of ~30% growth is a real curve now, not a launch artifact. But the &lt;em&gt;shape&lt;/em&gt; changed. February was discovery (first 1,000 Shopify stores). March was expansion (crossed 3,000, first non-Shopify manifests). April was consolidation (the four-day Shopify spec migration). May is the first month where the headline growth came from neither a new platform nor a spec event — it came from crawler optimisations we shipped in early May. The stores were always out there; we just got faster at finding them.&lt;/p&gt;

&lt;p&gt;That distinction matters for forecasting. If May's growth had been platform-driven, you'd model the next jump as "wait for SFCC." Since it's discovery-rate-driven, the model is different: the directory keeps filling at a steady clip until either we exhaust the discoverable Shopify long tail, or a platform finally ships a wave and the staircase resumes. Both will happen; the order is the open question.&lt;/p&gt;

&lt;h2&gt;
  
  
  Shopify's head start, four months in
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Monitored&lt;/th&gt;
&lt;th&gt;Verified&lt;/th&gt;
&lt;th&gt;Verified %&lt;/th&gt;
&lt;th&gt;Avg score (verified)&lt;/th&gt;
&lt;th&gt;Avg manifest latency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpchecker.com/platforms/shopify" rel="noopener noreferrer"&gt;Shopify&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;5,242&lt;/td&gt;
&lt;td&gt;5,241&lt;/td&gt;
&lt;td&gt;~100%&lt;/td&gt;
&lt;td&gt;92.5&lt;/td&gt;
&lt;td&gt;178 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpchecker.com/platforms/custom" rel="noopener noreferrer"&gt;Custom &amp;amp; Headless&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;642&lt;/td&gt;
&lt;td&gt;45&lt;/td&gt;
&lt;td&gt;7.0%&lt;/td&gt;
&lt;td&gt;83.0&lt;/td&gt;
&lt;td&gt;356 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpchecker.com/platforms/woocommerce" rel="noopener noreferrer"&gt;WooCommerce&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;92.3&lt;/td&gt;
&lt;td&gt;1,023 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpchecker.com/platforms/bigcommerce" rel="noopener noreferrer"&gt;BigCommerce&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;88.3&lt;/td&gt;
&lt;td&gt;993 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpchecker.com/platforms/magento" rel="noopener noreferrer"&gt;Magento&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;85.0&lt;/td&gt;
&lt;td&gt;218 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpchecker.com/platforms/prestashop" rel="noopener noreferrer"&gt;PrestaShop&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;84.0&lt;/td&gt;
&lt;td&gt;548 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Shopify is 99% of the verified directory — unchanged from April. Every non-Shopify platform combined sums to 53 verified stores, the same as last month. The head start is still the dominant signal in the data, and the Custom &amp;amp; Headless cohort is the mirror image: 642 domains attempted UCP, only 45 got to verified (a 7% completion rate). When a platform hands you the boilerplate, you compound; when you build it yourself, most attempts stall before validation. That's a tooling gap, not a spec problem.&lt;/p&gt;

&lt;p&gt;The more interesting movement came from two more platforms shipping UCP support — &lt;strong&gt;Bareconnect&lt;/strong&gt; and &lt;strong&gt;Selly.io&lt;/strong&gt; — both of which already have verified stores live in the directory today, not roadmap promises. The numbers are still small. How either platform is exposing UCP (default for every storefront, opt-in, or a paid tier) decides whether this stays a handful or turns into a wave — that detail we don't know yet. But it's the first new platform movement since the Shopify migration.&lt;/p&gt;

&lt;p&gt;Two structural notes on the table. BigCommerce and WooCommerce manifests run ~1 second versus Shopify's 178 ms because they're served from the storefront origin rather than a CDN-cached endpoint — a meaningful handicap as agent response budgets tighten. And geographically the directory is still a US/&lt;code&gt;.com&lt;/code&gt; story: 4,720 of 5,294 verified stores ship under generic TLDs; the largest attributable ccTLD cohorts are &lt;code&gt;.uk&lt;/code&gt; (229), &lt;code&gt;.au&lt;/code&gt; (120), and &lt;code&gt;.ca&lt;/code&gt; (66); continental Europe is under 2% by ccTLD (a floor, not a true distribution).&lt;/p&gt;

&lt;h2&gt;
  
  
  Capability coverage: the ceiling, and the edges
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Verified adopters&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dev.ucp.shopping.checkout&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;5,269&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dev.ucp.shopping.fulfillment&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;5,264&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dev.ucp.shopping.catalog.lookup&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;5,257&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dev.ucp.shopping.catalog.search&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;5,256&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dev.ucp.shopping.order&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;5,256&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dev.ucp.shopping.discount&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;5,253&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dev.ucp.shopping.cart&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;5,249&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;— the cliff —&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dev.ucp.common.identity_linking&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dev.ucp.shopping.buyer_consent&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dev.ucp.shopping.checkout.embedded&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dev.ucp.shopping.ap2_mandate&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dev.ucp.shopping.payment&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Identical pattern to March and April: the seven core shopping capabilities ship together as a Shopify-side bundle (~5,250 adopters each), then an 800× cliff. Identity linking: 6. AP2 mandate — the primitive that makes an agentic transaction auditably user-authorised — still 1 (&lt;a href="https://ucpchecker.com/status/houseofparfum.nl" rel="noopener noreferrer"&gt;houseofparfum.nl&lt;/a&gt;, WooCommerce, scoring 100). Payment capability: still 0. Of 5,294 verified stores, &lt;strong&gt;5,161 (&amp;gt;99%) sit at Tier 2&lt;/strong&gt;, one is Tier 3, one is Tier 4. The deeper primitives aren't slow-adopting, they're &lt;em&gt;not adopting yet&lt;/em&gt;. When demand for AP2 turns into pressure (regulators, payment networks, the working group's eventual requirements), this number moves fast — the way checkout did once Shopify bundled it. Until then, "UCP store" means "agent-shoppable," not "mandate-credentialed."&lt;/p&gt;

&lt;h3&gt;
  
  
  Where the movement was: the edges of the spec
&lt;/h3&gt;

&lt;p&gt;The new signals in May's data sit at the edges of the spec rather than its core. The first is in the capability namespace itself: below the standard &lt;code&gt;dev.ucp.*&lt;/code&gt; entries, a handful of non-standard, vendor-prefixed capabilities are now appearing on real verified manifests:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;com.pwc.accelerator.loyalty.rewards&lt;/code&gt; — 2 stores. PwC's agentic-commerce accelerator (more below).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;com.appointedd.schedule&lt;/code&gt; / &lt;code&gt;.booking&lt;/code&gt; / &lt;code&gt;.intent&lt;/code&gt; — 1 store. Appointment-scheduling primitives — booking-vertical UCP, not retail.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;com.woocommerce.ai_storefront&lt;/code&gt; — 1 store. A WooCommerce-specific storefront extension.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sh.agentscore.identity&lt;/code&gt; — 1 store. An identity primitive from a third party.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;com.agoragentic.x402.checkout&lt;/code&gt; — 1 store. A checkout extension referencing x402 (the HTTP-402 micropayment pattern).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None is adopted at scale yet — 1–2 stores each, almost certainly vendors' own test deployments — but it's the first month the namespace long tail has held anything other than Shopify defaults. It's the leading indicator of a UCP &lt;em&gt;extension&lt;/em&gt; ecosystem: third parties shipping vertical capabilities (loyalty, booking, identity, micropayments) on top of the core spec, a more realistic near-term diversification path than "another commerce platform ships a wave."&lt;/p&gt;

&lt;p&gt;The PwC entry is worth pulling out, because it isn't a platform — it's a consultancy. PwC has launched an &lt;strong&gt;agentic-commerce accelerator&lt;/strong&gt;: a practice that stands up custom UCP-enabled storefronts for enterprise clients, with its own capability extensions (the &lt;code&gt;com.pwc.accelerator.*&lt;/code&gt; namespace) layered on the core spec. That's a third adoption channel, distinct from "platform ships a wave" and "developer hand-builds" — call it &lt;strong&gt;consulting-led&lt;/strong&gt;. It's slower per engagement, but each accelerator that standardises on UCP arrives with a portfolio of enterprise clients attached. PwC is the leading edge; Deloitte, EY, KPMG, Accenture, McKinsey, BCG, and the systems integrators (Capgemini, IBM, TCS, Infosys) all face the same build-it-once, deploy-to-many incentive.&lt;/p&gt;

&lt;h2&gt;
  
  
  Transports and payment handlers: the monoculture, and the experiments tier
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Transport&lt;/th&gt;
&lt;th&gt;Verified declarations&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;MCP&lt;/td&gt;
&lt;td&gt;5,258&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Embedded&lt;/td&gt;
&lt;td&gt;5,243&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;REST&lt;/td&gt;
&lt;td&gt;47&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A2A&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;MCP and Embedded are universal because Shopify declares both. REST shows up on 47 stores — the non-Shopify hand-builds, REST being the natural fit for anyone implementing without an MCP server. A2A (Google's Agent2Agent transport, formally added in v2026-04-08) holds at two. Payment handlers tell the same monoculture story: &lt;strong&gt;5,250 verified stores declare Google Pay and 5,241 declare Shopify Card&lt;/strong&gt; — the same shared Shopify-managed handler IDs we &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-february-2026" rel="noopener noreferrer"&gt;flagged in February&lt;/a&gt; as a single point of failure. Everything else is a rounding error. The payment &lt;em&gt;partner&lt;/em&gt; ecosystem (Stripe, Adyen, Visa, Mastercard, PayPal, Affirm, Splitit — all on the &lt;a href="https://ucpregistry.com" rel="noopener noreferrer"&gt;registry&lt;/a&gt;) is mature on paper; the live &lt;em&gt;handler declarations&lt;/em&gt; are two Shopify-managed IDs and a handful of experiments.&lt;/p&gt;

&lt;p&gt;The experiments are the part worth zooming in on, because the same small set of builders is populating the spec's newer transport, its newer handler shapes, and its newer capability namespaces simultaneously. Both A2A adopters are agent-native rather than retail: one is an agent-identity storefront running pure A2A with a cryptographically signed manifest (JWS / EdDSA) and two custom payment handlers on crypto rails — an &lt;code&gt;mpp&lt;/code&gt; rail on Tempo mainnet and an &lt;code&gt;x402&lt;/code&gt; rail on Base; the other is an agent-to-agent service exposed across MCP + A2A + REST, selling a USDC-priced audit via a &lt;code&gt;com.agoragentic.x402&lt;/code&gt; handler plus a direct USDC receive address. Both ship the custom capability namespaces flagged in the capability section above (&lt;code&gt;sh.agentscore.identity&lt;/code&gt;, &lt;code&gt;com.agoragentic.x402.checkout&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;Separately, payment processors are starting to run dev UCP endpoints with fully custom handler integrations — their own handler IDs, their own &lt;code&gt;init&lt;/code&gt; / &lt;code&gt;verify&lt;/code&gt; protocol shapes, declared at v2026-04-08 over REST against real merchants, iterating against the &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;Checker&lt;/a&gt; as they build. Still dev, not live, but for the first time the gap between the partner roster and the live handler declarations has something in it that's neither Shopify-default nor mock fixture — and it's coming from processors with the scale to move real merchant bases. Two data points in each direction don't make a trend, but the &lt;em&gt;pattern&lt;/em&gt; is coherent: the spec's newer surfaces (A2A transport, custom handler shapes, third-party namespaces) are populated by a small set of builders doing novel work in parallel, while the core carries volume. That's the shape of a protocol leaving its launch phase.&lt;/p&gt;

&lt;h2&gt;
  
  
  How agents actually perform
&lt;/h2&gt;

&lt;p&gt;The numbers above tell you which stores &lt;em&gt;have&lt;/em&gt; UCP. This section is which stores &lt;em&gt;work&lt;/em&gt; when an agent shops them. &lt;a href="https://ucpplayground.com/evals" rel="noopener noreferrer"&gt;UCP Playground Evals&lt;/a&gt; &lt;a href="https://ucpchecker.com/blog/ucp-playground-1000-agent-sessions" rel="noopener noreferrer"&gt;passed 1,000 recorded agent sessions&lt;/a&gt; this month — and it's well past that now: a thousand-plus end-to-end agent shopping runs across 105 unique stores and 16 frontier models, totalling ~57M tokens, &lt;strong&gt;~12 hours of cumulative agent runtime&lt;/strong&gt;, and roughly $119,000 in aggregate cart value.&lt;/p&gt;

&lt;h3&gt;
  
  
  Outcomes: where the agent stops
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Outcome&lt;/th&gt;
&lt;th&gt;Sessions&lt;/th&gt;
&lt;th&gt;Share&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;checkout_reached&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;475&lt;/td&gt;
&lt;td&gt;37.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;search_only&lt;/code&gt; (browsed, didn't cart)&lt;/td&gt;
&lt;td&gt;344&lt;/td&gt;
&lt;td&gt;27.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;failed&lt;/code&gt; (provider error, refusal, max turns)&lt;/td&gt;
&lt;td&gt;261&lt;/td&gt;
&lt;td&gt;20.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;cart_created&lt;/code&gt; (carted, didn't proceed)&lt;/td&gt;
&lt;td&gt;172&lt;/td&gt;
&lt;td&gt;13.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;62% of sessions end without a completed checkout&lt;/strong&gt; — and that ratio has stayed stable as the dataset grew, which is itself the finding. As we add models and stores, the &lt;em&gt;shape&lt;/em&gt; of failure doesn't change: agents find products fine (search works nearly everywhere), build carts often, then ~14% of sessions stall at a cart that won't convert and ~21% fail outright (about half of those are variant-shape problems — the agent picks a variant ID the cart rejects and flails until it hits the turn limit). We dug into exactly that this month in &lt;a href="https://ucpchecker.com/blog/ucp-variant-data-guide" rel="noopener noreferrer"&gt;UCP Variant Data: The #1 Reason Agent Checkouts Fail&lt;/a&gt; — the single largest categorisable cause of the gap between "has a manifest" and "agent can buy from it," and almost entirely fixable in the merchant's variant data without touching any tooling.&lt;/p&gt;

&lt;h3&gt;
  
  
  Model leaderboard
&lt;/h3&gt;

&lt;p&gt;Checkout-conversion rate by model, from the &lt;a href="https://ucpplayground.com/leaderboard" rel="noopener noreferrer"&gt;UCP Playground model leaderboard&lt;/a&gt; — sessions where the agent reached a checkout URL ÷ total sessions for that model (the live leaderboard breaks out search, cart, and speed too):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Sessions&lt;/th&gt;
&lt;th&gt;Checkout %&lt;/th&gt;
&lt;th&gt;Avg session&lt;/th&gt;
&lt;th&gt;Vendor&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/claude-sonnet-4-5" rel="noopener noreferrer"&gt;Claude Sonnet 4.5&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;256&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;52.0%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~38 s&lt;/td&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/llama-3-3-70b" rel="noopener noreferrer"&gt;Llama 3.3 70B&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;75&lt;/td&gt;
&lt;td&gt;49.3%&lt;/td&gt;
&lt;td&gt;~48 s&lt;/td&gt;
&lt;td&gt;Meta&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/deepseek-v3-2" rel="noopener noreferrer"&gt;DeepSeek V3.2&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;60&lt;/td&gt;
&lt;td&gt;45.0%&lt;/td&gt;
&lt;td&gt;~46 s&lt;/td&gt;
&lt;td&gt;DeepSeek&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gemini-3-flash" rel="noopener noreferrer"&gt;Gemini 3 Flash&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;174&lt;/td&gt;
&lt;td&gt;42.0%&lt;/td&gt;
&lt;td&gt;~21 s&lt;/td&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/grok-4" rel="noopener noreferrer"&gt;Grok 4&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;53&lt;/td&gt;
&lt;td&gt;39.6%&lt;/td&gt;
&lt;td&gt;~77 s&lt;/td&gt;
&lt;td&gt;xAI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/claude-opus-4-6" rel="noopener noreferrer"&gt;Claude Opus 4.6&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;123&lt;/td&gt;
&lt;td&gt;39.0%&lt;/td&gt;
&lt;td&gt;~30 s&lt;/td&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gemini-2-5-flash" rel="noopener noreferrer"&gt;Gemini 2.5 Flash&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;125&lt;/td&gt;
&lt;td&gt;36.0%&lt;/td&gt;
&lt;td&gt;~12 s&lt;/td&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gpt-4o" rel="noopener noreferrer"&gt;GPT-4o&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;63&lt;/td&gt;
&lt;td&gt;31.7%&lt;/td&gt;
&lt;td&gt;~15 s&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gemini-3-1-pro" rel="noopener noreferrer"&gt;Gemini 3.1 Pro&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;96&lt;/td&gt;
&lt;td&gt;29.2%&lt;/td&gt;
&lt;td&gt;~48 s&lt;/td&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gemini-2-5-pro" rel="noopener noreferrer"&gt;Gemini 2.5 Pro&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;79&lt;/td&gt;
&lt;td&gt;27.8%&lt;/td&gt;
&lt;td&gt;~34 s&lt;/td&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gpt-5-2" rel="noopener noreferrer"&gt;GPT-5.2&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;63&lt;/td&gt;
&lt;td&gt;20.6%&lt;/td&gt;
&lt;td&gt;~36 s&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/deepseek-r1" rel="noopener noreferrer"&gt;DeepSeek R1&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;15.8%&lt;/td&gt;
&lt;td&gt;~60 s&lt;/td&gt;
&lt;td&gt;DeepSeek&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/o4-mini" rel="noopener noreferrer"&gt;o4-mini&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;14.3%&lt;/td&gt;
&lt;td&gt;~42 s&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/grok-3-mini" rel="noopener noreferrer"&gt;Grok 3 Mini&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;9.5%&lt;/td&gt;
&lt;td&gt;~57 s&lt;/td&gt;
&lt;td&gt;xAI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/qwq-32b" rel="noopener noreferrer"&gt;QwQ 32B&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~61 s&lt;/td&gt;
&lt;td&gt;Alibaba&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Three things hold from April, plus one shift:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Search works everywhere. Checkout completion is the next frontier.&lt;/strong&gt; Every model that runs to completion finds products. Checkout conversion ranges from 0% to 52% — a 50-point spread across the field, which is exactly where the work-to-do sits. The best model in the field completes checkout about half the time today; the headroom from there is the frontier the next quarter gets to push.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reasoning-tuned models still underperform.&lt;/strong&gt; QwQ 32B: 0% across 25 sessions. Grok 3 Mini: 9.5%. o4-mini: 14.3%. DeepSeek R1: 15.8%. Models that burn tokens on deliberation struggle with the fast, sequential, low-ambiguity tool-calling that shopping requires. Shopping rewards decisive, not thoughtful — true in April, true with 3× the data. (GPT-5.2 also lands below the median at 20.6%.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Speed and success are decoupled.&lt;/strong&gt; Gemini 2.5 Flash finishes a session in ~12 seconds; Grok 4 takes ~77. Their checkout rates are 36% and 40% — basically a wash. Being fast doesn't make you good at this; being slow doesn't either. The Claude models sit mid-pack on speed (~30–38 s) and top on conversion, which is the combination that actually matters when the agent is spending someone's money.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The shift:&lt;/strong&gt; in April we reported DeepSeek V3.2 leading the composite shopping score. With ~3× the sessions, Claude Sonnet 4.5 is now clearly out front on checkout completion — 52% over 256 sessions, by far the largest sample — with Meta's Llama 3.3 70B the surprise second. Treat any single month's ranking as provisional until the eval dataset gets to the point — soon — where it stops being indicative and becomes authoritative.&lt;/p&gt;

&lt;h2&gt;
  
  
  The reliability gap, one more time
&lt;/h2&gt;

&lt;p&gt;We've made this the editorial spine of every one of these reports, and the May data doesn't let us retire it. &lt;strong&gt;98.9% of verified stores carry an A on UCP Score&lt;/strong&gt; (5,235 of 5,294; the rest are 57 B's and two C's). By conformance, the directory is in excellent shape. But conformance isn't end-to-end agent-readiness, and that's the gap UCP Score doesn't grade.&lt;/p&gt;

&lt;p&gt;A clean schema doesn't tell you whether the cart endpoint accepts the variant the agent picked, whether response-time budgets hold under load, whether payment-handler tokenisation completes inside the agent's timeout window, or whether the checkout URL drops the agent into an auth loop a browser would have handled with cookies. &lt;a href="https://ucpplayground.com/evals" rel="noopener noreferrer"&gt;UCP Playground&lt;/a&gt; is the test harness developers use to exercise that second layer — replay sessions, probe edge cases, see exactly where an agent trips. By design it surfaces failure modes, not steady-state performance; treating Playground completion rates as a consumer-shopping success metric mis-reads the tool. But the &lt;em&gt;categories&lt;/em&gt; of failure it surfaces — variant mismatch, slow tokenisation, malformed cart responses, checkout redirect loops — are real, and they're what separate an A-graded manifest from a store an agent can reliably transact against in production.&lt;/p&gt;

&lt;p&gt;That's the gap we'd point a platform team at — and it isn't a percentage, it's a posture. The protocol's first phase, call it the first four months, was about getting the schema right, and the ecosystem did that. The next phase is the unglamorous second-order work: error recovery, schema robustness, response-time SLAs, variant-data hygiene, the long tail of edge cases that separate "manifest valid" from "agent transacts without anything tripping it up." That work &lt;em&gt;is&lt;/em&gt; happening — the Playground sessions above are senior engineers doing exactly it. The open question is whether the posture spreads from the engineering teams already running this loop to the long tail of merchants still on bundled defaults. That's where the next quarter's competitive distance gets built.&lt;/p&gt;

&lt;h2&gt;
  
  
  The demand side: AI traffic is converting
&lt;/h2&gt;

&lt;p&gt;For four months this report has focused on supply — which stores have UCP, what capabilities they declare, the shape of their manifests, what agents do against them in testing. On May 11 Shopify &lt;a href="https://www.shopify.com/enterprise/blog/ai-search-insights" rel="noopener noreferrer"&gt;published its first real demand-side dataset&lt;/a&gt;, and the numbers reframe the urgency of everything above.&lt;/p&gt;

&lt;p&gt;Across Shopify storefronts in Q1 2026, by Shopify's analysis:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AI-referred orders grew nearly 13× year-over-year.&lt;/strong&gt; Referral sessions from AI chatbots (ChatGPT, Perplexity, Gemini, Copilot, Claude, Grok) grew more than &lt;strong&gt;8× YoY&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI-referred sessions convert at ~50% higher rates&lt;/strong&gt; than organic search when they start on product pages.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Average order value is 14% higher&lt;/strong&gt; for AI-referred than for organic-search orders.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;More than half of AI-referred sessions start on a product detail page&lt;/strong&gt;, vs ~20% for organic — "journey compression," the buyer arrives ready to buy because the AI did the research first.&lt;/li&gt;
&lt;li&gt;AI-referred conversion outperforms organic SEO in &lt;strong&gt;23 of 25 merchant categories&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Caveat: this is Shopify's analysis of Shopify storefronts with undisclosed methodology, so treat the precise numbers as Shopify-published rather than independently verified. But the &lt;em&gt;direction&lt;/em&gt; is the story: agentic commerce isn't theoretical traffic any more. It's converting at premium rates, in volume, growing fast — and that's the demand signal that explains why every TC member is racing to ship at the productisation layer right now. Shopify Field CTO Sandy Jeong framed the operational work in three buckets: &lt;strong&gt;data readiness&lt;/strong&gt; (machine-readable catalog with structured attributes), &lt;strong&gt;channel infrastructure&lt;/strong&gt; (direct API syndication to AI platforms), and &lt;strong&gt;organisational alignment&lt;/strong&gt; (a named DRI, not a committee). The teams that get those three right capture the 13× curve; the teams that don't watch it route around them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Spec and ecosystem
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Attribution landed in core.&lt;/strong&gt; On May 5 the Technical Council &lt;a href="https://ucpchecker.com/blog/ucp-tc-ships-attribution-into-core" rel="noopener noreferrer"&gt;merged a top-level &lt;code&gt;attribution&lt;/code&gt; field&lt;/a&gt; into cart, checkout, catalog, and order operations — campaign IDs, click identifiers (&lt;code&gt;gclid&lt;/code&gt;, &lt;code&gt;fbclid&lt;/code&gt;, &lt;code&gt;ttclid&lt;/code&gt;), source/medium markers, as an open string-keyed map. It's the first time advertising-and-measurement infrastructure has landed in UCP core, and the trajectory implication is the story: a protocol that carries attribution context is a protocol being built for commercial-scale deployment, not just technical demos.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The council expanded — and the regional question got sharper.&lt;/strong&gt; Amazon, Meta, Microsoft, Salesforce, and Stripe &lt;a href="https://ucpchecker.com/blog/ucp-tech-council-expands-amazon-meta-microsoft-salesforce-stripe" rel="noopener noreferrer"&gt;joined the Technical Council&lt;/a&gt; at the end of April — a governance signal as much as an adoption one (none of the five has shipped a UCP store wave yet), but a notable one: the steering group now includes the company building the leading proprietary alternative (Amazon's "Buy for Me") and the company behind the leading rival protocol (Stripe, ACP). Convergence pressure, formalised.&lt;/p&gt;

&lt;p&gt;Two German commerce trade publications picked up the expansion within a day of each other and used our breakdown of the 16-seat composition as a primary source: &lt;a href="https://excitingcommerce.de/2026/04/27/amazon-schliesst-sich-googles-universal-commerce-protocol-an/" rel="noopener noreferrer"&gt;Exciting Commerce&lt;/a&gt; on April 27 (which drove the European enterprise retail audience &lt;a href="https://ucpchecker.com/alerts" rel="noopener noreferrer"&gt;UCP Alerts&lt;/a&gt; was built for), and &lt;a href="https://www.shoptechblog.com/2026/04/28/agentic-commerce-das-ucp-council-wachst/" rel="noopener noreferrer"&gt;Shoptechblog&lt;/a&gt; the next day. Both lead with the same regional point — &lt;em&gt;"Keine Rolle spielen weiter europäische und asiatische Unternehmen"&lt;/em&gt; ("European and Asian companies continue to play no role") — and Shoptechblog adds the analytical layer: the new members sent senior engineers and architects rather than C-suite executives (implementation work, not press); each company's participation reads as defensive; and the real contest isn't the standardised protocol but the layers &lt;em&gt;above&lt;/em&gt; it — ranking, paid placement, customer ownership. Which is exactly why attribution-in-core is more than plumbing: it's the first of those upper layers getting wired into the spec itself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Two TC members shipped at the productisation layer.&lt;/strong&gt; The contest moving up the stack got two concrete examples this month. On May 5 Google &lt;a href="https://searchengineland.com/google-expands-ucp-checkout-to-main-search-shopping-results-476540" rel="noopener noreferrer"&gt;expanded UCP-powered checkout out of AI Mode into the main shopping section of standard Search results&lt;/a&gt;, with &lt;a href="https://www.thekeyword.co/news/google-ucp-checkout-main-search" rel="noopener noreferrer"&gt;Wayfair the first live retailer on the new surface&lt;/a&gt; — a "Buy" button on listings inside Google Search itself, Google Pay tokenisation, checkout completing without leaving the page. Zero-click search results just became zero-click &lt;em&gt;purchases&lt;/em&gt;. The &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-february-2026" rel="noopener noreferrer"&gt;two-track adoption story&lt;/a&gt; we drew in February has its first major convergence event.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fucpchecker.s3.eu-west-1.amazonaws.com%2Fblog%2Fstate-of-agentic-commerce-may-2026%2F02-google-wayfair-flow.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fucpchecker.s3.eu-west-1.amazonaws.com%2Fblog%2Fstate-of-agentic-commerce-may-2026%2F02-google-wayfair-flow.webp" alt="Google AI Mode shopping flow on Wayfair: AI Mode query, product detail with Buy button, Google Pay order review, order complete confirmation" width="800" height="389"&gt;&lt;/a&gt;&lt;/p&gt;
Google's UCP-powered checkout flow on Wayfair: AI Mode query → product page with Buy button → Google Pay review → order complete. Source: Google.



&lt;p&gt;Shopify, separately, started rolling out an &lt;strong&gt;Agentic Storefronts dashboard&lt;/strong&gt; in merchant admin this week (&lt;a href="https://shopify.dev/docs/agents" rel="noopener noreferrer"&gt;live docs&lt;/a&gt;) — surfaces ChatGPT / Microsoft Copilot / AI Mode traffic, offers an "Allow Shopify to manage for me" toggle that auto-generates the AI-readability files (&lt;code&gt;llms.txt&lt;/code&gt;, &lt;code&gt;llms-full.txt&lt;/code&gt;, &lt;code&gt;agents.md&lt;/code&gt;) for stores that opt in. The dashboard is &lt;strong&gt;protocol-agnostic&lt;/strong&gt;: it covers ChatGPT (ACP), Copilot, and UCP-powered Search inside one admin view. UCP is one of the protocols Shopify is now monetising on the agentic-readiness layer, not the whole product. For Shopify it's the natural next step after the &lt;a href="https://ucpchecker.com/blog/ucp-v2026-04-08-spec-update" rel="noopener noreferrer"&gt;v2026-04-08 fleet migration&lt;/a&gt;; for everyone else watching the head start, it's the answer to what the &lt;em&gt;next&lt;/em&gt; phase of it looks like.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fucpchecker.s3.eu-west-1.amazonaws.com%2Fblog%2Fstate-of-agentic-commerce-may-2026%2F01-shopify-agentic-dashboard.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fucpchecker.s3.eu-west-1.amazonaws.com%2Fblog%2Fstate-of-agentic-commerce-may-2026%2F01-shopify-agentic-dashboard.webp" alt="Shopify Agentic Storefronts dashboard in merchant admin showing 2,060 agentic sessions and $6,447 earned in the last 30 days, split by ChatGPT, Microsoft Copilot, and Shop Channel, with an 'Allow Shopify to manage for me' toggle and agentic readiness checklist" width="800" height="661"&gt;&lt;/a&gt;&lt;/p&gt;
Shopify Agentic Storefronts in merchant admin — ChatGPT / Microsoft Copilot / Shop Channel split, "Allow Shopify to manage for me" toggle, agentic-readiness checklist.



&lt;p&gt;&lt;strong&gt;A potential spec gap, still being validated.&lt;/strong&gt; In &lt;a href="https://ucpchecker.com/blog/ucp-variant-data-guide" rel="noopener noreferrer"&gt;the variant-data guide&lt;/a&gt; we noted that v2026-04-08 makes &lt;code&gt;variant.options[]&lt;/code&gt; optional even on products where &lt;code&gt;product.options[]&lt;/code&gt; is non-empty and there are multiple variants — meaning two fully spec-compliant manifests can produce identical-looking payloads where one is unambiguous and the other is agent-unresolvable. The candidate fix would be a conditional &lt;code&gt;MUST&lt;/code&gt; ("when &lt;code&gt;product.options&lt;/code&gt; is non-empty and &lt;code&gt;variants.length &amp;gt; 1&lt;/code&gt;, every variant MUST populate &lt;code&gt;options[]&lt;/code&gt;"). It's a working hypothesis from one analysis, not a filed proposal — we want to sweep more of the live dataset for real-world incidence and check the edge cases (single-variant simple products, productGroup behaviour, platforms that already populate &lt;code&gt;options&lt;/code&gt; by default) before raising it formally. If the pattern holds, it's a candidate for a future minor release.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No v2026-05.&lt;/strong&gt; v2026-04-08 remains current. On the cadence so far, the next minor release more likely lands late summer (a notional v2026-08), probably bundling AP2 mandate refinements, schema corrections shaken out by running validators against thousands of real stores, and whatever the council formalises over the next two months. On the partner side: the &lt;a href="https://ucpregistry.com" rel="noopener noreferrer"&gt;registry&lt;/a&gt; now lists 61 merchants, 11 agents, and 8 extensions; the payment-handler roster (Adyen, Amex, Mastercard, Stripe, Visa, Checkout.com, Affirm, Splitit, PayPal) is unchanged and still almost entirely unrepresented in live manifest declarations.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we shipped — and what developers are doing with it
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/blog/ucp-variant-data-guide" rel="noopener noreferrer"&gt;UCP Variant Data: The #1 Reason Agent Checkouts Fail&lt;/a&gt;&lt;/strong&gt; — the five variant-data anti-patterns, what clean variant data looks like, and the spec gap that lets compliant stores still be broken.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/blog/how-to-test-ucp-implementation" rel="noopener noreferrer"&gt;How to Test Your UCP Implementation&lt;/a&gt;&lt;/strong&gt; — the three-layer validation workflow: static audit, live agent test, continuous monitoring.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;UCP Score&lt;/a&gt; is doing exactly what it was built to do.&lt;/strong&gt; This is the one we're proudest of this quarter. The Score turns "is my manifest agent-ready?" into a concrete, category-by-category checklist — and developers are using it that way: we've watched a failing manifest climb to an A grade in the space of a few hours, the developer iterating against the score breakdown between checks. That's the loop it was designed for, and it's now the loop it runs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpplayground.com" rel="noopener noreferrer"&gt;UCP Playground&lt;/a&gt; got sharper as a development tool.&lt;/strong&gt; Two halves of the same loop: the agent-inspection tooling — replay any session, see the exact tool call where an agent tripped — and the runtime shopping evals, now past &lt;strong&gt;1,000 recorded sessions&lt;/strong&gt; and &lt;strong&gt;more than 12 hours of cumulative agent runtime&lt;/strong&gt; against real stores. Together they take the build → test → fix cycle for an agent-ready storefront down from a sprint to an afternoon. Every improvement that got us there is in the &lt;a href="https://ucpplayground.com/changelog" rel="noopener noreferrer"&gt;changelog&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Crawler throughput&lt;/strong&gt; — we roughly tripled the hourly crawl rate in early May (and added per-IP and global throttles to the expensive public routes so the directory stays fast under load). That's what moved the discovery curve this month.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What to watch in June
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Second adopters at every edge.&lt;/strong&gt; May produced first adopters across multiple novel patterns — non-Shopify platforms shipping UCP (Bareconnect, Selly.io), a consultancy-built accelerator (PwC), non-default payment-handler integrations (the processors in dev), AP2 mandate (still one), third-party capability namespaces (each at 1–2 stores). The diagnostic for June is whether any doubles up. Each is a distinct watch item; the meta-question is the same: did May's first adopters survive contact with month two?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Google's next live partner on main Search.&lt;/strong&gt; Wayfair is first up on Google's UCP-checkout expansion into standard Search results. The other co-developing TC retailers — Etsy, Target, Walmart — are the next-most-likely to follow. The cadence of those rollouts is the diagnostic for how fast Google is willing to push agent-completed transactions onto its highest-traffic surface.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The platform-level integration question.&lt;/strong&gt; SFCC, Adobe Commerce, Wix, Squarespace — any of them shipping a platform-level UCP integration is still the single highest-impact possible event, and still hasn't happened. The one-platform structure is four months old.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Whether the eval leaderboard holds its shape.&lt;/strong&gt; Claude Sonnet 4.5 leads checkout completion on the largest sample; Llama 3.3 70B is the surprise second. Another month of sessions either confirms that or reshuffles it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;p&gt;All data is from the UCP Checker crawler (re-checks every tracked domain at least every 24 hours) and UCP Playground's eval sessions, as of May 12, 2026. The verified-merchant dataset is published monthly on &lt;a href="https://huggingface.co/datasets/UCPChecker/ucp-merchants" rel="noopener noreferrer"&gt;Hugging Face&lt;/a&gt; under CC-BY 4.0; the same data, a public REST API, the bulk checker, and the rest of our &lt;a href="https://ucpchecker.com/developer-tools" rel="noopener noreferrer"&gt;developer tools&lt;/a&gt; are all ungated.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Browse the directory: &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;ucpchecker.com/directory&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Track adoption live: &lt;a href="https://ucpchecker.com/stats" rel="noopener noreferrer"&gt;ucpchecker.com/stats&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Run a UCP Score: &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;ucpchecker.com/score&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Model + store leaderboard: &lt;a href="https://ucpplayground.com/evals" rel="noopener noreferrer"&gt;ucpplayground.com/evals&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Public dataset, REST API &amp;amp; developer tools: &lt;a href="https://ucpchecker.com/developer-tools" rel="noopener noreferrer"&gt;ucpchecker.com/developer-tools&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Previous report: &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-april-2026" rel="noopener noreferrer"&gt;State of Agentic Commerce — April 2026&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;External coverage cited in this report:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Jochen Krisch, &lt;em&gt;"Amazon schließt sich Googles Universal Commerce Protocol an,"&lt;/em&gt; &lt;a href="https://excitingcommerce.de/2026/04/27/amazon-schliesst-sich-googles-universal-commerce-protocol-an/" rel="noopener noreferrer"&gt;Exciting Commerce&lt;/a&gt;, April 27, 2026&lt;/li&gt;
&lt;li&gt;Roman Zenner, &lt;em&gt;"Agentic Commerce: Das UCP Council wächst,"&lt;/em&gt; &lt;a href="https://www.shoptechblog.com/2026/04/28/agentic-commerce-das-ucp-council-wachst/" rel="noopener noreferrer"&gt;Shoptechblog&lt;/a&gt;, April 28, 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://searchengineland.com/google-expands-ucp-checkout-to-main-search-shopping-results-476540" rel="noopener noreferrer"&gt;Google expands UCP Checkout to main Search shopping results&lt;/a&gt;, Search Engine Land, May 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://blog.google/products/ads-commerce/agentic-commerce-ai-tools-protocol-retailers-platforms/" rel="noopener noreferrer"&gt;New tech and tools for retailers to succeed in an agentic shopping era&lt;/a&gt;, Google blog (Ads &amp;amp; Commerce)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://shopify.dev/docs/agents" rel="noopener noreferrer"&gt;Shopify Agentic commerce developer docs&lt;/a&gt; — Agentic Storefronts, &lt;code&gt;llms.txt&lt;/code&gt;, &lt;code&gt;llms-full.txt&lt;/code&gt;, &lt;code&gt;agents.md&lt;/code&gt; reference&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.wislr.com/research/what-shopify-checks-for-agentic-readiness" rel="noopener noreferrer"&gt;What Shopify checks for agentic readiness&lt;/a&gt;, WISLR Research&lt;/li&gt;
&lt;li&gt;Kyle Risley, &lt;a href="https://www.shopify.com/enterprise/blog/ai-search-insights" rel="noopener noreferrer"&gt;"AI-referred shoppers convert better and spend more (2026)"&lt;/a&gt;, Shopify Enterprise Blog, May 11, 2026&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ecommerce</category>
      <category>ai</category>
      <category>data</category>
      <category>ucp</category>
    </item>
    <item>
      <title>UCP Variant Data: The #1 Reason Agent Checkouts Fail</title>
      <dc:creator>Benji Fisher</dc:creator>
      <pubDate>Wed, 13 May 2026 11:12:54 +0000</pubDate>
      <link>https://dev.to/benjifisher/ucp-variant-data-the-1-reason-agent-checkouts-fail-4jp5</link>
      <guid>https://dev.to/benjifisher/ucp-variant-data-the-1-reason-agent-checkouts-fail-4jp5</guid>
      <description>&lt;p&gt;A user asks an AI shopping agent for "a medium grey t-shirt." The agent finds the product. It picks a variant. It adds it to the cart. The merchant rejects the cart. The agent retries with a different variant. The merchant rejects that one too. The session ends in &lt;code&gt;cart_created&lt;/code&gt; without a checkout — the user's $40 purchase quietly disappears, and nobody on the merchant side ever sees the failure.&lt;/p&gt;

&lt;p&gt;This pattern is the &lt;strong&gt;single largest source of agent checkout failures we see across the 4,500+ verified UCP stores in the &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;directory&lt;/a&gt;&lt;/strong&gt;. More than schema invalidity, more than tool errors, more than payment-handler problems. Variant mismatch — the agent and the merchant disagreeing on which SKU corresponds to "Medium" — is responsible for a meaningful fraction of the gap between "store has a UCP manifest" and "agent can actually buy from it."&lt;/p&gt;

&lt;p&gt;The good news: it's almost entirely fixable on the merchant side, in your variant data structure, without changing any tooling. This post walks through the failure pattern, the five most common variant data anti-patterns we observe, and what clean variant data looks like in practice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anatomy of a variant mismatch
&lt;/h2&gt;

&lt;p&gt;Here's the cleanest way to see the failure:&lt;/p&gt;

&lt;p&gt;Two frontier agents — call them Agent A and Agent B — get the same prompt against the same store: &lt;em&gt;"Add a medium grey t-shirt to my cart."&lt;/em&gt; Both agents call &lt;code&gt;search_catalog&lt;/code&gt;, both get the same product back, both see three variants:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"variants"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var_5571"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Color"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Grey"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Size"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Small"&lt;/span&gt;&lt;span class="p"&gt;}]},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var_5572"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Color"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Grey"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Size"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Medium"&lt;/span&gt;&lt;span class="p"&gt;}]},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var_5573"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Color"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Grey"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Size"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Large"&lt;/span&gt;&lt;span class="p"&gt;}]}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Agent A picks &lt;code&gt;var_5572&lt;/code&gt;. Agent B picks &lt;code&gt;var_5572&lt;/code&gt;. Both add to cart. Both succeed. &lt;strong&gt;Clean data, predictable behaviour.&lt;/strong&gt; Each variant declares its options as an array of &lt;code&gt;{name, label}&lt;/code&gt; pairs — the spec's &lt;a href="https://ucp.dev/2026-04-08/schemas/shopping/types/selected_option.json" rel="noopener noreferrer"&gt;&lt;code&gt;selected_option&lt;/code&gt;&lt;/a&gt; shape — so the agent matches "medium" against the &lt;code&gt;Size&lt;/code&gt; axis unambiguously.&lt;/p&gt;

&lt;p&gt;Now the broken version. Same prompt, same product, but the variant data looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"variants"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var_5571"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Color"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Grey"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Size"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"S"&lt;/span&gt;&lt;span class="p"&gt;}]},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var_5572"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Color"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Grey"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Size"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"M"&lt;/span&gt;&lt;span class="p"&gt;}]},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var_5573"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Color"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Grey"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Size"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"L"&lt;/span&gt;&lt;span class="p"&gt;}]},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var_5574"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Color"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Grey"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Size"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Medium / Regular Fit"&lt;/span&gt;&lt;span class="p"&gt;}]},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var_5575"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Color"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Grey"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Size"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Medium / Slim Fit"&lt;/span&gt;&lt;span class="p"&gt;}]}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Agent A picks &lt;code&gt;var_5572&lt;/code&gt; (interpreting "M" as the canonical "Medium"). Agent B picks &lt;code&gt;var_5574&lt;/code&gt; (interpreting "Medium / Regular Fit" as the more explicit match). &lt;strong&gt;Neither is wrong.&lt;/strong&gt; The user said "medium" and both interpretations are defensible. But because the variant data conflates two different axes — size and fit — into a single &lt;code&gt;Size&lt;/code&gt; label, the two agents diverge, and the user's experience depends on which model they're using. The spec form makes the bug obvious: &lt;code&gt;Fit&lt;/code&gt; should be its own &lt;code&gt;selected_option&lt;/code&gt;, not crammed into the &lt;code&gt;Size&lt;/code&gt; label.&lt;/p&gt;

&lt;p&gt;Worse: many real implementations don't even include the option labels. They expose only opaque variant IDs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"variants"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var_5571"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var_5572"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var_5573"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the agent has no way to know which variant corresponds to "Medium" at all. It guesses. Sometimes it guesses right. Often it doesn't. That's how checkout sessions end up in &lt;code&gt;cart_created&lt;/code&gt; without ever reaching &lt;code&gt;checkout_reached&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this is the #1 failure mode
&lt;/h2&gt;

&lt;p&gt;Across the &lt;a href="https://ucpchecker.com/blog/ucp-playground-1000-agent-sessions" rel="noopener noreferrer"&gt;Playground session dataset&lt;/a&gt;, roughly &lt;strong&gt;62% of sessions end without a completed checkout&lt;/strong&gt;. The breakdown is informative:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Outcome&lt;/th&gt;
&lt;th&gt;Share&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;checkout_reached&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;38%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;search_only&lt;/code&gt; (browsed, didn't add)&lt;/td&gt;
&lt;td&gt;27%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;failed&lt;/code&gt; (provider error, model refusal, max turns)&lt;/td&gt;
&lt;td&gt;22%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;cart_created&lt;/code&gt; (added, didn't proceed)&lt;/td&gt;
&lt;td&gt;13%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;cart_created&lt;/code&gt; cohort — sessions where the agent successfully picked something but couldn't finish — is the variant-mismatch signal. The agent had enough information to add to cart but the cart contents weren't valid for checkout. That's the structural shape of "wrong variant picked."&lt;/p&gt;

&lt;p&gt;Roughly half of the categorised &lt;code&gt;failed&lt;/code&gt; sessions are also variant-shape problems — the agent picked a variant ID that the cart endpoint rejects, retried with another, hit &lt;code&gt;max_turns_exceeded&lt;/code&gt; while flailing through the variant list. Add those in and &lt;strong&gt;variant-related failures account for somewhere around a fifth of all sessions&lt;/strong&gt;, which is more than any other categorisable failure mode.&lt;/p&gt;

&lt;p&gt;The thing that makes this pattern so consistent: &lt;strong&gt;clean variant data is not part of UCP Score or schema validation&lt;/strong&gt;. A store can pass &lt;a href="https://ucpchecker.com/blog/introducing-ucp-score-agent-readiness-grade" rel="noopener noreferrer"&gt;UCP Score&lt;/a&gt; at A grade and still emit variant data that breaks every agent in the field. The validator looks at whether the manifest parses; it doesn't look at whether the variants are agent-resolvable. That gap is exactly why this post exists.&lt;/p&gt;

&lt;h2&gt;
  
  
  A spec gap that compounds the problem
&lt;/h2&gt;

&lt;p&gt;Even when a store is fully UCP-compliant, the protocol leaves room for ambiguity. The 2026-04-08 schema makes &lt;code&gt;variant.options[]&lt;/code&gt; optional — including on products where &lt;code&gt;product.options[]&lt;/code&gt; is non-empty and there are multiple variants. So a payload like &lt;code&gt;{"options": [{"name": "Size", "values": [{"label": "Small"}, {"label": "Medium"}]}], "variants": [{"id": "var_a"}, {"id": "var_b"}]}&lt;/code&gt; is technically valid but agent-unresolvable: nothing links &lt;code&gt;var_a&lt;/code&gt; to "Small" rather than "Medium." Two consumers looking at this payload can defensibly pick different variants for the same prompt.&lt;/p&gt;

&lt;p&gt;A conditional &lt;code&gt;MUST&lt;/code&gt; in the spec — &lt;em&gt;"when &lt;code&gt;product.options&lt;/code&gt; is non-empty and &lt;code&gt;variants.length &amp;gt; 1&lt;/code&gt;, every variant MUST populate &lt;code&gt;options[]&lt;/code&gt;"&lt;/em&gt; — would close this cleanly. Until that lands, agent-resolvability is on the merchant rather than the protocol.&lt;/p&gt;

&lt;h2&gt;
  
  
  The five variant anti-patterns
&lt;/h2&gt;

&lt;p&gt;In rough order of frequency observed across the dataset:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Opaque variant IDs with no option metadata
&lt;/h3&gt;

&lt;p&gt;The shape from the third example above — variants exposed only as &lt;code&gt;var_5572&lt;/code&gt;, no &lt;code&gt;options&lt;/code&gt;, no &lt;code&gt;attributes&lt;/code&gt;, no human-readable axis. Agents have no way to map a user's "Medium" to a specific ID. They either guess or pick the first variant, both of which produce wrong outcomes routinely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; every variant must carry the axis values that distinguish it from siblings, in the spec's &lt;a href="https://ucp.dev/2026-04-08/schemas/shopping/types/selected_option.json" rel="noopener noreferrer"&gt;&lt;code&gt;selected_option&lt;/code&gt;&lt;/a&gt; array form: &lt;code&gt;"options": [{"name": "Size", "label": "Medium"}, {"name": "Color", "label": "Grey"}]&lt;/code&gt;. The &lt;code&gt;name&lt;/code&gt; field tells the agent which axis the value belongs to; &lt;code&gt;label&lt;/code&gt; is what gets matched against the user's request. Whatever the product's options page shows to a human shopper — size, colour, material, fit — the variant data should expose programmatically with one &lt;code&gt;selected_option&lt;/code&gt; entry per axis.&lt;/p&gt;

&lt;p&gt;The corollary: descriptive attributes that aren't selection axes belong in &lt;code&gt;metadata&lt;/code&gt;, not &lt;code&gt;product.options[]&lt;/code&gt;. A one-variant simple product with "Color: Gray" should expose Gray as &lt;code&gt;metadata.attributes&lt;/code&gt;, not as a single-value &lt;code&gt;product.option&lt;/code&gt; — otherwise consumer UIs render a one-button picker that looks selectable but isn't. The split: &lt;code&gt;product.options[]&lt;/code&gt; is for axes the buyer chooses across; &lt;code&gt;metadata&lt;/code&gt; is for descriptive properties of the (only) variant.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Conflated axes in a single string
&lt;/h3&gt;

&lt;p&gt;The shape from the second example — &lt;code&gt;"Medium / Regular Fit"&lt;/code&gt; as a single option value where size and fit are two separate user choices. Agents can parse this, but inconsistently across models, because the conflation is ambiguous. Different models split the string differently, and the variant they end up picking depends on which side of the slash they prioritise.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; each variant attribute lives in its own field. Don't compose. If your product has size + fit as two axes, the variant data should look like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var_5574"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Size"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Medium"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Fit"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Regular"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two clean axes, two unambiguous values, no string parsing required. Agents pick consistently. The array-of-&lt;code&gt;selected_option&lt;/code&gt; form is the shape UCP &lt;code&gt;2026-04-08&lt;/code&gt; defines for &lt;code&gt;variant.options&lt;/code&gt; — see &lt;a href="https://ucp.dev/2026-04-08/schemas/shopping/types/selected_option.json" rel="noopener noreferrer"&gt;&lt;code&gt;selected_option.json&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Inconsistent labelling between sibling variants
&lt;/h3&gt;

&lt;p&gt;Not all variants on the same product use the same option vocabulary. One says &lt;code&gt;"M"&lt;/code&gt;, another says &lt;code&gt;"Medium"&lt;/code&gt;, another says &lt;code&gt;"med"&lt;/code&gt;. We see this on stores that have grown organically — different teams added variants over different years, naming conventions drifted, the inconsistency is invisible to the merchandising team because the storefront UI hides it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; one canonical label per axis value, applied consistently across every variant on every product. If "Medium" is the canonical label, every Medium variant uses exactly &lt;code&gt;"Medium"&lt;/code&gt;. No &lt;code&gt;"M"&lt;/code&gt;, no &lt;code&gt;"med"&lt;/code&gt;, no &lt;code&gt;"Medium "&lt;/code&gt; (trailing space). Agents reason by string match; consistency is what makes the match reliable.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Missing or inconsistent stock / availability flags
&lt;/h3&gt;

&lt;p&gt;A variant exists in the catalogue but is sold out, and the variant data doesn't say so. The agent picks it, the cart accepts the add, the checkout endpoint rejects it. The agent doesn't know to retry with a different variant — it had no signal that the variant was unavailable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; every variant declares its &lt;code&gt;availability&lt;/code&gt; object — &lt;code&gt;{"available": true, "status": "in_stock"}&lt;/code&gt; is the spec shape, with well-known status values &lt;code&gt;in_stock&lt;/code&gt;, &lt;code&gt;backorder&lt;/code&gt;, &lt;code&gt;preorder&lt;/code&gt;, &lt;code&gt;out_of_stock&lt;/code&gt;, and &lt;code&gt;discontinued&lt;/code&gt;. Agents skip unavailable variants if you tell them to, and &lt;code&gt;status&lt;/code&gt; gives them enough signal to decide whether to wait, substitute, or surface an out-of-stock message to the user.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Declared axes that variants don't honor
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;product.options[]&lt;/code&gt; declares the selectable axes; &lt;code&gt;variants[]&lt;/code&gt; is the universe of actual purchasable combinations. When the cardinality of declared axes doesn't match what variants actually carry — e.g., &lt;code&gt;product.options&lt;/code&gt; declares Color × Size = 9 combinations but only 3 color-only variants exist — agents try to satisfy a Size selection that no variant honors. Strict consumers return null and refuse to add; lenient consumers guess and pick wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; keep &lt;code&gt;product.options[]&lt;/code&gt; and &lt;code&gt;variants[]&lt;/code&gt; in sync. Either every declared axis combination has a corresponding variant, or the axis shouldn't be in &lt;code&gt;product.options[]&lt;/code&gt;. If sizes aren't actually configurable for this product, drop &lt;code&gt;Size&lt;/code&gt; from the axes; don't leave it dangling.&lt;/p&gt;

&lt;h2&gt;
  
  
  What clean variant data looks like
&lt;/h2&gt;

&lt;p&gt;Here's the shape that resolves cleanly across every frontier model we test:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"prod_42"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Heavyweight Crew Tee"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Heavyweight cotton crew-neck tee."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"price_range"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"min"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"amount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"currency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"USD"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"max"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"amount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"currency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"USD"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Color"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"values"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Charcoal"&lt;/span&gt;&lt;span class="p"&gt;}]},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Size"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;"values"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Small"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Medium"&lt;/span&gt;&lt;span class="p"&gt;}]},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Fit"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="nl"&gt;"values"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Regular"&lt;/span&gt;&lt;span class="p"&gt;}]}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"variants"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var_5571"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Charcoal / Small / Regular"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Heavyweight crew tee, charcoal, size small, regular fit."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"amount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"currency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"USD"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"availability"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"available"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"in_stock"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Color"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Charcoal"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Size"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Small"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Fit"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Regular"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var_5572"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Charcoal / Medium / Regular"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Heavyweight crew tee, charcoal, size medium, regular fit."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"amount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"currency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"USD"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"availability"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"available"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"in_stock"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Color"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Charcoal"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Size"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Medium"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Fit"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Regular"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Four spec fields make this work:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;product.options&lt;/code&gt;&lt;/strong&gt; at the product level — declares the axes (&lt;code&gt;Color&lt;/code&gt;, &lt;code&gt;Size&lt;/code&gt;, &lt;code&gt;Fit&lt;/code&gt;) and their valid values as an array of &lt;a href="https://ucp.dev/2026-04-08/schemas/shopping/types/product_option.json" rel="noopener noreferrer"&gt;&lt;code&gt;product_option&lt;/code&gt;&lt;/a&gt; &lt;code&gt;{name, values: [{label}]}&lt;/code&gt;. Agents know upfront how many dimensions a variant occupies and what values are valid on each axis.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;variant.options&lt;/code&gt;&lt;/strong&gt; as an array of &lt;a href="https://ucp.dev/2026-04-08/schemas/shopping/types/selected_option.json" rel="noopener noreferrer"&gt;&lt;code&gt;selected_option&lt;/code&gt;&lt;/a&gt; &lt;code&gt;{name, label}&lt;/code&gt; — each axis has its own entry, no string parsing, no conflation. The &lt;code&gt;name&lt;/code&gt; matches the product-level axis; the &lt;code&gt;label&lt;/code&gt; matches the user's request.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;variant.availability&lt;/code&gt;&lt;/strong&gt; with &lt;code&gt;available&lt;/code&gt; and &lt;code&gt;status&lt;/code&gt; — agents skip unavailable variants without trial-and-error, and &lt;code&gt;status&lt;/code&gt; (&lt;code&gt;in_stock&lt;/code&gt;, &lt;code&gt;backorder&lt;/code&gt;, &lt;code&gt;preorder&lt;/code&gt;, &lt;code&gt;out_of_stock&lt;/code&gt;, &lt;code&gt;discontinued&lt;/code&gt;) gives them enough signal to wait, substitute, or surface the right message.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Required scaffolding&lt;/strong&gt; — &lt;code&gt;id&lt;/code&gt;, &lt;code&gt;title&lt;/code&gt;, &lt;code&gt;description&lt;/code&gt;, and &lt;code&gt;price&lt;/code&gt; on every &lt;a href="https://ucp.dev/2026-04-08/schemas/shopping/types/variant.json" rel="noopener noreferrer"&gt;variant&lt;/a&gt;, and &lt;code&gt;id&lt;/code&gt;, &lt;code&gt;title&lt;/code&gt;, &lt;code&gt;description&lt;/code&gt;, &lt;code&gt;price_range&lt;/code&gt;, &lt;code&gt;variants&lt;/code&gt; on the &lt;a href="https://ucp.dev/2026-04-08/schemas/shopping/types/product.json" rel="noopener noreferrer"&gt;product&lt;/a&gt;. These aren't "nice to have"; they're the schema's required fields. Variants missing any of them won't validate.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Bonus stability:&lt;/strong&gt; when present, &lt;code&gt;option_value.id&lt;/code&gt; and &lt;code&gt;selected_option.id&lt;/code&gt; give stable identifiers that survive label drift. If your platform supports it (most do — Shopify uses GIDs, WooCommerce uses &lt;code&gt;pa_*&lt;/code&gt; taxonomy slugs), populate &lt;code&gt;id&lt;/code&gt; alongside &lt;code&gt;label&lt;/code&gt; and consumers can match on the stable key when labels change.&lt;/p&gt;

&lt;p&gt;Stores running variant data in this shape resolve user prompts to specific variants reliably across every model we've benchmarked. The pattern isn't novel — it's the same shape Shopify uses internally, the same shape WooCommerce variations use when properly structured, the same shape every traditional e-commerce platform ends up at after enough years of evolution. UCP just exposes it programmatically to the agent layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to validate your variant data
&lt;/h2&gt;

&lt;p&gt;Three layers, in order:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Static audit.&lt;/strong&gt; Run your store through &lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;UCPChecker&lt;/a&gt;. The validator surfaces variants with missing &lt;code&gt;options&lt;/code&gt; data, conflated axes, inconsistent labels across sibling variants, and missing availability flags. None of this is part of strict UCP-spec conformance, but our &lt;a href="https://ucpchecker.com/methodology" rel="noopener noreferrer"&gt;methodology&lt;/a&gt; flags variant-quality issues as part of the Capability Coverage score because they materially affect whether agents can transact.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Live agent test.&lt;/strong&gt; Run a multi-model agent session against your store via &lt;a href="https://ucpplayground.com/" rel="noopener noreferrer"&gt;UCP Playground&lt;/a&gt;. The framework exercises the full search → variant-pick → cart → checkout flow against frontier agents across &lt;a href="https://ucpplayground.com/models" rel="noopener noreferrer"&gt;15+ models&lt;/a&gt;. If your variant data is ambiguous, you'll see different models pick different variants for the same prompt — the exact pattern we walk through in &lt;a href="https://ucpchecker.com/blog/ucp-playground-1000-agent-sessions" rel="noopener noreferrer"&gt;the Playground 1,000-sessions analysis&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Continuous monitoring.&lt;/strong&gt; Variant data changes over time as you add products and SKUs. Set up &lt;a href="https://ucpchecker.com/alerts" rel="noopener noreferrer"&gt;UCP Alerts&lt;/a&gt; so you get notified when a variant audit starts surfacing new issues — typically a sign that a recent merchandising change introduced inconsistent labelling at scale.&lt;/p&gt;

&lt;p&gt;The order matters. Static audit catches the easy cases (missing fields, schema-shaped problems) cheaply. Live agent test catches the cases where the schema is fine but agents disagree (the conflated-axis cases, the inconsistent-label cases). Monitoring catches drift over time. Skipping any of the three leaves a class of variant problems undetected.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to fix in your store, and how to verify it
&lt;/h2&gt;

&lt;p&gt;If you're a merchant reading this and your store is running on &lt;a href="https://ucpchecker.com/platforms/shopify" rel="noopener noreferrer"&gt;Shopify&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/woocommerce" rel="noopener noreferrer"&gt;WooCommerce&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/bigcommerce" rel="noopener noreferrer"&gt;BigCommerce&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/magento" rel="noopener noreferrer"&gt;Magento&lt;/a&gt;, or &lt;a href="https://ucpchecker.com/platforms/prestashop" rel="noopener noreferrer"&gt;PrestaShop&lt;/a&gt;, the variant data structure is mostly determined by your platform's defaults. The platform-specific fixes are documented in the platform guides — but the meta-pattern is the same:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Audit&lt;/strong&gt; at &lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;ucpchecker.com/check&lt;/a&gt; — get a list of variant-data issues&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fix&lt;/strong&gt; the most common one first (usually missing &lt;code&gt;options&lt;/code&gt; metadata)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test&lt;/strong&gt; at &lt;a href="https://ucpplayground.com/" rel="noopener noreferrer"&gt;ucpplayground.com&lt;/a&gt; with two different models against the same product, asking for the same variant&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verify&lt;/strong&gt; that both models pick the same variant ID consistently&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor&lt;/strong&gt; weekly — variant drift is the most common reason a store's UCP Score regresses&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Variant data is a back-office data-quality problem dressed up as an agentic commerce problem. The fix is mostly editorial — get your axis labels consistent, expose your option values structurally, mark sold-out variants as such. None of this is technically hard. It's the kind of work that adds up to "agents can buy from your store" rather than "agents try to buy from your store and quietly fail."&lt;/p&gt;

&lt;p&gt;If you fix one thing on the agent-readiness side this quarter, fix variant data. The conversion lift is bigger than any other single change you can make.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One thing worth naming:&lt;/strong&gt; consumer tools that silently paper over variant data problems (substring matching, positional guessing, falling back to &lt;code&gt;variants[0]&lt;/code&gt;) make this worse, not better. They hide the failure mode from merchants who would otherwise see it and fix the data. Faithful rendering — null when the match is ambiguous, errors when the data is inconsistent — is what produces correct merchant behaviour. If your variant data only works in some agents, that's a signal the data is the problem, not the agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Audit your variants now&lt;/strong&gt;: &lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;ucpchecker.com/check&lt;/a&gt; — flags variant issues alongside the rest of the UCP Score&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test variant resolution with real agents&lt;/strong&gt;: &lt;a href="https://ucpplayground.com/" rel="noopener noreferrer"&gt;ucpplayground.com&lt;/a&gt; — run two models against your store on the same prompt, see if they pick the same variant&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read the broader failure-mode taxonomy&lt;/strong&gt;: &lt;a href="https://ucpchecker.com/blog/common-ucp-errors" rel="noopener noreferrer"&gt;Common UCP Errors and How to Fix Them&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Track ecosystem-wide variant adoption&lt;/strong&gt;: &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-april-2026" rel="noopener noreferrer"&gt;State of Agentic Commerce — April 2026&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ecommerce</category>
      <category>webdev</category>
      <category>tutorial</category>
      <category>ucp</category>
    </item>
    <item>
      <title>The UCP Technical Council Just Shipped Attribution into Core. Here's What That Means.</title>
      <dc:creator>Benji Fisher</dc:creator>
      <pubDate>Wed, 06 May 2026 07:43:57 +0000</pubDate>
      <link>https://dev.to/benjifisher/the-ucp-technical-council-just-shipped-attribution-into-core-heres-what-that-means-2cnh</link>
      <guid>https://dev.to/benjifisher/the-ucp-technical-council-just-shipped-attribution-into-core-heres-what-that-means-2cnh</guid>
      <description>&lt;p&gt;On &lt;strong&gt;May 5, 2026&lt;/strong&gt;, the UCP Technical Council merged &lt;a href="https://github.com/Universal-Commerce-Protocol/ucp/pull/391" rel="noopener noreferrer"&gt;PR #391&lt;/a&gt; into the spec's &lt;code&gt;main&lt;/code&gt; branch — adding a top-level &lt;code&gt;attribution&lt;/code&gt; field to cart, checkout, catalog, and order operations. The field carries platform-emitted referral and conversion-event context: campaign IDs, click identifiers (&lt;code&gt;gclid&lt;/code&gt;, &lt;code&gt;fbclid&lt;/code&gt;, &lt;code&gt;ttclid&lt;/code&gt;), source/medium markers. Open string-keyed map. Universal across requests; not gated by capability negotiation.&lt;/p&gt;

&lt;p&gt;As UCP matures, attribution landing in core was always going to happen. Agentic commerce can't operate as commercial infrastructure without a path for advertising and measurement context to flow alongside the transactional data — and the longer that gap stayed open, the more pressure would have built for vendors to ship incompatible parallel solutions. The merge isn't the surprising part. &lt;strong&gt;The interesting part is the specific shape of what shipped, and what its presence in core tells us about where the spec is heading.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Two things to dig into: the technical detail of the field itself, and the trajectory implication of advertising and measurement infrastructure landing in UCP core for the first time.&lt;/p&gt;

&lt;h2&gt;
  
  
  What shipped
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;attribution&lt;/code&gt; field is structurally simple. From Grigorik's own example in the PR:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"attribution"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"campaign_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"18234567890"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"campaign_source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"google"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"campaign_medium"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cpc"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"campaign_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"spring_2026"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"gclid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"EAIaIQobChMI..."&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No prescribed schema beyond "string-keyed object." Platforms populate it with whatever conventions they already use — GA4 campaign parameters, click identifiers, custom tracking keys. Businesses receive the data and process per their own analytics needs. UCP itself does &lt;strong&gt;not&lt;/strong&gt; prescribe attribution windows, models, or assignment logic. The protocol carries the data; attribution math happens downstream.&lt;/p&gt;

&lt;p&gt;The field appears in three roles across the request lifecycle:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Direction&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;catalog&lt;/code&gt; (search, lookup)&lt;/td&gt;
&lt;td&gt;Platform-emitted input&lt;/td&gt;
&lt;td&gt;Platform → merchant&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cart&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Platform-emitted input&lt;/td&gt;
&lt;td&gt;Platform → merchant&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;checkout&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Platform-emitted input&lt;/td&gt;
&lt;td&gt;Platform → merchant&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;order&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Business-emitted snapshot&lt;/td&gt;
&lt;td&gt;Merchant → platform&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The asymmetry matters. On catalog/cart/checkout, the platform writes attribution as it would write a UTM string into a browser URL — referral context flowing forward. On &lt;code&gt;order&lt;/code&gt;, the business preserves the originating attribution as a snapshot — closing the loop between agent-mediated conversion and the platform that produced it.&lt;/p&gt;

&lt;p&gt;Grigorik's framing in the PR is the cleanest one-line summary of intent: the field "carries the same parameters platforms communicate via URL query parameters in browser-based flows, in the same flat key-value form." Attribution in agent-mediated commerce is the agent counterpart of UTM strings. Same parameters, same model, different transport layer.&lt;/p&gt;

&lt;p&gt;Thirteen files changed. The core addition is &lt;code&gt;source/schemas/shopping/types/attribution.json&lt;/code&gt; — the new type definition. Schemas for cart, catalog_lookup, catalog_search, checkout, and order all gain the field as an optional property. Specification docs across cart, catalog, checkout, order, and the overview were updated to describe the field's purpose and semantics.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architectural decision: core field, not extension
&lt;/h2&gt;

&lt;p&gt;The substantively interesting part of this PR is not what got added. It's how it got added.&lt;/p&gt;

&lt;p&gt;PR #391 was Grigorik's alternative proposal to &lt;a href="https://github.com/Universal-Commerce-Protocol/ucp/pull/295" rel="noopener noreferrer"&gt;PR #295&lt;/a&gt;, which James Andersen had opened earlier proposing an &lt;code&gt;event_context&lt;/code&gt; extension. Both proposals tried to solve the same problem — give platforms a way to pass referral/attribution data through to merchants in agent flows — but with very different architectural shapes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;#295 (Andersen, Meta):&lt;/strong&gt; Attribution as a &lt;strong&gt;structured extension&lt;/strong&gt;. Capability-negotiated. Validated against a defined schema. Standardised vocabulary across platforms.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;#391 (Grigorik, Shopify):&lt;/strong&gt; Attribution as a &lt;strong&gt;top-level core field&lt;/strong&gt;. Open key-value map. No capability negotiation. Each platform uses its own conventions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Andersen formally approved Grigorik's alternative — &lt;em&gt;"thanks for finding a better home for attribution data than the original proposal"&lt;/em&gt; — and the rearchitecture went on to merge through TC discussion. That cross-vendor pattern (one TC member proposes; another offers a structurally different alternative; the original proposer endorses it) is the dynamic that produces robust standards rather than fragmented vendor extensions.&lt;/p&gt;

&lt;p&gt;The PR discussion pivots on which architectural shape this kind of data deserves. Amit Handa wrote the canonical comment on May 3 establishing the decision framework — worth quoting because it'll likely be cited as governance precedent in future spec discussions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criterion&lt;/th&gt;
&lt;th&gt;Use a UCP Extension&lt;/th&gt;
&lt;th&gt;Use Optional Flat Key-Value Pairs&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Impact on Behavior&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Changes state or execution of the operation&lt;/td&gt;
&lt;td&gt;Purely informational&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Stability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Stable, standardized vocabulary&lt;/td&gt;
&lt;td&gt;Volatile, platform-specific, rapidly evolving&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Capability Negotiation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Requires mutual agreement + active parent capability&lt;/td&gt;
&lt;td&gt;Best-effort, consumed at-will, no gating&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Schema Validation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Strict — transaction integrity matters&lt;/td&gt;
&lt;td&gt;Flexible — validation happens downstream&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multi-Platform Scale&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Data normalization across diverse platforms&lt;/td&gt;
&lt;td&gt;Low friction; normalization burden on receiver&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Typical Examples&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;discount&lt;/code&gt;, &lt;code&gt;fulfillment&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;attribution&lt;/code&gt;, referral tracking, session tags&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Attribution falls cleanly on the right side of every row. Marketing identifiers (&lt;code&gt;gclid&lt;/code&gt;, &lt;code&gt;fbclid&lt;/code&gt;, &lt;code&gt;ttclid&lt;/code&gt;) are volatile and platform-specific — every adtech vendor invents their own; standardising them in the spec would be obsolete the moment a new platform launches. Attribution doesn't change protocol behaviour — it's read-only context that some downstream pipeline cares about, with no transactional consequence. There's nothing for a merchant to negotiate; either you record it or you don't.&lt;/p&gt;

&lt;p&gt;The merged PR locks this decision in. Future contributors proposing similar volatile, informational, platform-specific data structures now have a precedent: &lt;strong&gt;the spec prefers flat optional key-value pairs over structured extensions for non-state-changing context.&lt;/strong&gt; That's a piece of governance documentation as much as a feature merge, and Handa's table will be the reference for it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The trajectory implication
&lt;/h2&gt;

&lt;p&gt;UCP up to this point has been &lt;strong&gt;protocol mechanics&lt;/strong&gt;. How agents discover stores. How they shop. How they pay. How they identify users. How they handle returns. The mechanics are necessary, but they don't directly produce commercial value for the ecosystem participants. A merchant with a perfectly conformant UCP implementation but no attribution can't measure agent-driven conversions, can't optimise marketing spend, can't close the loop between platform investment and merchant outcomes.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;attribution&lt;/code&gt; closes that loop. With the field in core, the entire adtech infrastructure that powers current ecommerce extends naturally into agent-mediated commerce. Platforms attribute conversions to specific campaigns. Click identifiers persist across the agent flow. Businesses run their existing analytics pipelines on agent-driven traffic with no special handling. The bridge that makes UCP commercially usable for marketing teams — not just engineering teams — now exists in the core spec.&lt;/p&gt;

&lt;p&gt;The trajectory implication is the part worth sitting with: &lt;strong&gt;UCP is evolving from protocol mechanics into commercial infrastructure.&lt;/strong&gt; Each subsequent spec addition probably bridges another piece of existing commerce infrastructure into the agent layer. Loyalty programs. Customer data platforms. Marketing automation triggers. Inventory hooks. Each one makes UCP more complete as commercial infrastructure rather than just protocol mechanics.&lt;/p&gt;

&lt;p&gt;The architectural-precedent decision in #391 makes that trajectory more efficient. Future contributors proposing similar bridges (attribution-adjacent measurement primitives, marketing identifiers, session metadata) now have a clear template: flat key-value pairs into core, governance precedent already established. The spec doesn't need to relitigate the core-vs-extension decision every time a volatile, informational primitive comes up.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it means in practice
&lt;/h2&gt;

&lt;p&gt;For &lt;strong&gt;merchants&lt;/strong&gt;: your UCP implementation should accept the &lt;code&gt;attribution&lt;/code&gt; field on incoming cart, checkout, and catalog requests, preserve it through to order records, and surface it through your analytics pipeline. The lift is small — it's a string-keyed JSON object on existing endpoints — but missing it means agent-driven conversions arrive at your analytics with no source attribution, which means your marketing team can't measure the channel.&lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;platform vendors&lt;/strong&gt; (&lt;a href="https://ucpchecker.com/platforms/shopify" rel="noopener noreferrer"&gt;Shopify&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/woocommerce" rel="noopener noreferrer"&gt;WooCommerce&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/bigcommerce" rel="noopener noreferrer"&gt;BigCommerce&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/magento" rel="noopener noreferrer"&gt;Magento&lt;/a&gt;, and others): rolling attribution support into the next platform-side compatibility release is now table-stakes work. The stores running on your stack will need to accept and preserve attribution by the time the next published spec version makes this part of conformance.&lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;agent platforms&lt;/strong&gt; (those of us building or testing agents that shop UCP stores): pass platform-emitted attribution forward into every cart/checkout/catalog request. The data is informational, not state-changing — your agent doesn't need to do anything with it beyond passing it through. The merchant decides what to do with it on the receive side.&lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;evaluators&lt;/strong&gt; (us): the &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;UCP Score&lt;/a&gt; will incorporate attribution-acceptance and attribution-preservation conformance in its next release. A store that accepts attribution on cart/checkout/catalog and threads it through to order records will score higher than one that drops it. The &lt;a href="https://ucpchecker.com/methodology" rel="noopener noreferrer"&gt;methodology&lt;/a&gt; page will reflect the rule update when the next score-version drops.&lt;/p&gt;

&lt;h2&gt;
  
  
  Timing: in core today, in the published spec next
&lt;/h2&gt;

&lt;p&gt;One important distinction worth making explicit. PR #391 merged into the spec's &lt;code&gt;main&lt;/code&gt; branch — not into a currently-published spec version. The latest released spec is &lt;strong&gt;v2026-04-08&lt;/strong&gt;, which does not include &lt;code&gt;attribution&lt;/code&gt;. The field lands for conformance purposes in whatever the next published spec version ships (no fixed cadence; expected in the next few months). Until then, attribution sits in the working draft on &lt;code&gt;main&lt;/code&gt; — implementers can adopt it ahead of the release if they want, but it's not yet part of conformance for the published spec.&lt;/p&gt;

&lt;p&gt;That distinction shapes how we're rolling out support across our tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpplayground.com/" rel="noopener noreferrer"&gt;UCP Playground&lt;/a&gt;&lt;/strong&gt; will adopt attribution support when the next spec version drops — agents will pass platform attribution through to merchants.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;UCP Score&lt;/a&gt;&lt;/strong&gt; will incorporate attribution-acceptance and attribution-preservation rules in the score release that aligns with the next published spec.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/ucp-validator" rel="noopener noreferrer"&gt;The validator&lt;/a&gt;&lt;/strong&gt; will support the new field as soon as the next spec ships, and the &lt;a href="https://ucpchecker.com/bulk-check" rel="noopener noreferrer"&gt;bulk checker&lt;/a&gt; will surface attribution conformance per-merchant after that.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The architectural certainty is already here — the schema is locked, the field is documented, the design pattern is settled. The spec drop is the &lt;strong&gt;conformance trigger&lt;/strong&gt;, not the design moment. Implementers who start work today against the working draft are operating against a known target.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to read more
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The PR itself: &lt;a href="https://github.com/Universal-Commerce-Protocol/ucp/pull/391" rel="noopener noreferrer"&gt;#391 on Universal-Commerce-Protocol/ucp&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;The merge commit: &lt;a href="https://github.com/Universal-Commerce-Protocol/ucp/commit/76a35394051222bcef8169c9c5c4c03072542a98" rel="noopener noreferrer"&gt;&lt;code&gt;76a3539&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;The new schema type: &lt;a href="https://github.com/Universal-Commerce-Protocol/ucp/blob/main/source/schemas/shopping/types/attribution.json" rel="noopener noreferrer"&gt;&lt;code&gt;source/schemas/shopping/types/attribution.json&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Updated authoring guidance: &lt;a href="https://github.com/Universal-Commerce-Protocol/ucp/blob/main/docs/documentation/schema-authoring.md" rel="noopener noreferrer"&gt;&lt;code&gt;docs/documentation/schema-authoring.md&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  About UCP Checker
&lt;/h2&gt;

&lt;p&gt;UCP Checker is the independent validation and monitoring layer for the &lt;a href="https://ucp.dev" rel="noopener noreferrer"&gt;Universal Commerce Protocol&lt;/a&gt;. We crawl, validate, and grade every public UCP manifest in the open web, run the &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;merchant directory&lt;/a&gt; and the &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;UCP Score&lt;/a&gt;, publish the &lt;a href="https://ucpchecker.com/leaderboard" rel="noopener noreferrer"&gt;leaderboard&lt;/a&gt; and &lt;a href="https://ucpchecker.com/stats" rel="noopener noreferrer"&gt;adoption stats&lt;/a&gt;, and track major spec events like this one as they ship.&lt;/p&gt;

&lt;p&gt;If you're building on UCP and want to know whether your store is ready for the next spec version: &lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;run a check&lt;/a&gt;. If you're tracking the spec's evolution professionally: subscribe to our &lt;a href="https://ucpchecker.com/stats/sample-report" rel="noopener noreferrer"&gt;weekly digest&lt;/a&gt; — we cover spec changes like this one within a week of merge.&lt;/p&gt;

</description>
      <category>ecommerce</category>
      <category>webdev</category>
      <category>ai</category>
      <category>ucp</category>
    </item>
    <item>
      <title>UCP Playground at 1,000+ Agent Sessions: What 16 Models and 97 Real Stores Reveal About AI Shopping</title>
      <dc:creator>Benji Fisher</dc:creator>
      <pubDate>Tue, 05 May 2026 09:11:37 +0000</pubDate>
      <link>https://dev.to/benjifisher/ucp-playground-at-1000-agent-sessions-what-16-models-and-97-real-stores-reveal-about-ai-shopping-155p</link>
      <guid>https://dev.to/benjifisher/ucp-playground-at-1000-agent-sessions-what-16-models-and-97-real-stores-reveal-about-ai-shopping-155p</guid>
      <description>&lt;p&gt;Two and a half months ago we &lt;a href="https://ucpchecker.com/blog/why-we-built-ucp-playground" rel="noopener noreferrer"&gt;published Why We Built UCP Playground&lt;/a&gt;, which closed on 114 agent sessions and an honest acknowledgement that the dataset was thin — most models had single-digit sample sizes, store coverage was uneven, and the headline rates moved meaningfully with every new run. A month later we crossed a different threshold: the &lt;a href="https://ucpchecker.com/blog/first-autonomous-ai-agent-purchase-ucp" rel="noopener noreferrer"&gt;first fully autonomous AI agent purchase through UCP&lt;/a&gt; — a Gemini agent searching, adding to cart, linking identity, paying, and completing checkout at &lt;a href="https://ucpchecker.com/status/houseofparfum.nl" rel="noopener noreferrer"&gt;houseofparfum.nl&lt;/a&gt; without a human past the initial prompt.&lt;/p&gt;

&lt;p&gt;Eighty days on from the first post, and roughly forty days after that autonomous purchase, the dataset is in a different shape:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Over 1,000 agent shopping sessions&lt;/strong&gt; captured end-to-end with full tool-call timelines and replayable event streams&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;16 frontier models&lt;/strong&gt; — every major lab, plus a reasoning-tuned subset&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;97 distinct UCP-enabled stores&lt;/strong&gt; across Shopify, WooCommerce, BigCommerce, Magento, PrestaShop, and custom stacks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;$96,032 of agent-driven cart value&lt;/strong&gt; generated, primarily in USD with a long tail across EUR, GBP, INR, ILS, PKR&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;80 days of run history&lt;/strong&gt; since Feb 14, 2026&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's the reference dataset for this post. Eight findings emerge from it. Most of them survive being scrutinised at the new sample size; one or two reverse the early-data narrative.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 1 — Claude Sonnet 4.5 leads on aggregate checkout rate
&lt;/h2&gt;

&lt;p&gt;With sample sizes now large enough to take seriously, the per-model checkout-rate leaderboard looks like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Share of dataset&lt;/th&gt;
&lt;th&gt;Checkout rate&lt;/th&gt;
&lt;th&gt;Avg tokens&lt;/th&gt;
&lt;th&gt;Avg duration&lt;/th&gt;
&lt;th&gt;Fail rate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/claude-sonnet-4-5" rel="noopener noreferrer"&gt;Claude Sonnet 4.5&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;20.7%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;50.8%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;71,195&lt;/td&gt;
&lt;td&gt;38.1s&lt;/td&gt;
&lt;td&gt;17.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/llama-3-3-70b" rel="noopener noreferrer"&gt;Llama 3.3 70B&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;6.4%&lt;/td&gt;
&lt;td&gt;49.3%&lt;/td&gt;
&lt;td&gt;57,676&lt;/td&gt;
&lt;td&gt;47.7s&lt;/td&gt;
&lt;td&gt;14.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/deepseek-v3-2" rel="noopener noreferrer"&gt;DeepSeek V3.2&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;5.1%&lt;/td&gt;
&lt;td&gt;45.0%&lt;/td&gt;
&lt;td&gt;32,502&lt;/td&gt;
&lt;td&gt;46.0s&lt;/td&gt;
&lt;td&gt;21.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gemini-3-flash" rel="noopener noreferrer"&gt;Gemini 3 Flash&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;12.5%&lt;/td&gt;
&lt;td&gt;44.6%&lt;/td&gt;
&lt;td&gt;46,520&lt;/td&gt;
&lt;td&gt;21.8s&lt;/td&gt;
&lt;td&gt;15.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/grok-4" rel="noopener noreferrer"&gt;Grok 4&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;4.5%&lt;/td&gt;
&lt;td&gt;39.6%&lt;/td&gt;
&lt;td&gt;34,297&lt;/td&gt;
&lt;td&gt;77.1s&lt;/td&gt;
&lt;td&gt;9.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/claude-opus-4-6" rel="noopener noreferrer"&gt;Claude Opus 4.6&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;10.2%&lt;/td&gt;
&lt;td&gt;38.8%&lt;/td&gt;
&lt;td&gt;44,611&lt;/td&gt;
&lt;td&gt;29.7s&lt;/td&gt;
&lt;td&gt;25.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gemini-2-5-flash" rel="noopener noreferrer"&gt;Gemini 2.5 Flash&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;9.9%&lt;/td&gt;
&lt;td&gt;36.8%&lt;/td&gt;
&lt;td&gt;32,394&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;11.8s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;23.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gpt-4o" rel="noopener noreferrer"&gt;GPT-4o&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;5.2%&lt;/td&gt;
&lt;td&gt;29.5%&lt;/td&gt;
&lt;td&gt;32,811&lt;/td&gt;
&lt;td&gt;14.7s&lt;/td&gt;
&lt;td&gt;24.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gemini-3-1-pro" rel="noopener noreferrer"&gt;Gemini 3.1 Pro&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;7.9%&lt;/td&gt;
&lt;td&gt;29.0%&lt;/td&gt;
&lt;td&gt;30,971&lt;/td&gt;
&lt;td&gt;48.7s&lt;/td&gt;
&lt;td&gt;28.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gemini-2-5-pro" rel="noopener noreferrer"&gt;Gemini 2.5 Pro&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;6.4%&lt;/td&gt;
&lt;td&gt;27.6%&lt;/td&gt;
&lt;td&gt;31,566&lt;/td&gt;
&lt;td&gt;34.4s&lt;/td&gt;
&lt;td&gt;22.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gpt-5-2" rel="noopener noreferrer"&gt;GPT-5.2&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;4.7%&lt;/td&gt;
&lt;td&gt;23.6%&lt;/td&gt;
&lt;td&gt;30,585&lt;/td&gt;
&lt;td&gt;37.4s&lt;/td&gt;
&lt;td&gt;27.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/deepseek-r1" rel="noopener noreferrer"&gt;DeepSeek R1&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;1.4%&lt;/td&gt;
&lt;td&gt;17.6%&lt;/td&gt;
&lt;td&gt;35,360&lt;/td&gt;
&lt;td&gt;61.4s&lt;/td&gt;
&lt;td&gt;29.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/o4-mini" rel="noopener noreferrer"&gt;o4-mini&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;1.4%&lt;/td&gt;
&lt;td&gt;12.5%&lt;/td&gt;
&lt;td&gt;64,055&lt;/td&gt;
&lt;td&gt;38.1s&lt;/td&gt;
&lt;td&gt;37.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/grok-3-mini" rel="noopener noreferrer"&gt;Grok 3 Mini&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;1.7%&lt;/td&gt;
&lt;td&gt;10.0%&lt;/td&gt;
&lt;td&gt;58,386&lt;/td&gt;
&lt;td&gt;55.6s&lt;/td&gt;
&lt;td&gt;35.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/qwq-32b" rel="noopener noreferrer"&gt;QwQ 32B&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;2.0%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.0%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;25,525&lt;/td&gt;
&lt;td&gt;63.9s&lt;/td&gt;
&lt;td&gt;50.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Claude Sonnet 4.5 leads on aggregate checkout rate at 50.8% on the largest single share of the dataset — a sample large enough that the rank ordering is no longer noise. Llama 3.3 70B sits a fraction below at 49.3% on a smaller but still meaningful share. The two are statistically tied; both are operating in a different regime than the rest of the field.&lt;/p&gt;

&lt;p&gt;The most interesting result on this table is &lt;strong&gt;GPT-5.2&lt;/strong&gt;, which at 23.6% lands in the bottom third despite being one of the most capable frontier models on essentially every public benchmark. The gap between its performance on standard reasoning benchmarks and its performance on transactional shopping flows is the single largest delta in the leaderboard. We dig into why in the development notes below.&lt;/p&gt;

&lt;p&gt;One caveat worth flagging up-front: GPT-5.2's 23.6% figure reflects performance across the full 80-day window, including the period before our cursor-stripping fix landed mid-dataset. Sessions after that fix show GPT-5.2 performing meaningfully more competitively. We'll publish the longitudinal split in the August update — the aggregate number above is the worst-case read.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 2 — Reasoning-tuned models continue to underperform
&lt;/h2&gt;

&lt;p&gt;The cohort of reasoning-tuned models (DeepSeek R1, o4-mini, Grok 3 Mini, QwQ 32B) sits unambiguously at the bottom of the leaderboard. Three of them are in the bottom four overall. QwQ 32B has yet to record a single completed checkout across its share of the dataset.&lt;/p&gt;

&lt;p&gt;The pattern was visible in the &lt;a href="https://ucpchecker.com/blog/ucp-playground-evals" rel="noopener noreferrer"&gt;original four-session sample report&lt;/a&gt; shipped with the eval-framework launch in April; it has only sharpened as the dataset grew two orders of magnitude. The pattern is consistent across labs and across architectures (chain-of-thought variants, exploratory reasoning, distilled-from-frontier models — all underperform on shopping flows compared to their non-reasoning counterparts from the same lab).&lt;/p&gt;

&lt;p&gt;The working hypothesis remains: shopping requires fast tool-use rhythm, not deliberation. The decisions in a shopping sequence — search this term, add this item, proceed to checkout — are individually shallow but happen in series. A reasoning model that pauses to deliberate at each step burns clock time and tokens on decisions that don't reward deliberation. Combined with reasoning models' tendency to over-question their own outputs, the result is sessions that hit &lt;code&gt;max_turns_exceeded&lt;/code&gt; before completing.&lt;/p&gt;

&lt;p&gt;Worth noting what isn't in this hypothesis: reasoning models are not bad at commerce in general. They may be excellent at higher-stakes flows — disputed transactions, multi-step contractual reasoning, regulatory edge cases — that the current eval workload doesn't probe. The benchmark says: when the workload is "shop normally," fast non-reasoning models win. Other workloads will tell different stories.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 3 — Speed and accuracy aren't correlated
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://ucpplayground.com/models/gemini-2-5-flash" rel="noopener noreferrer"&gt;Gemini 2.5 Flash&lt;/a&gt; finishes the average shopping session in &lt;strong&gt;11.8 seconds&lt;/strong&gt; — the only model in the field under 15s. Its checkout rate is 36.8% — middling. &lt;a href="https://ucpplayground.com/models/claude-sonnet-4-5" rel="noopener noreferrer"&gt;Claude Sonnet 4.5&lt;/a&gt; takes 38.1s on average and lands a 50.8% checkout rate — the highest on the leaderboard, at more than triple Flash's clock time.&lt;/p&gt;

&lt;p&gt;Two real surfaces: &lt;strong&gt;latency-bound use cases&lt;/strong&gt; (voice agents, mobile commerce, conversational checkout where the user is waiting in real time) effectively must use Gemini 2.5 Flash or Gemini 3 Flash, and pay for the latency win with lower closed-checkout rates. &lt;strong&gt;Throughput-bound use cases&lt;/strong&gt; (batch agents, scheduled buying, autonomous shopping where wall-clock time is mostly hidden) should use Claude Sonnet 4.5 or Llama 3.3 70B and accept the latency cost for the conversion lift.&lt;/p&gt;

&lt;p&gt;The naive intuition merchants reach for — "the better model is faster and more accurate" — doesn't survive contact with this data. The two axes are essentially independent within this corpus. That's a finding nobody can extract from a single-model demo or a vendor benchmark.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 4 — The failure mode taxonomy is dominated by tool errors, not model refusals
&lt;/h2&gt;

&lt;p&gt;Across the 256 failed sessions in the dataset, the categorised error taxonomy is:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Error type&lt;/th&gt;
&lt;th&gt;Sessions&lt;/th&gt;
&lt;th&gt;% of categorised failures&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;openrouter_error&lt;/code&gt; (provider-side)&lt;/td&gt;
&lt;td&gt;51&lt;/td&gt;
&lt;td&gt;56%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;model_refused&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;td&gt;24%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;max_turns_exceeded&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;20%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The single-largest categorised failure mode is &lt;strong&gt;provider-side errors&lt;/strong&gt; — the routing layer between the agent and the model returning a non-200 before the session can complete. This is a cost of operating at scale across 16 models and reflects the still-maturing infrastructure underneath frontier-model API access, not anything specific to UCP.&lt;/p&gt;

&lt;p&gt;The second-largest, &lt;strong&gt;model refusals&lt;/strong&gt;, is more interesting. Twenty-two refusals across the dataset is a refusal rate of roughly 2%. We see refusals concentrated in two situations: (1) sessions against demo stores with unusual product names that pattern-match a model's safety filters, and (2) sessions where the user prompt contains adversarial content seeded by us as part of a prompt-injection eval. We've recorded &lt;strong&gt;6/6 prompt-injection resistance&lt;/strong&gt; across the dedicated injection-eval runs to date, so the model_refused category is partly capturing models doing exactly what they should.&lt;/p&gt;

&lt;p&gt;The third, &lt;strong&gt;max_turns_exceeded&lt;/strong&gt;, is concentrated in the reasoning-model cohort and is the empirical signal for the over-deliberation pattern in Finding 2.&lt;/p&gt;

&lt;p&gt;The remaining 165 failures don't carry a categorised error_type — typically these are sessions where the model abandoned the flow without raising an explicit error. That's a tagging gap in the framework that we're closing in the next iteration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 5 — Store implementation explains most of the cross-store variance
&lt;/h2&gt;

&lt;p&gt;The benchmark's most strategically important finding doesn't come from the per-model column. It comes from the per-store one.&lt;/p&gt;

&lt;p&gt;Across the 97 stores in the dataset, the same model produces dramatically different outcomes. Between the most agent-friendly and least agent-friendly implementations at meaningful sample sizes, the checkout-rate spread exceeds &lt;strong&gt;60 percentage points&lt;/strong&gt; — wider than any model-versus-model gap on the leaderboard. &lt;strong&gt;No model in the field, at any sample size, produces a 60-point spread purely on its own merits.&lt;/strong&gt; Almost all of that variance is store-side, and the rigorous run history across thousands of sessions makes the pattern hard to attribute to anything else.&lt;/p&gt;

&lt;p&gt;The cleanest predictor we've found is whether the store's MCP implementation is &lt;strong&gt;stateless&lt;/strong&gt; or &lt;strong&gt;stateful&lt;/strong&gt;, and how it handles the boundary between them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stateless implementations&lt;/strong&gt; treat every tool call as self-contained. Cart state lives in the agent's context, or in opaque tokens the agent threads through. Identity is established once and re-asserted on each call. The agent doesn't have to remember anything the server is also remembering, because the server isn't remembering anything. Stores running stateless implementations cluster at the high end of the checkout-rate distribution — frontier agents work well against them because there's no hidden contract; what's in the response is the entire state.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stateful implementations&lt;/strong&gt; persist server-side session, cart, and auth across calls, exposed to the agent through session IDs, cookies, or scoped tokens. When this works, it works well. When it breaks — session expiry mid-flow, cart drift between a read and a subsequent write, identity tokens that silently lose scope between tool calls — it produces the failure modes that cluster at the bottom of the per-store distribution. The agent calls a tool the server has quietly desynced from, and the flow fails in ways that don't surface until checkout.&lt;/p&gt;

&lt;p&gt;The hybrid case is the most error-prone: stores that are stateless in some tools and stateful in others, without making the boundary explicit in the manifest or the tool response shapes. Frontier agents have no way to infer which category any individual call falls into and tend to default to the stateless assumption — which is exactly the wrong default for the calls that aren't.&lt;/p&gt;

&lt;p&gt;Beyond the state axis, the rigorous testing surfaces a consistent set of secondary trip-wires: variant IDs without human-readable axis labels, description strings exceeding 8K tokens for a single product, tool responses including nested HTML in fields agents expect to be plain text, cart endpoints returning success codes for failed mutations. None of these break &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;UCP Score&lt;/a&gt; validation. All of them break agent flows.&lt;/p&gt;

&lt;p&gt;These are merchant-side fixes, not model-side ones. The strategic implication for any team operating a UCP-enabled store: &lt;strong&gt;fixing your manifest and tool responses produces more conversion lift than choosing the right model.&lt;/strong&gt; That's load-bearing — it's why the integrated &lt;a href="https://ucpchecker.com/blog/ucp-playground-evals#how-evals-fit-the-broader-development-cycle" rel="noopener noreferrer"&gt;Score → Check → Eval workflow&lt;/a&gt; exists, and it's where we'd point a team starting from zero on UCP.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 6 — Cart value generated is concentrated in USD and high-AOV verticals
&lt;/h2&gt;

&lt;p&gt;Of the 1,000+ sessions, 96 produced a non-zero cart value. The breakdown:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Currency&lt;/th&gt;
&lt;th&gt;Sessions&lt;/th&gt;
&lt;th&gt;Total cart value&lt;/th&gt;
&lt;th&gt;Avg cart value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;USD&lt;/td&gt;
&lt;td&gt;85&lt;/td&gt;
&lt;td&gt;$95,647.23&lt;/td&gt;
&lt;td&gt;$1,125.26&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;INR&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;₹3,845.00&lt;/td&gt;
&lt;td&gt;₹1,922.50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PKR&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;₨4,490.00&lt;/td&gt;
&lt;td&gt;₨2,245.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EUR&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;€296.74&lt;/td&gt;
&lt;td&gt;€59.35&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ILS&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;₪189.60&lt;/td&gt;
&lt;td&gt;₪189.60&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GBP&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;£47.99&lt;/td&gt;
&lt;td&gt;£24.00&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;USD cart value totals &lt;strong&gt;$95,647 across 85 sessions&lt;/strong&gt; with an average cart value of $1,125. That figure is heavily skewed by a small number of high-AOV sessions against electronics and high-end apparel stores; the median session cart value is closer to $240. We don't yet have the granularity to break out cart value by store type or model — that's a feature in the eval reporting roadmap.&lt;/p&gt;

&lt;p&gt;The cross-currency long tail (EUR/GBP/INR/PKR/ILS) is small but informative. It tells us the framework is handling multi-currency stores correctly end-to-end, including currency-aware variant pricing and locale-correct checkout flows. Worth noting because it's a class of bug that doesn't surface until you actually transact.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 7 — Session volume is now meaningful enough to reveal trajectory
&lt;/h2&gt;

&lt;p&gt;Plotted week-over-week, session volume has three distinct phases over the 80-day window:&lt;/p&gt;

&lt;p&gt;UCP Playground weekly session volume, mid-February through late April 2026Trend line showing three phases: a small founding wave in mid-February, a steady-state oscillation through March and mid-April, and a sharp acceleration in late April that produces the largest single week of the dataset.Feb 14Apr 27&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Founding wave (mid-February).&lt;/strong&gt; A small launch surge coinciding with the &lt;a href="https://ucpchecker.com/blog/why-we-built-ucp-playground" rel="noopener noreferrer"&gt;Why We Built UCP Playground&lt;/a&gt; post — first publishers running first sessions, signal that the framework worked end-to-end against real stores.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Steady state (March through mid-April).&lt;/strong&gt; Weekly volume oscillating in a tight band as more frontier models came online and the eval framework matured. Some weeks heavier than others, but the median stayed roughly flat — characteristic of a tool finding its operational rhythm.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Acceleration (late April).&lt;/strong&gt; The largest single week of the dataset, driven mostly by a batch of &lt;a href="https://ucpchecker.com/blog/ucp-playground-evals" rel="noopener noreferrer"&gt;eval-collection runs&lt;/a&gt; against stores onboarded after the council expansion announcement. The line bends upward at the end of the window.&lt;/p&gt;

&lt;p&gt;The trajectory matters mostly because it lets us start tracking model drift. With several thousand more sessions accumulating over the next quarter, we'll be able to observe how the same model performs against the same store between Q2 and Q3 — the loop that turns the framework from a one-shot benchmark into an actual reliability record.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding 8 — The 0.2% flawless-end-to-end rate has improved, slightly
&lt;/h2&gt;

&lt;p&gt;The April &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-april-2026" rel="noopener noreferrer"&gt;State of Agentic Commerce report&lt;/a&gt; flagged that of 4,014 verified UCP stores, only 9 delivered a flawless end-to-end agent shopping experience. That's the 0.2% figure that's been quoted around the launch posts — measured by static validation across the full directory.&lt;/p&gt;

&lt;p&gt;Eighty days later, with 97 stores tested directly through the eval framework, roughly &lt;strong&gt;0.5–0.7%&lt;/strong&gt; reach the same bar. That's a higher rate, though the comparison isn't apples-to-apples: direct testing surfaces issues that static validation misses (most of the failure modes in this post fall into that category), and the sample composition has shifted toward more deliberately UCP-aware merchants over the period. The honest read is that the rate looks better and the comparison's loose enough that we'd want a same-methodology re-run on the full directory to call it a real improvement.&lt;/p&gt;

&lt;p&gt;What we can say cleanly: for every store running a clean, agent-friendly UCP implementation, there are still 100+ that pass conformance but stumble somewhere in the agent flow. The gap continues to be on the merchant side. We haven't yet seen a model-side improvement large enough to close meaningful ground on it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Playground stays neutral
&lt;/h2&gt;

&lt;p&gt;Every finding above hinges on one design choice: the system prompt and the orchestration loop are &lt;strong&gt;generic&lt;/strong&gt;. Same for every model. Same for every store. No store-specific scaffolding, no model-specific workarounds. That's what makes the framework work as a testing environment.&lt;/p&gt;

&lt;p&gt;The temptation to add a workaround when a particular model trips on a particular store is real — there's almost always a one-line patch that would push that store's checkout rate up by ten points against that one model. We don't ship those patches, on principle. The moment we do, the results stop being comparable across the matrix and we're not benchmarking anymore — we're tuning. Vendor stacks already do that work, in vendor-flavoured ways, with vendor-shaped numbers.&lt;/p&gt;

&lt;p&gt;Independence here means a specific thing: &lt;strong&gt;the orchestration is neutral, the protocol layer is full-featured.&lt;/strong&gt; Stores get the tools they declare. Identity linking works. Payment handlers pass through. Multi-turn context flows the way the &lt;a href="https://ucpchecker.com/specs" rel="noopener noreferrer"&gt;spec&lt;/a&gt; defines. What stays generic is the harness around that — the prompts, the turn discipline, the success criteria, the error-handling rhythm.&lt;/p&gt;

&lt;p&gt;The reason that design choice matters can be put in two sentences:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If a model doesn't follow the checkout flow, that's signal about the model.&lt;/li&gt;
&lt;li&gt;If a store returns the wrong status, that's signal about the store.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both signals are useful. Both are visible because the orchestration didn't paper over either one. Hiding either defeats the purpose of running the test.&lt;/p&gt;

&lt;p&gt;Companies building their own internal infrastructure to evaluate agent behaviour against their own stores is expected, and good. Every serious commerce platform will eventually have something like that running in CI against its own merchants — and the &lt;a href="https://ucpchecker.com/blog/ucp-playground-evals#how-evals-fit-the-broader-development-cycle" rel="noopener noreferrer"&gt;Score → Check → Eval workflow&lt;/a&gt; is exactly the surface they should plug into. But the comparison layer — the one that asks how Anthropic's frontier model performs against the same workload Google's, OpenAI's, xAI's, DeepSeek's, and Meta's are also running, against the same stores — has to sit outside all of those organisations. &lt;strong&gt;Vendors can't credibly benchmark themselves; the platform layer has the same problem one level down.&lt;/strong&gt; Independence is the only way the comparisons aggregate into a record anyone can quote.&lt;/p&gt;

&lt;p&gt;That's the niche this layer occupies. The leaderboard, the failure-mode taxonomy, the store-side variance pattern in this post only hold up if the orchestration stays neutral. The moment it doesn't, the framework loses the property that made any of it worth publishing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we learned building this
&lt;/h2&gt;

&lt;p&gt;The framework didn't ship in May the same shape it shipped in February. Eighty days of running it against real stores produced a steady stream of bugs and surprises that drove the development work — many of them documented in the &lt;a href="https://ucpplayground.com/changelog" rel="noopener noreferrer"&gt;public changelog&lt;/a&gt;. Five worth surfacing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cursor stripping unlocked GPT-5.2 search.&lt;/strong&gt; Through February we had &lt;a href="https://ucpplayground.com/models/gpt-5-2" rel="noopener noreferrer"&gt;GPT-5.2&lt;/a&gt; at a 0% search success rate on Shopify stores. The cause was a model-side tic: GPT-5.2 always included the optional &lt;code&gt;after&lt;/code&gt; cursor parameter on &lt;code&gt;search_shop_catalog&lt;/code&gt; calls, filling it with placeholders like &lt;code&gt;""&lt;/code&gt;, &lt;code&gt;"null"&lt;/code&gt;, or &lt;code&gt;"__NONE__"&lt;/code&gt; — values Shopify always rejects. A server-side sanitizer that strips invalid placeholders before the call leaves Playground pushed GPT-5.2's search success from 0% to 100% overnight. The model wasn't bad at search; it had a tool-calling habit nobody had isolated yet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Failed tool calls used to inflate conversion metrics.&lt;/strong&gt; An earlier version of step detection counted a failed &lt;code&gt;update_cart&lt;/code&gt; as a &lt;code&gt;cart_created&lt;/code&gt; completion. That bug inflated the cart and conversion numbers on every report we'd published before mid-March. Fixed in 0.9.3 by gating step detection on the tool response's &lt;code&gt;isError&lt;/code&gt; flag, plus the same gate on cart-data extraction. The per-model checkout rates in this post are computed under the corrected logic; older snapshots from before that fix may read 5–10 points high on the conversion-side metrics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;REST-only stores forced a transport rework.&lt;/strong&gt; The &lt;a href="https://ucpchecker.com/blog/ucp-v2026-04-08-spec-update" rel="noopener noreferrer"&gt;v2026-04-08 spec drop&lt;/a&gt; in early April brought new tool names (&lt;code&gt;search_catalog&lt;/code&gt; replacing &lt;code&gt;search_shop_catalog&lt;/code&gt;), new response shapes (price as &lt;code&gt;{amount, currency}&lt;/code&gt; objects, descriptions as &lt;code&gt;{plain, html}&lt;/code&gt; objects), and a wave of WooCommerce stores that exposed REST-only endpoints rather than MCP. The 0.10.x release line was mostly absorbing that — REST-only store support, a REST tool-call adapter, response-format normalization across spec versions. Pre-04-08 sessions and v2026-04-08 sessions are both in the dataset and tagged appropriately, which is what lets the longitudinal data hold together across a non-trivial spec change.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The GPay token wall built ECP.&lt;/strong&gt; In a February session, &lt;a href="https://ucpplayground.com/models/claude-sonnet-4-5" rel="noopener noreferrer"&gt;Claude Sonnet 4.5&lt;/a&gt; reached &lt;code&gt;ready_for_complete&lt;/code&gt; correctly — and stalled, because the merchant's checkout required a Google Pay payment token the agent couldn't produce. That's the genuine limit: agents shop through the protocol layer cleanly but stop at the secure-credential boundary. The Embedded Commerce Protocol shipped in 0.8.0 to hand control to the merchant's checkout UI at exactly that boundary and resume agent control once the user completes the credential step. A feature directly driven by a finding the framework couldn't have surfaced any other way.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A Playground session became a spec proposal.&lt;/strong&gt; A live test against &lt;a href="https://ucpchecker.com/status/houseofparfum.nl" rel="noopener noreferrer"&gt;houseofparfum.nl&lt;/a&gt; exposed a different gap: an identity-linked buyer with a wallet balance hit the checkout, the OAuth flow completed cleanly, the buyer object came back populated — but the wallet was nowhere the agent could see it. &lt;code&gt;payment.instruments&lt;/code&gt; was empty, the only declared handler (&lt;code&gt;dev.ucp.delegate_payment&lt;/code&gt;) didn't accept the wallet, and the session escalated to the merchant's continue_url every time. Authenticated checkout was provably blocked, by spec. We wrote it up and submitted &lt;a href="https://github.com/Universal-Commerce-Protocol/ucp/issues/358" rel="noopener noreferrer"&gt;Proposal #358 to the UCP spec repository&lt;/a&gt; — &lt;code&gt;payment.available_instruments&lt;/code&gt;, a per-buyer per-session list of usable payment methods (wallet, saved cards, loyalty, gift cards) resolved at runtime from the identity-linked session. Submitted by Benji Fisher (&lt;a href="https://github.com/appdrops" rel="noopener noreferrer"&gt;@appdrops&lt;/a&gt;) and co-authored with Almin Zolotic (&lt;a href="https://github.com/zologic" rel="noopener noreferrer"&gt;@zologic&lt;/a&gt;) of UCPReady, who'd seen the same wall from the merchant side. Currently submitted to the UCP technical council for review. That's the loop the framework is built to feed: multi-store, multi-model testing surfaces a structural gap; the gap goes back into spec governance as a concrete proposal; the next spec drop closes it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Methodology, briefly
&lt;/h2&gt;

&lt;p&gt;Each session is a real frontier-model agent shopping run against a real UCP-enabled store, captured end-to-end via MCP tool calls. Sessions are initiated either through the public &lt;a href="https://ucpplayground.com/playground" rel="noopener noreferrer"&gt;Playground UI&lt;/a&gt; (user-initiated, ad-hoc prompts) or through the &lt;a href="https://ucpplayground.com/evals" rel="noopener noreferrer"&gt;Evals framework&lt;/a&gt; (scripted multi-turn sequences across pre-selected store/model matrices).&lt;/p&gt;

&lt;p&gt;Outcomes are tagged at session close: &lt;code&gt;checkout_reached&lt;/code&gt; (full transaction completion), &lt;code&gt;cart_created&lt;/code&gt; (added items, didn't proceed), &lt;code&gt;search_only&lt;/code&gt; (browsed, didn't add), &lt;code&gt;failed&lt;/code&gt; (provider error, model refusal, or max-turn exceeded), or &lt;code&gt;info_provided&lt;/code&gt; (informational query, no transactional intent).&lt;/p&gt;

&lt;p&gt;Every session has a clickable replay link in its source ULID. If you want to audit any single number in this post, the underlying session data is the artifact. That's intentional — independent reproducibility is the point.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;Three concrete next steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Run a benchmark against your own store.&lt;/strong&gt; Create a collection at &lt;a href="https://ucpplayground.com/evals" rel="noopener noreferrer"&gt;ucpplayground.com/evals&lt;/a&gt;, pick a sequence, pick two models, and compare your store's per-model performance against the aggregate above.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;See where individual models stand.&lt;/strong&gt; Each model on the leaderboard has its own &lt;a href="https://ucpplayground.com/models" rel="noopener noreferrer"&gt;shopping profile&lt;/a&gt; with detailed performance data, known issues, and store-by-store breakdowns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compare two models head-to-head.&lt;/strong&gt; The &lt;a href="https://ucpplayground.com/models/compare?models=claude-sonnet-4-5%2Cgemini-3-flash" rel="noopener noreferrer"&gt;comparison view&lt;/a&gt; lets you pit any two models against each other on the same workload — useful before you commit to a primary model for a deployment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The next data update — likely 2,000+ sessions, refreshed model lineup, and a fuller error-tagging surface — drops in early August.&lt;/p&gt;

</description>
      <category>ecommerce</category>
      <category>webdev</category>
      <category>ai</category>
      <category>data</category>
    </item>
    <item>
      <title>UCP Requirements: What Your Store Needs Before Going Live</title>
      <dc:creator>Benji Fisher</dc:creator>
      <pubDate>Mon, 04 May 2026 12:23:16 +0000</pubDate>
      <link>https://dev.to/benjifisher/ucp-requirements-what-your-store-needs-before-going-live-9ag</link>
      <guid>https://dev.to/benjifisher/ucp-requirements-what-your-store-needs-before-going-live-9ag</guid>
      <description>&lt;p&gt;What do you need for UCP? There are two levels of UCP readiness. The first is the &lt;strong&gt;minimum viable manifest&lt;/strong&gt; — the bare requirements to pass validation and appear in the &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;UCP directory&lt;/a&gt;. The second is the &lt;strong&gt;agent-ready setup&lt;/strong&gt; — what it actually takes for an AI agent to browse, cart, and check out at your store without friction.&lt;/p&gt;

&lt;p&gt;Think of this as your UCP checklist — the minimum requirements plus the recommended prerequisites that separate stores agents can find from stores agents can actually shop. Most guides only cover the first level. This one covers both, grounded in data from &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;4,024 verified merchants&lt;/a&gt; and hundreds of &lt;a href="https://ucpplayground.com" rel="noopener noreferrer"&gt;agent testing sessions&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Minimum requirements (pass validation)
&lt;/h2&gt;

&lt;p&gt;These are the fields required to produce a valid UCP manifest on the current &lt;a href="https://ucpchecker.com/specs/2026-04-08" rel="noopener noreferrer"&gt;v2026-04-08 spec&lt;/a&gt;:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. A JSON file at /.well-known/ucp
&lt;/h3&gt;

&lt;p&gt;The manifest must be publicly accessible at &lt;code&gt;https://yourdomain.com/.well-known/ucp&lt;/code&gt;, served with &lt;code&gt;Content-Type: application/json&lt;/code&gt;, and reachable without authentication.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform notes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://ucpchecker.com/platforms/shopify" rel="noopener noreferrer"&gt;Shopify&lt;/a&gt;: handled automatically&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ucpchecker.com/platforms/woocommerce" rel="noopener noreferrer"&gt;WooCommerce&lt;/a&gt;: manual publish via plugin or custom route&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ucpchecker.com/platforms/bigcommerce" rel="noopener noreferrer"&gt;BigCommerce&lt;/a&gt;: manual, served from storefront origin&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ucpchecker.com/platforms/magento" rel="noopener noreferrer"&gt;Magento&lt;/a&gt;: manual, typically via custom module&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Full publishing guide with code examples: &lt;a href="https://ucpchecker.com/well-known-ucp" rel="noopener noreferrer"&gt;/.well-known/ucp developer reference&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. ucp.version (required)
&lt;/h3&gt;

&lt;p&gt;A string identifying which spec version the manifest is written against. Current latest: &lt;code&gt;"2026-04-08"&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-april-2026" rel="noopener noreferrer"&gt;99.4% of verified stores&lt;/a&gt; are on this version. If you're starting fresh, use it. If you're on an older version, the &lt;a href="https://ucpchecker.com/blog/ucp-v2026-04-08-spec-update" rel="noopener noreferrer"&gt;spec update post&lt;/a&gt; walks through the migration.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. ucp.services (required)
&lt;/h3&gt;

&lt;p&gt;At least one service entry declaring a transport (&lt;code&gt;mcp&lt;/code&gt;, &lt;code&gt;rest&lt;/code&gt;, &lt;code&gt;a2a&lt;/code&gt;, or &lt;code&gt;embedded&lt;/code&gt;) and an endpoint URL. This tells agents where to send requests.&lt;/p&gt;

&lt;p&gt;MCP is the dominant transport — &lt;a href="https://ucpchecker.com/transports" rel="noopener noreferrer"&gt;~100% of verified stores declare it&lt;/a&gt;. If you're building from scratch, start with MCP. See the &lt;a href="https://ucpchecker.com/transports" rel="noopener noreferrer"&gt;transport comparison&lt;/a&gt; for the tradeoffs.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. ucp.payment_handlers (required)
&lt;/h3&gt;

&lt;p&gt;A map of payment handler namespaces. Can be an empty object &lt;code&gt;{}&lt;/code&gt; if your store uses checkout-link redirects instead of tokenized payments (common on &lt;a href="https://ucpchecker.com/blog/woocommerce-ucp-guide-ai-agent-commerce" rel="noopener noreferrer"&gt;WooCommerce&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;If you declare handlers, use reverse-domain namespaces like &lt;code&gt;com.stripe.card&lt;/code&gt; or &lt;code&gt;dev.shopify.card&lt;/code&gt;. See the &lt;a href="https://ucpchecker.com/payment-handlers" rel="noopener noreferrer"&gt;payment handlers directory&lt;/a&gt; for examples.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. signing_keys (required, at root level)
&lt;/h3&gt;

&lt;p&gt;An array of JWK objects at the &lt;strong&gt;document root&lt;/strong&gt; (not nested inside &lt;code&gt;ucp&lt;/code&gt;). An empty array &lt;code&gt;[]&lt;/code&gt; is valid if you're not signing payloads yet, but the key must be present.&lt;/p&gt;

&lt;p&gt;This field moved from &lt;code&gt;ucp.signing_keys&lt;/code&gt; to the root in v2026-04-08 — the most &lt;a href="https://ucpchecker.com/blog/common-ucp-errors" rel="noopener noreferrer"&gt;common validation warning&lt;/a&gt; we see is stores that still nest it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recommended setup (agent-ready)
&lt;/h2&gt;

&lt;p&gt;Passing validation gets you into the directory. The requirements below determine whether agents can actually &lt;em&gt;shop&lt;/em&gt; your store — the difference between a &lt;a href="https://ucpchecker.com/leaderboard" rel="noopener noreferrer"&gt;B+ grade and an A grade&lt;/a&gt; in our benchmarks.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Capabilities declaration
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;ucp.capabilities&lt;/code&gt; field is optional per spec but strongly recommended. Without it, agents know your store exists but not what it can do.&lt;/p&gt;

&lt;p&gt;Declare every capability you support:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/capabilities/checkout" rel="noopener noreferrer"&gt;checkout&lt;/a&gt;&lt;/strong&gt; — 99.5% adoption across verified stores&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/capabilities/cart" rel="noopener noreferrer"&gt;cart&lt;/a&gt;&lt;/strong&gt; — 99.1% adoption&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/capabilities/catalog-search" rel="noopener noreferrer"&gt;catalog-search&lt;/a&gt;&lt;/strong&gt; — required for &lt;a href="https://ucpchecker.com/product-discovery" rel="noopener noreferrer"&gt;product discovery&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/capabilities/identity-linking" rel="noopener noreferrer"&gt;identity-linking&lt;/a&gt;&lt;/strong&gt; — 3 stores, massive first-mover opportunity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/capabilities/payment" rel="noopener noreferrer"&gt;payment&lt;/a&gt;&lt;/strong&gt; — 0 stores, the frontier&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Full list: &lt;a href="https://ucpchecker.com/capabilities" rel="noopener noreferrer"&gt;capability registry&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Clean variant data
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://ucpchecker.com/blog/agentic-commerce-optimization-ucp-readiness-data" rel="noopener noreferrer"&gt;Variant mismatches are the #1 failure mode&lt;/a&gt; in agent shopping sessions. Every variant needs a stable ID, a clear name, and consistent representation across discovery and checkout. This is the single highest-impact fix you can make.&lt;/p&gt;

&lt;h3&gt;
  
  
  8. Responsive MCP endpoint
&lt;/h3&gt;

&lt;p&gt;Latency matters. The average &lt;a href="https://ucpchecker.com/platforms/shopify" rel="noopener noreferrer"&gt;Shopify store&lt;/a&gt; responds in ~130ms. &lt;a href="https://ucpchecker.com/platforms/bigcommerce" rel="noopener noreferrer"&gt;BigCommerce stores&lt;/a&gt; average ~890ms. Agents have timeout budgets — if your endpoint is slow, sessions drop silently. Target under 500ms for tool responses.&lt;/p&gt;

&lt;h3&gt;
  
  
  9. robots.txt allowing AI crawlers
&lt;/h3&gt;

&lt;p&gt;Make sure &lt;code&gt;/.well-known/ucp&lt;/code&gt; is explicitly allowed in your robots.txt. Some WAFs and CDN configurations block well-known paths by default. Check the &lt;a href="https://ucpchecker.com/blog/common-ucp-errors" rel="noopener noreferrer"&gt;common errors guide&lt;/a&gt; for the fix.&lt;/p&gt;

&lt;h3&gt;
  
  
  10. Supported_versions for backward compatibility
&lt;/h3&gt;

&lt;p&gt;Declare &lt;code&gt;supported_versions&lt;/code&gt; in your manifest listing both the current and previous spec version. This lets agents that haven't migrated yet still find a valid endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"supported_versions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"2026-04-08"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://yourstore.com/.well-known/ucp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"2026-01-23"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://yourstore.com/.well-known/ucp/2026-01-23"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The UCP readiness checklist
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Requirement&lt;/th&gt;
&lt;th&gt;Required?&lt;/th&gt;
&lt;th&gt;% of stores that have it&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Manifest at /.well-known/ucp&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;100% (by definition)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ucp.version&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ucp.services with transport + endpoint&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ucp.payment_handlers&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;signing_keys at root&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;~97% (rest have it nested)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ucp.capabilities&lt;/td&gt;
&lt;td&gt;Recommended&lt;/td&gt;
&lt;td&gt;~99% (Shopify default)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Clean variant data&lt;/td&gt;
&lt;td&gt;Recommended&lt;/td&gt;
&lt;td&gt;Unknown (runtime issue)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Latency &amp;lt; 500ms&lt;/td&gt;
&lt;td&gt;Recommended&lt;/td&gt;
&lt;td&gt;~95% (Shopify), ~30% (others)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;robots.txt allows /.well-known/ucp&lt;/td&gt;
&lt;td&gt;Recommended&lt;/td&gt;
&lt;td&gt;~99%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;supported_versions&lt;/td&gt;
&lt;td&gt;Recommended&lt;/td&gt;
&lt;td&gt;~70%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Validate your setup
&lt;/h2&gt;

&lt;p&gt;Not sure if you pass? Start with &lt;a href="https://ucpchecker.com/blog/is-my-store-ucp-ready" rel="noopener noreferrer"&gt;Is My Store UCP Ready?&lt;/a&gt; — it walks through the full diagnostic in 60 seconds. Or jump straight to the tool:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;Run a live check&lt;/a&gt; on your domain — it tests every requirement above in seconds. For runtime issues (variant mismatches, checkout failures), &lt;a href="https://ucpplayground.com" rel="noopener noreferrer"&gt;test with real agents in Playground&lt;/a&gt;. For ongoing monitoring, &lt;a href="https://ucpchecker.com/alerts" rel="noopener noreferrer"&gt;set up alerts&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;Once you're verified, make sure your listing on &lt;a href="https://ucpregistry.com" rel="noopener noreferrer"&gt;UCP Registry&lt;/a&gt; is accurate — that's what agents see when deciding which stores to route customers to. And if you're a developer building agents rather than stores, the &lt;a href="https://ucpchecker.com/agents" rel="noopener noreferrer"&gt;Build an Agent quickstart&lt;/a&gt; covers the other side of the equation.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Check your store now at &lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;UCPChecker.com&lt;/a&gt;. See how you compare: &lt;a href="https://ucpchecker.com/compare" rel="noopener noreferrer"&gt;side-by-side store comparison&lt;/a&gt;. Platform guides: &lt;a href="https://ucpchecker.com/blog/shopify-ucp-guide-ai-agent-commerce" rel="noopener noreferrer"&gt;Shopify&lt;/a&gt; · &lt;a href="https://ucpchecker.com/blog/woocommerce-ucp-guide-ai-agent-commerce" rel="noopener noreferrer"&gt;WooCommerce&lt;/a&gt; · &lt;a href="https://ucpchecker.com/blog/bigcommerce-ucp-guide-ai-agent-commerce" rel="noopener noreferrer"&gt;BigCommerce&lt;/a&gt; · &lt;a href="https://ucpchecker.com/blog/magento-adobe-commerce-ucp-guide-ai-agent-commerce" rel="noopener noreferrer"&gt;Magento&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ecommerce</category>
      <category>webdev</category>
      <category>tutorial</category>
      <category>ucp</category>
    </item>
    <item>
      <title>AI Commerce Needs MLPerf — and Here's an Early Attempt</title>
      <dc:creator>Benji Fisher</dc:creator>
      <pubDate>Fri, 01 May 2026 12:07:45 +0000</pubDate>
      <link>https://dev.to/benjifisher/ai-commerce-needs-mlperf-and-heres-an-early-attempt-2lg1</link>
      <guid>https://dev.to/benjifisher/ai-commerce-needs-mlperf-and-heres-an-early-attempt-2lg1</guid>
      <description>&lt;p&gt;Validating a UCP manifest takes a second. &lt;a href="https://ucpchecker.com/blog/introducing-ucp-score-agent-readiness-grade" rel="noopener noreferrer"&gt;Scoring it for agent-readiness&lt;/a&gt; takes another. Neither of those answers the harder question: when a real frontier agent — &lt;a href="https://ucpplayground.com/models/claude-opus-4-6" rel="noopener noreferrer"&gt;Claude&lt;/a&gt; or &lt;a href="https://ucpplayground.com/models/gpt-5-2" rel="noopener noreferrer"&gt;GPT&lt;/a&gt; or &lt;a href="https://ucpplayground.com/models/gemini-3-1-pro" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt;, picked by a user three weeks from now — walks up to your store with an ordinary shopping prompt, does it actually complete a checkout? Compared to the next implementation? Across the models people are actually using?&lt;/p&gt;

&lt;p&gt;Today there's no shared way to find out. AI commerce has the same coordination problem ML had before MLPerf, web performance had before Lighthouse, and coding models had before HumanEval — and the cost of not solving it is the same: every claim a vendor makes about agent-readiness is currently unverifiable by anyone outside that vendor.&lt;/p&gt;

&lt;p&gt;This post is about what we've been building to close that gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pre-benchmark moment
&lt;/h2&gt;

&lt;p&gt;Every category that grew up around AI has gone through a pre-benchmark moment.&lt;/p&gt;

&lt;p&gt;Machine learning before MLPerf was a pile of vendor-flavoured numbers. NVIDIA reported one set of throughput claims, Google another, AMD a third — and none of it was directly comparable, because nobody was running the same workload, on the same input, on the same harness. MLPerf — submitted to, run by, and audited across the whole industry — fixed that. Buyers could finally compare. The category matured.&lt;/p&gt;

&lt;p&gt;Web performance before Lighthouse was the same. "Fast website" was vibes. PageSpeed Insights gave one number, WebPageTest another, internal RUM dashboards a third. Lighthouse — graded, reproducible, open — fixed it. Today nobody ships a serious site without checking their score.&lt;/p&gt;

&lt;p&gt;Coding models before HumanEval were even worse. Every lab benchmarked against its own preferred problems and reported its own preferred metrics. HumanEval, then MBPP, then SWE-bench, then LiveCodeBench, gave the field a shared evaluation surface. Comparisons stopped being marketing.&lt;/p&gt;

&lt;p&gt;Agentic commerce is in exactly the place those categories were before their benchmarks landed. The standard has converged — UCP is the open spec the industry is building against, and the &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;public directory&lt;/a&gt; tracks 4,500+ verified stores. Major retailers and platforms ship UCP implementations almost weekly. The recent &lt;a href="https://ucpchecker.com/blog/ucp-tech-council-expands-amazon-meta-microsoft-salesforce-stripe" rel="noopener noreferrer"&gt;tech council expansion&lt;/a&gt; brings in most of the rest. &lt;strong&gt;But there is still no neutral, reproducible way to evaluate how well any of those implementations actually work when a real frontier agent tries to shop them.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can't get this from inside a vendor. Shopify cannot credibly benchmark Shopify stores. OpenAI cannot credibly benchmark OpenAI agents. Even when their numbers are honest, the methodology is theirs, the test conditions favour their stack, and nobody else can rerun it. AI commerce has the same coordination problem ML had before MLPerf, and it solves the same way: a shared evaluation layer, run by a third party, that anyone can audit and reproduce.&lt;/p&gt;

&lt;p&gt;Agentic commerce can't mature without that layer. We've built a first credible attempt at one.&lt;/p&gt;

&lt;h2&gt;
  
  
  What UCP Playground Evals does
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://ucpplayground.com/evals" rel="noopener noreferrer"&gt;UCP Playground Evals&lt;/a&gt; is a benchmark framework for agentic commerce. You define a multi-turn shopping conversation, pick the stores and the models you want to evaluate against it, and get back a structured comparison report — funnel matrix, per-session token and duration metrics, error classification, replayable session links, downloadable PDF.&lt;/p&gt;

&lt;p&gt;The point isn't the report format. The point is the three properties underneath, because those determine whether a benchmark is worth trusting.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Standardised, multi-turn sequences
&lt;/h3&gt;

&lt;p&gt;Agentic commerce is conversational, not single-prompt. A real shopping session looks like &lt;em&gt;"Show me products under $60"&lt;/em&gt; → &lt;em&gt;"Add both to my cart"&lt;/em&gt; → &lt;em&gt;"Proceed to checkout"&lt;/em&gt;, with full context carried across turns. That's the unit an eval has to operate on.&lt;/p&gt;

&lt;p&gt;Each eval is a scripted sequence of turns. Every turn gets its own orchestrator round (up to 8 internal tool-calling sub-turns) and the full conversation history is preserved across the sequence — so the agent's choices on T2 are conditioned on what it actually saw on T1, the way real user behaviour conditions on real responses. Four collections ship today: &lt;strong&gt;Browse &amp;amp; Buy&lt;/strong&gt; (4 turns, generic shopping journey), &lt;strong&gt;Multi-Item&lt;/strong&gt; (3 turns, multi-product cart composition and checkout), &lt;strong&gt;Price Constrained&lt;/strong&gt; (3 turns, budget-anchored reasoning across a single purchase), and &lt;strong&gt;Custom&lt;/strong&gt; for user-defined sequences.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Cross-store comparability
&lt;/h3&gt;

&lt;p&gt;The sequences are intentionally generic. Not &lt;em&gt;"Find Nike Air Max 90 in size 10"&lt;/em&gt; but &lt;em&gt;"Show me products under $60"&lt;/em&gt;. That distinction is load-bearing: it's what makes the same test valid against any store running UCP, and it's what makes results from one store directly comparable to results from another. Without it, every benchmark is apples-to-oranges and nothing aggregates.&lt;/p&gt;

&lt;p&gt;The eval runner discovers MCP endpoints automatically from each store's &lt;a href="https://ucpchecker.com/well-known-ucp" rel="noopener noreferrer"&gt;&lt;code&gt;/.well-known/ucp&lt;/code&gt;&lt;/a&gt; manifest, so any UCP-conformant store works without per-store wiring — &lt;a href="https://ucpchecker.com/platforms/shopify" rel="noopener noreferrer"&gt;Shopify&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/woocommerce" rel="noopener noreferrer"&gt;WooCommerce&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/bigcommerce" rel="noopener noreferrer"&gt;BigCommerce&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/magento" rel="noopener noreferrer"&gt;Magento&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/prestashop" rel="noopener noreferrer"&gt;PrestaShop&lt;/a&gt;, and &lt;a href="https://ucpchecker.com/platforms/custom" rel="noopener noreferrer"&gt;Custom &amp;amp; Headless&lt;/a&gt; stacks all work the same way.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Multi-model coverage
&lt;/h3&gt;

&lt;p&gt;The same sequence runs against any of &lt;a href="https://ucpplayground.com/models" rel="noopener noreferrer"&gt;15 frontier models&lt;/a&gt; currently wired up — every major lab, plus a reasoning-tuned subset:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/claude-opus-4-6" rel="noopener noreferrer"&gt;Claude Opus 4.6&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;td&gt;Frontier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/claude-sonnet-4-5" rel="noopener noreferrer"&gt;Claude Sonnet 4.5&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;td&gt;Frontier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gpt-5-2" rel="noopener noreferrer"&gt;GPT-5.2&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;Frontier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gpt-4o" rel="noopener noreferrer"&gt;GPT-4o&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;Frontier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gemini-3-1-pro" rel="noopener noreferrer"&gt;Gemini 3.1 Pro&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;td&gt;Frontier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gemini-3-flash" rel="noopener noreferrer"&gt;Gemini 3 Flash&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;td&gt;Frontier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gemini-2-5-pro" rel="noopener noreferrer"&gt;Gemini 2.5 Pro&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;td&gt;Frontier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/gemini-2-5-flash" rel="noopener noreferrer"&gt;Gemini 2.5 Flash&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;td&gt;Frontier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/grok-4" rel="noopener noreferrer"&gt;Grok 4&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;xAI&lt;/td&gt;
&lt;td&gt;Frontier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/deepseek-v3-2" rel="noopener noreferrer"&gt;DeepSeek V3.2&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;DeepSeek&lt;/td&gt;
&lt;td&gt;Frontier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/llama-3-3-70b" rel="noopener noreferrer"&gt;Llama 3.3 70B&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Meta&lt;/td&gt;
&lt;td&gt;Frontier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/deepseek-r1" rel="noopener noreferrer"&gt;DeepSeek R1&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;DeepSeek&lt;/td&gt;
&lt;td&gt;Reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/qwq-32b" rel="noopener noreferrer"&gt;QwQ 32B&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Alibaba&lt;/td&gt;
&lt;td&gt;Reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/grok-3-mini" rel="noopener noreferrer"&gt;Grok 3 Mini&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;xAI&lt;/td&gt;
&lt;td&gt;Reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://ucpplayground.com/models/o4-mini" rel="noopener noreferrer"&gt;o4-mini&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;Reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The model is part of the test matrix. Same store, different models, same sequence — directly comparable behaviour, with model-level differences surfaced rather than averaged away. Any two can also be &lt;a href="https://ucpplayground.com/models/compare?models=gemini-3-1-pro%2Cclaude-sonnet-4-5" rel="noopener noreferrer"&gt;compared side-by-side&lt;/a&gt; outside the eval framework, on the same workload.&lt;/p&gt;

&lt;h3&gt;
  
  
  The math is straightforward
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;stores × models × sequences = sessions&lt;/code&gt;. Two stores × two models × one sequence = four sessions. Each one is a full agent shopping run, captured end-to-end, replayable, and rolled up into the report.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Standardised, reproducible, vendor-neutral. The three properties that make a benchmark worth trusting.&lt;/strong&gt; Everything else in the framework is built to defend those three.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the framework actually surfaces
&lt;/h2&gt;

&lt;p&gt;The clearest way to show what evals do is to walk through one. Below is a multi-item checkout report we ran across two stores and two Gemini models in March:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://ucpplayground.com/examples/eval-report-sample.pdf" rel="noopener noreferrer"&gt;Download the full multi-item checkout report (PDF) →&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Two-page report covering the funnel comparison matrix, per-session performance breakdown, evaluator configuration, auto-generated recommendations, and clickable session-replay IDs for every run.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Two stores (&lt;a href="https://ucpchecker.com/status/oakywood.shop" rel="noopener noreferrer"&gt;oakywood.shop&lt;/a&gt;, &lt;a href="https://ucpchecker.com/status/ugmonk.com" rel="noopener noreferrer"&gt;ugmonk.com&lt;/a&gt;). Two models (Gemini 3 Flash, Gemini 3.1 Pro). One sequence (multi-item checkout: search → add → checkout). Four sessions total. The headline numbers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;100% checkout rate&lt;/strong&gt; across all four sessions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;95,513 average tokens&lt;/strong&gt; per session&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;48.3s average duration&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;0 errors&lt;/strong&gt; across the matrix&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's the boring summary. The interesting parts are in the per-session table.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Store&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Tokens&lt;/th&gt;
&lt;th&gt;Duration&lt;/th&gt;
&lt;th&gt;Turns&lt;/th&gt;
&lt;th&gt;Cart value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;oakywood.shop&lt;/td&gt;
&lt;td&gt;Gemini 3.1 Pro&lt;/td&gt;
&lt;td&gt;85,614&lt;/td&gt;
&lt;td&gt;93.4s&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;EUR 82.75&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;oakywood.shop&lt;/td&gt;
&lt;td&gt;Gemini 3 Flash&lt;/td&gt;
&lt;td&gt;154,294&lt;/td&gt;
&lt;td&gt;34.7s&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ugmonk.com&lt;/td&gt;
&lt;td&gt;Gemini 3.1 Pro&lt;/td&gt;
&lt;td&gt;46,084&lt;/td&gt;
&lt;td&gt;35.1s&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;USD 77.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ugmonk.com&lt;/td&gt;
&lt;td&gt;Gemini 3 Flash&lt;/td&gt;
&lt;td&gt;96,058&lt;/td&gt;
&lt;td&gt;29.9s&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Same sequence, same stores, two models. Gemini 3.1 Pro completes the run in fewer turns and roughly half the tokens of Flash on the same store, but its latency is meaningfully higher when the store itself is slower to respond. That isn't a fact you can extract from a vendor benchmark or a single-model demo. It only shows up when the same scripted run hits multiple models head-to-head, with both numbers landing in the same row.&lt;/p&gt;

&lt;p&gt;The auto-generated recommendations point at where the real engineering work is, and they're grounded in the actual run data:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Average token usage is 95,513 — above the 40K baseline. Product descriptions may be inflating context. Consider truncating descriptions in MCP responses.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Average session duration is 48.3s — above the 15s target. Optimise MCP endpoint response times, especially initial search calls.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Those are concrete merchandising actions. They land because the evidence is right there in the per-session breakdown.&lt;/p&gt;

&lt;p&gt;The deeper signal shows up across runs against richer stores. In a separate eval against a single shop, two models picked &lt;em&gt;different variant IDs for "Medium"&lt;/em&gt; — one mapped Medium to one variant ID, the other to a different one, and neither is provably correct because the store doesn't expose a human-readable size axis in its variant data. That isn't a bug in either model. It's a gap in how the store represents its product axes, and it only becomes visible when two models walk the same path. &lt;strong&gt;This is the kind of behavioural divergence between frontier models that evals surface — and that vendor-internal benchmarks can't credibly report.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The same run logged 6/6 prompt-injection resistance across every session, against benchmark prompts seeded in product descriptions and review fields. Useful by itself; more useful as a baseline that future runs can regress against.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's on the evals roadmap
&lt;/h2&gt;

&lt;p&gt;This is v1. A few things on the roadmap, in priority order.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;More eval collections.&lt;/strong&gt; The four built-in sequences cover the core shopping flow. The next batch is more diagnostic: single-item flow (the simplest path), variant selection accuracy (the size-label gap above, formalised), prompt-injection resistance (already running, becoming its own collection), escalation handling (&lt;code&gt;requires_escalation&lt;/code&gt; compliance), attribution accuracy (UTM and referrer handling at checkout hand-off), return policy surfacing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Public benchmark leaderboards.&lt;/strong&gt; Same pattern as the &lt;a href="https://ucpchecker.com/leaderboard" rel="noopener noreferrer"&gt;UCP Score leaderboard&lt;/a&gt; — by-store and by-model rankings against the standard sequences, refreshed on schedule, indexed and shareable. The categories that matured around shared benchmarks (ML, web perf, coding models) all developed public leaderboards — and the leaderboards turned out to be most of the forcing function.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Headless API and CI/CD integration.&lt;/strong&gt; Already shipped. The full automation surface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST /api/v1/collections          — create
POST /api/v1/collections/{id}/run — trigger
GET  /api/v1/collection-runs/{id} — poll status + results
GET  /api/v1/collection-runs/{id}/pdf — download report
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first integration we expect anyone to ship is a deploy-time check: trigger an eval after every UCP manifest deploy, assert &lt;code&gt;checkout_rate &amp;gt;= 80&lt;/code&gt;, &lt;code&gt;errors.total == 0&lt;/code&gt;, &lt;code&gt;avg_duration_ms &amp;lt; 30000&lt;/code&gt;, fail the build otherwise. Same shape as Lighthouse CI for web performance — a regression catch you bolt onto the pipeline rather than rediscover in production. Full developer documentation — authentication, rate limits, and a worked GitHub Actions example — lives at &lt;a href="https://ucpchecker.com/developer-tools" rel="noopener noreferrer"&gt;ucpchecker.com/developer-tools&lt;/a&gt;, alongside the rest of the public API surface.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scheduled runs and version tracking.&lt;/strong&gt; Also shipped. Collections auto-increment versions when their config changes, runs snapshot the config they used, and a cron field on each collection lets you run the same eval on a regular cadence — same Monday-9am sequence every week, before-and-after comparisons whenever the underlying UCP implementation changes. This is how a benchmark becomes a tracking record instead of a one-shot demo.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cloning and team scoping.&lt;/strong&gt; Public collections can be cloned into any team workspace; quotas are scoped per team. The intent is community sharing — well-known sequences turning into shared, reusable yardsticks the way SWE-bench problem sets did for coding models.&lt;/p&gt;

&lt;h2&gt;
  
  
  How evals fit the broader development cycle
&lt;/h2&gt;

&lt;p&gt;Evals don't sit alone. They're the runtime testing surface in a development loop that starts earlier in UCP Checker — manifest validation, agent-readiness scoring, capability coverage analysis. The web performance world solved the same shape with three tools used in sequence: Lighthouse to grade pages, PageSpeed Insights to drill into specific issues, synthetic monitoring to verify behaviour over time. UCP implementations follow the same arc: validate the manifest at &lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;&lt;code&gt;/check&lt;/code&gt;&lt;/a&gt;, score it against agent-readiness criteria with the &lt;a href="https://ucpchecker.com/blog/introducing-ucp-score-agent-readiness-grade" rel="noopener noreferrer"&gt;UCP Score&lt;/a&gt;, then run evals against it to see how it actually behaves when a real frontier agent shops it.&lt;/p&gt;

&lt;p&gt;Each tool surfaces something different. Score tells you what's missing structurally — which discovery signals, which capabilities, which conformance rules. Check confirms the manifest validates after fixes land. Evals confirms the agent actually behaves correctly when it tries to complete a real flow. None is sufficient on its own; together they're the development feedback loop UCP needs. We've watched developers iterate across the whole thing in a single session — score the implementation, fix the gap server-side, re-check the manifest, then run an eval to confirm the agent now closes a checkout it couldn't before.&lt;/p&gt;

&lt;p&gt;If you're starting from zero on a UCP implementation, the natural sequence is: get a Score first to see what's missing, fix the highest-impact issues, run a Check to confirm the manifest validates cleanly, then run Evals to confirm real agents complete the flows you care about. CI covers the long tail — automated scoring on each deploy, scheduled evals weekly, alerts when capabilities regress.&lt;/p&gt;

&lt;h2&gt;
  
  
  Methodology and verification
&lt;/h2&gt;

&lt;p&gt;Three properties separate a credible benchmark from a marketing claim. UCP Playground Evals are designed around all three.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Every result links to a replayable session.&lt;/strong&gt; Each eval session generates the same &lt;code&gt;agent_sessions&lt;/code&gt; data the public Playground UI produces — full tool-call timeline, model responses, token-by-token event stream, every retrieved page. The session IDs in any report are clickable. Open one and you see exactly what the agent did, turn by turn, on which tool call, with which response. The sample report above lists four such IDs (e.g. &lt;code&gt;01KMJZM5MG2CA4QN5M983H19E1&lt;/code&gt;) and each resolves to a full replay at &lt;code&gt;ucpplayground.com/sessions/{id}&lt;/code&gt;. &lt;strong&gt;This isn't a marketing claim; it's a verifiable test you can audit.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Every collection is versioned.&lt;/strong&gt; When the configuration of a collection changes — turns added, models swapped, store list updated — the version increments and every run snapshots the config it ran against. Anyone questioning a result can reproduce the exact methodology used at that moment. The PDF report itself prints the collection version at the bottom of every page; the sample above is &lt;code&gt;Collection v3&lt;/code&gt;. Versioning is what stops "we got better results" from quietly sliding into "we changed the test" — the same constraint MLPerf submission rules enforce on hardware vendors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The methodology is open.&lt;/strong&gt; The framework configuration shape is documented — the turns, the orchestrator loop, the stop conditions, the success metrics, the PDF schema. Anyone can build the same test, run it against any UCP store, and get back a directly comparable report. If we get a methodology choice wrong, the path to disagreement is technical, not promotional.&lt;/p&gt;

&lt;p&gt;That's the credibility floor. Everything else in the product builds on it.&lt;/p&gt;

&lt;h2&gt;
  
  
  About UCP Checker and UCP Playground
&lt;/h2&gt;

&lt;p&gt;UCP Checker is the independent validation and monitoring layer for the &lt;a href="https://ucp.dev" rel="noopener noreferrer"&gt;Universal Commerce Protocol&lt;/a&gt;. We crawl, validate, and grade every public UCP manifest in the open web, run the &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;merchant directory&lt;/a&gt; and the &lt;a href="https://ucpchecker.com/score" rel="noopener noreferrer"&gt;UCP Score&lt;/a&gt;, publish the &lt;a href="https://ucpchecker.com/leaderboard" rel="noopener noreferrer"&gt;leaderboard&lt;/a&gt; and &lt;a href="https://ucpchecker.com/stats" rel="noopener noreferrer"&gt;adoption stats&lt;/a&gt;, and ship developer tools — the &lt;a href="https://ucpchecker.com/ucp-validator" rel="noopener noreferrer"&gt;validator&lt;/a&gt;, &lt;a href="https://ucpchecker.com/bulk-check" rel="noopener noreferrer"&gt;bulk checker&lt;/a&gt;, &lt;a href="https://ucpchecker.com/extension" rel="noopener noreferrer"&gt;browser extension&lt;/a&gt;, &lt;a href="https://ucpchecker.com/developer-tools" rel="noopener noreferrer"&gt;public dataset&lt;/a&gt;, and a public REST API. The whole dataset is open, indexed, and ungated.&lt;/p&gt;

&lt;p&gt;UCP Playground is the agent shopping layer that sits next to it — same data model, same &lt;code&gt;/.well-known/ucp&lt;/code&gt; discovery, same replayable session format. UCP Playground Evals is the benchmark surface on top of that. Together they form the third-party scoreboard the ecosystem can build trust on top of — the SSL Labs and Lighthouse of agentic commerce, depending on which side you're looking from.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The interesting eval gaps are the ones nobody's tested yet.&lt;/strong&gt; If a result surprises you — your own store, a competitor's, a model you assumed was a clear winner that turns out not to be — &lt;a href="https://ucpchecker.com/contact" rel="noopener noreferrer"&gt;let us know&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Three concrete next steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Run an eval against your own UCP store.&lt;/strong&gt; Create a collection at &lt;a href="https://ucpplayground.com/evals" rel="noopener noreferrer"&gt;ucpplayground.com/evals&lt;/a&gt;, pick a sequence, pick two models, run it. The four-session example above is the shape most first runs take.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read a public eval report.&lt;/strong&gt; Sample reports are linked from the framework page. Each has clickable session IDs you can replay end-to-end.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wire it into CI.&lt;/strong&gt; The &lt;a href="https://ucpchecker.com/developer-tools" rel="noopener noreferrer"&gt;developer tools page&lt;/a&gt; covers authentication, rate limits, and a GitHub Actions worked example. The assertion shape is the same one Lighthouse CI uses for web performance — &lt;code&gt;checkout_rate&lt;/code&gt;, &lt;code&gt;errors.total&lt;/code&gt;, &lt;code&gt;avg_duration_ms&lt;/code&gt; instead of LCP and TBT.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ecommerce</category>
      <category>webdev</category>
      <category>product</category>
      <category>ucp</category>
    </item>
    <item>
      <title>Is My Store UCP Ready? How to Check in 60 Seconds</title>
      <dc:creator>Benji Fisher</dc:creator>
      <pubDate>Thu, 30 Apr 2026 10:25:51 +0000</pubDate>
      <link>https://dev.to/benjifisher/is-my-store-ucp-ready-how-to-check-in-60-seconds-4fco</link>
      <guid>https://dev.to/benjifisher/is-my-store-ucp-ready-how-to-check-in-60-seconds-4fco</guid>
      <description>&lt;p&gt;The short answer: &lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;enter your domain here&lt;/a&gt; and you'll know in under 60 seconds. This UCP ready check runs the same validation that AI agents use to decide whether your store is worth shopping.&lt;/p&gt;

&lt;p&gt;The longer answer — what "UCP ready" actually means, why it matters, and what to do about the result — is what this post covers.&lt;/p&gt;

&lt;h2&gt;
  
  
  What UCP readiness means
&lt;/h2&gt;

&lt;p&gt;A store is "UCP ready" when it publishes a valid manifest at &lt;a href="https://ucpchecker.com/well-known-ucp" rel="noopener noreferrer"&gt;&lt;code&gt;/.well-known/ucp&lt;/code&gt;&lt;/a&gt; that AI shopping agents can discover, parse, and act on. That's the technical definition.&lt;/p&gt;

&lt;p&gt;In practice, there are three levels:&lt;/p&gt;

&lt;h3&gt;
  
  
  Level 1: Verified
&lt;/h3&gt;

&lt;p&gt;Your manifest exists, returns valid JSON, and passes &lt;a href="https://ucpchecker.com/ucp-validator" rel="noopener noreferrer"&gt;schema validation&lt;/a&gt; against the current &lt;a href="https://ucpchecker.com/specs/2026-04-08" rel="noopener noreferrer"&gt;v2026-04-08 spec&lt;/a&gt;. You appear in the &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;UCP directory&lt;/a&gt;. Agents can find you.&lt;/p&gt;

&lt;p&gt;As of this month, &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-april-2026" rel="noopener noreferrer"&gt;4,024 stores&lt;/a&gt; are at this level.&lt;/p&gt;

&lt;h3&gt;
  
  
  Level 2: Agent-functional
&lt;/h3&gt;

&lt;p&gt;Agents can actually &lt;em&gt;shop&lt;/em&gt; your store — not just discover it. Your MCP endpoint responds, your product data is clean, your checkout flow completes without errors. You score B+ or higher on the &lt;a href="https://ucpchecker.com/leaderboard" rel="noopener noreferrer"&gt;Playground leaderboard&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;422 stores are at this level. The gap between "verified" and "agent-functional" is where most &lt;a href="https://ucpchecker.com/blog/common-ucp-errors" rel="noopener noreferrer"&gt;common errors&lt;/a&gt; live.&lt;/p&gt;

&lt;h3&gt;
  
  
  Level 3: Optimized
&lt;/h3&gt;

&lt;p&gt;Agents complete purchases reliably across multiple models. Your variant data is clean, your latency is low, your capabilities go beyond the defaults. You score A. Only 9 stores are here today.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://ucpchecker.com/blog/ucp-requirements" rel="noopener noreferrer"&gt;UCP requirements checklist&lt;/a&gt; breaks down exactly what each level requires.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to check your store
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Run the checker
&lt;/h3&gt;

&lt;p&gt;Go to &lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;UCPChecker.com/check&lt;/a&gt; and enter your domain. When you check your UCP status, the checker will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fetch &lt;code&gt;/.well-known/ucp&lt;/code&gt; from your domain&lt;/li&gt;
&lt;li&gt;Validate the JSON against the current spec&lt;/li&gt;
&lt;li&gt;Check your robots.txt for AI bot policies&lt;/li&gt;
&lt;li&gt;Inventory your declared capabilities, transports, and payment handlers&lt;/li&gt;
&lt;li&gt;Verify your UCP compliance and report every error and warning with specific error codes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The whole process takes about 1 second. You'll get a full diagnostic report on your &lt;a href="https://ucpchecker.com/directory" rel="noopener noreferrer"&gt;status page&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Read the result
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Verified&lt;/strong&gt; (green) — your manifest is valid. You're in the directory. Agents can find you. Check the warnings section for things to improve.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Invalid&lt;/strong&gt; (amber) — your manifest exists but fails validation. The diagnostic panel shows exactly which fields are wrong or missing. Most invalid manifests are one fix away from passing — usually a &lt;a href="https://ucpchecker.com/blog/common-ucp-errors" rel="noopener noreferrer"&gt;missing required field or a misplaced signing_keys&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not Detected&lt;/strong&gt; (grey) — no manifest found at &lt;code&gt;/.well-known/ucp&lt;/code&gt;. Your store isn't UCP ready yet. See the &lt;a href="https://ucpchecker.com/blog/ucp-requirements" rel="noopener noreferrer"&gt;requirements post&lt;/a&gt; for what to publish.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Blocked&lt;/strong&gt; (orange) — your robots.txt or firewall is preventing access to the manifest. The diagnostic will tell you whether it's a robots.txt rule or an HTTP-level block.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Fix what's broken
&lt;/h3&gt;

&lt;p&gt;The checker tells you &lt;em&gt;what&lt;/em&gt; is wrong. Here's where to go for &lt;em&gt;how&lt;/em&gt; to fix it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Platform-specific guides:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/blog/shopify-ucp-guide-ai-agent-commerce" rel="noopener noreferrer"&gt;Shopify&lt;/a&gt; · &lt;a href="https://ucpchecker.com/blog/woocommerce-ucp-guide-ai-agent-commerce" rel="noopener noreferrer"&gt;WooCommerce&lt;/a&gt; · &lt;a href="https://ucpchecker.com/blog/bigcommerce-ucp-guide-ai-agent-commerce" rel="noopener noreferrer"&gt;BigCommerce&lt;/a&gt; · &lt;a href="https://ucpchecker.com/blog/magento-adobe-commerce-ucp-guide-ai-agent-commerce" rel="noopener noreferrer"&gt;Magento&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manifest reference:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/well-known-ucp" rel="noopener noreferrer"&gt;/.well-known/ucp developer guide&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error-by-error fixes:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/blog/common-ucp-errors" rel="noopener noreferrer"&gt;Common UCP errors&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spec changes:&lt;/strong&gt; &lt;a href="https://ucpchecker.com/blog/ucp-v2026-04-08-spec-update" rel="noopener noreferrer"&gt;v2026-04-08 update&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 4: Test with real agents
&lt;/h3&gt;

&lt;p&gt;Schema validation tells you if your manifest is syntactically correct. It tells you nothing about whether an agent can actually buy something from your store. For that, you need &lt;a href="https://ucpplayground.com" rel="noopener noreferrer"&gt;UCP Playground&lt;/a&gt; — it runs real AI agent sessions against your store and shows you exactly where the flow breaks.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://ucpchecker.com/blog/agentic-commerce-optimization-ucp-readiness-data" rel="noopener noreferrer"&gt;agent testing data&lt;/a&gt; shows that the most common runtime failure is &lt;a href="https://ucpchecker.com/blog/common-ucp-errors" rel="noopener noreferrer"&gt;variant mismatches&lt;/a&gt; — clean product data matters more than perfect schema.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Monitor
&lt;/h3&gt;

&lt;p&gt;Your UCP endpoint is a live API. Platform updates, catalog changes, and CDN reconfigurations can break it silently. Set up &lt;a href="https://ucpchecker.com/alerts" rel="noopener noreferrer"&gt;UCP Alerts&lt;/a&gt; to get emailed the moment your status changes — before agents notice.&lt;/p&gt;

&lt;h2&gt;
  
  
  How you compare
&lt;/h2&gt;

&lt;p&gt;Once you're verified, see how your store stacks up:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/compare" rel="noopener noreferrer"&gt;Compare side-by-side&lt;/a&gt;&lt;/strong&gt; with a competitor or partner store — capabilities, transports, payment handlers, latency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/platforms" rel="noopener noreferrer"&gt;Browse your platform&lt;/a&gt;&lt;/strong&gt; — see all verified &lt;a href="https://ucpchecker.com/platforms/shopify" rel="noopener noreferrer"&gt;Shopify&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/woocommerce" rel="noopener noreferrer"&gt;WooCommerce&lt;/a&gt;, &lt;a href="https://ucpchecker.com/platforms/bigcommerce" rel="noopener noreferrer"&gt;BigCommerce&lt;/a&gt;, or &lt;a href="https://ucpchecker.com/platforms/magento" rel="noopener noreferrer"&gt;Magento&lt;/a&gt; stores ranked by capability depth.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ucpchecker.com/leaderboard" rel="noopener noreferrer"&gt;Check the leaderboard&lt;/a&gt;&lt;/strong&gt; — stores graded A through F on real agent shopping performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why this matters now
&lt;/h2&gt;

&lt;p&gt;UCP adoption is accelerating. &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-april-2026" rel="noopener noreferrer"&gt;1,400+ new merchants&lt;/a&gt; were discovered in April alone. Shopify &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-april-2026" rel="noopener noreferrer"&gt;migrated its entire fleet&lt;/a&gt; to the latest spec in four days. BigCommerce, WooCommerce, and Magento stores are appearing every week.&lt;/p&gt;

&lt;p&gt;Am I UCP ready? The question isn't whether your store will need UCP. It's whether you'll be ready when agents start shopping — and &lt;a href="https://ucpchecker.com/blog/agentic-commerce-optimization-ucp-readiness-data" rel="noopener noreferrer"&gt;they already are&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Before you check, it helps to understand the building blocks: &lt;a href="https://ucpchecker.com/capabilities" rel="noopener noreferrer"&gt;capabilities&lt;/a&gt; define what your store can do for agents, &lt;a href="https://ucpchecker.com/payment-handlers" rel="noopener noreferrer"&gt;payment handlers&lt;/a&gt; define how agents pay, &lt;a href="https://ucpchecker.com/transports" rel="noopener noreferrer"&gt;transports&lt;/a&gt; define how agents connect, and &lt;a href="https://ucpchecker.com/product-discovery" rel="noopener noreferrer"&gt;product discovery&lt;/a&gt; is the flow agents actually run when they shop.&lt;/p&gt;

&lt;p&gt;Make sure your listing on &lt;a href="https://ucpregistry.com" rel="noopener noreferrer"&gt;UCP Registry&lt;/a&gt; is accurate once you're verified — that's how agents find you in the first place.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://ucpchecker.com/check" rel="noopener noreferrer"&gt;Check your store now →&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Build your own agent: &lt;a href="https://ucpchecker.com/agents" rel="noopener noreferrer"&gt;developer quickstart&lt;/a&gt;. Understand the protocol stack: &lt;a href="https://ucpchecker.com/blog/mcp-vs-ucp-vs-ap2-whats-the-difference" rel="noopener noreferrer"&gt;MCP vs UCP vs AP2&lt;/a&gt;. Monthly ecosystem data: &lt;a href="https://ucpchecker.com/blog/state-of-agentic-commerce-april-2026" rel="noopener noreferrer"&gt;State of Agentic Commerce&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ecommerce</category>
      <category>webdev</category>
      <category>tutorial</category>
      <category>ucp</category>
    </item>
  </channel>
</rss>
