DEV Community

Cover image for 3,000 UCP Stores, Open Data: Why We're Publishing Our Dataset on Hugging Face
Benji Fisher
Benji Fisher

Posted on • Originally published at ucpchecker.com

3,000 UCP Stores, Open Data: Why We're Publishing Our Dataset on Hugging Face

We've been crawling UCP manifests since January. For the first few months, the data lived in our own database — feeding the directory, powering the grades, tracking adoption week by week. We published summaries in our monthly state-of-the-ecosystem posts, but the raw dataset stayed internal. There wasn't much to share when the corpus was a few hundred stores.
That changed. We crossed 3,000 verified UCP merchants this week. And when you're sitting on a dataset that didn't exist six months ago, that no one else has, and that the people building agentic commerce tools would genuinely benefit from — it's time to share it.
Today we're publishing the UCPChecker merchant dataset on Hugging Face. Monthly snapshots, CC-BY 4.0 licensed, free to download, free to use.

What's in the dataset

Every row is a verified UCP merchant — a domain where we've confirmed a valid UCP manifest exists at /.well-known/ucp, the manifest passes spec validation, and the store has at minimum a working search capability.
For each merchant you get the domain, verification status, UCP endpoint URL, HTTP status, the spec version the store is advertising, five boolean capability flags (checkout, cart management, identity linking, order, payment token), a capability count, the AI bot policies the store declares, the transports it supports (MCP, REST, embedded), and two timestamps: when we last checked it and when we last got a successful response.

UCPChecker Dataset Schema — 15 columns grouped by category: Identity, Protocol, Capabilities, Ecosystem, and Timestamps

Dataset Schema — 15 columns — UCP Checker Share Embed

That last column matters more than it looks. "First seen" is when this store became agent-ready. It's a timestamp on a real industry transition.
The snapshot releasing alongside this post covers the full corpus — 3,000+ stores across every platform we've indexed, from the Shopify majority to the handful of independent WooCommerce and Magento implementations that have been painstakingly hand-configured.

This is the directory layer — the foundation. If you need more — deeper analytics, operational data, or enterprise-level insight — get in touch. We work with teams building on top of the ecosystem.

If you're a merchant and your store isn't in the dataset, check whether your /.well-known/ucp manifest is live and valid at ucpchecker.com. Once you're verified, you'll appear in the next monthly snapshot automatically.

What you can build with it

The obvious use case is research. If you're writing about agentic commerce — for a conference talk, an analyst report, a blog post — you now have a citable, versioned dataset instead of a hand-waved "thousands of stores." Download the CSV, run your own analysis, publish your own findings. We'll keep releasing monthly snapshots so your comparisons have a time axis.
The less obvious use case is tooling. If you're building a commerce agent, an MCP client, or anything that needs to discover agent-ready stores, this dataset is your starting index. You don't need to crawl from scratch. Every domain in the file has a working UCP manifest right now. Point your agent at any of them and it will find something to buy.
There's also benchmark utility. We've built our own benchmark tooling on top of this corpus — the leaderboard compares how AI models perform across real stores — but the underlying merchant list is the same one we're publishing. If you want to run your own evals against a representative cross-section of real UCP commerce, this is the store list to use.

What's in the open dataset

UCPChecker Open UCP Merchant Dataset — fields included in the free CC-BY 4.0 dataset: domain, status, capabilities, version, transports, ai_bot_policies, and timestamps

Open Dataset — CC-BY 4.0 — UCP Checker Share Embed

Why publish it

The ecosystem needs a shared baseline. Right now the people building agentic commerce tools — developers, agent frameworks, platforms — are all working from anecdotal evidence about which stores support what. That slows everyone down.
A public, versioned dataset fixes that. Researchers can cite real numbers instead of guessing. Developers can seed their agents with verified stores instead of crawling from scratch. Platforms can benchmark themselves against the field. The whole ecosystem moves faster when there's a common reference point, and we're in the best position to provide one.

What the data shows

Since we're talking about a dataset, it's worth saying what we actually see in it. The corpus grew fast — from a standing start in January to over 3,000 verified merchants by mid-March.

UCPChecker Dataset Growth Timeline — January 2026 crawling begins (445 domains), February 1000+ stores, March 3000+ stores and first Hugging Face release, April+ monthly snapshots

Dataset Growth — January to March 2026 — UCP Checker Share Embed

Version convergence is essentially complete. 99.8% of stores in the dataset are advertising spec version 2026-01-23. The ecosystem standardized on this version fast — faster than most protocol adoptions we've observed. That's partly Shopify's influence (when Shopify ships a version, 898 stores update in lockstep) but it also reflects that UCP adopters are, by selection, developers who care about spec compliance.
The capability gap is stark. Checkout is nearly universal — 99.96% of verified merchants declare it. But look one capability past that and the numbers collapse. Cart operations drop to 0.07%. Identity linking is at 0.07%. Payment token support is at 0%. The protocol has the capability definitions. The stores mostly haven't implemented them yet.
This is the part of the dataset we expect to move the most over the coming months. As Playground and other agent testing tools give developers concrete evidence that capability depth improves conversion, those numbers will shift. The baseline we're publishing now is a before picture.

UCPChecker Capability Adoption Cliff — bar chart showing Checkout at 99.96% and Order at 99.7% versus Cart Management at 0.07%, Identity Linking at 0.07%, and Payment Token at 0%

Capability Adoption Cliff — 3,000+ UCP stores — UCP Checker Share Embed

The platform breakdown tells a familiar story. Shopify accounts for 898 of the identified stores. Generic (unidentified platform) accounts for 285. The long tail of WooCommerce, Magento, BigCommerce, and custom implementations is real but small compared to the Shopify bloc. This is consistent with what we've written about before — Shopify's platform-level default made UCP table-stakes for their merchants overnight. Everyone else is still climbing.

Go use it

The dataset is at huggingface.co/datasets/UCPChecker/ucp-merchants. Download the CSV, run a notebook, build a tool, write a paper. The license is CC-BY 4.0 — use it for anything, just say where it came from.
We'll cut a new snapshot on the first of each month. If you're tracking adoption over time, watch the Hugging Face page for updates.
If you build something with it — an agent, an analysis, a visualization, a benchmark — we'd genuinely like to know. If you find gaps in the data or coverage you'd expect that isn't there, tell us. And if the base layer isn't enough and you need deeper data for your team, reach out. That conversation is one we want to have.

Top comments (0)