Benji Fisher

Posted on Mar 3 • Originally published at ucpchecker.com

We Monitored 2,000 UCP Manifests Every Day for a Month. Here's What Breaks

#ucp #webdev #mcp #ai

In our last post, we ran 180 AI agent shopping sessions and showed what happens when models actually try to buy things. That data told us which models reach checkout and which fall off the funnel.

But it left a bigger question unanswered: what about the stores themselves?

A session failing because Claude guessed a variant ID wrong is a model problem. A session failing because the store's manifest disappeared overnight is an infrastructure problem. And you can't tell the difference from a single scan.

So ten days after UCP launched on January 11th, we started monitoring. Not a one-time crawl — a continuous, automated check of every domain in our pool, every 24 hours. A month later, our crawler has run over 24,000 checks across 2,008 domains.

Here's what the data shows.

The monitoring pool

Our crawler tracks 2,008 domains, discovered through four channels:

Browser extension: 1,480 domains — our Chrome extension probes /.well-known/ucp on every storefront visited and feeds new domains into the pool automatically. 98.4% verified rate — the stores it finds are overwhelmingly real merchants with live manifests.
Crawler: 407 domains — a proprietary engine that continuously discovers new UCP-enabled stores.
Web: 120 domains — manual checks submitted through ucpchecker.com by developers and store owners testing their implementations.
Bulk check: 1 unique domain — batch submissions, mostly domains we're already tracking.

No single source covers the full picture. The crawler finds known stores systematically. The extension picks up storefronts that don't appear in any directory — niche brands, regional retailers, development endpoints. Together they give broader coverage than any one approach alone.

As of February 27th:

83% of domains have a working UCP manifest. That's a strong baseline — but the interesting story is in the other 17%, and in what happens to manifests that were verified yesterday.

Manifests break. More often than you'd think.

Over the monitoring period, we detected 457 status changes across the pool. 95 of those were breakages: a domain that was verified on one check came back invalid, unreachable, or blocked on the next.
The breakdown:

68 verified → invalid — manifest still exists but fails validation. Most common failure mode. A deployment pushes bad JSON, a field goes missing, or a version string gets malformed.
14 verified → unreachable — endpoint times out entirely. Infrastructure issue, DNS change, or CDN misconfiguration.
7 verified → blocked — domain starts rejecting the crawler. Usually a WAF rule change.
6 verified → not detected — manifest disappears. Endpoint returns 404 or redirects.

88 unique domains experienced at least one breakage during the month. That's roughly 5% of verified stores going down at some point — and then, in many cases, coming back.

The recovery cycle

The good news: most breakages are temporary.

We observed 96 recoveries — domains that were broken or missing and then came back:

69 invalid → verified (bad deployment rolled back)
13 unreachable → verified (infrastructure recovers)
10 not detected → verified (new manifest published)
4 blocked → verified (bot rules relaxed)

This creates a pattern we're calling the "manifest recovery cycle." A store's UCP endpoint breaks — usually through an invalid manifest — and typically recovers within 24–48 hours.

A caveat: we're running a 24-hour crawl cycle, so actual recovery time could be shorter. A store that breaks and fixes within a few hours between checks wouldn't show up as a transition at all — meaning the true breakage rate is likely higher than what we're reporting.

The implication for agent developers: just because a store worked yesterday doesn't mean it works today. And just because it's broken now doesn't mean it's gone.

The flappers

32 domains showed signs of persistent instability — oscillating between working and broken states multiple times over the month. We're calling these "flappers."

The most unstable endpoint flipped status 22 times in a month. Others in the top 10 include major retailers and well-known tech companies — names you'd expect to have stable infrastructure. Some are running custom UCP implementations outside Shopify, which may explain the instability. Others appear to be testing or iterating in production.

Flapping is a signal. It tells you the endpoint exists and someone is actively working on it — but it's not yet reliable enough for an agent to depend on.

For agent developers building production flows against specific stores, flapping domains need a different strategy: retry logic, fallback handling, or simply waiting until the implementation stabilises.

Who's blocking agents — and the paradox

95 domains are actively blocking our crawler:

Firewall blocks (1,179 check instances) — the domain's WAF rejects the request before it reaches the UCP endpoint. This includes major retailers like Kohl's, Macy's, Sears, REI, Neiman Marcus, and Tiffany.
Robots.txt blocks (108 instances) — explicit crawler disallowance.

Here's the paradox we flagged in our first audit and it persists: some of these blocked domains have fully deployed UCP manifests. They've built the infrastructure for agentic commerce and then locked the front door.

This is almost certainly an operational gap — the security team updating firewall rules without coordinating with the product team that shipped UCP. But for agents, the result is a hard bounce.

Robots.txt: the access picture

Among verified domains, we check robots.txt for six major AI bot user agents. The picture is overwhelmingly permissive:

18 verified domains block at least one AI bot while maintaining a live UCP manifest. The pattern splits into two groups: stores that block everything and stores that selectively block one or two.

The selective blockers are the more interesting case — a store that blocks GPTBot but allows ClaudeBot is making a deliberate choice about which agents can discover them. Some stores appear to be conflating AI training crawlers with AI shopping agents in their robots.txt rules.

The broader signal: stores that have committed to UCP have also committed to being discoverable by agents. The 1% that block specific bots are edge cases.

Response times: fast, but with a long tail

Across 19,035 verified checks:

130ms median is fast — well within what real-time agent interactions need. But that tail matters. 1.5% of verified checks returned in over 500ms.

For agents making multiple tool calls per session — search, details, cart, checkout — a slow manifest compounds. A 130ms response adds barely any latency. A 750ms response hit three or four times adds 2–3 seconds of dead time the user feels.

If you read our Playground data, you know the difference between a 5-second session and an 11-minute one can come down to infrastructure like this.

Transports: MCP dominates, REST is where it gets interesting

UCP is transport-agnostic by design. Across 1,669 verified domains:

MCP: 99.9% (1,668 domains) — the default Shopify transport. JSON-RPC, tool discovery via schema introspection, real-time tool calls.
Embedded: 99.8% (1,665 domains) — declared alongside MCP on nearly every Shopify store. Designed to solve the payment wall: agent builds the cart, merchant's checkout UI handles payment in a secure iframe.
REST: 0.5% (8 domains) — found exclusively on non-Shopify implementations: WooCommerce via UCPReady, custom builds, development endpoints.
A2A: 1 domain — Google's Agent-to-Agent protocol. The first A2A declaration we've seen in a UCP manifest in the wild.

The dominant combo is ["mcp", "embedded"] — what Shopify ships by default. But those 8 REST-declaring stores are where the transport diversity lives. These include WooCommerce implementations that expose REST alongside MCP and Embedded — giving agents three paths to the same store.

In our 180-session deep dive, we showed the schema fragmentation across these stacks — same "add to cart" intent, three completely different tool signatures. The transport data here shows why: 99.9% of stores speak one dialect (Shopify MCP), and the remaining 0.1% is where all the interoperability challenges live.

For agent developers: an agent that only speaks MCP reaches 99.9% of the current ecosystem. But as non-Shopify platforms ship UCP with REST-first architectures, that percentage will shift. The agents that handle multiple transports will have the widest reach.

The benchmark picture

We've benchmarked 1,183 domains on our scoring system:

The A-grade stores — Allbirds, Emma Bridgewater, Bodybuilding.com, and six others — represent the current ceiling. 188ms average TTFB, total scores of 90+, full capability coverage. In our Playground testing, Allbirds was also a standout: one store, five models, 100% checkout rate, zero errors.

The F-grades are almost entirely non-Shopify domains (285 of 289). Most don't have a manifest at all — domains submitted for checking that haven't deployed UCP.

The capability gap that matters most

Among 741 stores with verified manifests and benchmarked capabilities, coverage is near-universal: search, cart, product details, policies, shipping, discounts — all at 100%.

But OAuth: effectively 0%.

Without OAuth, every agent interaction is anonymous. No saved addresses, no order history, no loyalty discounts. The spec acknowledges it. The ecosystem hasn't addressed it yet.

The interesting divergence comes from non-Shopify implementations — WooCommerce stores exposing dev.ucp.shopping.checkout and dev.ucp.shopping.fulfillment capabilities that Shopify stores don't declare. As the ecosystem diversifies beyond Shopify, capability coverage becomes a real differentiator.

What this means if you're building on UCP

If you're building agents: manifests are not static. They break, recover, and flap. Your agent framework needs retry logic, health checking, and fallback handling. Don't cache manifest state for more than a few hours.

If you're running an MCP server: the reliability gap between Shopify and everyone else is significant. Shopify stores are consistently stable. Non-Shopify implementations are where the instability lives — and where the most interesting development is happening. If you're in the second group, monitoring your own endpoint is table stakes.

If you're a store owner: check whether your security team knows about UCP. The stores blocking AI crawlers aren't rejecting agentic commerce — they're running standard bot protection that wasn't updated when UCP went live. A WAF rule change could be the difference between your store being agent-shoppable and invisible.

If you care about speed: manifest response time is infrastructure-level optimisation. The best stores respond in under 200ms. The worst take over 2 seconds. This is fixable independently of your UCP implementation.

Why we turned this into alerts

Continuous monitoring generates signals. But signals are only useful if they reach the right person at the right time.

That's why we shipped UCP Alerts — track any domain and get emailed the moment its UCP status changes. A store goes live with a manifest, you know immediately. A verified endpoint breaks, you know before your agents hit the error. A blocked domain recovers, you know when to retry.

Same crawl cycle powering all the data in this post. Sign in, add domains, and you're covered.

Methodology: Our crawler checks each domain every 24 hours. A check hits /.well-known/ucp, validates against the UCP spec, and evaluates HTTP status, response time, manifest structure, AI bot policies, and errors. All data is from real automated checks recorded between January 21 and February 27, 2026.

Tools:

UCP Checker — Check any store's UCP manifest and agent-readiness
UCP Playground — Watch an AI agent shop any UCP-ready store in real time

If you're building on UCP — MCP server, Shopify app, WooCommerce plugin, agent framework — I'd love to hear what you're seeing. What's working? What's broken? What should we monitor next?

Top comments (1)

Almin Zolotic • Mar 3

The REST transport data is interesting from a WooCommerce perspective. The three-transport stack (MCP + REST + Embedded) in a single manifest is deliberate — giving agents multiple paths to the same store reduces single points of failure when one transport times out or isn't supported. The manifest recovery cycle you've documented is exactly why UCPReady includes health-check tooling. Appreciate the continued independent validation — the flapper data alone is worth building alerting around.