DEV Community

katyalai
katyalai

Posted on

I built a UCP conformance checker where every check has to prove it can catch its own bug

A conformance checker that says "yes" when the real answer is "no" is worse than having no checker at all. That one worry shaped a small open-source side project I've been building for UCP (the Universal Commerce Protocol) — the open, agentic-commerce standard for letting AI agents discover products and run checkouts with merchants.

This is an unofficial, independent project. It's early, it doesn't cover everything yet, and it never claims a server is "certified." I'm sharing it mostly because the idea behind it — making each check prove it can fail — turned out to be more useful than I expected, and I'd genuinely like feedback (including "you got this wrong").

The worry: checks that can't fail

Most quick conformance checks boil down to "got a 200, looks fine." A check that never fails when the server is actually broken isn't a check — it's decoration, and it's dangerous because it hands you false confidence.

So I tried to hold the tool to one rule:

No check ships until I've proven it fails when the server is wrong.

How each check earns trust

Every check is anchored to something I didn't write myself:

  1. Kill-rate testing. For each check, I inject the specific defect it's meant to catch — drop a required field, flip a status code, corrupt the body. If the check still passes, it's a false-pass hazard and it's blocked from release. A check only ships if it catches its own injected bug and passes cleanly on a known-good server.
  2. The official schema validator as the oracle. Rather than hand-rolling JSON-Schema logic (a classic source of subtle divergence), it shells out to the official ucp-schema validator, so payloads are judged against the spec's own schemas — not my interpretation of them.
  3. Spec citations. Each check points at a specific normative clause in the pinned spec, so a result is traceable rather than "trust me."

The whole suite also tests itself in CI — it goes red if any check loses its ability to catch the defect it's for.

What it turned up (with the caveat that I might be missing context)

Pointed at real implementations, a few things stood out. I'm framing these as "here's what I observed," not gotchas:

  • The official Node.js reference sample appears to serve capabilities as a JSON array and services.<name> as an object, where the pinned 2026 profile schema seems to require a keyed object and an array, respectively. The Python reference server and a live production Shopify store both use the schema-shaped forms, which is what made me think it's a real deviation rather than spec ambiguity — but I filed it upstream with a repro in case I've misread something.
  • A few reference gaps it flags rather than silently passing (e.g. error bodies using {detail, code} vs the spec's fuller envelope; a version-negotiation status-code difference between the spec and the official test suite).

None of this is a knock on the UCP project — the spec is genuinely good and the samples are useful. Surfacing drift like this is exactly what a conformance tool is for.

Trying it

pip install spck-conformance
spck-conformance --server https://your-store.example.com --init merchant.json
spck-conformance --server https://your-store.example.com --config merchant.json
Enter fullscreen mode Exit fullscreen mode

Or paste a store URL at spck.dev/check for an instant discovery + profile check (nothing to install). Or wire it into CI:

- uses: vishkaty/ucp-conformance@main
  with: { server: https://your-store.example.com }
Enter fullscreen mode Exit fullscreen mode

It's capability-adaptive (only runs checks for what your server actually declares), reports not-tested honestly instead of silently passing, and shows expected requirement vs your actual response for anything that deviates.

Source, methodology, and the self-test harness are all in the open: github.com/vishkaty/ucp-conformance.

If you're working with UCP and something here looks wrong — especially the reference-sample findings — I'd really like to hear it.

Top comments (0)