I build tools that help developers ship agentic commerce. Creator of UCP Checker and UCP Playground — independent, third-party tooling for testing and validating UCP implementations across stores/AI.
Thanks Mahima — schema drift is exactly what we're seeing. The same "add to cart" intent looks completely different across stacks: product_variant_id (Shopify), item.id (UCPReady), productId (custom). And the type enforcement varies too — one endpoint expects a string ID, another expects an integer, another wants an array of line item objects. We had a -32602 error in the dataset where the model passed line_items: "22" instead of an array — that's a schema description problem, not a model problem.
This is actually something we're tracking on the UCP Checker side too. We monitor domains continuously and get alerts when things drift — both at the static manifest level (the /.well-known/ucp declaration) and at runtime (tool schemas, response shapes, error handling). It happens more often than you'd expect, and it's rarely intentional.
"Strict schema adherence vs goal completion" is a great framing — we're already tracking tool call error rates per store in the Playground, so normalizing that into a conformance score per MCP endpoint is a natural next step. The session replay data is all there to build it from.
Appreciate you reading it properly — the replayable trace is the core of the whole thing.
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
Thanks Mahima — schema drift is exactly what we're seeing. The same "add to cart" intent looks completely different across stacks: product_variant_id (Shopify), item.id (UCPReady), productId (custom). And the type enforcement varies too — one endpoint expects a string ID, another expects an integer, another wants an array of line item objects. We had a -32602 error in the dataset where the model passed line_items: "22" instead of an array — that's a schema description problem, not a model problem.
This is actually something we're tracking on the UCP Checker side too. We monitor domains continuously and get alerts when things drift — both at the static manifest level (the /.well-known/ucp declaration) and at runtime (tool schemas, response shapes, error handling). It happens more often than you'd expect, and it's rarely intentional.
"Strict schema adherence vs goal completion" is a great framing — we're already tracking tool call error rates per store in the Playground, so normalizing that into a conformance score per MCP endpoint is a natural next step. The session replay data is all there to build it from.
Appreciate you reading it properly — the replayable trace is the core of the whole thing.