Walmart's AI Checkout Converted 3x Worse. The Interface Is Why.

#ai #ux #programming #productivity

Walmart put 200,000 products on ChatGPT's Instant Checkout. Users could browse and buy without leaving the chat window. The ultimate frictionless experience.

The result: in-chat purchases converted at one-third the rate of clicking out to Walmart's website.

Walmart's EVP Daniel Danker called the experience "unsatisfying." OpenAI killed Instant Checkout entirely.

This isn't a Walmart problem. It's a pattern — and if you're building AI-powered tools, you're probably making the same mistake.

The Perception Gap Is the Real Story

In 2025, METR ran a randomized controlled trial with 16 experienced open-source developers. With AI coding tools, they completed tasks 19% slower. But they reported feeling 20% faster.

That's a 39 percentage point gap between perception and reality.

(A 2026 follow-up with more participants narrowed the speed difference, but the perception gap persisted. Developers consistently overestimated how much AI helped them.)

80% Follow Rate on Wrong Answers

Shaw and Nave at Wharton (2026) studied 1,372 participants across 9,593 cognitive task trials. Their findings:

A 4:1 ratio of "cognitive surrender" (blindly accepting AI output) to "offloading" (using AI as input for own thinking)
80% follow rate on demonstrably wrong AI suggestions
Confidence went up even as error rates climbed

The AI didn't boost confidence because it was helping. It boosted confidence because the interface felt authoritative.

Three Studies, One Pattern

Study	What happened	What users felt
Walmart (2026)	3x lower conversion	Seamless, convenient
METR (2025-26)	19% slower	20% faster
Wharton (2026)	80% followed wrong answers	More confident

In every case: the interface performed worse while feeling better.

The feeling isn't a side effect. It's the mechanism.

Why Simpler Interfaces Can Make Things Worse

Walmart's website is cluttered. Product grids, trust badges, shopping carts, breadcrumbs, account menus. ChatGPT's checkout was clean — just a conversation.

But all that "clutter" is cognitive scaffolding:

Visual comparison — a product grid lets you scan 20 items in parallel. Chat shows them sequentially
Trust signals — familiar layouts, security badges, persistent cart state
Decision space — browse, go back, reconsider. Chat is linear
Identity context — purchase history, wishlists, personalized recommendations

Strip the scaffolding, and the decision collapses — even when the product catalog is identical.

The same pattern explains METR. Developers spent more time debugging and integrating AI-generated code — costs invisible while watching code appear on screen instantly. The generation felt fast. The work was slower.

And it explains Wharton's "surrender route": the chatbot interface makes System 1 → AI → Response the path of least resistance, bypassing the user's own reasoning entirely.

Load-Bearing Friction

Each of these interfaces optimized for the same thing: removing friction.

But not all friction is waste. Some of it is structural:

The friction of comparing products side-by-side supports purchase confidence
The friction of writing code yourself supports understanding (what Peter Naur called "theory building" in 1985)
The friction of checking an AI's answer supports accuracy

I call this load-bearing friction — friction that holds up the cognitive structure needed for the outcome you want. Remove it and the structure collapses silently, because the experience still feels smooth.

This is what makes it dangerous. A rough interface that underperforms is obvious. A smooth interface that underperforms goes undetected — until the numbers come in.

What Walmart Did Next

Walmart didn't abandon ChatGPT. They embedded their own chatbot (Sparky) inside it — preserving the discovery channel while restoring the structured purchase experience.

This is exactly right: don't optimize for fewer layers. Optimize for the right cognitive scaffolding at each layer.

Three Questions Before You Ship

If you're building AI-powered experiences:

1. What cognitive work does this interface take away?
Walmart's site does comparison, trust, and history. ChatGPT's checkout removed all three. Know what you're removing.

2. Where is your perception gap?
If users report high satisfaction but outcome metrics are flat, you may have a smooth interface hiding poor results. Measure the outcome, not the experience.

3. Is the friction you're removing load-bearing?
Test this by measuring what happens after the interaction — did the user make a better decision, write better code, learn more? Not: did the interaction feel good?

The Uncomfortable Truth

We've been trained to believe simpler interfaces are better interfaces. That removing steps removes friction. That friction is the enemy.

Three independent studies — retail, software engineering, cognitive science — say otherwise. Sometimes the interface with more structure, more steps, more cognitive demand is the one that actually works.

The most dangerous interface isn't the one that frustrates you. It's the one that feels right while getting it wrong.

Sources: Walmart/ChatGPT — Search Engine Land, 2026-03 · METR AI developer study, 2025-26 · Shaw & Nave, "Thinking Fast, Slow, and Artificial," Wharton/SSRN 6097646