There is a specific kind of confidence that comes from looking at AI-generated HTML in a chat window.
It looks clean. The structure makes sense. The copy is right. You've been staring at it for five minutes and you can't find anything wrong.
So you deploy it.
Then someone opens it on their phone and sends you a screenshot.
The heading is overflowing the screen. The CTA button is half-hidden behind the bottom navigation bar. The form fields are so small that tapping them is basically a game of precision. The layout that looked perfect on your 1440px monitor is completely broken on a 390px screen.
This has a name in software testing: plausible wrongness.
The output is technically correct. It passes a surface inspection. But it behaves wrong under real conditions.
Why AI-generated HTML fails on mobile
AI writes for the happy path.
When you prompt Claude or ChatGPT to build a landing page, it optimizes for what you asked for. Clean HTML. Good copy. Reasonable structure. It has no way of knowing what device your audience uses, what screen size your client will open the link on, or whether that hero section collapses gracefully at 375px.
It also has no skin in the game. It hands you the output and moves on. The consequences of plausible wrongness land on you.
Developers catch this early because the review loop is short. Generate, open in browser, resize window, check mobile, fix, repeat. The feedback is immediate.
For non-technical users the loop doesn't exist at all. They see the HTML in the chat. It looks right. They find a way to get it live. Someone opens it on a phone.
That's when they find out.
The review step that nobody builds in
Every conversation about AI-generated code focuses on two moments: generation and debugging.
Nobody talks about the moment between them.
Preview.
Not preview in the chat window. Not preview in a desktop browser. Preview that shows y
Top comments (1)
"Plausible wrongness" is exactly the term I've been missing — so thanks for that.
We've been running AI-generated mobile app pages through our QA pipeline, and the defect rate is honestly brutal. The visuals come out stunning every time, but the moment you start testing real interactions — scroll behavior, tap target accuracy, state transitions — it falls apart fast.
The worst part is how convincing it looks at first glance. It actually makes testing harder because you can't trust your eyes anymore. Every "looks fine" needs a second look.
Curious — have you noticed any specific interaction patterns where AI consistently gets mobile UX wrong across different models?