6 Part series on how UI is changing with GenAI; Day 0: The Chat Box Era and Its Limits

#ai #llm #ui #ux

This is Day 0 of my 6-part series on how LLMs rewrote the user interface over the past year — from plain chat boxes to agents that render their own UI.

Where it all started

Every LLM product launched the same way: a text box, a send button, and a stream of bubbles. It made sense — chat is the lowest-friction wrapper you can put around a language model. Type words in, get words out. No onboarding, no manual, no learning curve. Your grandmother and your CTO use the exact same interface.

That simplicity is exactly why ChatGPT became the fastest-growing consumer product in history. And for simple Q&A, chat still works. But somewhere along the way we started asking these models to do real work — analyze data, build dashboards, refactor codebases, manage multi-step workflows across tools — and we kept forcing all of that through the same lossy format: a chat bubble.

Think about what we actually ask agents to do today. "Compare these three vendors and book a demo with the best one." "Find the latency regression in last week's deploys." "Refund this customer and update the ticket." None of these are conversations. They're tasks — with state, intermediate steps, structured inputs, and results that beg for visual representation. We were running task software through a messaging metaphor.

The cracks appear

The first signs that pure chat wasn't enough came from the model vendors themselves. Watch the feature releases from mid-2024 onward and you can see them quietly admitting the limitation, one patch at a time.

Thinking mode showed up when reasoning models arrived — suddenly the UI had to display how the model reasoned, not just its answer. A transcript of bubbles had no good place for that, so we got collapsible reasoning panels bolted onto chat. It was the first time the interface had to represent process, not just output.

Artifacts (Claude, June 2024) was the bigger break. Instead of code scrolling past in a bubble, output became a standalone object in a split-screen panel — rendered live, iterable, and independent of the conversation. Ask for a calculator and you get a working calculator, not a code block to copy-paste somewhere else. The artifact persists while the conversation moves on, which sounds minor but is a fundamental change: output stopped being part of the transcript and became a thing you work on.

ChatGPT Canvas followed in October 2024 with a collaborative document editor alongside the chat. Its killer feature was targeted editing: highlight a paragraph or a function, say "make this shorter" or "fix the bug here," and only that selection changes. That's not chat anymore — that's a document editor with a language model inside it.

Then came Projects, Memory, and Skills — all attempts to give a stateless chat transcript the things real applications have had forever: persistence across sessions, user context, and reusable behavior. By late 2025, the "chat app" had quietly accumulated a sidebar, a file system, a settings panel, split panes, and background tasks.

Each feature is an admission of the same fact: the chat box was never the destination. It was scaffolding.

Why chat structurally fails for agents

Once agents started doing things instead of just answering, three structural problems became impossible to ignore.

1. Chat is append-only; agent work needs random access. A conversation is a sequential log. That's fine for dialogue, terrible for operations. When an agent runs for 40 minutes across 30 tool calls, "what happened between minute 30 and 45?" shouldn't require scrolling through everything in between. You want to jump to any event, see memory state alongside tool calls, and operate on steps in batches. A transcript can't do any of that; a timeline view can. We solved this problem for application logs a decade ago — then reintroduced it by putting agents in chat windows.

2. Text is a lossy output format. If a human engineer spots a latency spike, they don't write you a paragraph about it — they show you a line chart. Forcing structured data through prose is a bottleneck in both directions. On output: describing a table row by row instead of rendering the table. On input: asking five questions one at a time instead of presenting a form with five fields. Every round trip through natural language adds latency, ambiguity, and the chance of misunderstanding. The data was structured the whole time; we serialized it to English and asked the user to parse it back.

3. Review becomes practically impossible. Tool arguments — the actual instructions an agent sends to your systems — scroll past in a format that actively discourages reading. There's a now-infamous report of a coding agent deleting a production database along with 2.5 years of snapshots. Everyone blamed the model. I'd argue the interface failed first: the user had no efficient way to see what was about to execute, in a format designed for skimming approval. "I pasted a wall of text and the user said OK" is not informed consent — it's a UX failure with a blast radius.

There's a fourth, softer problem worth naming: navigation. Long chats become archaeology. Users describe digging through old conversations for that one answer from three weeks ago. Conversations are where information goes to die — there's no structure, no hierarchy, no map. That pain is what's driving the canvas-and-branching interfaces I'll cover in Day 6.

The shift that defines this series

Here's the one-line summary of the past year: agents stopped describing and started showing.

Ask for a table, get a table — not a paragraph describing one. Ask for a booking flow, get a form with a date picker — not a numbered list of questions. Ask for a comparison, get an interactive chart you can filter. The interface itself is increasingly drawn by the agent, in real time, shaped by what you actually asked for.

To make it concrete, here's the difference in practice. The chat-era interaction:

Agent: I found 3 flights. Option 1: AA 1042, departs 6:15 AM, $284. Option 2: UA 338, departs 9:40 AM, $311. Option 3: DL 90, departs 1:05 PM, $267. Which would you like? Please also confirm seat preference and whether you want to add bags.

The generative UI interaction: the agent renders a card list with the three flights, sortable by price or time, with seat and baggage selectors inline and a single confirm button. Same model, same data, same task. One of these is software; the other is a transcript pretending to be software.

That shift didn't happen all at once. It arrived in three architectural generations, and the differences between them matter more than most teams realize:

Static generative UI — the agent picks from your predefined components and fills them with data. The frontend keeps high control; the agent has low freedom. This is the AG-UI pattern.
Declarative generative UI — the agent returns a structured UI spec (cards, lists, forms) that your frontend renders with its own styling and constraints. Shared control. This is Google's A2UI / Open-JSON-UI territory.
Open-ended generative UI — the agent ships an entire interactive surface and your frontend mostly hosts it, sandboxed. Low control, high freedom. This is what MCP Apps enable.

Control versus freedom is the axis the whole field now organizes around, and picking the wrong point on it is the new architectural mistake.

Alongside all of this, an entire "vibe coding" toolchain — v0, Lovable, Bolt, Figma Make — changed who gets to build frontends at all. Karpathy coined the term in early 2025; by year's end it was Collins Dictionary's Word of the Year, Lovable had hit $20M ARR in two months, and designers were shipping production React. That story, and what it means for those of us who write frontend code for a living, is where I'll start.