DEV Community: Victor

Agent UX is not chatbot UX, and most teams in 2026 ship them as if they were

Victor — Fri, 15 May 2026 10:02:03 +0000

Chatbots respond. Agents act. 2026 is the year agent products went from research demos to mainstream shipping. Cursor 3, v0, Manus, Devin, Claude's Managed Agents, ChatGPT's agent mode, and GitHub Copilot's agent mode all reached general availability inside twelve months, and most of them launched with the chat-thread UX that worked when the AI only produced text. The pattern shows up in every adoption review I sit in: the chat interface that worked for Q&A breaks the moment the AI starts taking actions on the user's behalf.

The reason it breaks is straightforward. A user can read a streaming chat response, decide it's wrong, and move on; nothing happened that they need to undo. An agent operating on the user's behalf doesn't grant that luxury. Once the email reaches the inbox or the deploy hits production, no amount of disagreement scrolls it back. Every design problem unique to agent UX is downstream of that single asymmetry between text and action, and a chat thread is the wrong shape for managing it.

What "agent" actually means in interface terms

A chatbot's output is text. An agent's output is a state change in the world: a sent email, a CI pipeline run, a modified file in production. That distinction reshapes the entire surface the user is interacting with. The user has to see what's about to happen and be able to stop it before the action fires, and has to be able to come back hours or days later and reconstruct what already happened without scrolling through a conversation to do it.

I made the broader case a few weeks ago that most AI features should not be chatbots. Chatbot UX is built around understanding the response. What agent UX demands is something different: a fast, legible way for the user to grant consent before action is taken, and a record of what happened after.

Three design problems that chatbot UX can't solve

Pre-action previews before the agent acts

The pattern that works for irreversible actions is a pre-action preview: the agent describes the steps it's about to take, the user confirms or edits, then the agent executes. This is the oldest pattern in ai agent ui and the one with the most shipping examples.

Cursor's apply-edits flow is the cleanest version in market. The diff appears, the user accepts or rejects, the file changes. v0's deploy confirmation does the same thing for production deployments. Vercel formalized the same pattern at the platform level with claim deployments, which lets an agent deploy a project and explicitly hand over ownership to a human reviewer. Each of these compresses the consent step to a few seconds of friction in exchange for catching the irreversible mistake before it happens.

The harder design problem, once you have committed to previews, is making them feel like fluid interaction rather than a friction gate. Cursor's diff display is fast enough that approval feels native, not interrupting. Most teams ship slower previews that read as friction even when the friction is necessary. The difference is measured in milliseconds.

Multi-step plan editing

Agents usually need three or more actions to finish a task. The design problem is showing the full plan as a structured object the user can inspect, edit, and approve as a unit, rather than as a stream of messages buried inside a thread. These AI agent patterns are already established: a plan panel separate from the conversation, steps rendered as a checklist, each step editable, one approval button to execute the plan. This is what Devin and Manus both do, and what Cursor's Plan Mode added in early 2026.

The reason it works is unsurprising. Plans buried inside chat messages force users to scroll up and reconstruct the agent's intent, and most users don't bother, which means most users approve plans they haven't actually read. In agent rollouts I have watched at Fuselab over the past year, surfacing the plan as a discrete, editable object is consistently the change that moves approval from a rubber stamp to a real review.

Long-running tasks the user walked away from

A chatbot finishes responding in seconds. An agent can run for minutes or hours: a multi-document research task, a long refactor, a CI pipeline, an overnight deployment. The user closes the tab. When they come back, where does the result live? This is the problem most agent products in 2026 have not solved well, and the biggest gap between what users expect and what teams ship.

The wrong pattern is the one most teams ship: the result appears in the chat thread, scrolled away from view, with no notification surface and no separate task list. The user has to remember they started a task, find the right thread, and scroll to see the result. Three days later they have forgotten about it entirely, and the agent's work is invisible to the person who asked for it.

The pattern that works is a separate task surface. Not the chat thread. A dedicated view where running, completed, and failed agent tasks appear as cards with their own status indicators and audit trails. Notifications surface the moment a task completes. The user can return to any task without remembering which chat thread it lived in.

Two products shipped serious versions of this surface in April and May 2026. Cursor 3's Agents Window, released April 2, gives a single dashboard for parallel agents running locally, on remote machines, in the cloud, and from Slack. Claude Code's Agent View, released May 11, lists every running, blocked, and completed session in one screen. Both replace "the agent's work lives inside a chat thread you have to find" with "the agent's work lives on a surface designed to be returned to." Anthropic also formalized the underlying async pattern with Managed Agents, which runs agent tasks on Anthropic's infrastructure regardless of whether the user's machine is on.

The chat thread is for conversation. The task surface is for state. Teams still treating long-running agents as just another chat message are shipping the version of agent UX that quietly loses users in the second week.

Where the magic actually shows up in agent UX

The best agent products in 2026 do not feel boring. Cursor, v0, and Claude's Computer Use demos all feel magical, and the thing that makes them feel that way is not the absence of approval gates and audit history. It's the opposite. The magic is in the speed and precision of the approval gates themselves.

What this looks like in practice is the user granting consent and seeing the result in close to the same gesture. Compressing a necessary friction step down to a few hundred milliseconds is harder design work than removing the friction altogether, and it's the part of agent UX we work through with clients in our agent UI design practice at Fuselab.

Teams that skip the approval gates entirely don't ship magic. They ship products that get unplugged the first time the agent does something the user didn't expect.

Closing

The teams shipping the best agent UX in 2026 are the ones who stopped trying to make the agent feel like a person and started making it feel like a tool that's accountable to the user. The chat box is fine for asking. It is not fine for acting. What's the agent product you've used that genuinely felt like a tool rather than a chatbot in disguise? I'm curious which ones got the approval flow right.

I'm a designer at Fuselab Creative, working on dashboards and AI interfaces for healthcare and enterprise clients. More writing at fuselabcreative.com.

Five chatbot UI mistakes that quietly break user trust

Victor — Thu, 07 May 2026 08:10:03 +0000

Last week I argued that most AI features should not be chatbots. But once you have decided chat is genuinely the right shape for the problem, a second category of failure is waiting. Five chatbot UI design mistakes show up in almost every product I audit, and each one quietly erodes user trust before the team realizes anything is wrong. None get caught in usability testing. All of them show up in the second-week analytics.

What ties the five together is the same root cause: each one hides something the user needs in order to assess what the model is doing. Trust in a chatbot is built or broken in the small affordances that make model state legible.

Mistake 1: The model picks the wrong reference and the user has no way to tell

The user asks "and what about the second one?" The model picks the wrong "second one" from three turns ago. The user has no idea whether the model misread them or whether their question was ambiguous, and the two interpretations have completely different recovery paths. One says "rephrase your question." The other says "the model is unreliable on this kind of reference." Without a visual indicator of what the model is looking at, users default to the second interpretation every time, even when the first is correct.

The fix is a subtle anchor showing what the model is operating on. A connecting line back to the referenced turn, a quote-style highlight, or a small chip that says "responding about: [the earlier message]." Claude does a version of this with artifact references, where edits to a specific artifact get visually anchored to that artifact in the response. Most enterprise chatbots ship without anything equivalent, and the lack shows up as a confusing third-week bug report that engineering cannot reproduce because the model is technically responding correctly to its own interpretation of the user.

Mistake 2: One generic error toast for three completely different failures

Three things can fail when a chat sends a message: the API can time out, the model can refuse for content policy reasons, or the network can drop. The first and third are mostly platform-handled by 2026 with retry queues and offline drafts. Model refusals are the genuinely hard one because they aren't really errors. The user has to be coached toward a rephrasing the model will accept without being told they did something wrong, and without being given a roadmap to bypass the policy. Most chatbots collapse all three into one generic toast, which trains users that "errors mean the AI is broken" when often the right read is "I asked a question the model can't answer this way."

The fix is differentiated error states with recovery affordances that match the actual failure. The same modal shape is fine, but the messaging and the next action need to be specific to which kind of failure just happened. This is one of the cheapest improvements you can make to an AI chat interface, and it pays back in support tickets that never get filed and trust that doesn't get spent on the wrong failure mode.

Mistake 3: No way to fork the conversation from an earlier turn

Real thinking is non-linear. Every chatbot UI assumes it is.

A user is five turns into a useful conversation. They want to try a different angle on turn three's reply without losing what came after. Most chatbots embedded inside products force a binary choice: edit the earlier message and lose everything downstream, or start a new conversation and lose the prior context entirely. Neither matches how anyone actually thinks through a hard problem.

The pattern that works treats the conversation as a tree rather than a thread. The user can branch from any point, explore an alternative path in parallel, and come back to the original branch when they want to. ChatGPT shipped this in late 2025, Claude has had edit-and-branch on user turns for longer, and Gemini AI Studio supports it in its developer interface. The frontier labs have caught up. The chat embedded inside your bank app, your CRM, your CMS, and your support portal has not.

Most teams skip it because the data model is harder. A linear chat log is one table; a branchable conversation is a tree with parent pointers and a way to traverse it. Most product teams do not budget for the second one because no one in the room is asking for it on day one.

The reason users want it once they have it: they were already trying to think, and thinking branches. The first time someone forks a chat to explore three alternatives in parallel without losing the original, they stop tolerating single-thread chats in tools where the conversation actually matters. It is the affordance that quietly raises the floor for what counts as a competent chatbot ui design.

Mistake 4: Citations rendered in ways that hide or bury the sources

When a model claims something factual, the source matters, and the way the source is rendered determines whether anyone actually reads it. Inline footnotes hurt readability and break the conversational rhythm. Separate citation panels create a worse problem: the claim and its source are visually disconnected, so users mentally treat them as different things and stop cross-referencing. Tooltip-only citations are the worst pattern of the three, especially on mobile where they often don't trigger at all, and on desktop they make side-by-side comparison impossible because the source vanishes the moment the cursor moves. A 2025 study from the University of Tennessee and the University of Oklahoma ran 394 participants through four citation interface conditions and found that high-visibility designs dramatically increased source-hovering, but click-through to the underlying material stayed low across every condition. Visibility matters, but visibility alone does not make people verify.

The pattern that works in production is a persistent inline indicator, a small numbered marker placed at the relevant claim, that expands on hover or tap into a source preview, with one click through to the full document. Perplexity does the strongest version. Granola does an interesting variant that quotes the specific transcript segment behind each summary point, which makes verification almost free. Most enterprise chatbots either over-cite (every sentence has a footnote, and the response becomes unreadable) or under-cite (one block of links at the bottom that nobody reads). The middle ground requires deciding which claims actually carry citation weight, and that is a writing decision more than a chatbot UI design decision.

Mistake 5: No way to regenerate with different context

The user gets a response that is 90% right and missed one important constraint. The only options most chatbots offer are to retype the entire question with the constraint added, or live with the wrong answer. Both are bad. Retyping is friction; living with the wrong answer compounds, because the user remembers it the next time the chatbot asks them to verify anything.

The fix is a "regenerate with..." control that lets the user adjust the context of an answer without retyping the whole question. The most common case is adding a constraint the user forgot to mention. The pattern looks like this:

// Common pattern: only "regenerate" with no context change
[Regenerate]

// Better pattern: regenerate with structured context add
[Regenerate with...]
  └ Make it shorter
  └ Use only sources from 2025+
  └ Edit original prompt
  └ Custom instruction...

This is the single most consequential affordance most chatbots are still missing. Users who encounter it in one product carry the expectation to every other AI tool they touch, and chatbots without it start to feel under-built. Most teams skip it because "regenerate" is a thirty-minute feature and "regenerate with..." is a week of structured-input design with new state to manage. Users notice immediately because they were already going to retype the question; the control just saves them the effort.

What the five mistakes share

Every one of them hides something the user needs to assess what the model is doing: the context it is referencing, the failure mode it just hit, the branches available from any given turn, the sources behind a factual claim, the path back from a near-miss answer. Trust in a chatbot is built or broken in the small affordances of chat UX that make model state legible, not in the model itself.

What is the chat UI you have used that handles these well? Drop it in the comments. I am keeping a list.

I am a designer at Fuselab Creative, working on dashboards and AI interfaces for healthcare and enterprise clients. More writing at fuselabcreative.com.

Most AI features should not be chatbots

Victor — Thu, 30 Apr 2026 11:18:16 +0000

I keep watching teams ship AI features inside a chat interface for tasks that already had perfectly good forms, then spend the next quarter trying to figure out why nobody used the AI feature. For any feature where the input shape or the output shape is known in advance, chat is the wrong choice, and the cost shows up as low usage three weeks after launch.

Why chat became the default

Chat became the default because it's the path of least resistance for any team wrapping a language model, and I get why. The model takes text in and gives text out. A chat window is the thinnest UI you can build on top of that primitive, and it doesn't fight the shape of the underlying API. Most chatbot UI design starts there: a window, a text input, a thread of message bubbles. Every starter template ships with one. Every demo your CEO saw uses chat. Of course chat is what shipped.

Chat genuinely helps when the user does not yet know what they want, or when refining unstructured work where the freedom to wander matters more than precision. Stanford researchers recently ran a head-to-head comparison of generative interfaces against conversational ones, and chat actually held its own in casual how-to queries and basic prompts. There are real tasks chat fits well.

Designers have been making this case for years. Amelia Wattenberger's 2023 essay is the canonical version, and the argument has hardened into consensus among practitioners. Chat is the right interface for some AI features. The problem is treating it as the default for all of them. What I want to add is the developer-side rule.

Three scenarios where chat is the wrong choice

Scenario A: A complex form filled by AI

The first time I saw this fail in production was on an insurance application. The team had built a chatbot that asked for each field by name, interpreted the user's reply, and confirmed turn by turn. The flow looked clean in the demo. Once it shipped, completion rates dropped against the old web form. Geoffrey Litt has been arguing for years that LLMs should generate the UI a task actually needs, not force the user to describe their task in prose. We had done the opposite: taken a form that already worked, replaced it with chat, and asked the model to extract from sentences what it could have prefilled into fields.

We swapped it out for a generative form. The AI prefilled fields from the user's prior submissions and uploaded documents, the user edited inline. No conversation. Completion rates recovered. The output shape was known the entire time. It was a form. The fields had names. The conversation existed only because someone decided "AI feature" had to mean "talks to you."

// Wrong shape: chat
Bot:  "Let's start your application. What's your full legal name?"
User: "Jane Smith"
Bot:  "Got it. And your date of birth?"
User: "March 14, 1985"
Bot:  "Thanks. Now your current address..."

// Right shape: generative form
Name:     Jane Smith            from account
DOB:      March 14, 1985        from account
Address:  221B Baker St         from last claim, edit?
[Continue]

Scenario B: Querying a database

Within a sprint of launch, the chat interface we had built sat untouched. The eleven analysts on the team had defaulted back to the dashboard they already knew, and the non-technical operators we had hoped would use chat to query the billing dataset never showed up at all. Chat forced the analysts to type natural-language sentences to do what their existing dashboard filters did in two clicks. The right interface was a natural-language search bar above the existing dashboard, with results appearing as charts directly below it, not paragraphs in a chat thread.

The same Stanford team I cited earlier put a hard number on this. In data analysis tasks specifically, 93.8% of users preferred a generative interface that produced charts and tables over a chat interface that produced text. Most chatbot UI design hides this problem in the demo because demos are linear and clean. Real analyst work jumps between tabs and lives inside charts.

// Wrong shape: chat
Analyst: "Show me revenue by region last quarter"
Bot:     "Q3 revenue by region: NA $2.3M, EMEA $1.8M, APAC $1.1M.
          Want me to break this down further?"
Analyst: "Yes, by product line"
Bot:     "Sure, here is the breakdown..."

// Right shape: NL search above the dashboard
[Search bar above existing dashboard]
"Revenue by region last quarter, by product"
  charts render below, filters update, columns persist

Scenario C: Kicking off a workflow

A platform team I consulted with built a chatbot that let users start a multi-step approval process. Type the trigger. Confirm the parameters. Watch the bot narrate progress. The whole interaction was the bot reading back what the user had just said and then asking for confirmation in different words. The right interface was an action panel: one button to start, prefilled fields the user could edit, and a status timeline that showed progress without anyone having to "send a message" to find out.

The input shape was known, the output shape was known, and the trigger was already a button. Wrapping all of that in a conversation added latency, ambiguity, and a UX surface where the user could send free text the bot then had to parse and reject.

// Wrong shape: chat
User: "Start the Q3 budget review for marketing"
Bot:  "Got it. Who's the approver?"
User: "Sarah Chen"
Bot:  "Amount?"
User: "$45,000"
Bot:  "Confirming Q3 budget review, $45,000, Sarah Chen. Submit?"

// Right shape: action panel
[Start budget review]
  Period:    Q3 2026          (auto)
  Amount:    $45,000           (last quarter)
  Approver:  Sarah Chen        (team config)
  [Submit]   [Edit fields]
[Status timeline: Submitted -> In review -> Approved]

The decision rule

Every AI feature spec I write starts with two questions: does the input have a known shape? Does the output? If either answer is yes, the chat window is friction the user did not ask for.

When the input has a shape, use a command palette, a structured form, or a generative form the AI prefills. When the output has a shape, use the component that fits the shape: charts for data, action panels for workflows, code editors for code, calendars for scheduling. Chat earns its place when neither the input nor the output has a known shape, which is a real category. It is just a smaller one than the field acts like. The test takes ten seconds: sketch the feature on paper, then ask whether you could draw the input fields and the output components without the model in the picture. If you could, build that interface and put the model behind it. Most AI features have at least one known shape. Most AI features should not be chatbots.

The obvious counter is Cursor, which uses chat and works. But Cursor is not pure chat. Its chat handles intent expression while the file tree, diff view, and Apply button handle consumption. Chat as a thin intent layer on top of a structured surface works. Pure chat for both is the failure mode the three scenarios above describe.

Chatbot UI design is the most expensive interface to build well, because the model has to handle every edge case the UI would otherwise constrain. It is also the cheapest to ship badly. If a structured input would have worked, ship the structured input and let the AI fill it.

What's the worst chatbot you've used recently for something that should have been a button? Drop it in the comments. I'm always looking for new examples.

I'm a designer at Fuselab Creative, working on dashboards and AI interfaces for healthcare and enterprise clients. More writing at fuselabcreative.com.