AI-Native UI Components: Architecture and Grammar

#webdev #ai #design #ux

Your design system has a hidden assumption: there is exactly one actor at the keyboard. That assumption is about to break.

Two actors are about to share every interface you ship. The user, and the agent acting on their behalf. They will read the same screens, mutate the same fields, and sometimes try to act on the same component at the same instant. Almost no component library in production today was built for this.

Most attempts to fix it start at the wrong layer.

I started at the wrong layer too

The first time I sat down to design an AI-native dropdown, I reached for a gradient on the border. A small pop animation when the agent opened it. I told myself this was what would make the experience feel AI-native. It wasn’t. The gradient had no job. It was decoration looking for a problem.

The question I should have been asking was not what does AI-native look like. The question was what is AI-native. The visual could follow once the structure was clear.

Two actors need one interface, not two

The first real question is who owns the component’s state.

If the user and the agent each had their own way of acting on a dropdown, the dropdown would need to know who was driving and respond differently to each. Two APIs. Two code paths. Two sources of truth. Everything modern frameworks give you for free would have to be duplicated or bypassed. Accessibility. Focus management. Reactivity.

The answer is one interface. Both actors mutate the same state through the same path. The component renders the truth and does not ask who caused it.

That single move resolves more than it looks like. Frameworks stay happy. Screen readers stay accurate. Devtools can still trace every change. The agent stops being a special case in the codebase.

The user always wins

Two actors sharing one state surface raises an obvious worry. What happens when both act at the same moment?

The first instinct is to delay the agent. Give the user a window of priority. That instinct dissolves the moment you ask what the delay is actually solving. A delay does not resolve a conflict. It defers it. The user can still act inside the delay window, and the agent’s action lands at an unpredictable moment after the user has moved on.

The right model is asymmetric. The agent reads state before acting. If the user has begun something, the agent stands down. The user always wins. There is no symmetric lock that protects the agent from the user. The user can interrupt anything the agent is doing, on any component, at any moment.

This asymmetry is the design system saying: the user is the principal actor. The agent is a guest in their interface.

Architecture done. Visual layer still empty.

At this point the component is structurally correct. Two actors can share it. Simultaneity resolves toward the user. Frameworks and accessibility stay intact.

There is still a problem. When the agent opens a dropdown, the user has no way of knowing it happened. The dropdown just changes. This is the moment a visual treatment is supposed to do real work, and I still cannot say why.

The unlock came from thinking about voice assistants. Siri on iOS, Google Assistant on Android. The behavior is identical. I will use Siri as the example.

Siri could just talk. It has a voice, you have ears. And yet when Siri activates, the edges of the screen glow. Why?

It is proof of presence. An invisible actor anchored to a visible place.

It is state, not event. The glow persists as long as Siri is active. You can look up at any moment and orient.

It is channel separation. The glow frames the screen without taking it over. Siri does not need to interrupt your content to remind you it is there.

Each one maps directly to the dropdown. When the agent acts, the user’s attention is elsewhere. The component needs a way of saying an invisible actor is here, right now, doing something to this specific surface.

The gradient finally has a job. It is not decoration. It is the component’s voice.

When feedback becomes language

This is where the thinking turned.

Picture five dropdowns open on a screen. The user is hunting through them for an option they cannot find. They give up looking and ask the agent for help. The right dropdown glows. It scrolls to the matching option. The user’s eye is pulled to it.

Notice what just happened. The interface did not display the answer. It became the answer. The user asked a question. The interface itself replied. Not text in a chat panel, not a tooltip, not a toast. A surface in the room pointed at itself.

The visual treatment has stopped being feedback. It has become language. AI-native is when the interface becomes a language the agent speaks through.

This is the actual reframe. Everything else is grammar.

A language needs a grammar

Once you see the interface as a language, you start asking the questions a language asks.

What is the noun? The component the agent is acting on. The glow identifies which surface is speaking.

What is the verb? The kind of action the agent is performing. There are four. Locate finds something for the user. Suggest proposes content into an editable field. Execute acts on the user’s behalf. Observe reads state. Each verb carries its own visual treatment, so the user reads not just that something is happening, but what.

What about silence? This is the sharpest decision in the vocabulary. The agent only speaks visually when it acts on the user’s reality. Reading reality is silent. Observation does not glow.

If every agent action produced a visual, the visual would mean nothing. It would just be the agent’s permanent ambient presence on the screen. By making observation silent, the system has a real rule. One that disciplines every future decision. The moment someone wants to add a glow because the agent is thinking, the rule answers. Thinking is observing. Observing is silent.

AI-themed vs AI-native

The difference between AI-themed and AI-native is not a difference in polish. It is a difference in layer.

AI-themed bolts aesthetics onto traditional components. A border treatment. A sparkle icon. The components are still single-actor components dressed up to feel modern. When the agent and user collide, the component breaks. When the agent needs to communicate intent, it has no vocabulary to do so.

AI-native rebuilds the foundation. One state, one interface. Asymmetric locks favoring the user. A visual grammar with nouns, verbs, and a silence rule. The aesthetic emerges from the structure rather than getting bolted onto it.

Most of what I am building right now is downstream of this distinction. Not single-actor components with AI features. Components designed for two actors from the foundation up.

The interface stops being a thing the agent uses. It becomes a thing the agent speaks through.

Onu
onurag.me