Weili Gao for Innovation Process Technology AG (ipt)

Posted on May 28

Generative UI - The Future of Human-AI Interaction

#ui #frontend #ai #agents

This article was co-authored with my colleague

Benjamin Bürgisser

Hi, I'm Benjamin, a software engineer and IT architect based near Zurich, Switzerland. https://b-dimension.com/

TL;DR

Generative UI is the next evolution of AI interaction, moving beyond text to create dynamic, agent-driven user interfaces. We explored three key patterns - Static (AG-UI), Declarative (A2UI), and Open-ended UI - and used the CopilotKit framework to build two demos: a Tic-Tac-Toe assistant, and a 2D RPG with LangGraph NPC.

Why Generative UI?

Today’s AI agents primarily communicate through text. While revolutionary, this often forces humans to interact with a conversational interface, even for tasks that would be more efficient or intuitive within a traditional graphical user interface.

Generative UI is the paradigm shift that enables AI agents to dynamically generate the UI elements required to complete a task. Instead of the user describing what they see, the agent reads the UI state directly and acts through it - clicking buttons, moving characters, proposing decisions.

This shift blends the power of LLMs with familiar interactive components and is critical for implementing human-in-the-loop workflows, where the agent can propose an action but requires human confirmation via an interactive UI before execution. This ensures control and a better user experience.

The Three Core Patterns of Generative UI

Three core patterns exist for building Generative UI:

Static Generative UI (AG-UI): the agent generates a command (like a function call) for a predefined frontend tool. The UI for that tool is already built and the agent simply triggers it.
Declarative Generative UI (A2UI, Open-JSON-UI): the agent generates a structured data format (like JSON) that describes the UI it wants to render (e.g., a form, a chart, or a list). The frontend then interprets this data and renders the elements.
Open-ended Generative UI (MCP Apps): the agent generates raw, open-ended content, such as a block of custom HTML/CSS, which is then rendered directly in the UI. This is the most flexible approach but requires careful security measures.

The different patterns are not mutually exclusive and can be complementary, e.g. both AG-UI and A2UI are used by the CopilotKit framework for implementing Generative UI.

Key Generative UI Frameworks

The Generative UI approach is seeing rapid development from several frameworks:

CopilotKit: a comprehensive framework for building agent-driven interfaces.
Pillar: a simpler, headless framework focused on the core logic of agent tool routing.
Vercel AI SDK: provides foundational support for streaming and model integrations, and is used as the runtime for CopilotKit's built-in agent.

Introducing CopilotKit

We picked CopilotKit for building our two demos. It is an open-source (MIT) framework with 30k+ stars on GitHub that connects AI agents to your frontend. It is the company behind the AG-UI protocol.

Its core concepts are:

Readables: expose app state to the agent (board positions, inventories, distances).
Actions: let the agent do things in your UI (make a move, take an item, spawn a reward).
Chat UI: a themed sidebar or popup, with streaming and tool-call rendering, out of the box.

It supports React/Next.js and Angular, and works with agent backends like LangGraph or Microsoft Agent Framework. CopilotKit is partnered with Microsoft and Google for development on the AG-UI and A2UI protocols.

Our Demos

To gain hands-on experience with Generative UI, we have developed two demos with CopilotKit: a Tic-Tac-Toe Assistant and a 2D RPG. They demonstrate how a front-end application can expose its state and functionality (tools) to an agent, allowing the agent to perform actions that are visible and interactive for the user.

Demo 1: Tic-Tac-Toe Assistant

In our first demo we built a tic-tac-toe assistant that coaches the player at the immensely sophisticated game of tic-tac-toe. The agent can suggest moves (with an accept/reject dialog for human-in-the-loop), annotate the board with threats and opportunities, and render a custom "Coach Card" via generated HTML. The demo illustrates key Generative UI features by applying the patterns discussed above:

AI Move Suggestion (AG-UI + Human-in-the-Loop): The agent calls a predefined proposeNextMove tool, and the result is rendered as an "Accept / Reject" dialog (renderAndWaitForResponse), requiring user confirmation before the move is made.

Board Annotations (AG-UI): The agent calls an analyzeBoard tool, which returns data used to render overlay symbols (threats, opportunities) directly onto the existing UI components (Board.tsx/Square.tsx).

Coach Card (Open-ended Generative UI): The agent calls a renderCoachCard tool, which generates fully custom HTML for the "Coach Card" component, rendered using a secure mechanism (dangerouslySetInnerHTML with DOMPurify).

Shared Game State (Agent Context / Readable State): The useAgentContext hook is used to publish the current board state, player, and winner status, giving the agent a "readable state" of the application.

The application is structured as a Next.js app, with all frontend logic (tools, UI) in the browser, communicating with the backend API route (/api/copilotkit) which hosts the CopilotKit’s built-in agent and handles the model interaction (Azure OpenAI gpt-5-mini via @ai-sdk/azure).

This demo validated that exposing React state as readables and defining actions as tools is a clean, working pattern. The built-in agent made setup fast - readables and the instructions prop on the chat sidebar just worked.

Demo 2: 2D RPG with LangGraph NPC

For the second demo we wanted to go deeper: custom system prompts, richer agent logic, and a more complex scenario. We built a 2D RPG where the player walks around a medieval world collecting bananas, berries, and crystals. An NPC - powered by a LangGraph agent - acts as a quest giver. The agent can:

See the game state: positions, inventories, objects on the map, player–NPC distance.
Move the NPC through the world using A* pathfinding around obstacles.
Give quests dynamically based on what's available on the map. Take items from the player and reward coins for completed quests.

All of this runs through CopilotKit's readables and actions - the agent never touches the DOM directly; it just reads structured state and calls named functions.

A Quest in Three Acts

Nothing is scripted. The NPC reads the world state, decides what to ask for, and reacts to whatever the player actually does.

Act 1 - The Quest: We ask the NPC for a quest. It inspects the map and decides: "Bring me 3 crystals." No quest table - just the agent reasoning over game state.

Act 2 - The Pivot: Instead of crystals, we grab a banana and call the NPC over. It pathfinds around the brick walls to reach us. In a scripted game, offering the wrong item would be a dead end. Here, the agent goes off-script and reluctantly accepts the banana anyway.

Act 3 - The Reward: We ask for a reward. The agent decides a single coin is fair, spawns it next to the NPC, and we pick it up.

The NPC feels more real because we can freely talk to it and it makes decisions on the fly - exactly the kind of dynamic interaction that's hard to achieve with traditional game scripting.

Lessons Learned

Summary of what we've learned from the demos.

What Worked Well

The readable/action model is clean. You declare what the agent can see (useCopilotReadable) and do (useCopilotAction) right next to the React components that own the state. CopilotKit handles serialization, transport, and tool binding. It feels like a natural extension of React's component model - state flows down, actions flow up, and the agent plugs into that loop.
The chat UI saved real time. A themed sidebar with message history, streaming, and tool-call rendering - for free. We styled it to match our medieval parchment theme with a few CSS custom properties.
LangGraph integration is straightforward. Define a graph, export it, point the Next.js route at the deployment URL. CopilotKit's state annotation passes readables and actions through cleanly.

What Tripped Us Up

Readables don't auto-inject with custom agents. This was the biggest surprise. With the built-in agent (as in the Tic-Tac-Toe demo), useCopilotReadable values appear in the model's context automatically. With a custom LangGraph agent, they arrive in state.copilotkit.context - but you have to manually inject them into your system message. Actions worked immediately; readables silently did nothing until we traced the issue.
The instructions prop is ignored with custom agents. We set system instructions via CopilotSidebar's instructions field and they were silently dropped. This is a known open CopilotKit issue. The workaround: define instructions in the agent's system message directly.
React's async batching vs. rapid tool calls - when the agent fired two rapid calls (e.g. "take 1 berry" then "take 3 crystals"), the second read stale state because React hasn't re-rendered yet. We had to move from useEffect-based ref syncing to immediate ref updates - a useful pattern when bridging async agent actions with React state.

Is Using CopilotKit Worth It?

Couldn't we just append state to the user's message and define function-calling tools on our own? Yes - but CopilotKit gives you:

A structured protocol - AG-UI standardizes the agent–frontend connection so you don't reinvent the wheel.
A full chat framework - sidebar, streaming, theming. Non-trivial to build and maintain yourself.
A component-level API - readables and actions co-located with the components that own the state.

For a quick prototype, rolling your own is fine. For anything that will grow - multiple agents, generative UI patterns, shared state - a framework pays off.

Conclusion

Generative UI represents an exciting step forward in AI-powered applications, moving beyond simple text generation to dynamic, context-aware user interaction. Our experience with CopilotKit in developing the two demos validated the effectiveness of the AG-UI and open-ended patterns for creating engaging and powerful Human-in-the-Loop experiences. As frameworks mature, Generative UI is poised to become a core part of how developers integrate AI into modern web applications.