Day 3: Generative UI Gen 2 — Declarative Specs with A2UI

#ai #frontend #llm #ui

Day 3: Generative UI Gen 2 — Declarative Specs with A2UI

This is Day 3 of my 6-part series on how LLMs rewrote the user interface over the past year. Day 2 covered static generative UI with AG-UI — and ended on its core limitation: the agent can only show what you've pre-built.

Moving up the freedom axis

Gen 1's contract was "pick from my components." Gen 2's contract is more interesting: "compose a UI from my primitives."

The agent doesn't return code, and it doesn't just pick a component — it returns a declarative description of an interface: a tree of abstract components (cards, rows, text, inputs) plus the data to fill them. Your client renders that description using its own native widgets. The agent decides structure; you still decide implementation, styling, and what's allowed.

This is the pattern behind A2UI (Agent-to-User Interface), the open protocol Google released in late 2025. It came out of a real constraint: Google needed agents to send rich UI across trust boundaries — a third-party agent rendering inside Gemini, say — where executing agent-generated code is a non-starter.

The core idea: UI as a JSONL stream

A2UI transmits UI as a stream of JSON Lines. Each line is one message; the client builds the interface incrementally as lines arrive — so the UI paints progressively, like streamed text, instead of popping in at the end.

Four server-to-client message types do all the work: surfaceUpdate (add/update components), dataModelUpdate (set data), beginRendering (signal first paint with the root id), and deleteSurface (remove a UI region).

Here's an actual minimal stream that renders a profile card:
Note: It is a bit long but very easy to follow.

{
    "surfaceUpdate": {
        "components": [
            {
                "id": "root",
                "component": {
                    "Column": {
                        "children": {
                            "explicitList": [
                                "profile_card"
                            ]
                        }
                    }
                }
            }
        ]
    }
}
{
    "surfaceUpdate": {
        "components": [
            {
                "id": "profile_card",
                "component": {
                    "Card": {
                        "child": "card_content"
                    }
                }
            }
        ]
    }
}
{
    "surfaceUpdate": {
        "components": [
            {
                "id": "card_content",
                "component": {
                    "Column": {
                        "children": {
                            "explicitList": [
                                "name_text",
                                "bio_text"
                            ]
                        }
                    }
                }
            }
        ]
    }
}
{
    "surfaceUpdate": {
        "components": [
            {
                "id": "name_text",
                "component": {
                    "Text": {
                        "usageHint": "h3",
                        "text": {
                            "literalString": "A2A Fan"
                        }
                    }
                }
            }
        ]
    }
}
{
    "surfaceUpdate": {
        "components": [
            {
                "id": "bio_text",
                "component": {
                    "Text": {
                        "text": {
                            "literalString": "Building beautiful apps from a single codebase."
                        }
                    }
                }
            }
        ]
    }
}
{
    "dataModelUpdate": {
        "contents": {}
    }
}
{
    "beginRendering": {
        "root": "root"
    }
}

Two design choices worth noticing:

It's an adjacency list, not a nested tree. Components reference children by id ("children": {"explicitList": [...]}) instead of nesting. That's deliberate: flat structures are easier for an LLM to generate correctly token-by-token, and easier to stream — the spec's first design requirement is literally "must be easily generated by a Transformer LLM."

Structure and state are decoupled. Components describe shape; the data model holds values. Update a price? Send a tiny dataModelUpdate — no need to resend the component tree. Bindings connect the two:

{
    "surfaceUpdate": {
        "components": [
            {
                "id": "price_text",
                "component": {
                    "Text": {
                        "text": {
                            "path": "/flight/price"
                        }
                    }
                }
            }
        ]
    }
}
{
    "dataModelUpdate": {
        "contents": {
            "flight": {
                "price": "$267"
            }
        }
    }
}

Interactions flow back as events

When the user taps a button, the client doesn't run agent code — it sends a userAction message back to the agent (error is the only other client-to-server type). The agent responds with new stream messages. The loop is: agent streams UI → user acts → event goes back → agent streams an update. The primary data stream stays unidirectional, which keeps the security story clean.

Catalogs: the trust boundary made explicit

The piece that makes A2UI deployable across platforms is the catalog — a client-defined contract listing which component types it can render (Card, Row, Text, DateTimeInput, …) and their properties. The agent can only reference types in the catalog. Anything else is rejected.

This is also the cross-platform answer Gen 1 lacked. The same JSONL stream renders as React components on web, Flutter widgets on mobile, SwiftUI views on iOS — each client maps abstract types to its own native widgets via its registry. Write the agent once; every surface renders it natively. There's a standard catalog per protocol version, and you can negotiate custom ones (the server advertises supported catalogs in its A2A Agent Card; the client declares what it accepts).

Security falls out of the design: A2UI is data, not code. No script execution, no UI injection beyond what your catalog allows, no iframe of mystery origin. For regulated industries, that sentence is the whole pitch.

Where Gen 2 sits in practice

Compared to Gen 1, you trade some safety guarantees for a lot of expressiveness:

	Gen 1 (AG-UI static)	Gen 2 (A2UI declarative)
Agent controls	Which component + props	Full UI structure from primitives
You control	Everything else	Catalog, rendering, styling
Novel layouts	No — prebuilt only	Yes — composed at runtime
Cross-platform	Re-implement per client	Same stream, native render everywhere
Failure mode	Weird data in a good component	Weird layout from good components

That last row is the honest caveat: when the agent composes structure, it can compose bad structure — a form with the submit button above the fields, a 14-component card where 4 would do. Your catalog constrains vocabulary, not taste. Teams shipping Gen 2 typically add layout linting or few-shot examples of good composition in the agent prompt.

The deeper limit: the agent is still confined to your primitives. It can't ship an interactive chart type you never built, or a custom visualization for a domain it just learned about. For that you need to hand the agent a real, sandboxed UI surface — which is Gen 3, MCP Apps, and tomorrow's post.

What's next

Day 4: Generative UI Gen 3 — MCP Apps and open-ended surfaces
Day 5: Beyond chat — canvas interfaces, adaptive UX, and the security bill coming due

If you want to explore A2UI today: the spec, quickstart, and a live composer are at a2ui.org, and the project is open source on GitHub. See you tomorrow.