DEV Community

Cover image for Google Built a Free Design Tool That Generates Production Code From a Sentence — Then Added Multiplayer
TechLogStack
TechLogStack

Posted on • Originally published at techlogstack.com on

Google Built a Free Design Tool That Generates Production Code From a Sentence — Then Added Multiplayer

  • 30 seconds — plain English sentence to complete mobile UI, live on stage at Google I/O 2025
  • $0 vs $15 — Google Stitch multiplayer vs Figma professional plan per editor per month
  • 3 input types — text prompt, reference image, annotated screenshot — processed simultaneously
  • 5 screens — simultaneous canvas rendering introduced in Stitch 2.0, March 2026
  • 350 free generations/month — standard tier; $20/month Pro for unlimited
  • 1M+ waitlist signups overnight after the I/O 2025 live demo

At Google I/O 2025, Sundar Pichai typed a one-sentence description of a mobile app and watched Google Stitch render a complete, multi-component UI in under 30 seconds. One click exported it as React code. Another exported it as an editable Figma file. Figma charges $15 per editor per month for collaborative design. Stitch does it free. A year later, Google added real-time multiplayer, a streaming design agent, and voice input — and the design industry started paying attention.


The Story

Google Stitch did not emerge from Google's internal R&D labs. It began with the early-2025 acquisition of Galileo AI — a startup that had built one of the first credible text-to-UI generators, capable of interpreting product descriptions and producing coherent interface layouts. Google acquired Galileo, rebranded it as Stitch, integrated it with Gemini 2.5 Pro (Google's multimodal model able to process text, images, audio, and video simultaneously and generate structured outputs across all of them), and launched it as a Google Labs experiment at I/O 2025. The Labs framing was deliberate — testing the market before committing to a full product. Over 1 million waitlist signups appeared overnight.


What 'Vibe Design' Actually Means

Stitch entered the vocabulary alongside "vibe coding" — describing software intent to an AI and refining the output iteratively rather than building from first principles. The skill shifts from pixel manipulation to intent specification. A founder who cannot use Figma can produce a working prototype in minutes. A product manager can test five layout variations in the time it would previously have taken to brief a designer on one.

The evolution from launch to I/O 2026 compressed ten months of user feedback into a clear product trajectory. The May 2025 version was single-screen only — one prompt, one screen, export. July 2025 added theme customisation and Figma export. December 2025 brought multi-screen Prototypes alongside Gemini 3 integration. March 19, 2026 was Stitch 2.0: infinite canvas, 5-screen simultaneous generation, voice input, and app-flow generation. A demo had become a workspace.

Problem

Design-to-Dev Handoff: The Productivity Black Hole

The traditional pipeline required designers to build components in Figma, annotate specs manually, and hand off to developers who re-implemented everything in code. Even with design tokens and component libraries, the gap between "designed" and "built" consumed weeks. For small teams and solo founders this gap was existential — they lacked either the design skill or the engineering skill to close it alone.


Cause

Multimodal Models Reached UI-Generation Quality

By early 2025, Gemini's multimodal capabilities had reached a threshold where they could reliably interpret both text descriptions and uploaded images of existing UIs, generating coherent layouts with appropriate component choices, spacing, and visual hierarchy. The Galileo acquisition gave Google a product layer that had already solved the prompt engineering, training data, and output format problems on top of that capability.


Solution

Stitch: Three Inputs, Gemini Core, Production-Grade Exports

Stitch accepted three input types simultaneously: natural language descriptions, uploaded reference images or screenshots, and annotated screenshots with modification notes. Gemini 2.5 Pro processed all three in a single context window. Export paths targeted real developer workflows: Figma files with editable layers and auto-layout, production-ready HTML/CSS, React components, and Vue code.


Result

I/O 2026: Streaming Agent + Multiplayer — Both Free

At I/O 2026, Google launched a streaming design agent that renders UI components onto the canvas in real time as a designer types or speaks — mid-generation course correction is possible before the generation finishes. Simultaneous multi-user editing was also added, directly matching Figma's flagship collaboration feature. Both are free. Figma's professional plan charges $15 per editor per month.


The Fix

The Technical Architecture: Gemini as the UI Design Engine

Stitch's core is not a purpose-built design model — it is Gemini 2.5 Pro with a specialised prompt engineering and output parsing layer on top. This explains both Stitch's strengths and its limitations. Stitch understands concepts like "glassmorphism," "material design," and "iOS Human Interface Guidelines" because Gemini was trained on documentation and examples of all of them. It generates production-quality React because Gemini understands React at a level that exceeds most specialised code generation models.

  • 30s — sentence to complete mobile UI including navigation, components, and colour palette
  • 3 inputs — text prompt, reference image, annotated screenshot — single Gemini context window
  • 5-screen — simultaneous canvas rendering in Stitch 2.0, March 2026
  • $0 vs $15 — Stitch multiplayer vs Figma professional plan per editor per month

The I/O 2026 streaming agent is an architectural change, not just a speed improvement. Previous versions were turn-based: submit a prompt, wait for completion, review, resubmit. The streaming model replaces this with continuous render — components appear on canvas as they are generated, layouts reflow before generation finishes. The practical difference is the ability to steer mid-generation: if a layout is heading in the wrong direction, a designer can interrupt and redirect before it finishes. Voice input, integrated since March 2026, works within this same loop.

// Turn-based vs streaming: the architectural difference in Stitch's I/O 2026 upgrade

// BEFORE (turn-based): designer sees nothing until fully done
async function generateUI_old(prompt) {
  const result = await stitch.generate(prompt); // blocking — full wait
  return result.screens; // [{ html, css, figmaLayers }]
}

// AFTER (streaming agent): real-time render + mid-generation steering
async function generateUI_streaming(prompt) {
  const stream = stitch.generateStream(prompt);

  // Components render onto canvas as they are generated
  stream.on('component', (component) => {
    canvas.renderPartial(component); // visible immediately — no waiting
  });

  // Designer can interrupt and redirect before generation finishes
  stream.on('layoutDecision', () => {
    const userFeedback = canvas.checkInterrupt();
    if (userFeedback) {
      stream.steer(userFeedback); // mid-generation course correction
    }
  });

  // Voice input works inline — spoken mid-generation, reflected immediately
  voiceInput.on('command', (cmd) => stream.steer(cmd));

  await stream.complete();
  return canvas.getCurrentState();
}
Enter fullscreen mode Exit fullscreen mode

The Galileo Acquisition Rationale

Google could have built Stitch from scratch using Gemini. It acquired Galileo instead because Galileo had already solved the hardest non-model problems: the prompt engineering approach that reliably produces coherent UIs, the output parser that converts model outputs into valid design tokens and component trees, and the UX model for iterative refinement. Rebuilding these would have taken months. The acquisition compressed that to days. Galileo's technology became the product layer; Gemini became the intelligence underneath it.

RLHF for UI quality: how Stitch reached 95% component rendering accuracy
Stitch's code export quality reached 95% accuracy (component rendering fidelity) in the March 2025 closed beta, up from ~70% in early estimates. The improvement came from RLHF — Reinforcement Learning from Human Feedback — applied specifically to UI generation quality. The beta involved 500+ partner users including Vercel developers who provided direct feedback on generated code quality and design accuracy. This domain-specific signal tuned Gemini's output for the criteria professional designers and developers actually cared about: component naming, layout accuracy, code cleanliness, and design system compatibility.

Feature timeline — launch to I/O 2026:

Date Update Key Feature Added
May 20, 2025 Google I/O Launch Single-screen generation, Figma export, HTML/CSS/React export
Jul–Aug 2025 Public beta Theme customisation, RTL language support
Dec 2025 Stitch 2.0 preview Prototypes (multi-screen flows), Gemini 3 integration
Mar 19, 2026 Stitch 2.0 GA Infinite canvas, 5-screen canvas, voice input, app-flow generation
May 20, 2026 I/O 2026 Streaming agent (real-time canvas render), multiplayer — both free

Architecture

Stitch's internal architecture has three distinct layers. The input layer processes multimodal inputs through Gemini 2.5 Pro — text prompts, reference images, and annotated screenshots are unified into a single context window. The generation layer produces an intermediate representation (an abstract, format-agnostic description of design intent — component hierarchy, spacing tokens, visual relationships — that can be translated into multiple output formats without losing design semantics) rather than raw HTML or Figma JSON directly. The export layer translates that IR into Figma-compatible JSON with proper component structure and auto-layout, production-grade React/HTML/CSS, and AI Studio integration configs.

Before Stitch: The Traditional Design-to-Development Pipeline

View interactive diagram on TechLogStack →

Interactive diagram available on TechLogStack (link above).

Google Stitch Architecture: Multimodal Input to Production Output

View interactive diagram on TechLogStack →

Interactive diagram available on TechLogStack (link above).


The Multiplayer Technical Challenge

Adding simultaneous multi-user editing to an AI-native canvas is harder than adding it to a traditional design tool. In Figma, multiplayer synchronises deterministic object operations with well-understood CRDT (Conflict-free Replicated Data Type — a data structure that allows multiple users to edit concurrently without conflicts, automatically merging changes) semantics. In Stitch, two users can simultaneously prompt the AI to modify the same canvas, producing non-deterministic outputs that may conflict visually. Google's implementation queues concurrent AI generation requests per canvas object and applies last-write-wins for AI-generated changes, while standard CRDT semantics apply for manual edits.

The design quality ceiling: what Stitch still can't do
Stitch's core limitation remains consistent across all reviews: generated designs are starting points, not finished products. The AI produces layouts with appropriate components and reasonable visual hierarchy, but professional polish — precise spacing, custom illustration integration, brand-specific typography choices, edge-case state design (empty states, error states, loading states) — still requires human design expertise. Stitch is strongest for exploration and prototyping; weakest for production-ready UI that needs to meet professional brand standards.


Lessons

  1. Acquiring a specialised AI startup accelerates a product category by months, not weeks. Google had the models (Gemini) but not the product layer (Galileo). Galileo had the product layer but not the model quality or distribution. The acquisition combined both instantly. Teams building in AI-adjacent product categories should evaluate whether acquiring specialised AI startups is faster than building the application layer from scratch on top of foundation models.

  2. Intermediate representation between AI generation and format-specific output is the architecture that makes multi-format export viable. Generating React directly loses Figma compatibility. Generating Figma directly loses code usability. An IR exports to both, and to future formats not yet defined.

  3. Free with generous limits is a viable disruption strategy when the underlying AI cost is subsidised. Google can offer Stitch free because Gemini API calls are already budgeted across Google's infrastructure at marginal cost. Figma cannot match free without destroying its revenue model. This asymmetry is the structural moat Stitch is building — not feature parity, but cost parity at zero.

  4. Build the complement-not-replace narrative from day one. Sarah Drasner's explicit framing of Stitch as a Figma complement — not replacement — reduced designer resistance and encouraged adoption among professional users. Fighting the dominant tool's ecosystem directly creates adversarial resistance. Complementing it creates adoption.

  5. Streaming generation (delivering AI outputs progressively as they are computed) changes the product experience more profoundly than speed improvements do. A 30-second generation showing nothing for 28 seconds feels slow. A 30-second generation showing components appearing in real time and allowing mid-stream steering feels like collaboration. Same underlying model, fundamentally different user experience.


Engineering Glossary

CRDT (Conflict-free Replicated Data Type) — a data structure designed for distributed systems that allows multiple users to edit the same data concurrently without conflicts, automatically merging changes. Used in Stitch's multiplayer for deterministic manual edits alongside non-deterministic AI-generated changes.

Gemini 2.5 Pro — Google's multimodal frontier model capable of processing text, images, audio, and code simultaneously. Stitch uses it as the core reasoning engine for interpreting design intent and generating UI outputs.

Intermediate representation (IR) — an abstract, format-agnostic description of design intent — component hierarchy, spacing tokens, visual relationships — that can be translated into multiple output formats (Figma JSON, React, HTML/CSS) without losing design semantics.

RLHF (Reinforcement Learning from Human Feedback) — a training technique where human evaluators rate model outputs, and those ratings are used to fine-tune the model toward preferred outputs. Used by Stitch to improve component rendering fidelity from ~70% to 95% accuracy.

Streaming generation — delivering AI outputs progressively as they are computed, rather than waiting for the full generation to complete before showing any output. Enables mid-generation steering and real-time canvas rendering.

Vibe design — the practice of describing interface intent to an AI and refining the output iteratively, rather than building pixel by pixel. The AI design equivalent of vibe coding.


This case is a plain-English retelling of publicly available engineering material.

Read the full case on TechLogStack →

(Interactive diagrams, source links, and the full reader experience)


TechLogStack — built at scale, broken in public, rebuilt by engineers.

Top comments (0)