Manikant Kella

Posted on May 23

Google I/O 2026 quietly ended a 20-year-old web problem — meet the HTML-in-Canvas API

#devchallenge #googleiochallenge #webdev

Google I/O Writing Challenge Submission

This is a submission for the Google I/O Writing Challenge

TL;DR

Among the Gemini 3.5, Antigravity 2.0, and Android XR headlines from Google I/O 2026, Chrome's engineering team quietly shipped an origin trial for an API that resolves a tradeoff every web developer has lived with for two decades: DOM for rich, semantic, accessible UI, or Canvas for fast 3D and pixel-level graphics — pick one.

The new HTML-in-Canvas API lets you draw live DOM elements directly into a 2D canvas, a WebGL texture, or a WebGPU texture, while preserving accessibility, find-in-page, translation, dark mode, autofill, browser extensions, and DevTools inspection. The DOM elements remain real DOM elements — they just render inside the canvas.

That sentence sounds simple. It isn't. This is one of the most consequential changes to the web platform in years, and it shipped behind a one-paragraph bullet point on the dev keynote recap.

This post is the deep-dive that bullet point deserved.

The choice nobody wanted to make

If you've ever built anything ambitious in the browser, you've hit this wall.

Want a beautiful 3D product configurator with real text labels users can copy-paste? Either you live in the DOM and give up on 3D, or you live in WebGL and re-implement font layout, text selection, and accessibility from scratch.

Want a Figma-like canvas with rich form controls inside? You either go full DOM (slow once you have thousands of nodes) or you go canvas and accept that screen readers see a black hole where your UI used to be.

Want a WebXR scene that includes a tooltip the browser can actually translate to the user's language? Tough luck. Canvas is a pixel grid. The browser's translation engine doesn't speak in pixels.

This is the choice the Chrome team's official I/O 2026 blog post names directly:

"For years, web developers have had to make a tough architectural choice when building complex, highly-interactive visual applications on the web: do you lean on the DOM for its rich semantic features, or do you render directly to the <canvas> element for low-level graphics performance?"

The honest answer until I/O 2026 was: both choices were wrong, you picked the one that was less wrong for your app, and you spent the rest of the project apologizing for it.

Look at how the industry coped:

html2canvas / dom-to-image: thousands of GitHub stars, built around taking screenshots of the DOM and dumping them as static images. CORS-restricted. Video and SVG render blank. Slow on complex pages. The output is dead — no interactivity, no accessibility, no text selection.
Figma, Google Docs, Miro: all built bespoke text-layout engines in JavaScript on top of canvas because the DOM couldn't keep up at scale. Beautiful products. Multi-megabyte bundle sizes. Accessibility implementations that had to mirror everything into hidden DOM trees for screen readers.
WebGL frameworks like Three.js and PlayCanvas: an entire cottage industry of HUD libraries, in-scene UI systems, billboard text renderers — none of which inherited any of the DOM's accessibility, find-in-page, translation, or autofill behavior.

Every "solution" was a workaround for the same architectural sin: the browser had two rendering pipelines that didn't speak to each other.

What HTML-in-Canvas actually does

The pitch in one sentence: the <canvas> element can now have HTML children, and the canvas can paint them anywhere it wants, while the browser keeps treating them as real DOM elements.

You don't snapshot the DOM. You don't proxy events. The element is the element — selectable, focusable, accessible, find-in-page-able, extension-friendly, autofilled-by-Chrome — and your canvas is just drawing it at the coordinates you want, with the transform you want, in the texture you want.

Three rendering paths ship at once:

2D Canvas — ctx.drawElementImage(element, x, y) returns a CSS transform you apply back to the DOM element so events line up.
WebGL — gl.texElementImage2D(target, level, internalFormat, format, type, element) works like texImage2D but accepts a DOM element as the source. The element ends up as a real WebGL texture you can map onto any geometry.
WebGPU — device.queue.copyElementImageToTexture(element, { texture }) does the equivalent for WebGPU.

All three keep the DOM element alive underneath the rendered output. Click in the right spot and the real <button> gets the click. Press Cmd+F and your text gets highlighted. Switch Chrome to translate the page and the labels in your 3D scene change language. Enable a dark-mode extension and your in-canvas form follows along.

The smallest possible demo

Set the layoutsubtree attribute on a <canvas> and put real HTML inside it:

<canvas id="canvas" style="width: 200px; height: 200px;" layoutsubtree>
  <div id="form_element">
    <label for="name">Name:</label>
    <input id="name" type="text" />
  </div>
</canvas>

Then on every paint, draw the element and re-sync its transform so click/focus targets stay correct:

const ctx = document.getElementById("canvas").getContext("2d");
const form_element = document.getElementById("form_element");
const canvas = document.getElementById("canvas");

canvas.onpaint = () => {
  ctx.reset();

  // Draw the form element at (0, 0) inside the canvas
  const transform = ctx.drawElementImage(form_element, 0, 0);

  // Keep the underlying DOM hit-test region aligned with the painted output
  form_element.style.transform = transform.toString();
};

That's the entire API surface for 2D mode. A real <input> lives inside a real <canvas>. Autofill works. Tab order works. Screen readers find it. Find-in-page finds it. None of those sentences were true on the web a month ago.

The layoutsubtree attribute is the key signal — it tells the browser "this canvas has DOM children that are alive; lay them out, expose them to the a11y tree, but let me decide where their pixels go."

The WebGL path — where it gets fun

The really interesting use case isn't 2D — it's mapping live DOM onto 3D meshes. Here's the WebGL primitive:

canvas.onpaint = () => {
  if (gl.texElementImage2D) {
    gl.texElementImage2D(
      gl.TEXTURE_2D,
      0,
      gl.RGBA,
      gl.RGBA,
      gl.UNSIGNED_BYTE,
      form_element  // <-- this is a DOM element, not an HTMLImageElement
    );
  }
};

If you've written WebGL before, look at that line again. The texture source is a <div>. You can put it on a cube, a sphere, a 3D book page, a curved billboard. The text stays selectable. Find-in-page still highlights it on the 3D mesh. The browser's translation engine can localize it in place.

There's a price: getting events to line up in 3D space is harder than in 2D. The on-screen position of a textured element depends on your shader's model-view-projection matrix, and the browser can't deduce it from your draw call. So when you need hit-testing to follow a 3D-transformed element, you compute a DOM matrix that maps clip-space back to pixel-space and hand it to canvas.getElementTransform(element, screenSpaceTransform):

if (canvas.getElementTransform) {
  // 1. WebGL MVP matrix → DOM matrix
  const mvpDOM = new DOMMatrix(Array.from(htmlElementMVP));

  // 2. Normalize the HTML element (pixel size → 1x1 unit square)
  const width = targetHTMLElement.offsetWidth;
  const height = targetHTMLElement.offsetHeight;
  const cssToUnitSpace = new DOMMatrix()
    .scale(1 / width, -1 / height, 1)        // shrink + flip Y
    .translate(-width / 2, -height / 2);     // center

  // 3. Map clip space back to the actual canvas viewport in pixels
  const clipToCanvasViewport = new DOMMatrix()
    .translate(canvas.width / 2, canvas.height / 2)
    .scale(canvas.width / 2, -canvas.height / 2, 1);

  // 4. Compose: viewport · MVP · normalize
  const screenSpaceTransform =
    clipToCanvasViewport.multiply(mvpDOM).multiply(cssToUnitSpace);

  // 5. Tell the browser where the element actually sits on screen
  const computedTransform = canvas.getElementTransform(
    targetHTMLElement,
    screenSpaceTransform
  );
  if (computedTransform) {
    targetHTMLElement.style.transform = computedTransform.toString();
  }
}

The first time I read that snippet I thought "that's a lot of matrix math for one element." Then I realized — this is the math you were doing implicitly anyway to position your fake HUD over your WebGL scene. The difference is now it's blessed by the browser, so the DOM element actually lives at that 3D location and hit-testing works.

If you don't want to write the math yourself, Three.js and PlayCanvas have already shipped wrappers.

Three.js in one line

The Three.js team merged experimental support with a new HTMLTexture class. Mapping a DOM element onto a cube becomes this:

import * as THREE from "three";

const material = new THREE.MeshBasicMaterial();
material.map = new THREE.HTMLTexture(uiElement);  // pass any DOM element

const geometry = new THREE.BoxGeometry(1, 1, 1);
const mesh = new THREE.Mesh(geometry, material);
scene.add(mesh);

If you've used Three.js, you know how striking that is. A texture that's a live, accessible, find-in-page-able, browser-translatable DOM element. PlayCanvas has the equivalent. The framework boilerplate is gone.

What this unlocks that wasn't possible

Let me try to be concrete instead of hand-wavy. Here are five things you genuinely could not build before this API, broken down by what specifically changes:

1. WebXR / 3D product configurators with real, accessible text.
The 3D Pottery Barn sofa demo on a phone could finally have a label that screen readers can read, that Chrome's auto-translate localizes for German users, and that find-in-page highlights when someone searches "leather." Today, all three of those features silently fail inside WebGL textures.

2. Figma-class apps that don't ship a 4MB text engine.
Anything in the canvas-app category — Figma, Miro, Whimsical, Lucid, Photopea — built its own text layout, IME handling, font fallback, copy-paste, and accessibility shadow tree on top of canvas. With HTML-in-Canvas, the layout engine is the browser's. Bundle size goes down. CJK and bidirectional text become free. Accessibility stops being a parallel mirror you have to maintain.

3. WebGL games where the in-world terminal is a real <textarea>.
Every time I've seen a "computer terminal" inside a 3D game on the web, it's been faked with canvas text drawing and a hidden DOM element absorbing keypresses. Now the terminal can be a <textarea>, mapped onto the in-game CRT, with autofill, undo/redo, IME, and clipboard all just working.

4. AI-agent-ready 3D scenes.
This is the part most coverage misses. The Chrome team explicitly calls out indexability and AI agents as a use case: web crawlers and AI agents can now read the text rendered into 2D and 3D scenes because it's still in the DOM. When you combine HTML-in-Canvas with the other underrated I/O 2026 announcement (WebMCP), suddenly canvas-driven web apps become first-class agent surfaces. They were second-class for 20 years.

5. Translatable, dark-mode-aware, extension-friendly 3D experiences.
A user with a Chrome dictionary extension installed gets word definitions when they highlight text inside your 3D scene. A user with prefers-color-scheme: dark gets a dark UI inside your WebXR app. Nobody had to do anything to enable these. They just inherit them now, because the DOM was always responsible for them and the DOM never left.

What I'd build first

Two ideas I can't stop thinking about:

The translatable 3D book. Chrome has a demo where a WebGL-rendered book has pages that are real DOM. Users can change the font with CSS. The browser's translation feature works on the actual page content. This isn't a tech demo — this is the future of edtech, immersive journalism, and museum web experiences. Build one for a specific use case (kids' anatomy book, historical document, recipe book) and you have a portfolio piece that didn't exist a week ago.

A "real form" inside a WebGPU jelly slider. Chrome's WebGPU jelly slider demo shows an <input type="range"> refracting through a 3D jelly material while still responding to step and keyboard arrows. That's the killer pattern: take a piece of HTML you'd find on any boring form and put it inside a WebGPU effect that previously would have required you to give up form semantics entirely. Replace "jelly" with "frosted glass," "paper," "liquid metal" — every brand-marketing site shipping with WebGL right now becomes a candidate.

What I'm skeptical about

This is genuinely exciting. I still have three concerns I'd want to see addressed before I ship anything load-bearing on this API:

1. Chrome-only, behind a flag in Canary, and the origin trial spans only M148–M151.
The Intent to Experiment trail confirms a four-version window. That's three months. After that the API either becomes stable, gets pushed for another OT, or — historically — gets reshaped enough that early adopters have rewrites coming. There's no Firefox or Safari signal yet. The WICG explainer is healthy, but "WICG" doesn't mean "standardized." Plan for change.

2. Main-thread scrolling inside the canvas isn't a free win.
This is buried in the docs, but it matters: HTML-in-Canvas content is drawn from JavaScript, so scrolling and animations inside the canvas can't be off-the-main-thread the way ordinary DOM scrolling can. If you put a long scrollable list inside a canvas, every scroll event walks through your paint handler. Sometimes that's fine. Sometimes it's a 60fps → 24fps cliff you didn't see coming. Profile early.

3. Cross-origin content is blocked.
For security reasons, the API doesn't work with cross-origin iframes. That's the right call — letting one origin paint another origin's content into its own canvas would be an obvious info-leak vector. But it does mean every "embed a third-party widget into my WebXR scene" idea is dead on arrival. Plan to ship same-origin or proxy through your own server.

How to actually try it this weekend

If you want to play with this today:

Install Chrome Canary (you want at least 149).
Navigate to chrome://flags/#canvas-draw-element and enable it. Relaunch.
Clone the WICG html-in-canvas examples and open one of them.
Open the Chrome demos page to see what's possible end-to-end — the 3D book, the animated billboard, the refractive fluid prism text.
Sign up for the origin trial if you want to expose it to real users on your origin during M148–M151.

If you're a Three.js or PlayCanvas dev: jump straight to THREE.HTMLTexture or PlayCanvas's HTML texture support. The framework wrappers turn a 50-line matrix exercise into one line of code.

The big picture

Almost every I/O 2026 recap I've read leads with Gemini 3.5 and Antigravity 2.0. Those are the right headlines for an AI conference. But the announcements that quietly change what the web is capable of tend to come from the platform team, not the model team. WebMCP got coverage because it has "MCP" in the name. HTML-in-Canvas got one bullet because "DOM in canvas, but better" doesn't fit on a slide.

Here's what's actually happening, in one sentence: the browser is no longer forcing you to choose between semantic web and graphical web.

That's a 20-year-old constraint dissolving in real time. The first generation of apps built on this stack — the WebGL stores that read like text, the 3D book your kid can actually use a screen reader on, the canvas dashboards that finally have native accessibility — won't ship this month. They'll ship over the next 18 months as the origin trial matures and Three.js / PlayCanvas integrations stabilize.

But the work to be one of the people who shipped them starts now. Today. With a Chrome Canary flag and one weird layoutsubtree attribute on a <canvas>.

That's the announcement I think actually matters from Google I/O 2026.

If you build anything with HTML-in-Canvas — especially in WebGL or WebGPU — I'd love to see it. Drop a link in the comments.

Find me on GitHub or LinkedIn.

DEV Community