CSS 3D Transforms: How Someone Built DOOM in a Browser Without WebGL [2026]
Somewhere around 2013, a web developer named Keith Clark published a demo that broke frontend engineers' brains. A fully navigable 3D world — corridors, walls, lighting, shadows — rendered entirely with CSS 3D transforms and HTML div elements. No <canvas>. No WebGL. No game engine. Just the browser's own rendering pipeline doing something nobody designed it to do.
I remember opening that demo and immediately popping open DevTools, convinced there had to be a hidden canvas somewhere. There wasn't. Every wall, floor, and ceiling was a transformed div. And it ran smoothly. That moment changed how I thought about what CSS could do. More specifically, it changed what I thought "rendering engine" actually meant.
The techniques behind this are more clever than most developers give them credit for. Here's how all of it works.
How Does CSS Create a 3D World From Flat Elements?
CSS 3D transforms rely on one core mathematical concept: perspective projection. When you set the perspective property on a container element, you're telling the browser how far the viewer is from the 2D plane of the screen. This single value creates a vanishing point and the illusion of depth. Objects closer to the "camera" appear larger. Objects farther away shrink. Same principle that makes railroad tracks appear to converge in the distance.
But perspective alone only gets you halfway. The critical piece is transform-style: preserve-3d. As Keith Clark documented extensively in his writeups, without this property each child element exists in its own flat 2D plane. With it, child elements share a common 3D space. They can be rotated, translated, and positioned relative to each other in three dimensions.
What that means practically: you take a div, rotate it 90 degrees on the Y-axis, translate it 200 pixels forward on the Z-axis, and now you have a wall. Take four of those walls, add a floor (a div rotated 90 degrees on the X-axis), and you have a room. String rooms together into corridors, and you have a navigable 3D environment.
Every object in Clark's demo — barrels, walls, pillars — is built from rectangular div elements. Even seemingly rounded objects like barrels are just collections of narrow rectangles rotated around an axis, like the slats of a wooden barrel. Brute force geometry with HTML. And it works better than it has any right to.
The Camera Trick That Makes It All Performant
[YOUTUBE:T84i4Jp3v3g|Creating 3D Worlds with HTML and CSS]
This is the part that impressed me most as an engineer. In a naive implementation, you'd move the player through the world by updating the position of every single object in the scene. With hundreds of wall segments, floor tiles, and decorative objects, that's a massive number of DOM style recalculations per frame. Your browser would choke.
Clark's approach inverts this entirely: instead of moving the world's objects, you move a single "camera" element. The camera is just a wrapper div that contains the entire scene. When the player moves forward, the camera's transform property gets a new translate3d value. When the player turns, you update the rotateY value. One element changes. The browser's compositor handles the rest.
This is the same principle used in real game engines — the camera moves through a static world, not the other way around. But in the DOM context, it's also a critical performance win. Fewer style recalculations mean fewer layout thrashes, and the browser can batch the entire scene update into a single composite operation.
I've shipped enough frontend features to know that DOM performance is all about minimizing what the browser needs to recalculate. If you've ever dealt with JavaScript bloat killing your web performance, you know how much a single unnecessary reflow can cost. The camera trick sidesteps that entire problem.
Why CSS 3D Transforms Are GPU-Accelerated (And Why That Matters)
The fact that CSS 3D transforms run on the GPU is not a nice-to-have. It's the entire reason this works at all.
When you apply a CSS transform to an element, the browser promotes that element to its own compositing layer. The GPU then handles all the matrix math — rotation, translation, perspective projection — in hardware. This is a completely different path than layout-driven animations, where the CPU has to recalculate element positions, check for reflows, and repaint pixels.
As documented by MDN contributors at Mozilla, the transformations behind rotate3d(), translate3d(), and perspective() are all 4x4 matrix operations. GPUs are purpose-built for exactly this kind of parallel floating-point math. A modern GPU can process thousands of these matrices simultaneously. That's why a scene with hundreds of transformed div elements can still hit 60fps.
This is also why CSS 3D transforms feel nothing like animating top and left properties with JavaScript. With transforms, the browser skips layout and paint entirely, going straight to composite. You can feel the difference.
Having worked on projects where we benchmarked bundler performance down to milliseconds, I know how much this GPU acceleration matters. The difference between 16ms and 32ms per frame is the difference between smooth and nauseating in a first-person 3D scene.
The Biggest Limitation: No Z-Buffer, No Problem (Mostly)
Traditional 3D engines use a Z-buffer — a per-pixel depth map that determines which surface is visible when objects overlap. If two walls intersect, the Z-buffer ensures the closer surface is drawn on top, pixel by pixel.
CSS has no Z-buffer. The browser uses its standard stacking context rules to determine element ordering. This means overlapping 3D elements can produce visual glitches — a far wall might suddenly render on top of a near wall if the stacking order isn't carefully managed.
Clark and other CSS 3D experimenters work around this by controlling the DOM order of elements and using z-index strategically. For corridor-based environments like DOOM's, this is actually manageable. The level geometry is mostly convex spaces connected by portals (doorways), so you can sort the rendering order from back to front and get correct results most of the time.
But this is also why you'll never see a full open-world game rendered in CSS. The stacking context approach falls apart with complex overlapping geometry. That's a fundamental architectural constraint, not a bug to be fixed.
What's the Difference Between CSS 3D Transforms and WebGL?
I get this question every time I show people these demos. The short answer: CSS 3D transforms manipulate DOM elements in 3D space using the browser's compositor. WebGL gives you raw access to the GPU's rendering pipeline through the <canvas> element, with programmable vertex and fragment shaders.
WebGL can render arbitrary triangulated meshes, apply custom lighting algorithms, handle transparency with depth sorting, and process millions of polygons per frame. CSS 3D transforms can position, rotate, and scale rectangular HTML elements in 3D space. That's it.
The CSS approach does have one genuine advantage: the elements remain interactive DOM nodes. You can attach event listeners, apply hover states, embed text, and use standard CSS for styling. In a WebGL scene, everything is painted pixels — you'd need to implement your own hit testing. For UI-heavy use cases like 3D product configurators, interactive data visualizations, or architectural walkthroughs, CSS 3D transforms are sometimes the more pragmatic choice.
Chris Coyier, co-founder of CodePen, explored this distinction in his Smashing Magazine piece on CSS 3D transforms, noting the importance of creating a "stage" or "scene" container where all 3D objects coexist. It's a mental model that sits somewhere between CSS layout and game development. Most frontend developers have never had to think in that space, which is part of why Clark's demo hit so hard.
Can You Actually Build a Real Game With CSS 3D Transforms?
Sort of. You can build a surprisingly convincing demo. Clark's work proved that corridor-based level geometry, lighting, shadows, and even collision detection are all achievable. His demo included proper directional lighting calculated by computing surface normals from the element orientations — the math, as he admitted, "nearly broke" him.
But "demo" and "game" are different things. A real game needs reliable depth sorting for complex scenes, efficient culling of off-screen geometry, texture mapping, particle effects, and audio synchronization. CSS gives you none of that natively. You'd be rebuilding a game engine from scratch using DOM manipulation, fighting the browser's layout engine every step of the way.
Here's the honest answer: CSS 3D transforms are brilliant for what they are. A way to create spatial interfaces and mind-bending demos that push the platform to its absolute limits. If you want to ship an actual game in the browser, reach for WebGL, Three.js, or one of the WebGPU-based engines that are maturing rapidly.
But if you want to deeply understand how 3D rendering works — perspective projection, camera transforms, surface normals, depth sorting — building a scene with CSS is one of the best educational exercises I know of. You're forced to think about each concept individually because there's no engine abstracting it away. When I mentored a junior engineer through a similar exercise using reverse engineering as a learning tool, the depth of understanding they gained was way beyond what any tutorial could have provided.
The Boring Answer Is the Right One
This is one of those things where the boring answer is actually the right one. CSS 3D transforms won't replace WebGL or game engines. They're not meant to. But they reveal something about the web platform that most developers miss: the rendering primitives we use every day for layout and animation are far more powerful than we give them credit for.
The perspective property isn't just for card-flip animations. preserve-3d isn't just for parallax scroll effects. Put them together with intent and you can simulate a first-person 3D environment that runs at 60fps on hardware-accelerated browsers.
If you've never cracked open DevTools on one of these demos and inspected the elements, do it. Watch how the transforms update as you move through the scene. See how every surface is just a div with a rotation and a translation. It'll make you reconsider what "just CSS" really means.
Next time someone tells you CSS is "just for styling," send them a DOOM corridor built entirely from div elements. Then watch their face when they open the inspector.
Originally published on kunalganglani.com
Top comments (0)