From Skia to Lume: writing my own 2D rendering engine for Vel

#vel #gpu #rendering #webgpu

Vel is about a week old. I started it as a DSL plus framework experiment, and from day one the rendering substrate was Skia. That wasn't an accident. Skia is the most complete 2D rendering API you can drop into a C++ project today. Clean canvas surface, built-in text with system font fallback, image decode, a GPU backend that already works on every desktop OS. If you want a UI framework drawing pixels by the end of the week, Skia is what you reach for.

But the plan was always to replace it. Skia is a brilliant CPU-rasterization library bolted to a GPU backend, and as soon as you push it hard, the bolts show. Flutter publicly battled this same class of problems for years before they shipped Impeller and finally got rid of the runtime shader-compilation jank that made early Flutter apps stutter. I'd rather not repeat their story. So once the DSL was working and the framework was responding to my changes the way I wanted, I started writing the renderer I actually needed.

The new engine is called Lume. It lives in engine/ of the Vel repo. This post is about why I started with Skia, why I'm replacing it now, and what Lume does differently.

Why Skia first

Skia gave Vel three things I needed in the first few days of having a framework at all:

A clean SkCanvas API the widget pipeline could draw into without me writing any GPU code.
A working text renderer (CoreText on macOS, FreeType elsewhere, all behind the same API) with system font fallback.
Image decode plus GPU upload as table-stakes, so Image widgets just worked.

That let me focus on the actual hard problem of the framework, the DSL and the reactive substrate. Layout, signals, hot-reload, event dispatch, the widget registry. The rendering substrate didn't need to be mine yet. Skia was a load-bearing dependency for exactly the amount of time it took the rest of the system to stop being the bottleneck.

Why I'm replacing it now

Once the DSL and framework were in shape, I had a clear view of what the renderer was actually doing for me, and what it was going to cost as the surface area grew. Three things, all well-known to anyone who's tried to ship a Skia-based UI runtime at scale.

1. Shader compilation jank. Skia compiles shaders the first time it sees a new primitive during the frame that wants to draw it. The first time you open a dialog with a blur, you pay 40 to 120 ms while Skia builds the right shader for the GPU. Flutter spent years trying to predict and pre-warm these (the infamous "skp shader cache") and never fully won. The Impeller team's own postmortem describes this as the engine's defining flaw.

2. Tessellation on the CPU. Skia turns rounded rectangles, strokes, and curves into triangle meshes on the CPU, then ships them to the GPU. For one card it's free. For a table of 200 rows with rounded corners and hover highlights, the CPU is doing a lot of work that a fragment shader could do once and for all with an analytic SDF.

3. The framework didn't own its render path. This was the real one. Every cross-cutting question I expected to hit later (popovers clipping inside scroll views, text positioning in tight cells, atlas eviction policy, draw order across overlays, HiDPI handling) was eventually going to bottom out in Skia's behavior, and the answer was always going to be "work around it." When you don't control the rendering substrate, every one of those concerns is something you negotiate with a library that doesn't know what your widgets are.

I'm not the first person to land here. Flutter, Servo, Bevy's UI work, Slint: every team building a rendering-heavy UI runtime has eventually concluded that owning the engine is the only way to make the rest of the system answer to one design. The cheaper time to do it is before you have a year of code depending on someone else's render path.

What I borrowed from Flutter

Impeller's defining decision is ahead-of-time shader compilation. Every shader the engine could ever need is compiled at build time into Metal or Vulkan IR and bundled with the binary. The "first render is slow" problem goes away because there is no first render. Every shader has already been seen.

That insight was the foundation. The other thing I borrowed: keep the pipeline list small. Impeller has on the order of a dozen pipelines, not hundreds. The way you do that is by reducing every primitive you draw to a small set of canonical shapes (rounded rects with optional ring strokes, textured quads, line segments) and varying their behavior through uniforms, not new shaders.

Lume's pipeline count today is four:

Shape: analytic SDF rounded-rect. Fills, strokes, circles, lines, soft shadows all collapse to this.
Line: per-segment rotated quad with butt caps for polylines.
Text: textured quad sampling an R8 glyph atlas.
Image: textured quad sampling RGBA8 with corner-radius mask.

Every shape in the Vel showcase is one of those four primitives. A roundedFill is the shape pipeline with strokeWidth=0, radius=R. A shadowRect is the same pipeline with blur>0, which switches the fragment shader to a smoothstep falloff instead of the AA clamp. A circleStroke is a shape with radius=w/2. The instance attributes do the heavy lifting; the GPU just rasterizes.

What Lume actually is

The architecture is four layers:

L1  platform/   → CAMetalLayer attach (macOS). Future: ANativeWindow, HWND, canvas.
L2  gpu::Device → Dawn instance + adapter + device + queue (singleton).
L2  gpu::Surface → wgpu::Surface bound to the window's native layer.
L3  paint/      → DawnPainterImpl: four WGSL pipelines, glyph atlas,
                  per-instance state for shape/line/text/image, submission-
                  order draw segments.
L4  Painter API → public surface: fill, roundedFill, stroke, polyline,
                  arc, image, text, pushClip, pushTransform, and so on.

The whole stack is engine/include/vel/ (public headers) plus engine/src/ (about 3,000 lines of implementation). The framework calls into the Painter API and never sees a WebGPU type.

Three details that took real effort:

The glyph atlas is keyed on physical pixel size. When you ask for 14 px text on a 2× DPR display, FreeType rasterizes at 28 px. Lume's atlas cache key includes that physical size, so a window dragged to a 1× external monitor doesn't render upsampled-blurry text. It just rasterizes a second 14 px entry and uses that. The dst rect stays in logical pixels; the GPU samples the physical atlas 1:1.

Submission-order draw segments. Originally Lume batched all-shapes, then all-lines, then all-text per frame. This broke the Table widget's sticky header: the header background was drawn before the row text, so row text overdrew the header bg, and rows became visible through the header during scroll. The fix was to track a small DrawCmd list ({kind, firstInstance, count}) in submission order and emit one Draw call per segment. Same-kind cmds fuse. The Table works, and any widget that depends on draw order ("this card needs to be on top of those cards") works for the same reason.

Drag capture survives reactive rebuilds. Vel is signal-driven. When the user drags a Slider, the slider writes to a signal, which triggers a re-render, which replaces the Slider widget instance. The new instance has dragging_=false. The drag dies after one mouse-move event. The fix wasn't in Lume; it was in the framework's EventDispatcher. captureDrag(handler) registers a callable that closes over the slider's geometry plus its onChange (whose own closure captures the long-lived owning component's this). Mouse-move and mouse-up route to the captured handler directly, bypassing the widget tree. Drag continues across any number of rebuilds.

The Skia / Impeller / Lume comparison

The dimensions that matter for a 2D UI runtime:

	Skia (Vel v1)	Impeller (Flutter)	Lume (Vel today)
Shader compilation	JIT, at first-draw time	AOT, build-time	WGSL precompiled by Dawn at device init
Shape rendering	CPU tessellation → GPU triangles	Compute + tessellation hybrid	Analytic SDF in the fragment shader
Pipeline count	hundreds (one per primitive + state combo)	~12	4
Text	CoreText / FreeType per platform	Manual rasterizer → MTLTexture atlas	FreeType → R8 atlas, OS/2 typo metrics
Idle frame cost	Always paints	Always paints	~0 (frame-dirty flag short-circuits the whole pipeline)
HiDPI	Surface scaled in canvas	Per-pass DPR awareness	Atlas keyed on physical px; dst rect in logical px
Cross-platform reach	GL/Vulkan/Metal/D3D11	Metal + Vulkan (+ work-in-progress)	Dawn handles Metal/Vulkan/D3D12/WebGPU from one WGSL source
Library code in libvel	Skia + image codecs (~25 MB linked)	n/a	0
`libvel.dylib` size (macOS arm64)	~30 MB	n/a	11 MB
Hot-reload safety	Crashes if plugin link drops Skia symbols	n/a	Plugin links the same `libvel.dylib`; nothing else to share

The single most useful number on that table is the bottom one. With Skia gone, the hot-reload plugin no longer needs to think about which graphics symbols it shares with the host. libvel.dylib is the sole boundary. A hot reload re-emits a .vel.cpp, recompiles 200 lines, and dlopens the new dylib in under a second.

What Lume doesn't do yet (the honest section)

This is the first usable version of the engine, and I'd be lying if I said it was at parity with Skia for every workload. Three real gaps:

Compute-shader Gaussian blur. Lume's shadow is currently a smoothstep outer falloff applied to the rounded-rect SDF. For small blur radii (4 to 16 px, which covers most UI shadows) it's perceptually identical to a Gaussian. For larger radii it reads as "the rect got bigger and softer at the edges" rather than a true Gaussian. A two-pass separable Gaussian in a compute pipeline is next; for now the cheap approximation is honest about what it is.

Complex-script text shaping. I link HarfBuzz; I don't drive it yet. Latin, Cyrillic, and Greek render correctly. Arabic ligatures, Devanagari conjuncts, vertical text: those are next. The FreeType path is in; the HarfBuzz shaping pass on top of it isn't.

The platform surface is macOS-only. Dawn supports Vulkan and D3D12, so the underlying portability is real. The part missing is the window-to-surface glue. Lume has a SurfaceMac.mm that attaches a CAMetalLayer to a GLFW window's NSView; the Windows and Linux equivalents are file-shaped holes today. CI builds compile against the abstraction, but the surface code is the actual port.

The roadmap continues from here: native arcs and dashed strokes via additional pipelines, then HarfBuzz, then compute blur, then a Web target via Dawn plus Emscripten, then Windows and Linux surface layers, then partial-repaint damage rects. Owning the engine means the work is real, but at least it's bounded.

The journey is in the git log

The proof that Lume isn't a paper exercise is the diff. The Skia removal was commit 0d7a8f4. The reorganized four-tier repo (Lume in engine/, the framework in framework/, the component registry in registry/, and the .vel compiler in velc/) is f5b86af. grep -rE 'Sk[A-Z]|sk_sp' engine framework registry velc returns zero hits. The dependency list in vcpkg.json is six lines now, none of them Skia.

dawn          : GPU abstraction (Metal/Vulkan/D3D12/WebGPU)
freetype      : glyph rasterization
harfbuzz      : complex-script shaping (next)
glfw3         : windowing
spdlog        : logging
nlohmann-json : JSON for the framework

If you want to read it, the code is at github.com/chan27-2/Vel. The README's Lume section walks the engine specifically; the engine/ tree on main is the smallest version of "a 2D GPU rendering engine you can actually run" I know how to write.

The lesson I'd take from this, and I'm saying this because I want to remember it later, is that the rendering substrate is not a library decision. It's an architecture decision. The moment your framework needs to answer cross-cutting questions about hit-testing, atlas eviction, draw order, and HiDPI all at once, you can either keep negotiating with someone else's library or you can write your own. Flutter eventually came to the same conclusion. So did I, just earlier. The work is bigger than it looks. The result is that everything downstream of the renderer stops feeling like it's fighting the renderer.