ndesmic

Posted on Dec 16, 2025

WebGPU Engine from Scratch Part 11 - 2: True Scene Graph

#webgpu #vanillajs #computergraphics

Improvements to the markup

Resolution

At this point we can play around a little more without having to worry about breaking things because we can write visual tests for them. But one of the first features to come to mind is making the rendering smaller because the tests don't need large screenshots taking up so much space. The wc-geo component had a height and width attribute but we never actually wired those up. Let's fix that.

//wc-geo.js
set width(value) {
    this.#width = parseInt(value, 10);
    if (this.dom?.canvas) {
        this.dom.canvas.width = this.#width;
    }
}
set height(value) {
    this.#height = parseInt(value, 10);
    if (this.dom?.canvas) {
        this.dom.canvas.height = this.#height;
    }
}

This will change the props and the canvas when the height or width changes. This will also cause some internal problems in the engine (which will render a black rectangle) because the resources will now have to resize so let's fix those too.

Really the only thing that needs to change at this point is the depth texture resolution. We'll split it off from the initializeTextures method.

//gpu-engine.js
async initializeTextures(textures) {
    for (const [key, texture] of Object.entries(textures)) {
        if (texture.image ?? texture.images) {
            this.#textures.set(key, await uploadTexture(this.#device, texture.image ?? texture.images, { label: `${key}-texture` }));
        } else if (texture.color) {
            this.#textures.set(key, createColorTexture(this.#device, { color: texture.color, label: `${key}-texture` }));
        }
    }
    //default textures
    this.initDepthTexture();
    this.#textures.set(PLACEHOLDER_TEXTURE, createColorTexture(this.#device, { label: "placeholder-texture" }));
}
initDepthTexture() {
    this.#textures.set(DEPTH_TEXTURE, this.#device.createTexture({
        label: "depth-texture",
        size: {
            width: this.#canvas.width,
            height: this.#canvas.height,
            depthOrArrayLayers: 1
        },
        format: "depth32float",
        usage: GPUTextureUsage.RENDER_ATTACHMENT
    }));
}

Then we update the depth texture with a method if things change.

updateCanvasSize(){
    this.initDepthTexture();
}

Since it's tied to the canvas itself we don't need to pass in the height and width as those are already updated but there's an argument from an API standpoint we should manage resolution apart from the canvas. I'm not going to bother with that though.

We'll also set some state info to know if the engine was initialized. This way we know the different between setting the first time and if it's actually changing the engine in progress.

//gpu-engine.js
#isIntialized = false

async initializeScene(scene) {
    this.initializeCameras(scene.cameras);
    await this.initializeTextures(scene.textures);
    this.initializeMaterials(scene.materials);
    this.initializeSamplers();
    await this.initializeMeshes(scene.meshes);
    this.initializeGroups(scene.groups);
    this.initializeLights(scene.lights);
    await this.initializePipelines();
    this.initializePipelineMeshes(scene.pipelineMeshes);
    this.#isInitialized = true;
}

get isInitialized(){
    return this.#isInitialized;
}

Could be combined into a state machine with isRunning since they aren't entirely orthogonal. Also not going to bother with that.

Finally we can finish our setters.

//wc-geo.js
set width(value) {
    this.#width = parseInt(value, 10);
    if (this.dom?.canvas) {
        this.dom.canvas.width = this.#width;
    }
    if(this.engine?.isInitialized){
        this.engine.updateCanvasSize();
    }
}
set height(value) {
    this.#height = parseInt(value, 10);
    if (this.dom?.canvas) {
        this.dom.canvas.height = this.#height;
    }
    if(this.engine?.isInitialized){
        this.engine.updateCanvasSize();
    }
}

We do need to be defensive with the sets because the attribute will change before connectedCallback so those object won't even exist the first time around.

A true scene graph

The previous version was hacked together using keys as a crutch to pick-out nested meshes and groups. What we really want is a full tree from the top-down. It became too hard to explain step-by-step so we'll do a big migration and explain what happened:

//geo-markup-parser.js
import { Camera } from "../entities/camera.js";
import { Material } from "../entities/material.js";
import { Mesh } from "../entities/mesh.js";
import { Group } from "../entities/group.js";
import { Light } from "../entities/light.js";
import { loadImage } from "../utilities/image-utils.js";
import { fetchObjMesh } from "./data-utils.js";
import { surfaceGrid, quad, cube } from "./mesh-generator.js";

function parseVector(text, length = 4, defaultValue = null) {
    return text
        ? text.split(",").map(x => parseFloat(x.trim())).slice(0, length)
        : defaultValue
}

function parseIntOrDefault(text, defaultValue = null) {
    return text
        ? parseInt(text, 10)
        : defaultValue;
}

function parseFloatOrDefault(text, defaultValue = null) {
    return text
        ? parseFloat(text)
        : defaultValue;
}

function parseListOrDefault(text, defaultValue = null) {
    return text
        ? text.split(",").map(x => x.trim())
        : defaultValue
}

function updateMeshAttributes(meshEl, mesh) {
    const normalize = meshEl.hasAttribute("normalize");
    if (normalize) {
        mesh.normalizePositions();
    }

    const resizeUvs = parseIntOrDefault(meshEl.getAttribute("resize-uvs"));
    if (resizeUvs) {
        mesh.resizeUvs(resizeUvs)
    }

    const material = meshEl.getAttribute("material");
    mesh.setMaterial(material);

    const attributes = parseListOrDefault(meshEl.getAttribute("attributes"));
    if (attributes) {
        mesh.useAttributes(attributes);
    }

    const translate = parseVector(meshEl.getAttribute("translate"), 3);
    if (translate) {
        mesh.translate({ x: translate[0], y: translate[1], z: translate[2] });
    }

    const rotate = parseVector(meshEl.getAttribute("rotate"), 3);
    if (rotate) {
        mesh.rotate({ x: rotate[0], y: rotate[1], z: rotate[2] });
    }

    const scale = parseVector(meshEl.getAttribute("scale"), 3);
    if (scale) {
        mesh.scale({ x: scale[0], y: scale[1], z: scale[2] });
    }

    //must come last because it updates mesh
    const bakeTransforms = meshEl.hasAttribute("bake-transforms");
    if (bakeTransforms) {
        mesh.bakeTransforms();
    }
}

function parseCamera(cameraEl, options = {}) {
    return new Camera({
        name: cameraEl.getAttribute("name"),
        position: parseVector(cameraEl.getAttribute("position"), 3),
        screenHeight: cameraEl.getAttribute("height") ?? options.defaultHeight,
        screenWidth: cameraEl.getAttribute("width") ?? options.defaultWidth,
        fieldOfView: cameraEl.getAttribute("fov") ?? 90,
        near: cameraEl.getAttribute("near") ?? 0.01,
        far: cameraEl.getAttribute("far") ?? 5,
        isPerspective: !cameraEl.hasAttribute("is-orthographic")
    });
}

async function parseTexture(textureEl){
    const name = textureEl.getAttribute("name");
    const src = textureEl.getAttribute("src");
    const srcs = parseListOrDefault(textureEl.getAttribute("srcs"));
    const color = textureEl.getAttribute("color");
    let value;
    if (src) {
        value = { entity: "texture", image: await loadImage(src), name  };
    } else if(srcs){
        value = { entity: "texture", images: await Promise.all(srcs.map(s => loadImage(s))), name } 
    } else if (color) {
        value = { entity: "texture", color: parseVector(color, 4), name };
    }

    return value;
}

function parseMaterial(materialEl) {
    const roughnessMap = materialEl.getAttribute("roughness-map");
    const albedoMap = materialEl.getAttribute("albedo-map");

    return new Material({
        name: materialEl.getAttribute("name"),
        albedoMap: albedoMap,
        useRoughnessMap: !!roughnessMap,
        roughness: parseFloatOrDefault(materialEl.getAttribute("roughness")),
        metalness: parseFloatOrDefault(materialEl.getAttribute("metalness")),
        baseReflectance: parseVector(materialEl.getAttribute("base-reflectance"), 3)
    });
}

async function parseMesh(meshEl) {
    const reverseWinding = meshEl.hasAttribute("reverse-winding");
    const src = meshEl.getAttribute("src");
    const mesh = await fetchObjMesh(src, { reverseWinding });

    updateMeshAttributes(meshEl, mesh);

    return mesh;
}

function parseSurfaceGrid(meshEl) {
    const rowCount = parseInt(meshEl.getAttribute("row-count"), 10);
    const colCount = parseInt(meshEl.getAttribute("col-count"), 10);
    const mesh = new Mesh(surfaceGrid(rowCount, colCount));

    updateMeshAttributes(meshEl, mesh);

    return mesh;
}

function parseQuad(meshEl){
    const mesh = new Mesh(quad());
    updateMeshAttributes(meshEl, mesh);
    return mesh;
}

function parseCube(meshEl){
    const mesh = new Mesh(cube());
    updateMeshAttributes(meshEl, mesh);
    return mesh;
}

/**
 * 
 * @param {HTMLElement} groupEl 
 */
async function parseGroup(groupEl, options){
    const children = await Promise.all(Array.from(groupEl.children).map(async c => {
        switch(c.tagName){
            case "GEO-BACKGROUND": {
                return parseBackground(c);
            }
            case "GEO-MESH": {
                return (await parseMesh(c));
            }
            case "GEO-SURFACE-GRID": {
                return parseSurfaceGrid(c);
            }
            case "GEO-QUAD": {
                return parseQuad(c);
            }
            case "GEO-CAMERA": {
                return parseCamera(c, options);
            }
            case "GEO-CUBE": {
                return parseCube(c)
            }
            case "GEO-GROUP": {
                return (await parseGroup(c, options))
            }
            case "GEO-LIGHT": {
                return parseLight(c)
            }
            case "GEO-TEXTURE": {
                return parseTexture(c);
            }
            case "GEO-MATERIAL": {
                return parseMaterial(c);
            }
            default: {
                throw new Error(`Group doesn't support ${c.tagName} children`)
            }
        }
    }));

    const group = new Group({
        children
    });

    const translate = parseVector(groupEl.getAttribute("translate"), 3);
    if (translate) {
        group.translate({ x: translate[0], y: translate[1], z: translate[2] });
    }

    const rotate = parseVector(groupEl.getAttribute("rotate"), 3);
    if (rotate) {
        group.rotate({ x: rotate[0], y: rotate[1], z: rotate[2] });
    }

    const scale = parseVector(groupEl.getAttribute("scale"), 3);
    if (scale) {
        group.scale({ x: scale[0], y: scale[1], z: scale[2] });
    }

    return group;
}

function parseLight(lightEl) {
    const light = new Light({
        type: lightEl.getAttribute("type") ?? "point",
        color: parseVector(lightEl.getAttribute("color"), 4, [1, 1, 1, 1]),
        direction: parseVector(lightEl.getAttribute("direction"), 3, [0, 0, 0]),
        castsShadow: lightEl.hasAttribute("casts-shadow")
    });

    const translate = parseVector(lightEl.getAttribute("translate"), 3);
    if (translate) {
        light.translate({ x: translate[0], y: translate[1], z: translate[2] });
    }

    const rotate = parseVector(lightEl.getAttribute("rotate"), 3);
    if (rotate) {
        light.rotate({ x: rotate[0], y: rotate[1], z: rotate[2] });
    }

    return light;
}

function parseBackground(backgroundEl){
    if(!backgroundEl) return null;
    return {
        entity: "background",
        environmentMap: backgroundEl.getAttribute("environment-map"),
        sampler: backgroundEl.getAttribute("sampler")
    };
}

export async function parseScene(element) {
    const sceneRoot = await parseGroup(element, {
        defaultWidth: element.width,
        defaultHeight: element.height
    });

    return {
        sceneRoot,
    };
}

Here we've eliminated everything from the parseScene output except the top most group to make it fully recursive. This means we can just store each element as a child of a group which is much simpler. We do this by checking each nested element and seeing if it's something we can parse and then parse it. I also changed key to name for most elements as it overall made more sense to do so. The other benefit of the tree structure is that things like lights no longer need keys as we can generate those automatically.

//gpu-engine.js
async initializeScene(scene) {
        this.initializeSamplers();

        this.initializeGroup(scene.sceneRoot, 0);
        //default textures
        this.initDepthTexture();
        this.#textures.set(PLACEHOLDER_TEXTURE, createColorTexture(this.#device, { label: "placeholder-texture" }));

        this.#shadowMaps.set("placeholder", this.#device.createTexture({
            label: "placeholder-depth-texture",
            size: { width: 1, height: 1, depthOrArrayLayers: 1 },
            format: "depth32float",
            usage: GPUTextureUsage.RENDER_ATTACHMENT | GPUTextureUsage.TEXTURE_BINDING
        }));

        await this.initializePipelines();
        this.#isInitialized = true;
    }
    initializeCameras(camera) {
        this.#cameras.set(camera.name, camera);
    }
    initializeTexture(texture) {
        if (texture.image ?? texture.images) {
            this.#textures.set(texture.name, uploadTexture(this.#device, texture.image ?? texture.images, { label: `${texture.name}-texture` }));
        } else if (texture.color) {
            this.#textures.set(texture.name, createColorTexture(this.#device, { color: texture.color, label: `${texture.name}-texture` }));
        }
    }
    initDepthTexture() {
        this.#textures.set(DEPTH_TEXTURE, this.#device.createTexture({
            label: "depth-texture",
            size: {
                width: this.#canvas.width,
                height: this.#canvas.height,
                depthOrArrayLayers: 1
            },
            format: "depth32float",
            usage: GPUTextureUsage.RENDER_ATTACHMENT | GPUTextureUsage.TEXTURE_BINDING
        }));
    }
    initializeMaterial(material) {
        this.#materials.set(material.name, material);
    }
    initializeSamplers() {
        this.#samplers.set(DEFAULT_SAMPLER, this.#device.createSampler({
            addressModeU: "repeat",
            addressModeV: "repeat",
            magFilter: "linear",
            minFilter: "linear"
        }));
        this.#samplers.set(DEFAULT_NEAREST_SAMPLER, this.#device.createSampler({
            addressModeU: "repeat",
            addressModeV: "repeat",
            magFilter: "nearest",
            minFilter: "nearest"
        }));
        this.#samplers.set(DEFAULT_SHADOW_SAMPLER, this.#device.createSampler({
            label: "shadow-map-default-sampler",
            compare: "less",
            magFilter: "linear",
            minFilter: "linear"
        }));
        this.#samplers.set("shadow-map-debug", this.#device.createSampler({
            label: "shadow-map-debug-sampler",
            compare: undefined,
        }));
    }
    initializeMesh(mesh, key) {
        const { vertexBuffer, indexBuffer } = uploadMesh(this.#device, mesh, { label: `${key}-mesh` });
        this.#meshContainers.set(mesh, { mesh, vertexBuffer, indexBuffer });
    }
    initializeGroup(group, key) {
        for (let i = 0; i < group.children.length; i++) {
            const child = group.children[i];
            if (child instanceof Camera){
                this.initializeCameras(child);
            } else if(child instanceof Mesh) {
                this.initializeMesh(child);
            } else if(child instanceof Light){
                this.initializeLight(child, `${key}-${i}`);
            } else if (child instanceof Group) {
                this.initializeGroup(child, `${key}-${i}`);
            } else if (child instanceof Material){
                this.initializeMaterial(child);
            } else if (child.entity === "texture"){
                this.initializeTexture(child);
            } else if (child.entity === "background"){
                this.initializeBackground(child);
            } else {
                throw new Error(`Don't know what this entity is ${JSON.stringify(child)}`)
            }
        }
        this.#sceneRoot = group;
    }
    initializeLight(light, key) {
        this.#lights.set(key, light)

        if(light.castsShadow){
            this.#shadowMaps.set(key, this.#device.createTexture({
                label: `shadow-map-${key}`,
                size: {
                    width: 2048,
                    height: 2048,
                    depthOrArrayLayers: 1
                },
                format: "depth32float",
                usage: GPUTextureUsage.RENDER_ATTACHMENT | GPUTextureUsage.TEXTURE_BINDING
            }));
        }
    }
    async initializePipelines() {
        {
            const pipeline = await getMainPipeline(this.#device);

            this.#pipelines.set("main", {
                pipeline,
                bindGroupLayouts: new Map([
                    ["scene", pipeline.getBindGroupLayout(0)],
                    ["materials", pipeline.getBindGroupLayout(1)],
                    ["lights", pipeline.getBindGroupLayout(2)]
                ]),
                bindMethod: this.setMainBindGroups.bind(this)
            });
        }
        {
            const pipeline = await getShadowMapPipeline(this.#device);

            this.#pipelines.set("shadow-map", {
                pipeline,
                bindGroupLayouts: new Map([
                    ["scene", pipeline.getBindGroupLayout(0)],
                ]),
                bindMethod: this.setShadowMapBindGroups.bind(this)
            });
        }
        {
            const pipeline = await getBackgroundPipeline(this.#device);

            this.#pipelines.set("background", {
                pipeline: pipeline,
                bindGroupLayouts: new Map([
                    ["scene", pipeline.getBindGroupLayout(0)],
                    ["materials", pipeline.getBindGroupLayout(1)]
                ]),
                bindMethod: this.setBackgroundBindGroups.bind(this)
            });
        }
    }
    initializeBackground(background) {
        if (background) {
            const mesh = new Mesh(screenTri()).useAttributes(["positions"]);
            const { vertexBuffer, indexBuffer } = uploadMesh(this.#device, mesh, { label: `background-tri-mesh` });
            this.#meshContainers.set(mesh, { mesh, vertexBuffer, indexBuffer });
            this.#background = {
                mesh,
                environmentMap: background.environmentMap,
                sampler: background.sampler === "nearest" ? DEFAULT_NEAREST_SAMPLER : DEFAULT_SAMPLER
            };
        }
    }
    //...etc

In gpu-engine.js we traverse through that tree and start adding the entities into their maps as we see them. For things like background (a new entity, see below) we'll just override so only the last one is used. There could be more validation here. Also with more complex parsing we could restrict certain entities to the top-level since it doesn't actually make sense to nest backgrounds and textures but we allow it anyway.

We can rip out the concept of pipelineMesh too. Snything in the scene root will go through the main pipeline. We'll hardcode the rest, at present there are only 3, main, shadow-map, and background and I don't see that changing much, at least in a way that can be configured dynamically.

Adding a background

Speaking of backgrounds, I want the ability to set the background. This means the cubemap that the scene sits inside. We already have the background pipeline so we should be able to leverage that. As you can see above we parse it into the scene object and set it in the engine. There was a modification to the texture parsing to handle multi-layer textures using the srcs attribute instead of src.

Looking at initializeBackground, we need the fullscreen tri for the background to sit on. Keep in mind we only need the positions, and don't use the other attributes. It'll be a little weird to hold the mesh reference to use as a key but that's how we'll deal with it. We could do something more complex with samplers (like take the actual min/mag filters) but I don't see the need right now. To render it, renderScene has it's pipeline loop unrolled to do the two phases for main geometry and background.

//gpu-engine.js
renderScene() {
    const commandEncoder = this.#device.createCommandEncoder({
        label: "main-command-encoder"
    });
    const camera = this.#cameras.get("main");
    const depthView = this.#textures.get(DEPTH_TEXTURE).createView({ label: "depth-texture-view"});
    const colorView = this.#context.getCurrentTexture().createView({ label: "color-texture-view"});
    {
        const passEncoder = commandEncoder.beginRenderPass({
            label: `main-render-pass`,
            colorAttachments: [
                {
                    storeOp: "store",
                    loadOp: "clear",
                    clearValue: { r: 0.1, g: 0.3, b: 0.8, a: 1.0 },
                    view: colorView
                }
            ],
            depthStencilAttachment: {
                view: depthView,
                depthClearValue: 1.0,
                depthStoreOp: "store",
                depthLoadOp: "clear"
            }
        });
        const pipelineContainer = this.#pipelines.get("main");
        passEncoder.setPipeline(pipelineContainer.pipeline);
        const renderRecursive = (meshOrGroup) => {
            if (meshOrGroup instanceof Group) {
                for (const child of meshOrGroup.children) {
                    renderRecursive(child)
                }
            } else if(meshOrGroup instanceof Mesh) {
                const meshContainer = this.#meshContainers.get(meshOrGroup);
                pipelineContainer.bindMethod(passEncoder, pipelineContainer.bindGroupLayouts, camera, meshContainer.mesh, this.#lights, thi
                passEncoder.setVertexBuffer(0, meshContainer.vertexBuffer);
                passEncoder.setIndexBuffer(meshContainer.indexBuffer, "uint16");
                passEncoder.drawIndexed(meshContainer.mesh.indices.length);
            }
        }
        renderRecursive(this.#sceneRoot);
        passEncoder.end();
    }
    if(this.#background){
        const passEncoder = commandEncoder.beginRenderPass({
            label: `background-render-pass`,
            colorAttachments: [
                {
                    storeOp: "store",
                    loadOp: "load",
                    view: colorView
                }
            ],
            depthStencilAttachment: {
                view: depthView,
                depthStoreOp: "store",
                depthLoadOp: "load"
            }
        });
        const pipelineContainer = this.#pipelines.get("background");
        passEncoder.setPipeline(pipelineContainer.pipeline);

        const meshContainer = this.#meshContainers.get(this.#background.mesh);
        pipelineContainer.bindMethod(passEncoder, pipelineContainer.bindGroupLayouts, camera, meshContainer.mesh, this.#lights, this.#shado
        passEncoder.setVertexBuffer(0, meshContainer.vertexBuffer);
        passEncoder.setIndexBuffer(meshContainer.indexBuffer, "uint16");
        passEncoder.drawIndexed(meshContainer.mesh.indices.length);
        passEncoder.end();
    }
    this.#device.queue.submit([commandEncoder.finish()]);
}

For completeness here's the change to the backgroundTextureBindGroup

setBackgroundTextureBindGroup(passEncoder, bindGroupLayouts) {
    const textureBindGroup = this.#device.createBindGroup({
        layout: bindGroupLayouts.get("materials"),
        entries: [
            { binding: 0, resource: this.#samplers.get(this.#background.sampler) },
            { binding: 1, resource: this.#textures.get(this.#background.environmentMap).createView({ dimension: "cube" }) },
        ]
    });
    passEncoder.setBindGroup(1, textureBindGroup);
}

For visual tests I added a new cube map with primary colors on each side so it's immediately apparent which way the camera is facing.

Note: If you caught it in the video above, there's a bug here that cost me like an hour of debugging later on.

Adding lights to groups

To start with this it would be nice to manipulate the cameras and lights the same way as the meshes. To do so lets create a base class Transformable that lets us perform common transforms and then the light and camera can inherit those abilities. These are just cut/pasted from mesh so you can update that class to inherit accordingly:

//transformable.js
import { getTranslationMatrix, getRotationXMatrix, getRotationYMatrix, getRotationZMatrix, getScaleMatrix, multiplyMatrix, getIdentityMatrix } from "../utilities/vector.js";

export class Transformable {
    #transforms = [];
    #worldMatrix = getIdentityMatrix();

    get modelMatrix() {
        return this.#transforms.reduce((mm, tm) => multiplyMatrix(tm, [4,4], mm, [4,4]), getIdentityMatrix());
    }
    get worldMatrix() {
        return this.#worldMatrix;
    }
    /**
     * @param {Float32Array} value 
     */
    set worldMatrix(value){
        this.#worldMatrix = value;
    }

    translate({ x = 0, y = 0, z = 0 }) {
        this.#transforms.push(getTranslationMatrix(x, y, z));
        return this;
    }
    scale({ x = 1, y = 1, z = 1 }) {
        this.#transforms.push(getScaleMatrix(x, y, z));
        return this;
    }
    rotate({ x, y, z }) {
        //there's an order dependency here... something something quaternions...
        if (x) {
            this.#transforms.push(getRotationXMatrix(x));
        }
        if (y) {
            this.#transforms.push(getRotationYMatrix(y));
        }
        if (z) {
            this.#transforms.push(getRotationZMatrix(z));
        }
        return this;
    }
    resetTransforms() {
        this.#transforms = [];
    }
}

Be aware that I also changed the getX methods into actual getters because it's a bit cleaner and I don't expect them to need parameters.

For the light we extend it:

//light.js
import { Transformable } from "./transformable.js";

export class Light extends Transformable { ... }

I noticed a bug where we were directly setting the private fields in the constructor of the Light, those should have actually gone through the setters (eg this.position = options.position this.#position = options.position). We'll also need to fix up the direction property:

set direction(val) {
    if(val.length === 3){
        this.#direction = new Float32Array([...val, 0]);
    } else {
        this.#direction = new Float32Array(val);
    }
}

We're using homogeneous coordinates (4-value) and this is necessary for the matrix multiplies to work. If we only pass in 3 values assume the last one is a 0 (remember that because it's a direction, not a position we give it a 0 instead of a 1). I'm going to try to be more consistent about using these coordinates in the engine to hopefully make things more consistent and explicit.

In the parsing we've already taken care of the transforms.

We will use the recursive group traversal to add them to a set. Unlike geometry, the semantic tree doesn't matter as much for lights, even if they are in different subtrees they can still impact the scene so we still iterate through them in a list (potentially we could be smarter in the future using complex heuristics). We need to make sure since they are still present along with some other entities in the scene graph we don't try to render them.

//gpu-engine.js - renderShadowMaps and renderScene
const renderRecursive = (meshOrGroup) => {
    if (meshOrGroup instanceof Group) {
        for (const child of meshOrGroup.children) {
            renderRecursive(child)
        }
    } else if(meshOrGroup instanceof Mesh) {
        const meshContainer = this.#meshContainers.get(meshOrGroup);
        pipelineContainer.bindMethod(passEncoder, pipelineContainer.bindGroupLayouts, camera, meshContainer.mesh, this.#lights, this.#shadowMaps)
        passEncoder.setVertexBuffer(0, meshContainer.vertexBuffer);
        passEncoder.setIndexBuffer(meshContainer.indexBuffer, "uint16");
        passEncoder.drawIndexed(meshContainer.mesh.indices.length);
    }
}

When we render a scene we apply the transform to the light. Unlike the mesh it's easier to apply this on the CPU since there's only one vector to be transformed we gain nothing from doing it on the GPU.

//gpu-engine.js - setMainLightBindGroup
//...stuff
const shadowMappedLights = lights
            .entries()
            .map(([key, value]) => {
                const shadowMap = shadowMaps.get(key);
                const shadowMapAspectRatio = shadowMap.width / shadowMap.height;

                return {
                    typeInt: value.typeInt,
                    position: multiplyMatrixVector(value.position, multiplyMatrix(getTranspose(value.worldMatrix, [4,4]), [4,4], value.modelMatrix, [4,4]), 4),
                    direction: multiplyMatrixVector(value.direction, multiplyMatrix(getTranspose(value.worldMatrix, [4,4]), [4,4], value.modelMatrix, [4,4]), 4),
                    color: value.color,
                    shadowMap,
                    projectionMatrix: shadowMap ? getLightProjectionMatrix(shadowMapAspectRatio) : getEmptyMatrix([4, 4]),
                    viewMatrix: shadowMap ? getLightViewMatrix(value.direction) : getEmptyMatrix([4, 4]),
                    castsShadow: value.castsShadow ? 1 : 0,
                    shadowMapIndex: (value.castsShadow && shadowMap) ? shadowMapIndex++ : -1
                };
            }).toArray();
//stuff...

The transform needs to be transposed. This is because the convention in WGSL is col-major and our matrices are row-major (this is mostly apparent because one gets the correct answer and the other rotates the opposite direction so if you see that this might be the answer). This will get a correct result.

Adding cameras to groups

Cameras can also be added to groups to be transformed with them but it will require a big refactor to the current camera system.

//camera.js
import { getOrthoMatrix, getProjectionMatrix, getCameraToWorldMatrixFromDirection, UP, subtractVector, normalizeVector, getWorldToCameraMatrixFromDirection, multiplyMatrixVector, addVector } from "../utilities/vector.js";
import { sphericalToCartesian, cartesianToSpherical } from "../utilities/math-helpers.js";
import { Transformable } from "./transformable.js";

export class Camera extends Transformable {
    #name;
    #position = new Float32Array([0,0,-1,1]);
    #direction;
    #screenWidth;
    #screenHeight;
    #near;
    #far;
    #left;
    #right;
    #top;
    #bottom;
    #fieldOfView;
    #isPerspective;

    /**
     * 
     * @param {{
     *   position: ArrayLike,
     *   screenWidth: number,
     *   screenHeight: number,
     *   left: number,
     *   right: number,
     *   top: number,
     *   bottom: number,
     *   near: number,
     *   far: number,
     *   fieldOfView: number,
     *   isPerspective: boolean
     * }} camera 
     */
    constructor(camera){
        super();
        if(!camera.name){
            throw new Error("Camera must have a name");
        }
        this.name = camera.name;
        this.position = camera.position;

        if(camera.direction){
            this.direction = normalizeVector(camera.direction);
        } else {
            this.lookAt(camera.target ?? new Float32Array([0,0,0,1]));
        }

        this.#screenWidth = camera.screenWidth;
        this.#screenHeight = camera.screenHeight;
        this.#left = camera.left;
        this.#right = camera.right;
        this.#top = camera.top;
        this.#bottom = camera.bottom;
        this.#near = camera.near;
        this.#far = camera.far;
        this.#fieldOfView = camera.fieldOfView;
        this.#isPerspective = camera.isPerspective;

        if (this.#isPerspective && (this.#screenWidth === undefined || this.#screenHeight === undefined || this.#near === undefined || this.#far === undefined || this.#fieldOfView === undefined)){
            throw new Error(`Missing required value for perspective projection`);
        }
        if (!this.#isPerspective && (this.#left === undefined || this.#right === undefined || this.#near === undefined || this.#far === undefined || this.#top === undefined || this.#bottom === undefined)) {
            throw new Error(`Missing required value for ortho projection`);
        }
    }
    lookAt(target){
        const normalizedTarget = normalizeVector(target.length === 3
            ? new Float32Array([...target, 1])
            : new Float32Array(target));
        this.direction = subtractVector(normalizedTarget, this.position); 
    }

    moveTo(x, y, z){
        this.position = [x,y,z,1];
    }

    moveBy({ x = 0, y = 0, z = 0 }){
        this.position = addVector(this.position, new Float32Array([x,y,z,1]));
    }

    panBy({ right = 0, up = 0, forward = 0 }){
        const cameraToWorld = getCameraToWorldMatrixFromDirection(this.position, this.direction);

        const delta = multiplyMatrixVector(new Float32Array([right, up, forward, 0]), cameraToWorld, 4);
        this.position = addVector(this.position, delta); 
    }

    orbitBy({ radius = 0, lat = 0, long = 0 }, target){
        const [currentLat, currentLng, r] = this.getOrbit(target); 
        const newLat = currentLat + lat;
        const newLong = currentLng - long;
        const newRadius = Math.max(0.1, r + radius);
        this.position = sphericalToCartesian([newLat, newLong, newRadius]);
    }

    getOrbit(target){
        const homgeneousTarget = target.length === 3 ? new Float32Array([...target, 1]) : new Float32Array(target);
        const targetDelta = subtractVector(this.position, homgeneousTarget);
        return cartesianToSpherical(targetDelta);
    }

    get viewMatrix(){
        return getWorldToCameraMatrixFromDirection(this.#position, this.#direction, UP);
    }

    get projectionMatrix(){
        return this.#isPerspective 
            ? getProjectionMatrix(this.#screenHeight, this.#screenWidth, this.#fieldOfView, this.#near, this.#far)
            : getOrthoMatrix(this.#left, this.#right, this.#bottom, this.#top, this.#near, this.#far);
    }

    get fieldOfView() {
        return this.#fieldOfView;
    }

    /**
     * 
     * @param {ArrayLike<number>} position 
     */
    set position(val){
        if(val.length === 3){
            this.#position = new Float32Array([...val, 1]);
        } else {
            this.#position = new Float32Array(val);
        }
    }
    get position(){
        return this.#position;
    }

    set direction(val){
        if(val.length === 3){
            this.#direction = new Float32Array([...val, 0]);
        } else {
            this.#direction = new Float32Array(val);
        }
    }
    get direction(){
        return this.#direction;
    }

    set name(val){
        this.#name = val;
    }
    get name(){
        return this.#name;
    }
}

We'll need to add the name to the Camera class so that we can keep track of it. Unlike Light which could have also used an explicit key, Cameras are more important to identify and carrying the name around as floating data would just get messier. The big thing here is that the Camera no longer has a target it's based entirely on direction with new methods to lookAt a certain position. This is because a direction is more intuitive to transform.

I've changed the methods that generate view matrices to be more explicitly named and to take homogeneous coordinates.

//vector.js

export function crossVector(a, b, isHomogeneous = false) {
    if(isHomogeneous){
        return new Float32Array([
            a[1] * b[2] - a[2] * b[1],
            a[2] * b[0] - a[0] * b[2],
            a[0] * b[1] - a[1] * b[0],
            0
        ]);
    }
    return new Float32Array([
        a[1] * b[2] - a[2] * b[1],
        a[2] * b[0] - a[0] * b[2],
        a[0] * b[1] - a[1] * b[0],
    ]);
}

/**
 * Creates the matrix to convert world space coordinates into camera space looking a particular direction
 * @param {Float32Array} position position of camera, homogeneous (4-value)
 * @param {Float32Array} direction direction of camera, homogeneous (4-value)
 * @param {Float32Array?} up direction of up, homogeneous (4-value)
 * @returns 
 */
export function getWorldToCameraMatrixFromDirection(position, direction, up = UP) {
    const forward = normalizeVector(direction);

    if(Math.abs(dotVector(forward, up)) > 0.999){
        up = Math.abs(forward[1]) < 0.999 ? UP : FORWARD;
    }

    const right = normalizeVector(crossVector(up, forward, true));
    const newUp = crossVector(forward, right, true);


    return new Float32Array([
        right[0], newUp[0], forward[0], 0,
        right[1], newUp[1], forward[1], 0,
        right[2], newUp[2], forward[2], 0,
        -dotVector(position, right), -dotVector(position, newUp), -dotVector(position, forward), 1
    ]);
}

/**
 * Creates the matrix to convert camera space coordinates into world space looking a particular direction
 * @param {Float32Array} position position of camera, homogeneous (4-value)
 * @param {Float32Array} direction direction of camera, homogeneous (4-value)
 * @param {Float32Array?} up direction of up, homogeneous (4-value)
 */
export function getCameraToWorldMatrixFromDirection(position, direction, up = UP) {
    const forward = normalizeVector(direction);

    // choose a stable up if forward is (nearly) parallel to provided up
    if (Math.abs(dotVector(forward, up)) > 0.999) {
        up = Math.abs(forward[1]) < 0.999 ? UP : FORWARD;
    }

    const right = normalizeVector(crossVector(up, forward, true));
    const newUp = crossVector(forward, right, true);

    return new Float32Array([
        right[0], newUp[0], forward[0], position[0],
        right[1], newUp[1], forward[1], position[1],
        right[2], newUp[2], forward[2], position[2],
        0,        0,        0,         1
    ]);
}

/**
 * Creates the matrix to convert world space coordinates into camera space looking at a particular target
 * @param {Float32Array} position position of camera, homogeneous (4-value)
 * @param {Float32Array} direction direction of camera, homogeneous (4-value)
 * @param {Float32Array?} up direction of up, homogeneous (4-value)
 * @returns 
 */
export function getWorldToCameraMatrixFromTarget(position, target, up = UP){
    return getWorldToCameraMatrixFromDirection(position, subtractVector(target, position), up);
}

There were a number of bugs I noticed moving to a different environment map. The biggest is that when creating the inverse of the view matrix I needed to remove the translation bits otherwise the movement of the cubemap gets wonky. This sorta makes sense because the cubemap is infinite distance so moving shouldn't impact it, only rotations.

setBackgroundSceneBindGroup(passEncoder, bindGroupLayouts, camera) {
    const viewRotationOnly = camera.viewMatrix.slice();
    viewRotationOnly[12] = 0;
    viewRotationOnly[13] = 0;
    viewRotationOnly[14] = 0;
    const inverseViewMatrix = getInverse(viewRotationOnly, [4, 4])
    const sceneBuffer = this.#device.createBuffer({
        size: inverseViewMatrix.byteLength,
        usage: GPUBufferUsage.UNIFORM | GPUBufferUsage.COPY_DST,
        label: "background-scene-buffer"
    });
    this.#device.queue.writeBuffer(sceneBuffer, 0, inverseViewMatrix);
    const sceneBindGroup = this.#device.createBindGroup({
        label: "background-scene-bind-group",
        layout: bindGroupLayouts.get("scene"),
        entries: [
            {
                binding: 0,
                resource: {
                    buffer: sceneBuffer,
                    offset: 0,
                    size: inverseViewMatrix.byteLength
                }
            }
        ]
    });
    passEncoder.setBindGroup(0, sceneBindGroup);
}

Hopefully now everything comes together and still works as expected:

There are probably some spots I missed in my constant refactoring to call out. Please see the code changes for full details.

Convention Bugs

One issue I had was that with some transforms the light/normals/camera would rotate the opposite direction. There were several areas where I had to clean up some convention because I wasn't careful, and wasn't paying too much attention to which direction the transforms happened. Most of these stemmed the difference between WGSL's matrix ordering. In WGSL matrices are column major but in javascript they are row major (because that's how we define them). This means they need to transposed when passed to the GPU to be in the right order. I wasn't doing this in all cases which causes reversed rotations. I'm still not with the view and projection matrices but that's mostly because I didn't do a full investigation of what's happening there but I suspect it's wrong. Perhaps in the future this needs to be solved the the struct packer. To be clear, the convention is right-handed rotations. This means if you take your right hand and point your thumb along the rotation axis pointing in the position direction your fingers will curl the direction of the rotation. If you see weird rotation bugs double check these conventions.

Conclusion

We cleaned up the code and make things just a little bit more consistent which should make adding future features easier. With the scene graph we could do interesting things like scoping too. These refactors would be a lot harder to dig into without tests so I'm glad we have them even if they are a bit messy.