ndesmic

Posted on Jul 14

WebGPU Engine from Scratch Part 2: Geometry

#webgpu #vanillajs #computergraphics

For the next part I wanted to improve how I generate meshes. From WebGL we generated objects with lists of positions, colors, centroids, triangles, uvs, and normals. This made sense in the WebGL world because these attributes are separate. In WebGPU they all get stuffed into a single buffer. The question then is how to pass these around in an agnostic way?

If we return the (CPU-side) buffer then we might have to carry more data than we intend. Maybe we use colors, maybe not, but if not we shouldn't waste space with them. Unless that's just parameterized into the generation function? Or maybe it comes with a description similar to the vertex buffer descriptor? If we give back a structure with lists of attributes then we have to zip and pack them back up before use. One other thing to consider is using parameterized curve like NURBS which can be rendered with ray tracing but need to be tessellated for a raster 3D engine. I'd like to support these too.

I think in this case it makes sense to split the job. If we have curves or more attributes than needed a separate process will pack them.

Geometry generation

For now we're still just generating shapes. I've added on function to generate a quad:

/**
 * Generates a quad facing negative Z, like a wall
 * @returns {Mesh}
 */
export function quad(){
    return {
        positions: new Float32Array([
            -1.0, -1.0, 0.0,
            1.0, -1.0, 0.0,
            1.0, 1.0, 0.0,

            -1.0, -1.0, 0.0,
            1.0, 1.0, 0.0,
            -1.0, 1.0, 0.0,
        ]),
        uvs: new Float32Array([
            0.0, 1.0,
            1.0, 1.0,
            1.0, 0.0,

            0.0, 1.0,
            1.0, 0.0,
            0.0, 0.0
        ]),
        length: 6
    }
}

Each attribute is it's own array and we have a final length indicating how many items there are.

To get that into something GPU can use we need to pack it based on how big each attribute is and which ones we're actually using.

/**
 * 
 * @param {Float32Array} buffer 
 * @param {Array} attributes 
 * @param {number} index 
 * @param {number} attributeOffset offset in terms of indices (not bytes, assume f32)
 * @param {number} attributeLength number of values per vertex 
 * @param {number} elementStride number of indicies per element
 * @returns 
 */
function packAttribute(buffer, attributes, index, attributeOffset, attributeLength, elementStride){
    if(!attributeLength) return;
    for (let j = 0; j < attributeLength; j++) {
        buffer[index * elementStride + attributeOffset + j] = attributes[index * attributeLength + j]
    }
}
/**
 * 
 * @param {{ positions: Float32Array, colors?: Float32Array, uvs?: Float32Array, normals?: Float32Array, indices?: Float32Array, length: number }} meshAttributes 
 * @param {{ positions: number, colors: number, indicies: boolean, uvs: number, normals: number }} options number of 32-bit elements per vertex
 */
export function packMesh(meshAttributes, options){
    const stride = (options.positions ?? 0) + (options.colors ?? 0) + (options.uvs ?? 0) + (options.normals ?? 0); //stride in terms of indices (not bytes, assume F32s)
    const buffer = new Float32Array(stride * meshAttributes.length);

    const positionOffset = 0;
    const colorOffset = options.positions ?? 0;
    const uvOffset = colorOffset + (options.colors ?? 0);
    const normalOffset = uvOffset + (options.uvs ?? 0);

    for(let i = 0; i < meshAttributes.length; i++){
        packAttribute(buffer, meshAttributes.positions, i, positionOffset, options.positions, stride);
        packAttribute(buffer, meshAttributes.colors, i, colorOffset, options.colors, stride);
        packAttribute(buffer, meshAttributes.uvs, i, uvOffset, options.uvs, stride);
        packAttribute(buffer, meshAttributes.normals, i, normalOffset, options.normals, stride);
    }

    return buffer;
}

This is a tad more hard-coded than I'd like but it'll do for now. We can use it like this:

const mesh = new Mesh(quad());
const vertices = packMesh(mesh, { positions: 3, uvs: 2 });

To pack everything in one buffer. We'll be using the same Mesh class as before for now (with a few small changes in naming and trying to use Float32Array more). This also means we'll be using vector.js from the last project but again modified to use Float32Arrays where possible. I won't show this part for brevity as it's not interesting so you can view the source if necessary.

Why Float32Arrays?

I have previous posts about this but because these are non-dynamic and allocate exactly enough they are more efficient that js arrays. Javascript numbers are 64-bit so we save half the amount of memory too. In the past I used nested arrays to maintain shape of matrices which was a bad choice that needlessly complicated things and made it less efficient. Instead, we'll pass through a shape parameter which tells us what the shape of the matrix is (another idea gleaned from the machine learning series that can easily be converted to tensors if needed). Finally, less overall conversion is necessary if we stick to one thing. Conversion gets expensive, so the fewer conversions we do internally between array types the better and all WebGPU APIs use TypedArrays. 32-bit is just a general standard. Probably not going to need more precision than that.

We can also update the vertexBufferDescriptor to take into account 3d and UVs.

initializePipelines(){
    const vertexBufferDescriptor = [{
        attributes: [
            {
                shaderLocation: 0,
                offset: 0,
                format: "float32x3"
            },
            {
                shaderLocation: 1,
                offset: 12,
                format: "float32x2"
            }
        ],
        arrayStride: 20,
        stepMode: "vertex"
    }];
    //more stuff...
}

Camera

We'll be using the same camera class as before with small changes mostly to use more Float32Arrays.

aysnc initialize()
    //stuff...
    this.initializeCamera();
}

initializeCameras(){
    this.#cameras.set("main", new Camera({
        position: [0, 0, -2],
        screenHeight: this.#canvas.height,
        screenWidth: this.#canvas.width,
        fieldOfView: 90,
        near: 0.01,
        far: 5,
        isPerspective: true
    }))
}

I'm using the same principal as before of keeping track of the objects in maps (in case we have more than one camera).

BindGroups

When we go to do perspective rendering, we need to pass in uniforms as we did in WebGL. WebGPU makes this very manual but a lot more flexible.

setBindGroups(passEncoder, bindGroupLayouts, camera, mesh){
    const viewMatrix = camera.getViewMatrix();
    const projectionMatrix = camera.getProjectionMatrix();
    const modelMatrix = mesh.getModelMatrix();
    const normalMatrix = getTranspose(getInverse(trimMatrix(multiplyMatrix(modelMatrix, [4,4], viewMatrix, [4,4]), [4,4], [3
    const cameraPosition = camera.getPosition();
    const alignment = getAlignments([
        "mat4x4f32",
        "mat4x4f32",
        "mat4x4f32",
        "mat3x3f32",
        "vec3f32"
    ]);
    const bufferSize = alignment.totalSize;
    const environmentUniformBuffer = this.#device.createBuffer({
        size: bufferSize,
        usage: GPUBufferUsage.UNIFORM | GPUBufferUsage.COPY_DST,
        label: "environment-uniform-buffer"
    });
    const environmentUniformData = new Float32Array(bufferSize / 4);
    environmentUniformData.set(viewMatrix, alignment.offsets[0] / 4); 
    environmentUniformData.set(projectionMatrix, alignment.offsets[1] / 4),
    environmentUniformData.set(modelMatrix, alignment.offsets[2] / 4); 
    environmentUniformData.set(normalMatrix, alignment.offsets[3] / 4);
    environmentUniformData.set(cameraPosition, alignment.offsets[4]/ 4);
    this.#device.queue.writeBuffer(environmentUniformBuffer, 0, environmentUniformData);
    const environmentBindGroup = this.#device.createBindGroup({
        label: "environment-bind-group",
        layout: bindGroupLayouts[0],
        entries: [
            {
                binding: 0,
                resource: {
                    buffer: environmentUniformBuffer,
                    offset: 0,
                    size: bufferSize
                }
            }
        ]
    });
    passEncoder.setBindGroup(0, environmentBindGroup);
}

Again things have been modified slightly to use Float32Arrays. In this code we stuff all the matrices into a single buffer and upload it to the GPU and associated it with the bind group that will be defined in the shader.

Alignment

One serious note here is getAlignments. You'll probably get bitten by this but the buffers can't be any size, they need to be multiples of something (for architectural reasons). In general the buffer needs to be a multiple of 16 but not any multiple of 16 because the data inside the buffer has to be padded. For instance the vec3<f32> for the camera cannot start on a multiple of 8, only 16 because it has an alignment of 16 (see: https://www.w3.org/TR/WGSL/#alignment-and-size). Even certain types are not the size you might think, a mat4x3<f32> is 64 bytes for example, not 48. So we need to use the table in the spec to see how each value fits into the buffer.

const gpuTypeAlignSize = {
    bool: [4,4],
    i32: [4,4],
    u32: [4,4],
    f32: [4,4],
    f16: [2,2],
    atomic: [4,4],
    vec2bool: [8,8],
    vec2i32: [8,8],
    vec2u32: [8,8],
    vec2f32: [8,8],
    vec2f16: [4,4],
    vec3bool: [16,12],
    vec3i32: [16,12],
    vec3u32: [16,12],
    vec3f32: [16,12],
    vec3f16: [8,6],
    vec4bool: [16,16],
    vec4i32: [16,16],
    vec4u32: [16,16],
    vec4f32: [16,16],
    vec4f16: [8,8],
    mat2x2f32: [8,16],
    mat2x2f16: [4,8],
    mat3x2f32: [8,24],
    mat3x2f16: [4,12],
    mat4x2f32: [8,32],
    mat4x2f16: [4,16],
    mat2x3f32: [16,32],
    mat2x3f16: [8,16],
    mat3x3f32: [16,48],
    mat3x3f16: [8,24],
    mat4x3f32: [16,64],
    mat4x3f16: [8,32],
    mat2x4f32: [16,32],
    mat2x4f16: [8,16],
    mat3x4f32: [16,48],
    mat3x4f16: [8,24],
    mat4x4f32: [16,64],
    mat4x4f16: [8,32]
}

/**
 * 
 * @param {number} size 
 * @param {number} smallestUnitSize 
 * @returns
 */
export function getPaddedSize(size, smallestUnitSize){
    const remainder = size % smallestUnitSize;
    if(remainder === 0){
        return size;
    }
    return size + smallestUnitSize - remainder;
}

/**
 * @typedef {keyof gpuTypeAlignSize} GpuType
 * @param {GpuType[]} typesToPack 
 */
export function getAlignments(typesToPack){
    let offset = 0;
    let maxAlign = 0;
    const offsets = new Array(typesToPack.length);
    for(let i = 0; i < typesToPack.length; i++){
        const alignmentSize = gpuTypeAlignSize[typesToPack[i]];
        if(maxAlign < alignmentSize[0]){
            maxAlign = alignmentSize[0];
        }
        offset = getPaddedSize(offset, alignmentSize[0])
        offsets[i] = offset;
        offset += alignmentSize[1];
    }
    return {
        offsets,
        totalSize: getPaddedSize(offset, maxAlign)
    };
}

This function uses that table to fit each value and gives the offset and the total size. This should help alleviate those annoying padding issues. You can do it manually, but you will probably encounter issues eventually.

One issue I'm not solving is optimal packing. Due to the padding, depending on the order you can save space by ordering properties in the right way. For example,

f16
mat4x4<f32>
f16

Would need 2 bytes, then 14 wasted bytes of padding, and then 64 bytes and then 2 bytes aligned to 16 (because that's the max align in the struct). So 16 + 64 + 16 = 96. If instead we had done

mat4x4<f32>
f16
f16

Then we need 64 bytes, and then 2 bytes plus 2 bytes aligned to 16. So 64 + 16 = 80 saving 16 bytes. Something to consider.

BindGroup layouts

In order for the shader to understand how our data was packed into the buffer it needs the bind group layout. This can be automatic (generated from the shader code) or manual.

Auto

const pipelineDescriptor = {
    label: "pipeline",
    vertex: {
        module: shaderModule,
        entryPoint: "vertex_main",
        buffers: vertexBufferDescriptor
    },
    fragment: {
        module: shaderModule,
        entryPoint: "fragment_main",
        targets: [
            { format: "rgba8unorm" }
        ]
    },
    primitive: {
        topology: "triangle-list"
    },
    layout: "auto"
};


// construct pipleline and other stuff ...

const bindGroup = this.#device.createBindGroup({
    label: "bind-group",
    layout: pipleine.getBindGroupLayout(0)
    entries: [
        {
            binding: 0,
            resource: {
                buffer: bindGroupUniformBuffer,
                offset: 0,
                size: bufferSize
            }
        }
    ]
});

By setting "auto" in the pipeline layout you have WebGPU do this for you. It will look at the bind groups you defined in your shader code and generate it. This saves some code. However, it cannot be reused with other shaders and you need to watch out if you don't use a value in the shader because the shader compiler will tree shake it and the auto bind group won't have it anymore.

Manual

const environmentBindGroupLayout = this.#device.createBindGroupLayout({
    label: "bind-group-layout",
    entries: [
        {
            binding: 0,
            visibility: GPUShaderStage.VERTEX | GPUShaderStage.FRAGMENT,
            buffer: {
                type: "uniform"
            }
        }
    ]
});
const pipelineLayout = this.#device.createPipelineLayout({
    label: "pipeline-layout",
    bindGroupLayouts: [
        environmentBindGroupLayout
    ]
});
const pipelineDescriptor = {
    label: "pipeline",
    vertex: {
        module: shaderModule,
        entryPoint: "vertex_main",
        buffers: vertexBufferDescripto
    },
    fragment: {
        module: shaderModule,
        entryPoint: "fragment_main",
        targets: [
            { format: "rgba8unorm" }
        ]
    },
    primitive: {
        topology: "triangle-list"
    },
    layout: pipelineLayout
};

// construct pipleline and other stuff ...

const environmentBindGroup = this.#device.createBindGroup({
    label: "environment-bind-group",
    layout: bindGroupLayouts[0],
    entries: [
        {
            binding: 0,
            resource: {
                buffer: environmentUniformBuffer,
                offset: 0,
                size: bufferSize
            }
        }
    ]
});

This is the same thing, just manual. This can be reused but is more verbose.

Organizing BindGroups

Another note is to use WGSL structs. For example:

struct Environment {
    view_matrix: mat4x4<f32>,
    projection_matrix: mat4x4<f32>,
    model_matrix: mat4x4<f32>,
    camera_position: vec3<f32>
}

@group(0) @binding(0) var<uniform> environment : Environment;

@group(0) @binding(0) var<uniform> view_matix : mat4x4<f32>;
@group(0) @binding(1) var<uniform> projection_matrix : mat4x4<f32>;
@group(0) @binding(2) var<uniform> model_matrix : mat4x4<f32>;
@group(0) @binding(3) var<uniform> camera_position : vec3<f32>;

The latter will require at least 256 bytes per binding so even though a mat4x4<f32> is 64 bytes the minimum space you need to allocate is 256 * 4 because of alignment issues for uniforms in bindings. So to not waste space a struct is better and cleaner.

Pass Encoding

render() {
    const commandEncoder = this.#device.createCommandEncoder({
        label: "main-command-encoder"
    });
    const pipelineContainer = this.#pipelines.get("main");
    const camera = this.#cameras.get("main");
    const meshContainer = this.#meshes.get("background");
    const passEncoder = commandEncoder.beginRenderPass({
        label: "main-render-pass",
        colorAttachments: [
            {
                loadValue: { r: 0, g: 0, b: 0, a: 1 },
                storeOp: "store",
                loadOp: "load",
                view: this.#context.getCurrentTexture().createView()
            }
        ]
    });
    passEncoder.setPipeline(pipelineContainer.pipeline);
    this.setBindGroups(passEncoder, pipelineContainer.bindGroupLayouts, camera
    passEncoder.setVertexBuffer(0, meshContainer.vertices);
    passEncoder.draw(6); //TODO need index buffer
    passEncoder.end();
    this.#device.queue.submit([commandEncoder.finish()]);
}

For the bindgroups we need to book keep some stuff and pass it into setBindGroups which was shown above and builds out the data for the Model-View-Projection matrix. Beyond that not much has changed here.

Using indices

Like before we can be more efficient by using indices to reuse vertices. So we'll do that too. First we need to setup the indices with the geometry. Luckily this was already done. The property on the mesh object was called triangles previously, I changed it to indices to make the naming more consistent. Then we need to upload it to the GPU (probably immediately after we upload the vertices).

const indexBuffer = this.#device.createBuffer({
        size: mesh.indices.byteLength,
        usage: GPUBufferUsage.INDEX | GPUBufferUsage.COPY_DST,
});
this.#device.queue.writeBuffer(indexBuffer, 0, mesh.indices);

Only difference between vertices and indices in terms of uploading to the GPU is that indices are a UInt16Array and the have usage GPUBufferUsage.INDEX instead of GPUBufferUsage.VERTEX. Then in the passEncoder part:

passEncoder.setIndexBuffer(meshContainer.indexBuffer, "uint16");
//passEncoder.draw(...)
passEncoder.drawIndexed(meshContainer.mesh.indices.length);

Easy. Now a quad only needs to define 4 vertices instead of 6 because we can repeat them with the index buffer.

Now we should have a red quad on the screen.

Code

https://github.com/ndesmic/geo/releases/tag/v0.2

DEV Community