DEV Community: Carmen Cincotti

Lets Look At Magic LookAt Matrices

Carmen Cincotti — Mon, 25 Apr 2022 13:12:03 +0000

The Magic of the LookAt Matrix

I find math to sometimes be hard, sometimes fun, sometimes magical, and sometimes hard-fun-magical. Linear Algebra¹ is the mathematics behind a lot of fun tech, such as VR, AR, graphics, machine learning, data science buzz words etc.

I’ve been on a WebGPU 3D computer graphics kick lately (last week I wrote about making a triangle in WebGPU). When it came time to implement a camera, I figured I could just instantiate some sort of camera object and move on.

I quickly learned that the camera in 3D graphics does not exist. It’s all smoke and mirrors. We give the illusion that it does exist through the magic of linear algebra. Let’s see what I mean by taking a look at (ha!) the LookAt matrix.

The LookAt Matrix is a great exercise in linear algebra. It encompasses the usage of the dot product and cross product. It involves vectors. It involves matrices.

Anyway, it turned out to be a fun learning opportunity to really drive some key math concepts down. I’d like to share this knowledge with you.

Some Theory

The LookAt matrix is a matrix that transforms something to look at a point in space. Let’s keep our discussion limited to the application of the LookAt Matrix to cameras.

Namely, we can use the LookAt matrix to transform the positions of the objects within the 3D scene to give the illusion that they are being viewed from the lens of the camera.

Let’s take as an example a 3D scene containing a camera and a 3D ball — and we apply a LookAt matrix to the camera that transforms it to view a red ball from a certain position in 3D, we might expect to see something like this:

A camera looking at a red ball after applying the LookAt matrix

Additionally, we might instead want to view the world from the lens of our camera (which is a much more common application)… so we’ll transform the ball / plane instead to give the illusion that we’re viewing it from the perspective of the camera:

What we might see if we looked through the camera at our scene.

Over the next few sections, we’ll see how we can calculate both of these views. One where we move the camera (and the world remains constant), and where the camera remains static (at origin, looking down the negative z-axis) and the world moves instead.

Some Code

Poof! Here’s the magic trick in full (I prototyped it for a WebGPU app, so it’s in Javascript).

⚠️ Take note that I am actually calculating the LookAt Matrix that we would use to move scene objects in relation to the camera (explanation at the end of the article).

Some Calculations

Solving for this matrix is really attempting to model this camera through the careful calculation of it’s coordinate system in relation to world space. Or more simply put, we need to find the vectors forwardVector , upVector , and rightVector of the camera in relation to the ball’s coordinate system.

This feels a bit like sorcery. We can take very minimal information and, in the end, come up with an entire matrix representing an orthonormal coordinate system like shown in the image:

A representation of a 3D scene where a camera is looking at a red ball. Notice how the forward vector points towards the camera.

We’ll start our journey through this Magical Math forest, knowing that our camera here has a certain position in space, as does our red ball. We’ll also assume the conventions of a right-handed coordinate system, so we’re all on the same page.

Step One: Calculate the Forward Axis direction

This is actually very doable. Given our camera position and red ball position, we can calculate the direction of the forwardVector through vector subtraction:

forwardVector = normalize(cameraPosition — redBallPosition)

⚠️ Remember to normalize the result of the vector subtraction, since we want the directional vector, which is a unit vector.

Step one, done. Let’s hop on our broomsticks to Step Two.

Step Two: Calculate the Right Axis direction

This step involves quite a bit of witchcraft, and I’m not a huge fan of the handwavy-ness of the steps that follow but, hey, math is magic.

Let’s list out some things we know to be true first before arriving at our next step:

We are re-constructing an orthonormal 3D coordinate system based on the position and rotation of our camera
The right axis that we are looking for is orthogonal to the forward axis and the up axis (which is still unknown, but disregard that for now).

That was a lot to say, but given all that, we can conclude that we need to find the cross product of the forward and up axes. Wikipedia describes the cross product as:

Given two linearly independent vectors a and b, the cross product, a × b (read “a cross b”), is a vector that is perpendicular to both a and b

An example of the cross product — using the up and forward axes, we can find a vector perpendicular to both of them — the right axis.

OK, well we know the forward vector (since we solved for it in Step One).

BUT, we still lack the up vector. Are we doomed? Nope… we just need magic.

Here’s the trick: We just need any-old vector that falls within the same plane formed by the forward and up vector… and not necessarily the actual up vector of the camera itself. A common convention in this case is to use (0, 1, 0) as the up vector.

The trick — we can define a tempUpVector in place of the actual up axis, since it will still lie within the plane formed by the up and forward axes.

Finally, we can solve for the rightVector by doing the following calculation:

tempUpVector = (0, 1, 0)

rightVector = normalize(cross(tempUpVector, forwardVector))

Step Three: Calculate the Up Axis direction

It’s all smooth flying from here. We now know the rightVector and the forwardVector… so given our earlier assumptions (these three vectors are orthonormal), we know that to find the upVector, we need to find the cross product of the forwardVector and rightVector!

upVector = normalize(cross(forwardVector, rightVector))

And that’s that. We have successfully calculated our 3 directional vectors.

Now we could start building our LookAt matrix, but we’re still missing a key component: the translation component.

Step Four: Calculate the Camera Translation Vector

Now we need to put on our Thinking Witch Hats to figure out how we will translate our camera so it gets to the position that we want it to be in our coordinate system.

Why not just translate it by the camera position we used to calculate the _forwardVector_ ?

That’s a great question. Let’s look at an example.

Let’s assume that our camera was originally at the 3D position in world coordinates (0,4,4) and our red ball is at origin (0,0,0).

A naive approach where we assume that the camera is translating along the axes defined on the bottom left (the blue/green arrows).

On the face of it, this looks like we should be able to just translate the camera by (0,4,4)… but it’s missing a key understanding of the order of matrix operations that I neglected to clarify earlier but will do now to drive home a point:

We are rotating our camera first, then translating.

⚠️ This is mainly convention, and it’s up to you to decide the order in which you want to perform matrix operations… but if you don’t follow the way I present it to you here, you will not arrive at the same result.

Basically, we need to think about how to translate the camera after having rotated it.

The correct approach where we rotate first, then translate along our new axes.

The real translation vector of the camera would be (0, 0, 5.65). since we are changing the coordinate basis (while maintaining its orthonormal properties).

The operation that we need to use to figure this out can be simplified by using the dot product. Read more about it here.

In short: the dot product will return the magnitude in which two vectors project over each other. Therefore, it will return zero for orthogonal vectors since they are perfectly not-overlapping each other. Not even a little bit. Here’s what the calculation may look like:

translationX = dot(positionOfCamera, rightVector);

translationY = dot(positionOfCamera, upVector);

translationZ = dot(positionOfCamera, forwardVector);

⚠️Remember to define _rightVector_ , _upVector_ and _forwardVector_ as unit vectors! The correct-ness of these calculations rely on it.

💡Thought experiment — Starting with an object at (0,0,0), can you imagine what translating first then rotating (around the origin) will do? Can you imagine rotating (around the origin) then translating will do?

Step Five: Build the Matrix!

Now just fill in the data into this matrix:

The Magic LookAt Matrix

That’s it, we’re done… but if you’re looking for how you can use this for a 3D application that you’re developing… well keep on reading!

Recall that I said at the very beginner that we would like to view this world from the lens of the camera. The typical name for a transformation matrix that does this operation is a ViewMatrix.

The key thing to understand is that the camera stays at origin and looks down the negative z-axis. It does not move. To simulate camera movement, we need to move the objects in the scene instead.

Therefore, we need to apply to the inverse of the LookAt matrix to the objects in the scene rather than the camera.

In the code, we calculate just that… the translation is inversed (turned negative), as is the rotation component. However, take note of a very important point that made me bang my head at work while trying to debug a rotation issue… the inverse of a rotation is it’s transpose.

So, our final inverse LookAt Matrix… or View Matrix, would be:

The inverse LookAt matrix is the ViewMatrix

You’re a wizard, reader, be proud of yourself for getting through all of this material. Until next time!

¹ for the uninitiated — it’s math for arrays and matrices. For the initiated — don’t crucify me for that gross vulgarization.

Resources

Drawing a Triangle with WebGPU

Carmen Cincotti — Fri, 22 Apr 2022 14:12:19 +0000

WebGPU is part of the new generation of rendering, created for the web.

If you’re experienced in other graphics APIs such as Vulkan, you’ll probably find that WebGPU isn’t so so different. However, if you come from a WebGL background, you’ll need to wrap your head around the non-state-machine-esque behavior of WebGPU… and if you’re completely new to 3D rendering inside the browser, welcome!

I propose that we create a triangle to introduce ourselves to the WebGPU API.

The Goal

We want to draw this bad boy in a browser that supports WebGPU rendering.

The awesome triangle that we would like to draw.

The Code

Here’s the code for those that just want to copy and paste, I’ll be walking through it and providing descriptions of each section later down this post.

⚠️ To run the code, you must use a browser with the WebGPU flag enabled as it is still not enabled by default (it’s that brand-spanking new!). See this resource for more information.

Click here to view a Demo.

Adapter and Device

We first start by ensuring the availability of WebGPU in our browser in the first few lines of the script. If it is available, we then initialize WebGPU’s adapter and device, and the HTML Canvas.

What is an adapter and a device?

Adapter is like VkPhysicalDevice (if you are experienced in Vulkan). WebGPU allows us to get a list of all the GPUs in our computer. You can also pass an argument to the function to select a GPU of a certain type. For example, you may want to select from multiple GPUs in your system, like a low-power GPU for battery-powered use and a high-powered GPU for plugged-in use.
Device is like VkDevice. This is the GPU driver on the hardware graphics card and our method of communicating with it. I think of it as the API of the graphics card.

Swap Chain

After creating the Canvas context, we configure it.

These lines may look weird and abstract… we are setting up the swap chain and the Canvas context at the same time.

⚠️ If you read a WebGPU tutorial from the past, you might see the use of a deprecated method where you explicitly set up the swap chain.

What is a swap chain? The main role of the swap chain is to synchronize the presentation of our generated images with our screen refresh rate. It is a queue that contains images waiting to be displayed.

Vertices

We have to pack our array of vertices into a single giant array buffer. Each vertex consists of six floats in this case (three to represent position, and three to represent color). If we want to include other attributes, we must add more floats to each vertex.

After defining our vertices, we create the vertexBuffer which is the buffer that will live in the GPU. We are responsible for filling it at this point. The act of “mapping” a buffer is important to its operation.

A mapped buffer means that the CPU can write to it and the GPU cannot.
Conversely, if the buffer is unmapped, the GPU will be able to read it, and the CPU will be prohibited.

This is why we designate mappedAtCreation as true during the creation stage. We can then invoke .set to copy our vertices into the buffer. Finally, we remove the CPU’s write access, and grant GPU read access, by calling vertexBuffer.unmap().

The vertexBuffersDescriptors are instructions telling the GPU how to decode the buffer. In our case, we use 32 bytes to describe all attributes of a vertex. In our shaders, the GPU will be able to find the position vector at offset 0, and the color vector at offset 16.

The Vertex and Fragment Shader

These shaders are simple. We define them using WGSL, which is like Rust. There are no surprises in this code, and I invite you to review shader code tutorials in order to follow this bit of code.

The Rendering Pipeline

Finally, we define the rendering pipeline which is just a simple, and sort’ve boilerplate-y configuration. We combine our shaders and vertex attributes while defining the type of primitive that will be generated.

The Animation Frame and Command Buffers

We start our animation! Another difference from WebGL is this idea of a command buffer. We use a command buffer to pre-record all drawing operations so that WebGPU can process them more efficiently. The advantage is that it will reduce the bandwidth between CPU and GPU (and therefore performance will improve) and we can fill this buffer in parallel using multiple threads if we choose to do so.

The commandEncoder is responsible for receiving our render commands. To create a command buffer that can be submitted to the GPU, we call .finish on the encoder. The received command buffer is passed to the device to be executed, then our triangle will be rendered!

Finally, the image of our triangle will be written to the swap chain, then displayed on the canvas!

What you should see!