The Magic of the LookAt Matrix
I find math to sometimes be hard, sometimes fun, sometimes magical, and sometimes hard-fun-magical. Linear Algebra¹ is the mathematics behind a lot of fun tech, such as VR, AR, graphics, machine learning, data science buzz words etc.
I’ve been on a WebGPU 3D computer graphics kick lately (last week I wrote about making a triangle in WebGPU). When it came time to implement a camera, I figured I could just instantiate some sort of camera object and move on.
I quickly learned that the camera in 3D graphics does not exist. It’s all smoke and mirrors. We give the illusion that it does exist through the magic of linear algebra. Let’s see what I mean by taking a look at (ha!) the LookAt matrix.
The LookAt Matrix is a great exercise in linear algebra. It encompasses the usage of the dot product and cross product. It involves vectors. It involves matrices.
Anyway, it turned out to be a fun learning opportunity to really drive some key math concepts down. I’d like to share this knowledge with you.
Some Theory
The LookAt matrix is a matrix that transforms something to look at a point in space. Let’s keep our discussion limited to the application of the LookAt Matrix to cameras.
Namely, we can use the LookAt matrix to transform the positions of the objects within the 3D scene to give the illusion that they are being viewed from the lens of the camera.
Let’s take as an example a 3D scene containing a camera and a 3D ball — and we apply a LookAt matrix to the camera that transforms it to view a red ball from a certain position in 3D, we might expect to see something like this:
A camera looking at a red ball after applying the LookAt matrix
Additionally, we might instead want to view the world from the lens of our camera (which is a much more common application)… so we’ll transform the ball / plane instead to give the illusion that we’re viewing it from the perspective of the camera:
What we might see if we looked through the camera at our scene.
Over the next few sections, we’ll see how we can calculate both of these views. One where we move the camera (and the world remains constant), and where the camera remains static (at origin, looking down the negative z-axis) and the world moves instead.
Some Code
Poof! Here’s the magic trick in full (I prototyped it for a WebGPU app, so it’s in Javascript).
⚠️ Take note that I am actually calculating the LookAt Matrix that we would use to move scene objects in relation to the camera (explanation at the end of the article).
Some Calculations
Solving for this matrix is really attempting to model this camera through the careful calculation of it’s coordinate system in relation to world space. Or more simply put, we need to find the vectors forwardVector
, upVector
, and rightVector
of the camera in relation to the ball’s coordinate system.
This feels a bit like sorcery. We can take very minimal information and, in the end, come up with an entire matrix representing an orthonormal coordinate system like shown in the image:
A representation of a 3D scene where a camera is looking at a red ball. Notice how the forward vector points towards the camera.
We’ll start our journey through this Magical Math forest, knowing that our camera here has a certain position in space, as does our red ball. We’ll also assume the conventions of a right-handed coordinate system, so we’re all on the same page.
Step One: Calculate the Forward Axis direction
This is actually very doable. Given our camera position and red ball position, we can calculate the direction of the forwardVector
through vector subtraction:
forwardVector = normalize(cameraPosition — redBallPosition)
⚠️ Remember to normalize the result of the vector subtraction, since we want the directional vector, which is a unit vector.
Step one, done. Let’s hop on our broomsticks to Step Two.
Step Two: Calculate the Right Axis direction
This step involves quite a bit of witchcraft, and I’m not a huge fan of the handwavy-ness of the steps that follow but, hey, math is magic.
Let’s list out some things we know to be true first before arriving at our next step:
- We are re-constructing an orthonormal 3D coordinate system based on the position and rotation of our camera
- The right axis that we are looking for is orthogonal to the forward axis and the up axis (which is still unknown, but disregard that for now).
That was a lot to say, but given all that, we can conclude that we need to find the cross product of the forward and up axes. Wikipedia describes the cross product as:
Given two linearly independent vectors a and b, the cross product, a × b (read “a cross b”), is a vector that is perpendicular to both a and b
An example of the cross product — using the up and forward axes, we can find a vector perpendicular to both of them — the right axis.
OK, well we know the forward vector (since we solved for it in Step One).
BUT, we still lack the up vector. Are we doomed? Nope… we just need magic.
Here’s the trick: We just need any-old vector that falls within the same plane formed by the forward and up vector… and not necessarily the actual up vector of the camera itself. A common convention in this case is to use (0, 1, 0) as the up vector.
The trick — we can define a tempUpVector in place of the actual up axis, since it will still lie within the plane formed by the up and forward axes.
Finally, we can solve for the rightVector
by doing the following calculation:
tempUpVector = (0, 1, 0)
rightVector = normalize(cross(tempUpVector, forwardVector))
Step Three: Calculate the Up Axis direction
It’s all smooth flying from here. We now know the rightVector
and the forwardVector
… so given our earlier assumptions (these three vectors are orthonormal), we know that to find the upVector
, we need to find the cross product of the forwardVector
and rightVector
!
upVector = normalize(cross(forwardVector, rightVector))
And that’s that. We have successfully calculated our 3 directional vectors.
Now we could start building our LookAt matrix, but we’re still missing a key component: the translation component.
Step Four: Calculate the Camera Translation Vector
Now we need to put on our Thinking Witch Hats to figure out how we will translate our camera so it gets to the position that we want it to be in our coordinate system.
Why not just translate it by the camera position we used to calculate the
_forwardVector_
?
That’s a great question. Let’s look at an example.
Let’s assume that our camera was originally at the 3D position in world coordinates (0,4,4) and our red ball is at origin (0,0,0).
A naive approach where we assume that the camera is translating along the axes defined on the bottom left (the blue/green arrows).
On the face of it, this looks like we should be able to just translate the camera by (0,4,4)… but it’s missing a key understanding of the order of matrix operations that I neglected to clarify earlier but will do now to drive home a point:
We are rotating our camera first, then translating.
⚠️ This is mainly convention, and it’s up to you to decide the order in which you want to perform matrix operations… but if you don’t follow the way I present it to you here, you will not arrive at the same result.
Basically, we need to think about how to translate the camera after having rotated it.
The correct approach where we rotate first, then translate along our new axes.
The real translation vector of the camera would be (0, 0, 5.65). since we are changing the coordinate basis (while maintaining its orthonormal properties).
The operation that we need to use to figure this out can be simplified by using the dot product. Read more about it here.
In short: the dot product will return the magnitude in which two vectors project over each other. Therefore, it will return zero for orthogonal vectors since they are perfectly not-overlapping each other. Not even a little bit. Here’s what the calculation may look like:
translationX = dot(positionOfCamera, rightVector);
translationY = dot(positionOfCamera, upVector);
translationZ = dot(positionOfCamera, forwardVector);
⚠️Remember to define _rightVector_
, _upVector_
and _forwardVector_
as unit vectors! The correct-ness of these calculations rely on it.
💡Thought experiment — Starting with an object at (0,0,0), can you imagine what translating first then rotating (around the origin) will do? Can you imagine rotating (around the origin) then translating will do?
Step Five: Build the Matrix!
Now just fill in the data into this matrix:
The Magic LookAt Matrix
That’s it, we’re done… but if you’re looking for how you can use this for a 3D application that you’re developing… well keep on reading!
Recall that I said at the very beginner that we would like to view this world from the lens of the camera. The typical name for a transformation matrix that does this operation is a ViewMatrix.
The key thing to understand is that the camera stays at origin and looks down the negative z-axis. It does not move. To simulate camera movement, we need to move the objects in the scene instead.
Therefore, we need to apply to the inverse of the LookAt matrix to the objects in the scene rather than the camera.
In the code, we calculate just that… the translation is inversed (turned negative), as is the rotation component. However, take note of a very important point that made me bang my head at work while trying to debug a rotation issue… the inverse of a rotation is it’s transpose.
So, our final inverse LookAt Matrix… or View Matrix, would be:
The inverse LookAt matrix is the ViewMatrix
You’re a wizard, reader, be proud of yourself for getting through all of this material. Until next time!
¹ for the uninitiated — it’s math for arrays and matrices. For the initiated — don’t crucify me for that gross vulgarization.
Top comments (2)
Hi,
I think there is a small error in your inverse matrix. If you inverse a matrix with no perspective component (=> (0, 0, 0, x) for last line), you end up with a matrix having no perspective component (=> (0, 0, 0, x) for last line):
wolframalpha.com/input?i=%5B%5Ba%2...
If you use your last viewMatrix in a 3D rendering, you will end up with very strange results 😉
I think you merge the 2 concepts:
Actually, you have to transpose only the rotation part, ie the 3x3 upper left matrix and minus only the translation part to, ie the right column, get your view matrix.
As a rule of thumb, unless you really know what you are doing and that you know you are doing something unusual, Model and View matrices should always have (0, 0, 0, 1) in there last line. Keep the perspective part to Projection matrix.
Thank you for this comment and thanks for reading!
For your second point, I think what was confusing me was the
gl-matrix
library implementation where the results are returned like such, so I assumed that it was a transpose of the entire matrix... but now upon some reflection, I think it may be just a difference between row major and column major implementations.