NeefMan

Posted on Mar 5

How I Built a 3D Renderer From Scratch in JavaScript

#gamedev #javascript #programming #webdev

I have played 3D video games for as long as I can remember. Seeing a fully three-dimensional world rendered onto a two-dimensional monitor always felt like magic to me. That changed when I encountered Tsoding’s video on YouTube, “One Formula that Demystifies 3D Graphics”. In the video, Tsoding programs a very simple 3D renderer from scratch in JavaScript which is capable of rendering, rotating, and translating a cube on the screen.

I was so fascinated by the whole process that I decided to build my own 3D renderer and extend it with more features such as near plane line clipping. This paper describes my minimal implementation of a 3D graphics pipeline and the discoveries I made while building it.

You can find the source code here.

3D Rendering Process

Rendering a 3D object onto a 2D display requires a sequence of transformations to be applied to each point (𝑥, 𝑦, 𝑧) in a scene. This sequence of operations is known as the 3D graphics pipeline. The following describes the full pipeline that I implemented.

Local Space Rotation
World Space Translation
Camera Transformation
Near Plane Clipping
Perspective Projection/Perspective Divide
Viewport Transformation

To start, consider this 3D cube which exists at the origin, (0, 0, 0), in space. You are inside of the cube looking at its walls.

Local Space Rotation

Rotation matrices only rotate points around the origin. Because of this, rotation must happen before translation. Otherwise, the object would orbit the origin instead of spinning in place.
I like to think of it this way:

Pre-translation rotation is like the Earth spinning on its axis
Post-translation rotation is like the Earth orbiting the sun

In other words, the order of object rotation matters because matrix multiplication is not commutative.
Here is the rotation matrix to rotate a Vector (x, y, z) in the x-y plane:
{x’ = xcosΘ - ysinΘ, y’ = xsinΘ + ycosΘ}
Below is the cube after it has been rotated in local space while remaining at the origin:

World Space Translation

After rotation, the object is translated away from the origin into world space. This step positions the object within the scene. Since rotation occurred first, the object maintains its orientation relative to its own center.

Below is the cube after it has been translated away from the origin while maintaining its previously described local space rotation:

Camera Transformation

Rather than moving the camera itself, the scene is transformed in the opposite direction of the camera’s movement so that the camera is always positioned at the origin looking down the positive z-axis. Again, since the rotation matrix rotates an object around the origin, this process is applied post world space translation so that the object rotates around the world axis rather than the local axis, simulating 3D camera movement.

Below is the cube after it has undergone a world space rotation while maintaining its previously described states:

Near Plane Clipping

One problem that I observed with the renderer was that when an object exists partially in front of and behind the camera, some pretty wonky things can happen with the rendering of said object.

Below is what the perspective looks like when the camera is inside of the cube without the near plane clipping implementation:

Now, here is what the perspective looks like when the camera is inside of the cube with the near plane clipping implementation:

As displayed above, segments that exist partially in the positive z-axis are clipped to include only the visible portion while lines that exist behind the positive z-axis are totally omitted.

Perspective Projection (Perspective Divide)

I consider the following process to be the most elegant part of the entire 3D graphics pipeline.
For a 3D point, (x, y, z):
We compute its projected 2D coordinates as:
(x’ = x / z, y’ = y / z)

Following this principle, consider two Vectors:
A = (x: -0.5, y: 0.5, z: 2) and B = (x: 0.5, y: 0.5, z: 5).
Then, apply the perspective projection to both vectors to yield two Vectors:
A’ = (x: -0.25, y: 0.25) and B’ = (x: 0.1, y: 0.1)
The points mentioned are graphed below.

Perspective projection caused both points to move closer to the origin, (0, 0, 0). Furthermore, it can be observed that the Vector B, with a larger z-value of 5 compared to the Vector A’s z-value of 2, shifts farther towards the origin post-projection. This demonstrates that objects with larger z-values (objects farther from the viewer) appear smaller due to perspective projection.

A simple test I like to run to physically observe this behavior goes as follows: Hold your hand over your eye and watch as you slowly extend your arm out in front of you. As your hand extends away from you, or its z-value increases, it appears closer to the center of your field of view relative to other objects.

This process creates the illusion of depth on a 2D plane.

Viewport Transformation

Up until this point, all algorithms in the pipeline assume that coordinates exist in normalized form within x, y ∈ [−1,1] otherwise known as Normalized Device Coordinates. These coordinates need to be transformed so they are in the screen space interval x ∈ [0, SCREEN_WIDTH] and y ∈ [0, SCREEN_HEIGHT] otherwise known as Screen Coordinates.

The graphic below displays the difference between Normalized Device Coordinates (NDC) and Screen Coordinates.

After this transformation has been completed, the point can finally be drawn to the screen.

Additional Observations

While implementing my renderer, some important insights became clear.

First, transformation order matters because matrix multiplication is not commutative. Swapping the order of processes in the pipeline will result in a different outcome. For example, a local space rotation behaves just like a camera transformation if it occurs after world space translation.

Second, graphics are mainly just linear algebra since the whole pipeline I have implemented consists of a group of Vectors being transformed and remapped with algebra.

Finally, perspective in 3D graphics is surprisingly simple since the whole illusion of 3D depth comes from a single division by z on each point.

Abstractions hide the beauty of 3D graphics from us. Since modern graphics APIs handle most low-level operations automatically, we never see how elegant the math behind 3D graphics are. Although a helpful tool in a game development environment, modern graphics APIs preemptively stop any low level learning about graphics from ever happening.

Future Improvements

Although this renderer already covers the fundamentals of the graphics pipeline, there are still many procedures that I could add for a more complete engine. Some possible additions are listed below.

Implementing Triangular rasterization instead of my current wireframes - Almost all 3D models are made out of triangles, so adding this would be a step closer to modern graphics APIs.
Add Back-face culling for performance - Naturally, many triangles will face away from the camera. Only those that face the camera should be rendered to improve performance.
Depth (Z) buffer - A depth buffer is a technique in 3D graphics that determines which objects are visible and which are hidden. If I wanted to render multiple objects in my renderer I would need to add a depth buffer since it solves the problem of which object should be visible when multiple objects overlap each other in space.
Lighting - Another upgrade that can further the illusion of 3D depth on a 2D display is lighting. I can implement Lambertian diffuse lighting which is a simple lighting technique that makes polygons brighter when they face a light source and darker when they face away.
Frustum clipping - Although I have already implemented near plane clipping, another important algorithm to implement is frustum clipping since it is an all-encompassing clipping procedure. While near plane clipping only excludes objects with z-values less than the near plane, frustum clipping also excludes objects that exist past the far plane, horizontal field of view, and vertical field of view.

Conclusion

Building my 3D renderer fundamentally changed how I look at computer graphics. I realized that what once seemed like magic was simply a sequence of mathematical transformations applied to vectors. That same realization made computer graphics even more fascinating to me and is the whole reason I decided to write this paper. It is amazing how something that used to appear so complex to me is built from surprisingly elegant and minimal mathematical ideas.