ambasador

Posted on Oct 5, 2023

Tiny 3D renderer in Python with ✨Z-buffering✨ in less than 200 lines of code

#python #gamedev #tutorial #showdev

Recently I found myself in a need for some low-res sprites of some low-poly 3D models from different angles for an upcoming project. Something like that:

Possible ways to obtain them included:

learn a little bit of Blender
make it in WebGL
write my own renderer

My short adventure with blender already traumatized me and WebGL has a somewhat confusing API. And what a better opportunity to brush up on linear algebra than do some 3D transformations in my own 3D engine?

Here is link to the Github repo.

How are 3D objects represented

Super short intro into 3D graphics: All 3D objects are represented by triangles in 3D space. It means each triangle is made up from 3 points, each of which holds 3 variables: x, y and z.

Our goal is to transform arrays of those points, colors, etc. into a nice .png picture that we could use as sprites.

Most 3D graphics you see are in perspective projection, so objects that are further from camera seem smaller. But we will only focus on orthographic projection where it's not the case. No matter how far from a camera an object is, it's size doesn't change.
More reading on that.

So what we will build here is:
✔️ a functioning 3D renderer with adjustable camera direction
✔️ Z-buffering to accurately represent overlapping triangles
✔️ basic directional shading

What we won't cover (at least for now):
❌ shade thrown by objects onto other objects
❌ texturing
❌ perspective projection

Also it won't run on GPU. It won't be super efficient. It won't be enough to render complex scenes (or possibly even simple ones) in real time. But it will be fun and satisfying, at least it was for me.

The first image

Install dependencies, we will use numpy for matrix operations and Pillow to turn arrays into images.

pip install Pillow numpy

Create Object3D class in renderer.py file for storing individual models.

renderer.py

import numpy as np
from PIL import Image


class Object3D:
    def __init__(self, points, triangles, colors):
        self.points = points
        self.triangles = triangles
        self.colors = colors

points: array containing coordinates of points in 3D object
triangles: array containing indices of points making up a triangle
colors: array of RGB values for each triangle (e.g. white = [255, 255, 255])

And Renderer class.

renderer.py

class Renderer:
    def __init__(
        self,
        objects,
        viewport,
        resolution,
        camera_angle=0.0,
        camera_pitch=0.0,
    ):
        self._objects = objects
        self._viewport = viewport
        self._resolution = resolution
        self._camera_angle = camera_angle
        self._camera_pitch = camera_pitch

        resolution_x, resolution_y = self._resolution
        self._screen = np.ones((resolution_y, resolution_x, 3), "uint8") * 120

        x_min, y_min, x_max, y_max = self._viewport
        self._range_x = np.linspace(x_min, x_max, resolution_x)
        self._range_y = np.linspace(y_max, y_min, resolution_y)

    def render_scene(self, output_path):
        for object_3d in self._objects:
            self._render_object(object_3d)
        im = Image.fromarray(self._screen)
        im.save(output_path)

_objects: list of 3D objects in scene
_viewport: visible region in form of (min_x, min_y, max_x, max_y)
_resolution: output image resolution, (80, 48) means image will be 80px wide and 48px high.
_camera_angle and _camera_pitch defines camera's position and direction as on the diagram below
_screen: array representing RGB bitmap of output image
_range_x, _range_y: arrays representing x and y world coordinates for each row/column of pixels

So far nothing too fancy. Now we will implement _render_object.

renderer.py

class Renderer:
    ...
    def _render_object(self, object_3d):
        projected_points = self._get_object_projected_points(object_3d)
        projected_triangles = projected_points[object_3d.triangles]

        for triangle_points, color in zip(projected_triangles, object_3d.colors):
            self._render_triangle(triangle_points, color)

First we need to convert world coordinates to camera coordinates using _get_object_projected_points.

renderer.py

class Renderer:
    ...
    def _get_object_projected_points(self, object_3d):
            return (
                object_3d.points
                @ _get_z_rotation_matrix(self._camera_angle)
                @ _get_x_rotation_matrix(self._camera_pitch)
            )


def _get_x_rotation_matrix(angle):
    return np.array(
        [
            [1, 0, 0],
            [0, np.cos(angle), -np.sin(angle)],
            [0, np.sin(angle), np.cos(angle)],
        ]
    )


def _get_z_rotation_matrix(angle):
    return np.array(
        [
            [np.cos(angle), -np.sin(angle), 0],
            [np.sin(angle), np.cos(angle), 0],
            [0, 0, 1],
        ]
    )

We do so by first rotating object's coordinates by _camera_angle in Z axis, then by _camera_pitch in X axis. You can find rotation matrices for each axis on wikipedia.

Now let's write a method for rendering individual triangle.

renderer.py

class Renderer:
    ...
    def _render_triangle(self, points, color):
        bounding_box = _get_bounding_box(points)

        for screen_x, scene_x in enumerate(self._range_x):
            if scene_x < bounding_box[0, 0] or scene_x > bounding_box[1, 0]:
                continue
            for screen_y, scene_y in enumerate(self._range_y):
                if scene_y < bounding_box[0, 1] or scene_y > bounding_box[1, 1]:
                    continue
                if not _point_in_triangle(np.array([scene_x, scene_y, 0]), points):
                    continue
                self._screen[screen_y, screen_x, :] = color


def _get_bounding_box(points):
    return np.array(
        [
            [np.min(points[:, 0]), np.min(points[:, 1])],
            [np.max(points[:, 0]), np.max(points[:, 1])],
        ]
    )

screen_x: integer in range [0, resolution[0]]
scene_x: float in range [viewport[0], viewport[2]]
We will iterate each pixel and skip if current point doesn't lie within triangle's bounding box. Otherwise we check if point is inside a triangle, and if so, we color current pixel.

Checking if point is inside a triangle is not actually a trivial problem. I copy-pasted solution from this stackoverflow post.

renderer.py

def _sign(p1, p2, p3):
    return (p1[0] - p3[0]) * (p2[1] - (p3[1])) - (p2[0] - p3[0]) * (p1[1] - p3[1])


def _point_in_triangle(p, triangle):
    a, b, c = triangle
    d1 = _sign(p, a, b)
    d2 = _sign(p, b, c)
    d3 = _sign(p, c, a)

    has_neg = (d1 < 0) or (d2 < 0) or (d3 < 0)
    has_pos = (d1 > 0) or (d2 > 0) or (d3 > 0)

    return not (has_neg and has_pos)

Without going too much into detail, we check if the point lies on the same side for each of three halfplanes constructed from triangle's vertices. If it does, it means point is inside a triangle. This method works for both clockwise, and counterclockwise triangles.

Let's now create a simple scene and render it in main.py

main.py

import numpy as np

from renderer import Renderer, Object3D

WHITE = [255, 255, 255]
YELLOW = [200, 200, 0]
GREEN = [0, 200, 0]
BLUE = [0, 0, 200]
RED = [200, 0, 0]
ORANGE = [200, 100, 0]


cube_points = np.array(
    [
        [-1, -1, -1],
        [-1, 1, -1],
        [1, 1, -1],
        [1, -1, -1],
        [-1, -1, 1],
        [-1, 1, 1],
        [1, 1, 1],
        [1, -1, 1],
    ]
)

cube_triangles = np.array(
    [
        # bottom
        [0, 2, 1],
        [0, 3, 2],
        # top
        [4, 5, 6],
        [4, 6, 7],
        # back
        [1, 2, 6],
        [1, 6, 5],
        # front
        [0, 4, 7],
        [0, 7, 3],
        # left
        [0, 1, 5],
        [0, 5, 4],
        # right
        [3, 7, 6],
        [3, 6, 2],
    ]
)

cube_colors = np.array(
    [
        YELLOW,
        YELLOW,
        WHITE,
        WHITE,
        ORANGE,
        ORANGE,
        RED,
        RED,
        GREEN,
        GREEN,
        BLUE,
        BLUE,
    ]
)

cube = Object3D(cube_points, cube_triangles, cube_colors)

renderer = Renderer(
    [cube],
    (-2, -2, 2, 2),
    (300, 300),
    camera_angle=np.deg2rad(45),
    camera_pitch=np.deg2rad(60),
)
renderer.render_scene("scene.png")

Let's run the script.

Not looking like a cube yet, because as we can see faces have been drawn in a somewhat random order and overlap each other in unpredictable ways.

We can also see that orange face (back) and green face (left) are drawn, even though they shouldn't be visible from camera direction. Triangles in 3D engines have only a single visible side to avoid unnecessary computation. We will address it first.

Removing triangles that do not face the camera

Let's add another property to Renderer

renderer.py

class Renderer:
    def __init__(
        ...
    ):
        ...
        self._camera_dir = np.array([0, 0, -1])

And modify _render_object.

renderer.py

class Renderer:
    ...
    def _render_object(self, object_3d):
        projected_points = self._get_object_projected_points(object_3d)
        projected_triangles = projected_points[object_3d.triangles]

        visible_mask = self._get_screen_dot_products(projected_triangles) < 0
        if not any(visible_mask):
            return

        for triangle_points, color in zip(
            projected_triangles[visible_mask], object_3d.colors[visible_mask]
        ):
            self._render_triangle(triangle_points, color)

    def _get_screen_dot_products(self, triangles):
        normals = np.cross(
            (triangles[:, 0] - triangles[:, 1]),
            (triangles[:, 2] - triangles[:, 1]),
        )
        return (self._camera_dir * normals).sum(axis=1)

How do we know if a triangle is visible?
We first need to find triangle's surface vectors, which is cross product of vector ab and ac.
Dot product is a measure of how closely two vectors align, in terms of the directions to which they are pointing. So let's calculate dot product of camera direction and triangle surface vectors. Value less than 0 means triangle is visible, greater than 0 means it's invisible, and 0 means the two vectors are perpendicular (which also means the triangle is not visible).

Running the script now we can see something resembling Rubik's cube.

But the renderer is still lacking a very basic capability. If we try to render triangles stacked on top of each other, we will get result depending on ordering of the triangles defined in Object3D.triangles.

Although green triangle is stacked on top of red triangle in camera and world coordinates, the green triangle was drawn before the red one.

We could find a way to sort the triangles and render them in order from the ones farthest from camera, to the ones that are the closest. But sorting the triangles isn't actually an easy task. We will also run into cases where two or more triangles intersect, or ones where where we can't accurately sort the triangles without splitting them into smaller ones.

✨Z-buffering✨

This is where Z-buffering comes in. Instead of operating on whole triangles, we will calculate and store Z value for each pixel painted on our bitmap in a different array. Each cell will be initialized to -infinity, then before coloring a pixel _screen we will compare depth of our current point on a triangle and stored depth for this pixel in _z_buffer array.

How to calculate Z coordinate for a point with known X and Y coordinates on triangle plane? We need to solve simple linear system to get delta Z values on X and Y axis.

For a triangle ABC lets first calculate vectors AB and AC.

v1 = B - A
v2 = C - A

Then, calculate for Z:

v1.x + v1.y = v1.z
v2.x + v2.y = v2.z

renderer.py

def _calculate_delta_z(points):
    v_ab = points[1] - points[0]
    v_ac = points[2] - points[0]
    slope = np.array([v_ab[:2], v_ac[:2]])
    zs = np.array([v_ab[2], v_ac[2]])
    return np.linalg.solve(slope, zs)

When we have our delta Z (dz), and choose an arbitrary vertex of a triangle (a), we can calculate depth for any point (p):

v_pa = a - p
depth = a.z + v_pa * dz

Let's add z-buffer array and modify _render_triangle method:

renderer.py

class Renderer:
    def __init__(
        ...
    ):
        ...
        self._z_buffer = np.ones((resolution_y, resolution_x)) * -np.inf

    ...
    def _render_triangle(self, points, color):
        bounding_box = _get_bounding_box(points)
        # np.linalg.solve can throw an error so we need to handle it
        try:
            delta_z = _calculate_delta_z(points)
        except np.linalg.LinAlgError:
            return

        for screen_x, scene_x in enumerate(self._range_x):
            if scene_x < bounding_box[0, 0] or scene_x > bounding_box[1, 0]:
                continue
            for screen_y, scene_y in enumerate(self._range_y):
                if scene_y < bounding_box[0, 1] or scene_y > bounding_box[1, 1]:
                    continue
                if not _point_in_triangle(np.array([scene_x, scene_y, 0]), points):
                    continue
                depth = (
                    points[0][2]
                    + (np.array([scene_x, scene_y]) - points[0][:2]) @ delta_z
                )
                if depth <= self._z_buffer[screen_y, screen_x]:
                    continue
                self._screen[screen_y, screen_x, :] = color
                self._z_buffer[screen_y, screen_x] = depth

The stacked triangles are rendered correctly now, regardless of their Object3D.triangles ordering.

Renderer is still lacking one basic and aesthetically pleasing feature, which is shading. For now each pixel is colored with color defined for a particular triangle. So we will add directional lighting to make the rendered image more pleasing.

Basic directional lighting

We will determine how much light a triangle gets in similar way to the one we used to determine if a triangle is visible. But this time we will not only need to know if dot product is larger or smaller than 0, but also need the exact value, so we will be operating on unit vectors, which are vectors that are a single unit of length. To convert a vector to unit vector, you need to calculate its length, then multiply by 1/length

for vector A:
length = sqrt(A.x^2 + A.y^2 + A.z^2)
unit_A = A * 1 / length

renderer.py

def _normalize_vector(p):
    return 1 / np.sqrt((p**2).sum()) * p

This will yield value between -1 (pointing in opposite directions) and 1 (pointing in exactly the same direction).

Add light direction as a property to Renderer object:

renderer.py

class Renderer:
    def __init__(
        ...
    ):
        ...
        self._light_dir = _normalize_vector(np.array([1, -1, -1]))

renderer.py

class Renderer:
    ...
    def _calculate_color(self, points, color):
        v_ba = points[0] - points[1]
        v_bc = points[2] - points[1]
        surface_unit_vector = _normalize_vector(np.cross(v_ba, v_bc))
        factor = 1 - (np.dot(self._light_dir, surface_unit_vector) + 1) / 2

        r, g, b = color
        r = int(factor * r)
        g = int(factor * g)
        b = int(factor * b)
        return np.array([r, g, b], "uint8")

I will normalize the dot product to [0, 1] range and multiply RGB values by 1 minus it, but you can use fancier function for that.

📢 DISCLAIMER 📢
I chose to bind the direction of light with position/direction of the camera. To make it independent, you would need to make the same steps, but instead of using projected points of a triangle, use it's untransformed world coordinates.

Now let's modify _render_triangle:

renderer.py

class Renderer:
    ...
    def _render_triangle(self, points, color):
        shaded_color = self._calculate_color(points, color)
        ...
        for screen_x, scene_x in enumerate(self._range_x):
            ...
            for screen_y, scene_y in enumerate(self._range_y):
                ...
                self._screen[screen_y, screen_x, :] = shaded_color
                self._z_buffer[screen_y, screen_x] = depth

Change cube's colors to white to see the difference better:

Something more complex than a cube.

If you want to see it in action with something more interesting than a cube, copy content of sherman.py on Github. I created this model from scratch by adding primitives and methods for manipulating and merging objects.

I also used interpolation create to this animated sequence of rendered images:

So I can say mission accomplished, it's exactly the thing I wanted for the task of creating sprites.

If there is some interest in this topic I'm not ruling out creating tutorials extending the renderer. Possible topics include shades, animation and texturing.