Remo H. Jansen

Posted on May 17

Building a 3D engine from scratch with C++ and Vulkan for web developers Part I: Bootstrapping Vulkan

#cpp #gamedev #webdev #programming

Why this project exists

My background is in web development. TypeScript, Node.js, React — that's been my world for years. I've spent a lot of time thinking about software architecture: SOLID principles, onion architecture, dependency injection, separation of concerns. All of that in the context of the web.

But the reason I got into technology in the first place has nothing to do with web forms or REST APIs. It was Super Mario 64. Playing it as a kid was my first time experiencing 3D graphics, and it left a permanent mark. The name "Ultra" is a tribute to the Ultra 64 — the original name of the Nintendo 64 before it shipped.

I've always wanted to understand how 3D engines work at every level. Not just "call a function and a triangle appears", but the actual machinery: how does the GPU know what to draw? How do pixels end up on the screen? What happens between your code and the display?

This project is my attempt to answer those questions by building a 3D engine from scratch with C++ and Vulkan. And since I'm learning as I go, I'm documenting everything so it can help others who are on the same path — especially those coming from a web development background like me.

The source code is hosted on Codeberg — a non-profit alternative to GitHub, hosted in the EU. You can find the repository at codeberg.org/remojansen/ultra.

I'm not sure how far this project will go. I'm doing it for fun and education, so I'll keep at it for as long as I find it engaging and exciting. There's no roadmap, no deadline, no promise of a "complete" engine at the end — just curiosity and momentum.

Prerequisites

This series assumes you're an experienced web developer. You should be comfortable with:

JavaScript and TypeScript — you write JS/TS daily and understand its type system, module system, and async patterns.
Node.js — you've built backend services, used npm, and understand how the Node runtime works.
Browser APIs — you have a working knowledge of the DOM, <canvas>, and how the browser renders pages.
Software architecture — concepts like separation of concerns, dependency injection, and layered architectures aren't new to you.

No prior C++ or graphics programming experience is required — that's what this series teaches. But you should be the kind of developer who's comfortable reading documentation and figuring out new tools.

The toolchain

Before we write any engine code, we need to understand the tools. If you're coming from the JavaScript ecosystem, the C++ toolchain will feel different — there's no npm install and no node index.js. But the concepts map surprisingly well.

Clang (the compiler)

Install: Getting Started with Clang

In JavaScript, your code runs directly in V8 or SpiderMonkey. These engines use JIT (Just-In-Time) compilation — they compile your code to machine code while the program is running, optimizing hot paths on the fly based on actual usage patterns.

C++ works the opposite way. Your code must be compiled into a binary before it can run. This is called AOT (Ahead-Of-Time) compilation. Clang is the compiler we use — it reads your .cpp files and produces machine code that runs directly on your CPU, with no runtime or interpreter involved.

The tradeoff is straightforward: JIT compilation gives you fast startup and runtime adaptability (the engine can optimize code paths it sees running frequently), but there's overhead — the compiler runs alongside your program. AOT compilation is slow upfront (you have to compile before you can test), but the output is fully optimized native code with zero runtime overhead. For a real-time graphics engine where every microsecond matters, that's the tradeoff we want.

CMake (the build system generator)

Install: CMake Download

In Node.js, you have package.json to describe your project. In C++, you have CMakeLists.txt. CMake reads this file and generates the actual build instructions for your platform.

It doesn't build your code directly — it generates build files for another tool (in our case, Ninja). This might feel like an unnecessary layer of indirection, but it's what allows the same project to build on macOS, Linux, and Windows without changes.

Our CMakeLists.txt defines two targets:

# Ultra engine library
file(GLOB_RECURSE ULTRA_SOURCES src/engine/*.cpp)
add_library(ultra_engine STATIC ${ULTRA_SOURCES})
target_include_directories(ultra_engine PUBLIC ${CMAKE_SOURCE_DIR}/src)
target_link_libraries(ultra_engine PUBLIC glfw Vulkan::Vulkan)

# Demo application
file(GLOB_RECURSE DEMO_SOURCES src/demo/*.cpp)
add_executable(demo ${DEMO_SOURCES})
target_link_libraries(demo PRIVATE ultra_engine)

The engine is built as a static library (ultra_engine), and the demo game is built as an executable (demo) that links against it. Let's unpack what that means.

An executable is a binary that your OS can run directly — it has an entry point (main()), and double-clicking it (or running it from a terminal) starts a process. A static library is not something you can run. It's a bundle of compiled code that sits there waiting to be included in an executable. Think of it as a .jar file in Java or a compiled npm package — it contains useful code, but it needs a host program to actually execute.

Linking is the process that connects them. When the compiler builds demo, it sees calls to functions like Application() and Run(). Those functions aren't defined in demo's own source code — they live in ultra_engine. The linker's job is to resolve these references: it looks through the static library, finds the compiled code for each function, and copies it directly into the final executable. The result is a single binary (demo) that contains everything it needs to run — both its own code and all the engine code baked in.

This is fundamentally different from how JavaScript modules work. When you import a package in Node.js, that package is loaded at runtime from node_modules. With static linking, there is no runtime lookup — the library's code is physically embedded in the executable at build time. The static library file itself isn't needed after compilation; the executable is entirely self-contained.

Ninja (the build executor)

Install: Ninja Getting Started

Ninja is the tool that actually runs the compiler. CMake generates the instructions, Ninja executes them. It's fast and minimal. You'll rarely interact with it directly — you just run cmake --build build and CMake calls Ninja for you.

vcpkg (the package manager)

Install: vcpkg Getting Started

This one will feel familiar. vcpkg is the closest thing C++ has to npm. We use it to install third-party libraries like GLFW. Dependencies are declared in vcpkg.json, and vcpkg resolves, downloads, and builds them:

{
  "dependencies": [
    "glfw3"
  ]
}

clang-format and clang-tidy (code quality)

Install: Both ship with the LLVM toolchain. If you installed Clang, you likely already have them.

clang-format is Prettier for C++. It enforces consistent code style automatically.
clang-tidy is ESLint for C++. It performs static analysis and catches common bugs and anti-patterns at compile time.

`.h` and `.cpp` files

In web development, you write everything in a single .ts or .js file. In C++, code is split into two file types:

Header files (.h) — These are the declarations. They describe what exists: the struct names, the method signatures, the types. Think of them as TypeScript interface files or .d.ts declarations. Other files include headers to know what's available.
Source files (.cpp) — These are the implementations. They contain the actual code that runs. Think of them as the concrete classes that implement a TypeScript interface.

This separation exists because the C++ compiler processes files independently. When application.cpp needs to use the Window struct, it doesn't read window.cpp — it reads window.h to learn the shape of Window, and the linker connects everything at the end.

The directory architecture

Separation of concerns matters just as much in a 3D engine as it does in a web application. We organize the source code into layers with clear responsibilities:

src/
├── engine/                  ← the engine (static library)
│   ├── core/
│   │   └── application     ← owns the game loop, orchestrates everything
│   ├── platform/
│   │   ├── window          ← OS window management (GLFW)
│   │   └── surface         ← bridge between the window and Vulkan
│   └── renderer/
│       ├── instance         ← Vulkan runtime initialization
│       ├── device           ← GPU selection and logical device
│       └── swapchain        ← image buffers for presenting frames
├── demo/
│   └── main.cpp            ← demo application that uses the engine

The dependency flow is one-directional:

main.cpp → Application → Renderer
                        → Platform (Window, Surface)

The platform layer deals with anything OS-specific (creating a window, connecting it to Vulkan). The renderer layer is pure Vulkan. The core layer ties them together. The demo application only knows about Application — it has no idea that GLFW or Vulkan exist.

If you've worked with the onion architecture in Node.js, this will feel familiar. The inner layers don't know about the outer layers. The renderer doesn't know about the window. The application sits at the boundary and wires everything together, just like a composition root in a dependency injection setup.

The initialization flow

Vulkan is an explicit API. Unlike OpenGL (or WebGL, if you've used it), Vulkan doesn't do anything for you behind the scenes. You have to set up every piece of the pipeline yourself. The initialization flow looks like this:

Instance → Window → Surface → Device → Swapchain

Each step depends on the one before it. But before we dive in, let's make sure we understand the two main technologies we're working with.

Step 1: What are Vulkan and GLFW?

If you're coming from web development, you've never had to think about how pixels get on the screen. The browser handles all of that. You write HTML and CSS, or draw to a <canvas>, and the browser's rendering engine figures out how to talk to the GPU.

In native development, there is no browser. Your application talks to the GPU directly through a graphics API. That's what Vulkan is — a low-level API that lets you send commands to the GPU: "create an image buffer", "run this shader program", "draw these triangles", "present this frame to the screen." It's the equivalent of the WebGL API you might have seen in the browser, but much more explicit and verbose. Where WebGL hides most of the complexity, Vulkan exposes it all. You control memory allocation, synchronization, command recording — everything. That's what makes it powerful for high-performance engines, and also what makes it hard to learn.

Vulkan is cross-platform — it runs on Windows, Linux, and Android natively. On macOS, it runs through MoltenVK, a translation layer that converts Vulkan calls into Apple's Metal API under the hood. This means we write Vulkan code and it works on macOS, but there are a few extra setup steps (portability extensions) that we'll see shortly.

One thing Vulkan does not do is create windows. It's a graphics API, not a windowing API. It can render pixels, but it has no idea how to open a window on your operating system, handle keyboard input, or respond to a close button. For that, we need a separate library.

That's where GLFW comes in. GLFW is a small C library that handles the OS-level stuff: creating a window, processing input events (keyboard, mouse, gamepad), and providing the bridge between the OS window and whatever graphics API you're using. Think of it as the window object in a browser — it gives you the container that your rendering will appear in, and it fires events when the user interacts with it.

GLFW also knows about Vulkan specifically. It can tell you which Vulkan extensions are needed on your platform to display rendered output in a window. Extensions in Vulkan are optional features — the core API is minimal, and everything platform-specific (like "how do I display pixels in a macOS window?") is provided as an extension. GLFW queries the system and returns the list of extensions you need to enable. We'll see this in action when we create the Vulkan instance.

With that context, let's walk through each initialization step.

Step 2: VkInstance

Source: instance.h · instance.cpp

The VkInstance is the entry point to the Vulkan API. Creating it initializes the Vulkan runtime on your machine. Nothing Vulkan-related can happen without it.

If you're coming from web development, think of it as opening a browser. The browser itself doesn't show any web page yet, but you now have access to the rendering engine. That's what VkInstance is — you're telling the system "I want to use Vulkan."

Let's look at the header:

#pragma once

#include <vulkan/vulkan.h>

struct Instance
{
    VkInstance instance;

    Instance();
    void Destroy();
};

A few things here that deserve explanation if you're new to C++:

struct — if you're coming from TypeScript, you'd expect to see class here. C++ has both struct and class, and they're almost identical. The only difference is default visibility: in a struct, everything is public by default; in a class, everything is private by default. We use struct throughout the engine because all our members are public — there's no reason to write class and then immediately add public: to undo the default. You'll see both conventions in C++ codebases; it's a style choice, not a functional one.

#pragma once is a preprocessor directive that tells the compiler "only include this file once, even if multiple files try to include it." Without it, you'd get duplicate definition errors. It's the C++ equivalent of making sure you don't import the same module twice — except in C++ the compiler doesn't handle it automatically.

#include <vulkan/vulkan.h> pulls in the Vulkan API declarations. This is conceptually the same as import vulkan from 'vulkan' in JavaScript — it tells the compiler that types like VkInstance exist and what they look like.

When we create the instance, three things happen:

We describe our application with VkApplicationInfo — metadata like the app name and Vulkan API version. Think of it as a User-Agent header in HTTP.
We ask GLFW for the required extensions — as we discussed in Step 1, GLFW knows which Vulkan extensions are needed to present to a window on your platform. On macOS, this includes the MoltenVK portability extensions we mentioned earlier.
We enable validation layers (in debug builds only) — these are a runtime linter. They watch every Vulkan API call and warn you if you're doing something wrong. Incredibly useful for learning.

Step 3: Window

Source: window.h · window.cpp

The window is your connection to the operating system. It's the rectangle on your screen where pixels will appear. We use GLFW to create and manage it — as we covered in Step 1, GLFW handles window creation, input, and OS events across platforms.

#pragma once

#include <GLFW/glfw3.h>
#include <cstdio>
#include <cstdlib>

struct Window
{
    GLFWwindow* window;

    Window();
    Window(int width, int height, const char* title);

    bool ShouldClose() const;
    void PollEvents() const;
    void Destroy();
};

Notice the const keyword after the method declarations — bool ShouldClose() const. This has no equivalent in JavaScript or TypeScript. It's a promise to the compiler that this method will not modify the object. It only reads data, never writes it. If you accidentally try to change a member variable inside a const method, the compiler will reject it. Think of it as a read-only contract — like Readonly<T> in TypeScript, but enforced at the method level rather than on the type. Destroy() is not const because it modifies state (it tears down the window).

This is the first time we see a pointer, so let's talk about what GLFWwindow* window means.

In JavaScript, when you write const element = document.getElementById('app'), you get a reference to a DOM element. You don't get the element itself — you get something that points to where the element lives in memory. If the browser moved it in memory, your reference would be updated.

In C++, a pointer is the explicit version of this. GLFWwindow* means "a memory address where a GLFWwindow lives." The * is what makes it a pointer. You don't own the GLFW window data directly — GLFW allocates it internally and gives you a pointer to it. When you pass this pointer to other GLFW functions, they follow the address to find the actual window data.

The important thing to know right now is: a pointer is an address. GLFWwindow* window means "window is a variable that holds the address of a GLFWwindow somewhere in memory."

The window has a simple job. It creates an OS window via GLFW and exposes three operations:

ShouldClose() — has the user clicked the close button?
PollEvents() — check for OS events (mouse, keyboard, resize, close). Without this, the OS thinks your app is frozen.
Destroy() — tear it all down.

Note the GLFW_NO_API hint in the implementation:

glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);

This tells GLFW "don't set up OpenGL — we're going to use Vulkan instead." By default GLFW creates an OpenGL context, which we don't want.

Step 4: VkSurfaceKHR

Source: surface.h · surface.cpp

The surface is the bridge between your OS window and Vulkan. It answers the question: "where should Vulkan draw its pixels?"

In web terms, imagine you have a <canvas> element and a WebGL context. The canvas is the window. The WebGL context is Vulkan. The surface is the binding between them — it's what allows the rendering API to output to that specific rectangle on the screen.

#pragma once

#define GLFW_INCLUDE_VULKAN
#include <GLFW/glfw3.h>

struct Surface
{
    VkSurfaceKHR surface;

    Surface();
    Surface(VkInstance instance, GLFWwindow* window);
    void Destroy(VkInstance instance);
};

#define GLFW_INCLUDE_VULKAN is a preprocessor macro. JavaScript has nothing like this. Before the compiler even sees your code, a separate step called the preprocessor runs through it and performs text substitution. #define GLFW_INCLUDE_VULKAN creates a flag — it doesn't produce any code itself. When GLFW's header file is included on the next line, it checks whether this flag exists, and if so, it also pulls in the Vulkan type declarations (like VkInstance and VkSurfaceKHR). Without it, GLFW wouldn't know about Vulkan types. Think of it as a compile-time feature flag — like an environment variable, but resolved before compilation rather than at runtime.

The surface needs both the VkInstance and the GLFWwindow* because it sits between the two worlds. GLFW provides a helper function (glfwCreateWindowSurface) that handles the platform-specific details — on macOS it creates a Metal surface, on Linux it creates an X11 or Wayland surface, on Windows it creates a Win32 surface.

This is why the surface is created after both the instance and the window exist, and why it lives in the platform/ directory — it fundamentally wraps an OS-level concept, even though it uses a Vulkan type.

Step 5: VkDevice

Source: device.h · device.cpp

The device step is actually two things: picking a physical GPU and creating a logical device.

Picking a physical device

Your machine might have multiple GPUs (an integrated one and a discrete one, for example). We need to pick one that can do what we need. The selection process works like this:

Enumerate all GPUs on the system
For each GPU, check its queue families — groups of "workers" that can do different things (graphics, compute, transfer, presentation)
Find a GPU that has both a graphics queue family (can draw things) and a present queue family (can display things on our surface)
Check that the GPU supports the swapchain extension (needed to present rendered frames)
Pick the first GPU that passes all checks

If you're coming from web development, think of queue families as different thread pools. One pool handles graphics work, another handles displaying results on screen. Often they're the same pool, but the API makes you check.

Creating the logical device

Once we've picked a GPU, we create a logical device — our application's handle to the GPU. The distinction matters: VkPhysicalDevice represents the actual hardware, VkDevice represents our connection to it. Multiple applications can each have their own VkDevice pointing at the same VkPhysicalDevice.

When creating the logical device, we request:

Queues from the families we identified — these are the mailboxes where we'll send draw commands
Extensions — specifically VK_KHR_swapchain, so we can present frames

After creation, we retrieve the queue handles. These are what we'll use later to submit rendering commands and present images to the screen.

What is a shader? A shader is a small program that runs on the GPU. We'll write these in Part II. For now, just know that the GPU executes shader programs to determine where vertices appear and what color each pixel should be.

Step 6: Swapchain

Source: swapchain.h · swapchain.cpp

What is a frame? A frame is a single complete image displayed on your screen. Movies run at 24 frames per second — 24 still images flashing by so fast they look like motion. Games work the same way, typically at 60 frames per second. Each frame is drawn from scratch by the GPU, displayed briefly, then replaced by the next one.

The swapchain is a queue of images that take turns being displayed on screen. Think of it like double or triple buffering in a <canvas> game.

Imagine you have 2–3 offscreen canvases. You draw to one while the browser displays another. When you finish drawing, you swap them — the freshly drawn one goes to the screen, and the previously displayed one is freed up for the next frame. That's exactly what a swapchain does.

Image A: [being displayed on screen]
Image B: [GPU is drawing the next frame here]
Image C: [waiting, ready for the GPU to use next]
         ↓
         swap → Image B goes to screen, Image A is now free

Without it, the user would see half-finished frames — a visual glitch called tearing.

What is tearing? Tearing is when the top half of the screen shows one frame and the bottom half shows the next, because the display refreshed mid-draw. It happens when the GPU writes to the same image the monitor is currently reading from. The swapchain prevents this by keeping displayed and in-progress images separate.

Creating the swapchain involves querying the surface for what it supports and choosing the best options:

Format — the pixel format and color space. We prefer B8G8R8A8_SRGB (BGRA 8-bit with sRGB). This is the standard format for monitors — sRGB ensures colors look correct.
Present mode — how swapping works. MAILBOX is triple buffering (low latency, GPU replaces queued frames). FIFO is vsync (like requestAnimationFrame — guaranteed to be available, smooth but higher latency). We prefer mailbox, fall back to FIFO.

What is vsync? Vsync (vertical sync) locks your frame rate to the monitor's refresh rate (usually 60Hz). It prevents tearing by waiting for the monitor to finish displaying one frame before swapping in the next. requestAnimationFrame in browsers is essentially vsync — the browser calls your callback once per display refresh.

Extent — the resolution, usually matching the window size.

After creating the swapchain, we get the image handles (the driver created them, we just get references) and create image views for each one. An image view is a "lens" that tells Vulkan how to interpret an image — its format, that it's 2D, that we care about the color channels.

Putting it all together: Application

Source: application.h · application.cpp

The Application struct sits at the top and orchestrates everything. It's the composition root — the one place that knows about all the pieces and wires them together.

Application::Application(int width, int height, const char* title)
    : window(width, height, title), renderer(),
      surface(renderer.instance.instance, window.window)
{
    renderer.InitDevice(surface.surface);
    renderer.InitSwapchain(surface.surface,
                           static_cast<uint32_t>(width),
                           static_cast<uint32_t>(height));
}

Two pieces of syntax here that have no JavaScript equivalent:

Member initializer list — the : window(width, height, title), renderer(), surface(...) part after the constructor signature. In JavaScript, you'd write this.window = new Window(width, height, title) inside the constructor body. C++ has a separate syntax for this because of how memory works: when an Application is created, all its member variables need to be constructed before the constructor body runs. The initializer list is where you tell the compiler how to construct each member. If you skipped it and assigned inside the body instead, each member would first be default-constructed (possibly doing wasted work) and then reassigned — the initializer list avoids that double initialization.

static_cast<uint32_t>(width) — explicit type conversion. In JavaScript, numbers are just numbers — there's one number type. In C++, int and uint32_t (unsigned 32-bit integer) are different types, and the compiler will warn you about implicit conversions between them because going from signed to unsigned can lose information (negative values wrap around). static_cast says "I know these types are different, and I'm intentionally converting." It's the C++ equivalent of writing width as number in TypeScript — an explicit annotation that makes the intent clear.

Creation order matters because each step depends on the previous one:

Order	Component	Depends on
1	Window	nothing
2	Instance	nothing
3	Surface	Instance + Window
4	Device	Instance + Surface
5	Swapchain	Device + Surface

Destruction is the reverse — you always tear down in the opposite order of creation, so nothing tries to use something that's already gone:

void Application::Shutdown()
{
    surface.Destroy(renderer.instance.instance);
    renderer.Destroy();  // swapchain → device → instance
    window.Destroy();
}

If you're coming from JavaScript, you might wonder: why do we need to call Destroy() at all? In JS, you just stop referencing an object and the garbage collector eventually frees the memory. C++ has no garbage collector. When you allocate a resource — a Vulkan device, a window, a block of GPU memory — it stays allocated until you explicitly release it. If you forget, it leaks: the resource is gone from your program's perspective but still held by the OS or GPU driver until the process exits. Every Destroy() method you see in this engine is doing what the garbage collector would do for you in JavaScript — but manually and in a specific order, because these resources depend on each other.

The game loop is simple for now:

void Application::Run()
{
    while (!window.ShouldClose())
    {
        window.PollEvents();
    }

    Shutdown();
}

The application owns the loop. The window just reports events. The renderer doesn't know about the loop at all. As the engine grows, the loop will expand to include scene updates and render calls, but the structure stays the same.

The demo application is intentionally minimal — two lines of real code:

int main() {
    Application app(800, 600, "Ultra 3D Engine");
    app.Run();
    return EXIT_SUCCESS;
}

This is the whole point of layered architecture. The complexity of Vulkan initialization, GPU selection, swapchain creation — none of that leaks into the application code. It's all behind Application, exactly where it should be.

What's next

At this point we have a window, a Vulkan instance, a surface, a device, and a swapchain. The infrastructure is in place. We have images to render to and a GPU ready to receive commands.

In the next part, we'll create the render pass and graphics pipeline — the point where we actually tell the GPU what to draw.