Remo H. Jansen

Posted on May 17

Building a 3D engine from scratch with C++ and Vulkan for web developers Part II: Rendering your first triangle

#webdev #cpp #programming #gamedev

Where we left off

In Part I, we set up the Vulkan infrastructure: instance, window, surface, device, and swapchain. We have a GPU ready to receive commands, and a queue of images waiting to be drawn to and displayed on screen.

But at that point we didn't draw anything. The game loop just polled for window events and looped. Now we're going to make pixels appear — specifically, a colored triangle. This is the "Hello, World!" of graphics programming. Every GPU tutorial starts here because a triangle is the simplest shape the GPU can draw, and getting one on screen proves the entire pipeline is working.

Here's what we need to build:

Swapchain (Part I) → Render Pass → Shaders → Pipeline → Command Pool → Framebuffers → Sync → Draw

Each of these builds on the previous one. By the end, we'll have a working render loop that clears the screen to black, draws a rainbow triangle, and presents the result every frame.

The updated architecture

The renderer has grown. Here's the directory structure now:

src/engine/renderer/
├── instance         ← Vulkan runtime (Part I)
├── device           ← GPU selection (Part I)
├── swapchain        ← image buffers for presenting frames (Part I)
├── render_pass      ← describes what we render to and how     ← NEW
├── shader           ← loads compiled GPU programs             ← NEW
├── pipeline         ← the full GPU configuration for drawing  ← NEW
├── command_pool     ← allocates command buffers               ← NEW
└── renderer         ← orchestrates everything                 ← UPDATED

And the shader source files live in a new location:

src/engine/assets/shaders/
├── triangle.vert    ← vertex shader (positions + colors)
├── triangle.frag    ← fragment shader (pixel colors)

Step 1: Render Pass

Source: render_pass.h · render_pass.cpp

A render pass describes where you're rendering to and how the output should be treated. It doesn't draw anything itself — it's a blueprint that tells Vulkan: "when I render a frame, I'll be writing to this kind of image, I want you to clear it first, and when I'm done, the image should be ready to display on screen."

If you're coming from web development, think of it as configuring a <canvas> context. When you call canvas.getContext('2d') or canvas.getContext('webgl'), you're telling the browser what kind of rendering you'll do and what the output format looks like. A render pass is the Vulkan version of that — except far more explicit.

struct RenderPass
{
    VkRenderPass renderPass;

    RenderPass();
    RenderPass(VkDevice device, VkFormat swapchainFormat);
    void Destroy(VkDevice device);
};

Creating a render pass involves three pieces:

Attachments

An attachment is the image you're rendering into — in our case, the swapchain image. We describe it with VkAttachmentDescription:

VkAttachmentDescription colorAttachment{};
colorAttachment.format = swapchainFormat;
colorAttachment.samples = VK_SAMPLE_COUNT_1_BIT;
colorAttachment.loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;
colorAttachment.storeOp = VK_ATTACHMENT_STORE_OP_STORE;
colorAttachment.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
colorAttachment.finalLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;

The key fields:

loadOp = CLEAR — when the render pass begins, clear the image (to black, or whatever clear color we set). This is like calling ctx.clearRect(0, 0, width, height) at the start of each frame in a canvas game.
storeOp = STORE — when the render pass ends, keep the rendered result. (The alternative, DONT_CARE, would let the driver discard it — useful for temporary buffers, not for something we want to display.)
initialLayout = UNDEFINED — we don't care what state the image is in before we start. We're going to clear it anyway.
finalLayout = PRESENT_SRC_KHR — when the render pass finishes, the image should be in a layout ready for presentation. Vulkan images can be in different "layouts" optimized for different operations (rendering, reading, presenting). This tells the driver to transition the image to a presentation-ready layout when we're done.

Subpasses

A render pass can contain multiple subpasses — stages that render to different combinations of attachments. Think of it like a multi-pass compositor: one pass renders the scene, another applies post-processing, another adds UI. Each subpass can read the output of the previous one.

For our triangle, we only need one subpass. It writes to a single color attachment:

VkAttachmentReference colorRef{};
colorRef.attachment = 0;
colorRef.layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;

VkSubpassDescription subpass{};
subpass.pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS;
subpass.colorAttachmentCount = 1;
subpass.pColorAttachments = &colorRef;

Before we explain the fields, it helps to understand how VkAttachmentDescription and VkAttachmentReference relate. The description defines what an attachment is — its format, how it's loaded and stored, and what layouts it transitions through. The reference is a pointer to one of those descriptions by index. Think of the render pass as having an array of attachment descriptions; subpasses refer to them by index rather than duplicating the definition.

attachment = 0 — index into the render pass's attachment descriptions array. We only have one attachment (the color attachment we described above), so it's index 0.
layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL — the image layout to use during this subpass. This tells the driver to keep the image in a layout optimized for writing color data. This is different from initialLayout/finalLayout on the description, which control the layouts before and after the entire render pass.
pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS — this subpass uses the graphics pipeline (as opposed to VK_PIPELINE_BIND_POINT_COMPUTE for compute shaders). It determines which type of pipeline can be bound during this subpass.
colorAttachmentCount = 1 — how many color attachments this subpass writes to. A subpass can write to multiple render targets simultaneously (called MRT — multiple render targets), but we only need one.
pColorAttachments = &colorRef — pointer to the array of attachment references. The index in this array matters: layout(location = 0) out vec4 outColor in the fragment shader writes to pColorAttachments[0]. If you had two color attachments, location = 1 would write to pColorAttachments[1].

Dependencies

The subpass dependency is synchronization. It says: "don't start writing color until the swapchain image is actually available." Without this, the GPU might try to render into an image that's still being displayed on screen.

VkSubpassDependency dependency{};
dependency.srcSubpass = VK_SUBPASS_EXTERNAL;
dependency.dstSubpass = 0;
dependency.srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
dependency.dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
dependency.dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;

VK_SUBPASS_EXTERNAL means "everything that happened before this render pass" — in our case, acquiring the swapchain image. The dependency ensures that our color writing waits until that acquisition is complete. If you've ever dealt with race conditions in async JavaScript, this is the same concept — but at the GPU hardware level.

Step 2: Shaders

Source: shader.h · shader.cpp

Shaders are programs that run on the GPU. If C++ code runs on the CPU, shaders are the code that runs on the GPU's thousands of tiny cores — in parallel, for every vertex and every pixel, every frame.

In web development, the closest equivalent is WebGL shaders. If you've ever written GLSL inside a <script type="x-shader/x-vertex"> tag or passed shader strings to gl.shaderSource(), that's the same language we're using here. The key difference is the compilation model.

The two shaders

Every graphics program needs at least two shaders:

Vertex shader — runs once per vertex. Its job is to determine where each vertex appears on screen. Think of it as a function that transforms 3D world coordinates into 2D screen coordinates.

What is a vertex? A vertex is a point in space — a corner of a shape. A triangle has 3 vertices, a cube has 8. Each vertex has a position (x, y, z coordinates), and can carry extra data like color or texture coordinates. Every 3D model you've ever seen in a game is made of thousands of triangles, and every triangle is defined by exactly 3 vertices.

Fragment shader — runs once per pixel (technically, per fragment — a candidate pixel). Its job is to determine what color each pixel should be. Think of it as a function that takes a pixel's position and produces an RGBA color.

Here's our vertex shader:

#version 450

vec2 positions[3] = vec2[](
    vec2( 0.0, -0.5),
    vec2( 0.5,  0.5),
    vec2(-0.5,  0.5)
);

vec3 colors[3] = vec3[](
    vec3(1.0, 0.0, 0.0),   // red
    vec3(0.0, 1.0, 0.0),   // green
    vec3(0.0, 0.0, 1.0)    // blue
);

layout(location = 0) out vec3 fragColor;

void main()
{
    gl_Position = vec4(positions[gl_VertexIndex], 0.0, 1.0);
    fragColor = colors[gl_VertexIndex];
}

A few things to unpack:

The positions are hardcoded in the shader. Normally you'd send vertex data from the CPU via a vertex buffer (we'll do that in a later part). For now, the triangle's three corners are baked directly into the shader code. The coordinates are in clip space: X goes from -1 (left) to 1 (right), Y goes from -1 (top) to 1 (bottom), and the center is (0, 0). Our triangle spans from the top-center to the bottom-left and bottom-right.

What is clip space? Clip space is the coordinate system the vertex shader outputs to. It's a normalized box where X and Y range from -1 to 1, regardless of your window's actual pixel size. The GPU later maps these coordinates to real pixels based on the viewport dimensions. Think of it like using percentages in CSS instead of pixels — you describe positions relative to the available space, not in absolute units.

gl_VertexIndex is a built-in variable that tells the shader which vertex it's currently processing (0, 1, or 2). It's like the index parameter in array.map((item, index) => ...) — it lets you look up per-vertex data.

gl_Position is the output — where this vertex appears on screen. It's a vec4(x, y, z, w) with four components:

x, y — the vertex position in clip space (provided by positions[gl_VertexIndex])
z (the 0.0) — the depth value. This determines how "near" or "far" the vertex is. It matters when triangles overlap — the GPU uses the depth buffer to decide which one is in front. For a flat 2D triangle, 0.0 (the middle of the depth range) works fine.
w (the 1.0) — the homogeneous coordinate, used for perspective division. After the vertex shader runs, the GPU divides x, y, and z by w to get the final position. With w = 1.0, the coordinates pass through unchanged. When we add a perspective camera later, the projection matrix will produce w values other than 1.0, which is what makes distant objects appear smaller.

layout(location = 0) out vec3 fragColor declares a variable that passes data from the vertex shader to the fragment shader. Let's break the syntax down:

layout(location = 0) — assigns this variable to output slot 0. The number is an index that connects outputs to inputs between shader stages. The fragment shader must declare a matching layout(location = 0) in vec3 ... to receive this data. If you had a second output, you'd use location = 1, and so on.
out — marks this as an output variable (data flowing out of the vertex shader to the next stage).
vec3 fragColor — the type and name. A vec3 holds 3 floats, which we use for RGB color.

The GPU automatically interpolates this value across the triangle's surface. So even though we set red, green, and blue at the three corners, every pixel in between gets a smooth gradient. This is how we get the rainbow effect without any extra work.

What is interpolation? Interpolation means calculating values between known points. If one corner of the triangle is red and another is green, what color should a pixel halfway between them be? The GPU answers this automatically: 50% red + 50% green = yellow. It does this for every pixel across the triangle's surface, producing smooth gradients. This is a hardware feature — the GPU does it for free, with no extra code needed.

The fragment shader is simpler:

#version 450

layout(location = 0) in vec3 fragColor;
layout(location = 0) out vec4 outColor;

void main()
{
    outColor = vec4(fragColor, 1.0);
}

Notice the symmetry with the vertex shader:

layout(location = 0) in vec3 fragColor — receives the interpolated color from the vertex shader. The location = 0 matches the location = 0 on the vertex shader's out declaration — that's how the GPU knows to wire them together. The in keyword (instead of out) marks this as an input. The variable names don't technically need to match — only the location number and type (vec3) matter — but using the same name keeps it readable.
layout(location = 0) out vec4 outColor — the fragment shader's own output. This writes to pColorAttachments[0] in the render pass — the swapchain image. The 1.0 appended in vec4(fragColor, 1.0) is the alpha channel (fully opaque).

That's it — the GPU's interpolation hardware did the hard work between the two shaders.

GLSL → SPIR-V compilation

In WebGL, you pass shader source code as strings and the browser compiles them at runtime. Vulkan doesn't work that way. Shaders must be compiled ahead of time into SPIR-V — a binary intermediate format, like bytecode.

Think of it as the difference between JavaScript and WebAssembly. JavaScript is text that the engine parses and compiles at runtime. WebAssembly is pre-compiled bytecode that the engine can load and execute directly. SPIR-V is the WebAssembly of GPU shaders.

The compiler is glslc (part of the Vulkan SDK). Our CMake build handles this automatically:

find_program(GLSLC glslc)

file(GLOB_RECURSE SHADER_SOURCES
    src/engine/assets/shaders/*.vert
    src/engine/assets/shaders/*.frag
)

foreach(SHADER ${SHADER_SOURCES})
    get_filename_component(SHADER_NAME ${SHADER} NAME)
    set(SPV "${SHADER_OUTPUT_DIR}/${SHADER_NAME}.spv")
    add_custom_command(
        OUTPUT ${SPV}
        COMMAND ${GLSLC} ${SHADER} -o ${SPV}
        DEPENDS ${SHADER}
    )
endforeach()

Each .vert and .frag file gets compiled to a .spv binary in the build directory. When the engine runs, it loads these .spv files from disk.

Loading shader modules

The Shader struct loads the compiled SPIR-V files and creates Vulkan shader modules:

struct Shader
{
    VkShaderModule vertModule;
    VkShaderModule fragModule;

    Shader();
    Shader(VkDevice device, const std::vector<char>& vertCode,
           const std::vector<char>& fragCode);
    void Destroy(VkDevice device);

    static std::vector<char> ReadFile(const char* path);
};

std::vector<char> — if you're coming from JavaScript, a vector is the C++ equivalent of an Array. It's a dynamically-sized, contiguous block of memory. std::vector<char> is an array of bytes — essentially a Buffer or Uint8Array in Node.js. We use it to hold the raw binary content of the SPIR-V files.

static on the ReadFile method means it belongs to the struct itself, not to any instance. It's like a static method in a JavaScript class — you call it as Shader::ReadFile(path) rather than on an instance. It doesn't need access to vertModule or fragModule, so it doesn't need an instance.

The ReadFile method does something specific to native development: it resolves the file path relative to the executable's location, not the current working directory. In Node.js, fs.readFileSync('./shaders/triangle.vert.spv') reads relative to where you ran node. In our case, the shader files are placed next to the executable by CMake, so we use _NSGetExecutablePath (macOS-specific) to find where the binary lives and build the path from there. This ensures the shaders are found regardless of which directory you run the demo from.

reinterpret_cast<const uint32_t*>(code.data()) — this appears in the shader module creation code and deserves explanation. The cast is needed because of a type mismatch between two APIs: C++'s file I/O reads data as char* (bytes), but Vulkan expects SPIR-V data as uint32_t* (pointer to 32-bit integers). Even though the .spv file contains 32-bit SPIR-V words, std::ifstream::read() has no concept of that — it treats all files as streams of bytes. So the data lands in a std::vector<char>, and we need reinterpret_cast to tell the compiler: "the bytes at this address — treat them as 32-bit integers instead." This doesn't change the data at all — it changes how the compiler interprets the pointer type. It's like using a DataView in JavaScript to read the same ArrayBuffer as either bytes or 32-bit values. The cast is zero-cost at runtime — it generates no instructions and performs no conversion. It's purely a compile-time mechanism. This is unavoidable: as long as you read files through C++'s char-based I/O and pass the data to an API that expects a different type, a cast is needed at one boundary or the other.

A key detail: shader modules are only needed during pipeline creation. Once the pipeline is built, the GPU has the shader programs baked in, and the modules can be destroyed. That's why in the renderer, we call shader.Destroy() right after creating the pipeline — the modules have served their purpose.

Step 3: Graphics Pipeline

Source: pipeline.h · pipeline.cpp

The graphics pipeline is the biggest, most detailed object in Vulkan. It describes everything the GPU needs to know to turn vertices into pixels on screen. Where WebGL lets you change individual state bits on the fly (gl.enable(gl.DEPTH_TEST), gl.blendFunc(...)), Vulkan makes you declare all of that upfront in a single immutable object.

Think of it like this. In Express.js, you might configure middleware one piece at a time — add CORS here, add compression there, add auth somewhere else. Each piece of middleware can be swapped at runtime. A Vulkan pipeline is the opposite approach: imagine if you had to declare your entire Express middleware stack as one frozen object at startup and couldn't change it. That's the tradeoff Vulkan makes — less flexibility at runtime in exchange for zero overhead, because the GPU driver can optimize the entire pipeline as a single unit.

struct Pipeline
{
    VkPipelineLayout layout;
    VkPipeline pipeline;

    Pipeline();
    Pipeline(VkDevice device, VkRenderPass renderPass, VkExtent2D extent,
             VkShaderModule vertModule, VkShaderModule fragModule);
    void Destroy(VkDevice device);
};

Creating a pipeline means configuring every stage of the GPU's rendering process. Let's walk through each one.

Shader stages

First, we tell the pipeline which shaders to use:

VkPipelineShaderStageCreateInfo vertStage{};
vertStage.stage = VK_SHADER_STAGE_VERTEX_BIT;
vertStage.module = vertModule;
vertStage.pName = "main";

VkPipelineShaderStageCreateInfo fragStage{};
fragStage.stage = VK_SHADER_STAGE_FRAGMENT_BIT;
fragStage.module = fragModule;
fragStage.pName = "main";

pName is the entry point function name in the shader — just like how a C++ program starts at main(), the shader starts at whatever function you specify here. You could technically have multiple entry points in one shader module, but "main" is the convention.

Vertex input

VkPipelineVertexInputStateCreateInfo vertexInput{};
vertexInput.sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO;

This describes the format of vertex data coming from the CPU. Since our triangle positions are hardcoded in the shader, we leave this empty — no vertex buffers, no attributes. In a real application, this is where you'd describe the layout of your mesh data: "each vertex has a 3-float position, a 3-float normal, a 2-float UV coordinate, at these byte offsets."

What is a mesh? A mesh is a collection of vertices, edges, and faces that defines the shape of a 3D object. Think of it as a wireframe model — a bunch of triangles stitched together to approximate a surface. A cube is 8 vertices and 12 triangles. A character model might be thousands of vertices and tens of thousands of triangles. The mesh data (positions, normals, colors, texture coordinates for each vertex) is what gets uploaded from the CPU to the GPU via vertex buffers.

Input assembly

VkPipelineInputAssemblyStateCreateInfo inputAssembly{};
inputAssembly.topology = VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST;
inputAssembly.primitiveRestartEnable = VK_FALSE;

This tells the GPU how to interpret the vertices. TRIANGLE_LIST means "every three consecutive vertices form one triangle." Other options include LINE_LIST (draw lines), POINT_LIST (draw dots), and TRIANGLE_STRIP (each new vertex forms a triangle with the previous two). For most 3D rendering, you'll use TRIANGLE_LIST.

Viewport and scissor

VkViewport viewport{};
viewport.width = static_cast<float>(extent.width);
viewport.height = static_cast<float>(extent.height);
viewport.minDepth = 0.0f;
viewport.maxDepth = 1.0f;

VkRect2D scissor{};
scissor.extent = extent;

The viewport defines the transformation from clip space (the -1 to 1 range in the shader) to pixel coordinates on the framebuffer. If your window is 800×600, the viewport maps the normalized coordinates to those 800×600 pixels.

The scissor is a clipping rectangle — pixels outside this rectangle are discarded. Usually it matches the viewport. Think of it as overflow: hidden in CSS. The viewport says "scale the content to fit here," and the scissor says "cut off anything outside this box."

Rasterizer

VkPipelineRasterizationStateCreateInfo rasterizer{};
rasterizer.polygonMode = VK_POLYGON_MODE_FILL;
rasterizer.cullMode = VK_CULL_MODE_BACK_BIT;
rasterizer.frontFace = VK_FRONT_FACE_CLOCKWISE;
rasterizer.lineWidth = 1.0f;

The rasterizer converts the triangle (defined by three vertex positions) into a set of pixel-sized fragments. This is the step where geometry becomes pixels.

What is rasterization? Rasterization is the process of converting vector shapes (triangles defined by vertex coordinates) into a grid of pixels. It's the same concept as when your browser renders an SVG — the browser takes the mathematical description of a shape and figures out which pixels on screen fall inside it. The GPU does this in hardware, for millions of triangles per frame.

What is a fragment? A fragment is a candidate pixel — a pixel-sized sample generated by the rasterizer that might end up on screen. It becomes an actual pixel after passing through the fragment shader and any blending/depth tests. The distinction matters because multiple fragments can compete for the same pixel position (e.g., overlapping triangles), and only one wins.

polygonMode = FILL — fill the triangle with color. Alternatives are LINE (wireframe) and POINT (just the corners).
cullMode = BACK_BIT — don't render triangles facing away from the camera. If a triangle's vertices appear in clockwise order on screen, it's front-facing; counter-clockwise means it's facing away. This optimization skips roughly half the triangles in a 3D scene (the back sides of objects you can't see).
frontFace = CLOCKWISE — defines which vertex winding order means "front-facing."

Multisampling

VkPipelineMultisampleStateCreateInfo multisampling{};
multisampling.sampleShadingEnable = VK_FALSE;
multisampling.rasterizationSamples = VK_SAMPLE_COUNT_1_BIT;

Multisampling is anti-aliasing at the hardware level — rendering each pixel at multiple sub-pixel positions and averaging the results. We disable it for now (1_BIT = one sample per pixel = no multisampling). Think of it as the image-rendering CSS property — it controls smoothing quality at the edges of shapes.

What is anti-aliasing? Anti-aliasing smooths out the jagged, staircase-like edges ("jaggies") that appear when diagonal or curved lines are drawn on a pixel grid. Without it, the edges of your triangle would look like tiny stairsteps. With it, the GPU blends edge pixels with the background to create the illusion of a smoother line. You've seen this in browsers — text and SVGs look smooth because the browser applies anti-aliasing automatically.

Color blending

VkPipelineColorBlendAttachmentState colorBlendAttachment{};
colorBlendAttachment.colorWriteMask =
    VK_COLOR_COMPONENT_R_BIT | VK_COLOR_COMPONENT_G_BIT |
    VK_COLOR_COMPONENT_B_BIT | VK_COLOR_COMPONENT_A_BIT;
colorBlendAttachment.blendEnable = VK_FALSE;

Color blending controls what happens when a new pixel overlaps an existing one. With blending disabled, the new pixel simply replaces the old one. With blending enabled, you can do transparency effects (like mix-blend-mode in CSS or globalCompositeOperation in canvas). The colorWriteMask says "write all four channels" — red, green, blue, and alpha. The | operator combines the bit flags, a common C/C++ pattern for expressing combinations of options.

Pipeline layout

VkPipelineLayoutCreateInfo layoutInfo{};
layoutInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO;

The layout describes what external data the pipeline receives — push constants (small chunks of data pushed per draw call) and descriptor sets (references to buffers, textures, etc.). We don't use either yet, so the layout is empty. Later, when we add textures and camera matrices, this is where we'll declare them.

Putting it together

All these pieces feed into a single VkGraphicsPipelineCreateInfo struct that creates the final pipeline object. This is one of the most expensive calls in Vulkan — the driver compiles and optimizes the entire pipeline state into GPU-specific instructions. That's why pipelines are created once and reused, never modified.

Step 4: Command Pool and Command Buffers

Source: command_pool.h · command_pool.cpp

In Vulkan, you don't call the GPU directly. You record commands into a command buffer, and then submit the entire buffer to the GPU at once. Think of it like writing a batch script instead of typing commands one by one — you prepare all the instructions ahead of time, and the GPU executes them as a unit.

If you've used Web Workers, the model is similar. You don't call functions on the worker directly — you post a message containing all the work, and the worker processes it asynchronously. Command buffers are the messages, and the GPU is the worker.

struct CommandPool
{
    VkCommandPool pool;
    std::vector<VkCommandBuffer> buffers;

    CommandPool();
    CommandPool(VkDevice device, uint32_t queueFamily, uint32_t bufferCount);
    void Destroy(VkDevice device);
};

A command pool is a memory allocator for command buffers. You create one pool per queue family, and it hands out command buffers from a pre-allocated memory region. This is more efficient than allocating each buffer individually — the same reason Node.js uses a Buffer pool internally.

VkCommandPoolCreateInfo poolInfo{};
poolInfo.flags = VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT;
poolInfo.queueFamilyIndex = queueFamily;

The RESET_COMMAND_BUFFER_BIT flag means we can reset and re-record individual command buffers. Without it, you'd have to reset the entire pool at once. We need per-buffer reset because we re-record commands every frame (the swapchain image index changes).

Command buffers are allocated from the pool. We allocate one per swapchain image — while the GPU is executing commands for one image, we can record commands for the next.

VkCommandBufferAllocateInfo allocInfo{};
allocInfo.commandPool = pool;
allocInfo.level = VK_COMMAND_BUFFER_LEVEL_PRIMARY;
allocInfo.commandBufferCount = bufferCount;

PRIMARY command buffers are submitted directly to the GPU queue. (There are also SECONDARY buffers that are called from primary buffers — like sub-functions, useful for multi-threaded recording. We don't need them yet.)

Step 5: Framebuffers

Framebuffers don't have their own file — they're created in the renderer because they're essentially just glue between the render pass and the swapchain images.

A framebuffer binds specific images to the attachment slots defined in the render pass. Remember, the render pass said "I'll write to one color attachment." The framebuffer says "and that attachment is this specific swapchain image view."

If the render pass is a form template with blank fields, the framebuffer is the filled-in form — it provides the actual images.

VkFramebufferCreateInfo fbInfo{};
fbInfo.renderPass = renderPass.renderPass;
fbInfo.attachmentCount = 1;
fbInfo.pAttachments = attachments;  // the swapchain image view
fbInfo.width = swapchain.extent.width;
fbInfo.height = swapchain.extent.height;
fbInfo.layers = 1;

We create one framebuffer per swapchain image. If the swapchain has 3 images, we have 3 framebuffers. Each one points to its respective image view so that when we start a render pass, Vulkan knows exactly which image to render into.

Step 6: Synchronization

This is the part of Vulkan that has no equivalent in web development. In JavaScript, the event loop handles all timing for you — you call requestAnimationFrame and the browser tells you when to draw. In Vulkan, you manage all the timing yourself, and getting it wrong means frames tear, the GPU reads stale data, or your program crashes.

We need three types of synchronization primitives:

Semaphores (GPU ↔ GPU)

Semaphores synchronize operations within the GPU. They're signals that one GPU operation raises when it's done, and another waits for before starting. Think of them as Promise objects — one operation resolves the promise, and another operation awaits it.

We create two per frame slot:

imageAvailableSemaphore — signaled when the swapchain image has been acquired and is ready to be rendered to
renderFinishedSemaphore — signaled when rendering is complete and the image is ready to be presented on screen

Fences (GPU → CPU)

Fences synchronize the GPU with the CPU. They let the CPU wait until the GPU finishes a specific operation. If semaphores are GPU-to-GPU Promises, fences are await on the CPU side.

We create one fence per frame slot:

inFlightFence — the CPU waits on this before recording new commands, ensuring the GPU is done with the previous frame's commands for this slot

VkFenceCreateInfo fenceInfo{};
fenceInfo.flags = VK_FENCE_CREATE_SIGNALED_BIT;

The SIGNALED_BIT flag starts the fence in the "already done" state. Without this, the very first frame would wait forever — there's no previous frame to finish, so the fence would never be signaled. Starting it as signaled lets the first frame pass through.

Frames in flight

We use two frames in flight (MAX_FRAMES_IN_FLIGHT = 2). This means the CPU can be recording commands for frame N+1 while the GPU is still executing frame N. It's double buffering for command submission — the same concept as the swapchain's image buffering, but applied to the CPU/GPU pipeline.

Each frame slot has its own semaphores and fence, so they don't interfere with each other.

Step 7: Drawing a frame

Source: renderer.cpp

This is where everything comes together. Every single frame, the DrawFrame method executes a sequence that touches every component we've built. Let's walk through it.

1. Wait for the previous frame

vkWaitForFences(device.device, 1, &inFlightFences[currentFrame], VK_TRUE, UINT64_MAX);
vkResetFences(device.device, 1, &inFlightFences[currentFrame]);

Before we do anything, we wait for the GPU to finish processing the last frame that used this slot. UINT64_MAX means "wait forever" — we have no timeout, we just block until it's done. Then we reset the fence so it can be used again.

This is the await in our rendering loop. Without it, we'd pile up commands faster than the GPU can execute them.

2. Acquire a swapchain image

uint32_t imageIndex;
vkAcquireNextImageKHR(device.device, swapchain.swapchain, UINT64_MAX,
                      imageAvailableSemaphores[currentFrame], VK_NULL_HANDLE, &imageIndex);

Ask the swapchain for the next available image to render to. The imageAvailableSemaphore will be signaled when the image is ready. We get back an imageIndex — which of the swapchain images we'll use this frame.

3. Record commands

VkCommandBuffer cmd = commandPool.buffers[imageIndex];
vkResetCommandBuffer(cmd, 0);

VkCommandBufferBeginInfo beginInfo{};
beginInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO;
vkBeginCommandBuffer(cmd, &beginInfo);

We grab the command buffer for this image, reset it (clear the previous frame's commands), and start recording. Everything between vkBeginCommandBuffer and vkEndCommandBuffer is a batch of GPU instructions.

The actual draw commands are recorded inside a render pass:

VkClearValue clearColor = {{{0.0f, 0.0f, 0.0f, 1.0f}}};

VkRenderPassBeginInfo rpBegin{};
rpBegin.renderPass = renderPass.renderPass;
rpBegin.framebuffer = framebuffers[imageIndex];
rpBegin.renderArea.extent = swapchain.extent;
rpBegin.clearValueCount = 1;
rpBegin.pClearValues = &clearColor;

vkCmdBeginRenderPass(cmd, &rpBegin, VK_SUBPASS_CONTENTS_INLINE);
vkCmdBindPipeline(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, pipeline.pipeline);
vkCmdDraw(cmd, 3, 1, 0, 0);
vkCmdEndRenderPass(cmd);

vkEndCommandBuffer(cmd);

The triple-brace {{{0.0f, 0.0f, 0.0f, 1.0f}}} for the clear color looks unusual. This is because VkClearValue is a union — a C/C++ type that can hold different data in the same memory (a color, a depth value, or a stencil value). The outer braces initialize the union, the middle braces initialize its color member, and the inner braces initialize the float32[4] array inside the color. It's like a discriminated union in TypeScript (type ClearValue = { color: [number, number, number, number] } | { depth: number }) except without any tag field — you just write to whichever variant you need.

Three commands do all the work:

vkCmdBeginRenderPass — start the render pass, targeting the framebuffer for this swapchain image. The clear color (black, fully opaque) is applied here because we set loadOp = CLEAR.
vkCmdBindPipeline — activate our graphics pipeline (which shaders to use, how to rasterize, etc.).
vkCmdDraw(cmd, 3, 1, 0, 0) — draw 3 vertices, 1 instance. This invokes the vertex shader 3 times (once per vertex), the rasterizer generates fragments for the triangle, and the fragment shader colors each pixel.

4. Submit to the GPU

VkSubmitInfo submitInfo{};
submitInfo.waitSemaphoreCount = 1;
submitInfo.pWaitSemaphores = waitSemaphores;           // wait for image to be available
submitInfo.pWaitDstStageMask = waitStages;             // wait at the color output stage
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &cmd;
submitInfo.signalSemaphoreCount = 1;
submitInfo.pSignalSemaphores = signalSemaphores;       // signal when rendering is done

vkQueueSubmit(device.graphicsQueue, 1, &submitInfo, inFlightFences[currentFrame]);

We submit the recorded command buffer to the GPU's graphics queue. The submit info ties together the synchronization:

Wait on imageAvailableSemaphore before writing color output
Signal renderFinishedSemaphore when rendering completes
Signal inFlightFence so the CPU knows this frame's commands are done

5. Present

VkPresentInfoKHR presentInfo{};
presentInfo.waitSemaphoreCount = 1;
presentInfo.pWaitSemaphores = signalSemaphores;    // wait for rendering to finish
presentInfo.swapchainCount = 1;
presentInfo.pSwapchains = swapchains;
presentInfo.pImageIndices = &imageIndex;

vkQueuePresentKHR(device.presentQueue, &presentInfo);

Tell the presentation engine: "once the renderFinishedSemaphore is signaled, display image imageIndex on screen." The presentation engine handles the actual display timing (vsync, etc.).

6. Advance the frame counter

currentFrame = (currentFrame + 1) % MAX_FRAMES_IN_FLIGHT;

Cycle to the next frame slot. With MAX_FRAMES_IN_FLIGHT = 2, this alternates between 0 and 1.

The full frame in one picture

CPU                                    GPU
 │                                      │
 ├─ Wait for fence[N] ─────────────────►│ (GPU finishes previous frame N)
 ├─ Acquire image ─────────────────────►│ → signals imageAvailable[N]
 ├─ Record commands                     │
 ├─ Submit commands ───────────────────►│ waits imageAvailable[N]
 │                                      ├─ Execute render pass
 │                                      ├─ → signals renderFinished[N]
 │                                      ├─ → signals fence[N]
 ├─ Present ───────────────────────────►│ waits renderFinished[N]
 │                                      ├─ Display image
 ├─ N = (N + 1) % 2                     │
 └─ repeat                              │

The updated application

Source: application.cpp

The initialization sequence has grown to include all the new components:

Application::Application(int width, int height, const char* title)
    : window(width, height, title), renderer(),
      surface(renderer.instance.instance, window.window)
{
    renderer.InitDevice(surface.surface);
    renderer.InitSwapchain(surface.surface,
                           static_cast<uint32_t>(width),
                           static_cast<uint32_t>(height));
    renderer.InitPipeline();
    renderer.InitCommands();
    renderer.InitFramebuffers();
    renderer.InitSync();
}

The initialization order matters. Each step depends on what came before:

Order	Component	Depends on
1	Window	nothing
2	Instance	nothing
3	Surface	Instance + Window
4	Device	Instance + Surface
5	Swapchain	Device + Surface
6	Render Pass	Device + Swapchain format
7	Shaders	Device
8	Pipeline	Device + Render Pass + Shaders + Swapchain extent
9	Commands	Device + Swapchain image count
10	Framebuffers	Device + Render Pass + Swapchain image views
11	Sync	Device

And the game loop now draws:

void Application::Run()
{
    while (!window.ShouldClose())
    {
        window.PollEvents();
        renderer.DrawFrame();
    }

    Shutdown();
}

One line — renderer.DrawFrame() — is all it took to go from an empty window to a rendered triangle. That's the value of the layered architecture we've been building. The complexity of synchronization, command recording, pipeline binding — none of it leaks into the application layer.

Destruction now includes everything, in reverse order:

void Renderer::Destroy()
{
    vkDeviceWaitIdle(device.device);
    // semaphores, fences, framebuffers, command pool,
    // pipeline, render pass, swapchain, device, instance
}

The vkDeviceWaitIdle call at the top is critical — it blocks until the GPU has finished all submitted work. Without it, we'd start destroying resources while the GPU is still using them, which would crash (or worse, corrupt memory silently).

The final result

A rainbow triangle on a black background — proof that every piece of the pipeline is working. The vertex shader placed three vertices, the rasterizer filled the triangle, the fragment shader colored each pixel using the GPU's hardware interpolation, and the presentation engine displayed the result on screen.

What's next

We now have a working renderer. A triangle appears on screen, the colors are interpolated by the GPU, and the frame loop runs with proper synchronization. But the triangle is hardcoded in the shader — no vertex data comes from the CPU, no camera exists, and nothing moves.

In the next part, we'll render a full-screen quad with a procedural checker texture — generated entirely in the fragment shader, no image files needed. This will introduce UV coordinates, shader math, and the basics of pattern generation on the GPU.

DEV Community