Charles Fonseca

Posted on Oct 28 • Originally published at charlesfonseca.substack.com

Building a Redis Clone in Zig—Part 3

#programming #tutorial #design #zig

In this series we are going through the hard parts; if you want to see the whole implementation, refer to the GitHub repository.

Why is there a software craftsmanship movement? What motivated it? What drives it now? One thing; and one thing only.
We are tired of writing crap.
— Robert C. Martin

In Part 2, we squeezed performance from our in-memory store through string interning and hash map optimizations. But what happens when the power goes out? In this article, we’ll make our Redis clone durable by implementing RDB snapshots—and learn some hard lessons about Zig’s IO interface along the way.

Implementing the RDB Writer

The RDB file format is a compact binary format used by Redis to persist its in-memory data to disk. It’s meant to be used as a snapshot of the database at a specific time, not for real-time replication. The format consists of a series of opcodes that represent different data types and commands. The specification can be found in the RDB File Format documentation.

To implement the RDB writer, we will create a new module called rdb.zig. This module will contain functions to serialize our in-memory data structures into the RDB format. We will start by defining the basic structure of our RDB writer.

pub const Writer = struct {
    allocator: std.mem.Allocator,
    buffer: *[1024]u8,
    file: std.fs.File,
    store: *Store,
    writer: std.fs.File.Writer,

    pub fn init(allocator: std.mem.Allocator, store: *Store, fileName: []const u8) !Writer {
        fs.cwd().deleteFile(fileName) catch |err| switch (err) {
            error.FileNotFound => {},
            else => return err,
        };
        const file = try std.fs.cwd().createFile(fileName, .{ .truncate = true });
        const buffer = try allocator.create([1024]u8);
        const writer = file.writer(buffer);

        return .{
            .allocator = allocator,
            .buffer = buffer,
            .file = file,
            .store = store,
            .writer = writer,
        };
    }
}

What about the so-called new IO interface?

For those following along with the latest Zig developments, you might have noticed that the IO interface has been evolving quite a bit. It introduces the std.Io.Writer and std.Io.Reader interfaces, which provide a more flexible way to handle IO operations. In our implementation, we keep the std.fs.File.Writer, why not store it as a std.Io.Writer?

pub const Writer = struct {
    allocator: std.mem.Allocator,
    buffer: *[1024]u8,
    file: std.fs.File,
    store: *Store,
    writer: *std.Io.Writer,

    pub fn init(allocator: std.mem.Allocator, store: *Store, fileName: []const u8) !Writer {
        fs.cwd().deleteFile(fileName) catch |err| switch (err) {
            error.FileNotFound => {},
            else => return err,
        };
        const file = try std.fs.cwd().createFile(fileName, .{ .truncate = true });
        const buffer = try allocator.create([1024]u8);
        var file_writer = file.writer(buffer);

        return .{
            .allocator = allocator,
            .buffer = buffer,
            .file = file,
            .store = store,
            .writer = &file_writer.interface,
        };
    }
}

This code will compile just fine, but when you run it, you might encounter unexpected behavior or crashes. The reason is that @fieldParentPtr is used internally by the new IO interface to get the parent struct from an interface pointer. Since file_writer is a local variable, once the init function returns, the memory for file_writer is no longer valid. Any attempt to use writer afterward will lead to undefined behavior. You can read more about this in Karl Seguin’s excellent post on the topic: Zig’s new Writer.

Continuing the Writer Implementation

While the RDB format specification is extensive, this article focuses on the implementation patterns for any binary file writer in Zig. For complete RDB format details, see the official spec and Zedis's RDB implementation. Let’s continue with our implementation by adding functions to write the RDB header and auxiliary fields.

fn writeAuxFields(self: *Writer) !void {
    // Magic String
    _ = try self.writer.interface.write(”REDIS”);
    // RDB Version
    _ = try self.writer.interface.write(”0012”);

    // Auxiliary fields
    // Redis version
    try self.writeMetadata(”redis-ver”, .{ .string = “255.255.255” });

    const bits = if (@sizeOf(usize) == 8) 64 else 32;
    try self.writeMetadata(”redis-bits”, .{ .int = bits });

    const now_timestamp = std.time.timestamp();
    try self.writeMetadata(”ctime”, .{ .int = now_timestamp });

    try self.writeMetadata(”used-mem”, .{ .int = 0 });

    try self.writeMetadata(”aof-base”, .{ .int = 0 });
}

fn writeHeader(self: *Writer) !void {
    const OPCODE_SELECT_DB = 0xFE;
    const OPCODE_RESIZE_DB = 0xFB;

    try self.writeAuxFields();

    try self.writer.interface.writeByte(OPCODE_SELECT_DB);
    try self.writeLength(0x00);

    try self.writer.interface.writeByte(OPCODE_RESIZE_DB);
    try self.writeLength(self.store.size());
    // TODO
    try self.writeLength(0);
}

pub fn writeFile(self: *Writer) !void {
    try self.writeHeader();
    try self.writeCache();
    try self.writeEndOfFile();

    try self.writer.interface.flush();
}

It’s a bit of a pain to have to use self.writer.interface everywhere, but it’s necessary to get things working correctly.

Writing the CRC64 Checksum

To make sure our RDB files are valid, we need to compute and write a CRC64 checksum at the end of the file. Zig has the Redis CRC implementation built-in —std.hash.crc.Crc64Redis—, so we don't need to do it ourselves, though I did for learning purposes and didn't realize Zig already had it.

The easiest way to write the checksum, yet the most inefficient, is to read back the entire file and compute the checksum. That would be outrageous! We can do better!

Computing the Checksum on the Fly

To compute the checksum on the fly, we can maintain a running CRC64 value as we write the file. This way, we don’t need to read the file back to compute the checksum.

Here’s how we can implement this:

Initialize the CRC64 value to the initial value.
Update the CRC64 value with each write operation.
Write the final CRC64 value at the end of the file.

The new IO interface shines brightly here, as we can create a custom writer, keeping the std.Io.Writer interface and updating the CRC64 value every write. This is how we can do it:

// WriterCrc.zig
const std = @import(”std”);

underlying_writer: *std.Io.Writer,
checksum: std.hash.crc.Crc64Redis,
interface: std.Io.Writer,

const WriterCrc = @This();

pub fn init(underlying: *std.Io.Writer, buffer: []u8) WriterCrc {
    return .{
        .underlying_writer = underlying,
        .checksum = .init(),
        .interface = .{
            .vtable = &.{
                .drain = drain,
                .flush = flush,
                .sendFile = std.Io.Writer.unimplementedSendFile,
                .rebase = std.Io.Writer.defaultRebase,
            },
            .buffer = buffer,
        },
    };
}

fn drain(w: *std.Io.Writer, data: []const []const u8, splat: usize) std.Io.Writer.Error!usize {
    _ = splat;
    const self: *WriterCrc = @fieldParentPtr(”interface”, w);

    // Capture the bytes and update checksum
    const bytes_to_write = data[0];
    self.checksum.update(bytes_to_write);

    // Write to underlying writer (don’t flush here - that’s done by the flush vtable function)
    try self.underlying_writer.writeAll(bytes_to_write);

    return bytes_to_write.len;
}

fn flush(w: *std.Io.Writer) std.Io.Writer.Error!void {
    const self: *WriterCrc = @fieldParentPtr(”interface”, w);

    // First, drain any buffered data in WriterCrc’s own buffer
    while (w.end > 0) {
        _ = try drain(w, &.{w.buffer[0..w.end]}, 0);
        w.end = 0;
    }

    // Then flush the underlying writer
    try self.underlying_writer.flush();
}

pub fn writer(self: *WriterCrc) *std.Io.Writer {
    return &self.interface;
}

pub fn getChecksum(self: *WriterCrc) u64 {
    return self.checksum.final();
}

Therefore, we can modify our RDB Writer struct to use this WriterCrc:

pub const Writer = struct {
    allocator: std.mem.Allocator,
    buffer: []u8,
    crc_buffer: []u8,
    file: std.fs.File,
    file_writer: *std.fs.File.Writer,
    store: *Store,
    writer_crc: WriterCrc,

    pub fn init(allocator: std.mem.Allocator, store: *Store, fileName: []const u8) !Writer {
        fs.cwd().deleteFile(fileName) catch |err| switch (err) {
            error.FileNotFound => {},
            else => return err,
        };
        const file = try std.fs.cwd().createFile(fileName, .{ .truncate = true });

        const buffer_size = 256 * 1024; // 256KB

        const buffer = try allocator.alloc(u8, buffer_size);
        errdefer allocator.free(buffer);

        const crc_buffer = try allocator.alloc(u8, buffer_size);
        errdefer allocator.free(crc_buffer);

        // Allocate file_writer on heap so its address doesn’t change
        const file_writer = try allocator.create(std.fs.File.Writer);
        errdefer allocator.destroy(file_writer);

        file_writer.* = file.writer(buffer);

        // Now initialize WriterCrc with a pointer to the heap-allocated file_writer
        const writer_crc = WriterCrc.init(&file_writer.interface, crc_buffer);

        return .{
            .allocator = allocator,
            .buffer = buffer,
            .crc_buffer = crc_buffer,
            .file = file,
            .file_writer = file_writer,
            .store = store,
            .writer_crc = writer_crc,
        };
    }
}

With this setup, every time we write data using self.writer_crc.writer(), the CRC64 checksum will be updated automatically. Why do we have two buffers, though? One for the file writer and another for the CRC writer?

The reason is that each writer maintains its buffer for efficiency. The WriterCrc needs its buffer to accumulate writes before updating the checksum, while the underlying file writer has its buffer to optimize disk writes. This two-layer buffering ensures that both the checksum calculation and file writing are efficient.

Why This Matters for Performance

Without the CRC layer’s buffer, we’d update the checksum on every single byte write, even if those bytes haven’t been flushed yet. With buffering:

Small writes accumulate in crc_buffer before checksum computation.
Checksum updates happen in larger chunks (more cache-friendly).
File writes are also batched (fewer syscalls to the kernel).

This means writing 1000 individual bytes results in far fewer checksum updates and syscalls than 1000 separate operations. The performance difference is significant for large RDB files.

Here’s a diagram to illustrate the flow of data:

When we’re done writing, we can get the final checksum and add it to the file.

fn writeEndOfFile(self: *Writer) !void {
    const OPCODE_EOF = 0xFF;
    try self.writer_crc.writer().writeByte(OPCODE_EOF);

    // Get the accumulated checksum from all writes
    const checksum = self.writer_crc.getChecksum();

    // Write checksum
    try self.writer_crc.writer().writeInt(u64, checksum, .little);
}

When to Use RDB vs. AOF

Now that we’ve implemented RDB snapshots, you might wonder, when should you use them? Understanding the trade-offs helps you choose the right persistence strategy.

RDB snapshots are great for:

Backup and restore scenarios —compact binary format transfers quickly.
Faster restarts with large datasets —loading RDB is faster than replaying millions of commands.
Point-in-time snapshots —great for disaster recovery with periodic backups.
Lower disk I/O —snapshots happen periodically, not on every write.

But they’re not ideal for:

Minimal data loss —you lose all writes since the last snapshot on crash.
Real-time durability —there's always a window of data loss between snapshots.
Frequently changing data —if data changes faster than the snapshot interval, you’re always behind.

AOF (Append-Only File) complements RDB by:

Logging every write command as it happens (minimal data loss).
Providing point-in-time recovery (replay up to any command).
Trading disk I/O for durability (every write hits disk).

That’s why Redis uses both strategies: RDB for fast restarts and backups, AOF for durability. You can configure Redis to use RDB-only, AOF-only, or both (the safest option). In production, most deployments use both: RDB for snapshots and AOF for durability between snapshots.

In our implementation, we’ve built the foundation for RDB. In future parts, we’ll add AOF logging to achieve Redis-level durability guarantees.

Wrapping Up

We’ve built a production-ready RDB writer from scratch, learning:

How to safely work with Zig’s new IO interface.
Why pointer lifetimes matter with @fieldParentPtr.
How to compute checksums on-the-fly without performance penalties.
When to choose RDB vs. AOF for persistence.

In Part 4, we’ll implement the reader side and discover how Redis handles backward compatibility with older RDB versions.

Learn More

Enjoy this post? Get new ones delivered straight to your inbox. Subscribe for free and support my work.

DEV Community