I’ve been writing web applications for years, mostly high-level stuff with frameworks that abstract away the messy details. You know the type: ORMs, HTTP servers, JSON serialization—all the conveniences of modern backend development. But I’ve always been curious about what happens beneath those abstractions. How does a database actually work? What does efficient memory management look like? So a few months ago, I decided to learn Zig by rebuilding Redis from scratch.
Why Redis?
Redis is deceptively simple from the outside: it’s an in-memory key-value store with a straightforward text protocol. But underneath, it’s a masterclass in systems programming—memory management, data structures, network I/O, and concurrency all wrapped up in one project. Perfect for someone trying to bridge the gap between web development and systems programming. Also, almost every company I’ve worked for used it in some way.
The whole promise of Zig is explicit control—memory, error handling, and resources. There’s no garbage collector to save you, no exceptions to bubble up unhandled. If you want to allocate memory, you have to say exactly how, and more importantly, you have to free it yourself. This is precisely what Redis needs: tight control over memory for performance and predictability.
The Protocol
Redis uses RESP (REdis Serialization Protocol), a simple text-based format. A command like SET mykey myvalue
arrives as:
*3\r\n
$3\r\n
SET\r\n
$5\r\n
mykey\r\n
$7\r\n
myvalue\r\n
The *3
means “array of 3 elements.” Each $N
is a bulk string of N bytes. It’s verbose but easy to parse.
Here’s a simplified parser:
fn parseCommand(reader: anytype) ![][]const u8 {
var args = std.ArrayList([]const u8).init(allocator);
// Read first byte to determine type
const type_byte = try reader.readByte();
if (type_byte != ‘*’) return error.InvalidProtocol;
// Read number of arguments
const count = try readInteger(reader);
for (0..count) |_| {
// Each arg starts with $
const dollar = try reader.readByte();
if (dollar != ‘$’) return error.InvalidProtocol;
// Length of this arg
const len = try readInteger(reader);
// Read the actual bytes
const arg = try allocator.alloc(u8, len);
_ = try reader.readAll(arg);
try args.append(arg);
// Consume \r\n
_ = try reader.readByte();
_ = try reader.readByte();
}
return args.toOwnedSlice();
}
Memory Management: The Hard Part
Coming from garbage-collected languages, the hardest adjustment was thinking about ownership. In Zig, when you allocate memory, you own it. You’re responsible for freeing it. There’s no runtime to clean up after you.
Here’s a simple key-value store:
const Store = struct {
map: std.StringHashMap([]const u8),
allocator: std.mem.Allocator,
pub fn init(allocator: std.mem.Allocator) Store {
return .{
.map = std.StringHashMap([]const u8).init(allocator),
.allocator = allocator,
};
}
pub fn set(self: *Store, key: []const u8, value: []const u8) !void {
// Duplicate the key and value so we own them
const owned_key = try self.allocator.dupe(u8, key);
const owned_value = try self.allocator.dupe(u8, value);
// If key exists, free old value
if (self.map.get(key)) |old_value| {
self.allocator.free(old_value);
}
try self.map.put(owned_key, owned_value);
}
pub fn deinit(self: *Store) void {
var iter = self.map.iterator();
while (iter.next()) |entry| {
self.allocator.free(entry.key_ptr.*);
self.allocator.free(entry.value_ptr.*);
}
self.map.deinit();
}
};
This is a great start, but there are some improvements to be made. Most of the keys and values in a typical caching system are small, so SSO (Small String Optimization) can help reduce allocations.
pub const ShortString = struct {
data: [23]u8, // Inline storage
len: u8, // Actual length
pub fn fromSlice(str: []const u8) ShortString {
var ss: ShortString = .{ .data = undefined, .len = @intCast(str.len) };
@memcpy(ss.data[0..str.len], str);
// Zero remaining bytes for consistent hashing
@memset(ss.data[str.len..], 0);
return ss;
}
pub fn asSlice(self: *const ShortString) []const u8 {
return self.data[0..self.len];
}
};
This ShortString
struct can hold strings up to 23 bytes inline, avoiding heap allocations for small keys/values. Usually, a CPU cache line is 64 bytes, allowing room for multiple instances to fit in a single cache line. I still need to evaluate the performance difference between 23 bytes and 15 bytes, but this is a good starting point.
Hash Tables
Zig’s standard library provides a StringHashMap
, which is a good starting point. However, we need to ensure that our keys and values are properly managed in terms of memory ownership. For that, we can use StringHashMapUnmanaged
, which allows us to define custom allocators and manage memory more explicitly.
To go further, we can create an optimized hash map tailored for our use case. We must be able to store integers, short strings, larger strings, and later, more complex types like lists and hashes. Let’s take a look at how we can define this:
pub const ValueType = enum(u8) {
string = 0,
int,
list,
short_string,
};
pub const ZedisValue = union(ValueType) {
string: []const u8,
int: i64,
list: ZedisList,
// Showed before
short_string: ShortString,
};
pub const ZedisObject = struct {
value: ZedisValue,
expiration_time: i64,
// Plan to add more fields in the future
};
const OptimizedHashMap = std.ArrayHashMapUnmanaged([]const u8, ZedisObject, StringContext, true);
This will improve performance by reducing allocations for small strings and integers, which are common in Redis workloads. Additionally, we can process KEYS and SCAN commands more efficiently by iterating over the hash map directly, since the underlying structure is an array.
Memory Pool
Karl Seguin has already written a fantastic article about it; for a more in-depth understanding, see that. In summary, it keeps track of previously destroyed objects so that the same address in memory can be used again, making it an excellent choice for objects that are allocated and released frequently.
pub const Store = struct {
base_allocator: std.mem.Allocator,
allocator: std.mem.Allocator,
// Cache hash map
map: OptimizedHashMap,
// Hybrid memory pools for different string sizes
small_pool: std.heap.MemoryPool([32]u8), // 32 bytes - for keys
medium_pool: std.heap.MemoryPool([128]u8), // 128 bytes - for small values
large_pool: std.heap.MemoryPool([512]u8), // 512 bytes - for medium values
pub fn init(allocator: std.mem.Allocator) Store {
return .{
.base_allocator = allocator,
.allocator = allocator,
.map = .{},
.small_pool = SafeMemoryPool.init(allocator, SMALL_STRING_SIZE),
.medium_pool = SafeMemoryPool.init(allocator, MEDIUM_STRING_SIZE),
.large_pool = SafeMemoryPool.init(allocator, LARGE_STRING_SIZE),
};
}
}
It performs especially well with caching system workloads like rate limiters, session stores, user metrics, etc. With that optimization, the SET command is O(1) rather than O(n) in terms of string allocation; just beware that if the memory pool doesn’t have an address available, it will call the syscall alloc.
Know more
Zedis on GitHub.
Top comments (0)