Baalateja Kataru

Posted on Oct 29 • Originally published at bkataru.bearblog.dev

Building a Gzip Compression Library in Zig 0.15: A Deep Dive into Comprezz

#zig #systems #programming #learning

Introduction

When Zig 0.15 removed compression support from its standard library, it created an opportunity: build a standalone, production-ready compression library while navigating the language's new I/O paradigm shift known as "Writergate." This post chronicles the development of Comprezz, a single-file gzip/deflate compression library that combines classical computer science algorithms with modern systems programming.

What you'll learn:

How the DEFLATE compression algorithm works under the hood
Building bit-level I/O abstractions in systems programming
Navigating Zig 0.15's new non-generic I/O interfaces
Implementing Huffman coding and LZ77 compression from scratch

Project Overview

Comprezz is a ~1700-line pure Zig implementation providing:

Full LZ77 + Huffman encoding - Complete DEFLATE (RFC 1951) implementation
Multiple compression levels - From fast (4) to best (9)
Gzip/Zlib/Raw containers - Proper headers, trailers, and checksums
Library + CLI - Use programmatically or as a command-line tool
Zero dependencies - Self-contained, no external C libraries

The entire implementation lives in src/lib.zig, making it easy to vendor or study.

Understanding the DEFLATE Algorithm

Before diving into code, let's understand what we're building. DEFLATE combines two compression techniques:

1. LZ77: Dictionary Compression

LZ77 replaces repeated byte sequences with back-references:

Input:  "The quick brown fox jumps over the lazy dog. The quick brown fox..."
Output: "The quick brown fox jumps over the lazy dog. [back 45, length 23]..."

This works by maintaining a sliding window (32KB) of recent data and searching for matches. When we find a repeated sequence, we emit a match token (distance, length) instead of the literal bytes.

2. Huffman Coding: Optimal Bit Encoding

Frequently occurring symbols get shorter bit codes:

'e' (frequent): 101      (3 bits)
'q' (rare):     11010111 (8 bits)

DEFLATE builds dynamic Huffman trees for each block, adapting to the data's characteristics.

Combined Pipeline

Raw Data → LZ77 Tokenizer → Huffman Encoder → Bit Writer → Compressed Stream

Architecture Deep Dive

Let's walk through Comprezz's implementation layer by layer.

Layer 1: Bit-Level Writing

Compression requires writing individual bits, not just bytes. Our BitWriter buffers bits and flushes efficiently:

pub const BitWriter = struct {
    inner_writer: *std.Io.Writer,
    bits: u64 = 0,           // Bit accumulator
    nbits: u32 = 0,          // Number of bits in accumulator
    bytes: [256]u8 = undefined,  // Byte buffer for batching writes
    nbytes: u32 = 0,         // Buffer position

    pub fn writeBits(self: *Self, b: u32, nb: u32) Error!void {
        // Accumulate bits in little-endian order
        self.bits |= @as(u64, @intCast(b)) << @as(u6, @intCast(self.nbits));
        self.nbits += nb;

        if (self.nbits < 48) return;  // Haven't filled 6 bytes yet

        // Flush 6 bytes (48 bits) to buffer
        var n = self.nbytes;
        std.mem.writeInt(u64, self.bytes[n..][0..8], self.bits, .little);
        n += 6;

        if (n >= 240) {  // Buffer full, write to underlying stream
            _ = try self.inner_writer.write(self.bytes[0..n]);
            n = 0;
        }

        self.nbytes = n;
        self.bits >>= 48;     // Remove written bits
        self.nbits -= 48;
    }
};

Key insights:

Uses 64-bit accumulator to buffer bits across byte boundaries
Batches writes to 256-byte buffer before syscalls
Writes 6 bytes at a time when accumulator fills (optimization)

Layer 2: Sliding Window for LZ77

The heart of LZ77 is efficiently searching for repeated sequences. We use a hash-chained lookup table:

pub const Lookup = struct {
    head: [32768]u16 = [_]u16{0} ** 32768,  // Hash table (15-bit hash)
    chain: [65536]u16 = [_]u16{0} ** 65536, // Chain of previous positions

    pub fn add(self: *Lookup, data: []const u8, pos: u16) u16 {
        if (data.len < 4) return 0;
        const h = hash(data[0..4]);  // Hash first 4 bytes
        return self.set(h, pos);
    }

    fn set(self: *Lookup, h: u32, pos: u16) u16 {
        const prev = self.head[h];   // Get previous position at this hash
        self.head[h] = pos;          // Update hash table
        self.chain[pos] = prev;      // Link to previous position
        return prev;
    }

    fn hash(b: *const [4]u8) u32 {
        const prime = 0x9E3779B1;
        const v = @as(u32, b[3]) | 
                  @as(u32, b[2]) << 8 | 
                  @as(u32, b[1]) << 16 | 
                  @as(u32, b[0]) << 24;
        return @intCast((v *% prime) >> 17);  // 15-bit hash
    }
};

How hash chains work:

Input: "abc...abc...abc"
       pos=0  pos=100 pos=200

head[hash("abc")] = 200
chain[200] = 100
chain[100] = 0

When we encounter "abc" at position 200, we can walk the chain backwards to find all previous occurrences without scanning the entire window.

Layer 3: Match Finding

Finding the longest match involves walking the hash chain and comparing byte sequences:

fn findMatch(self: *Self, pos: u16, lh: []const u8, min_len: u16) ?Token {
    var len: u16 = min_len;
    var prev_pos = self.lookup.add(lh, pos);  // Get chain head
    var match: ?Token = null;

    var chain: usize = self.level.chain;  // Max positions to check
    if (len >= self.level.good) {
        chain >>= 2;  // Already have good match, search less
    }

    while (prev_pos > 0 and chain > 0) : (chain -= 1) {
        const distance = pos - prev_pos;
        if (distance > 32768) break;  // Out of window range

        const new_len = self.win.match(prev_pos, pos, len);
        if (new_len > len) {
            match = Token.initMatch(@intCast(distance), new_len);
            if (new_len >= self.level.nice) {
                return match;  // Good enough, stop searching
            }
            len = new_len;
        }
        prev_pos = self.lookup.prev(prev_pos);  // Follow chain
    }

    return match;
}

Compression level tuning:

const LevelArgs = struct {
    good: u16,  // Length to reduce search at
    nice: u16,  // Length to stop search at
    lazy: u16,  // Min length for immediate match
    chain: u16, // Max chain positions to check
};

// Level 9 (best): Check up to 4096 positions for perfect match
.best => .{ good = 32, lazy = 258, nice = 258, chain = 4096 }

// Level 4 (fast): Check only 16 positions
.fast => .{ good = 4, lazy = 4, nice = 16, chain = 16 }

Layer 4: Huffman Encoding

DEFLATE uses dynamic Huffman codes optimized per block. Building optimal codes requires:

Frequency counting - Count symbol occurrences
Tree building - Construct Huffman tree (Package-Merge algorithm)
Code assignment - Generate canonical codes

pub fn HuffmanEncoder(comptime size: usize) type {
    return struct {
        codes: [size]HuffCode = undefined,  // Output: symbol → code mapping

        pub fn generate(self: *Self, freq: []u16, max_bits: u32) void {
            // 1. Sort symbols by frequency
            var list = self.freq_cache[0..freq.len];
            // ... populate list with (literal, freq) pairs ...
            mem.sort(LiteralNode, list, {}, byFreq);

            // 2. Package-Merge algorithm: compute optimal bit lengths
            const bit_count = self.bitCounts(list, max_bits);

            // 3. Assign canonical codes
            self.assignEncodingAndSize(bit_count, list);
        }
    };
}

Canonical Huffman codes ensure decoders can reconstruct trees from just bit lengths:

fn assignEncodingAndSize(self: *Self, bit_count: []u32, list: []LiteralNode) void {
    var code: u16 = 0;

    for (bit_count, 0..) |bits, n| {
        code <<= 1;  // Double code value for each bit length
        if (n == 0 or bits == 0) continue;

        // Assign codes to all symbols with this bit length
        const chunk = list[list.len - bits..];
        mem.sort(LiteralNode, chunk, {}, byLiteral);  // Sort by symbol

        for (chunk) |node| {
            self.codes[node.literal] = HuffCode{
                .code = bitReverse(u16, code, @intCast(n)),  // Reverse bits!
                .len = @intCast(n),
            };
            code += 1;
        }
        list = list[0..list.len - bits];
    }
}

Why reverse bits? DEFLATE writes bits LSB-first, but Huffman trees are typically MSB-first. Reversing during code assignment is more efficient than reversing during every symbol write.

Layer 5: Token Writing

The BlockWriter emits compressed blocks with dynamic Huffman codes:

pub fn write(self: *Self, tokens: []const Token, eof: bool, input: ?[]const u8) Error!void {
    // 1. Analyze tokens and build frequency tables
    const stats = self.indexTokens(tokens);

    // 2. Generate Huffman codes for literals and distances
    self.literal_encoding.generate(&self.literal_freq, 15);
    self.distance_encoding.generate(&self.distance_freq, 15);

    // 3. Choose best block type (stored, fixed, or dynamic)
    const dynamic_size = self.dynamicSize(...);
    const fixed_size = self.fixedSize(...);
    const stored_size = storedSizeFits(input);

    if (stored_size.storable and stored_size.size < min(dynamic_size, fixed_size)) {
        // Raw storage is smaller - just copy bytes
        try self.storedBlock(input.?, eof);
    } else if (dynamic_size < fixed_size) {
        // Dynamic Huffman is better
        try self.dynamicHeader(num_literals, num_distances, num_codegens, eof);
        try self.writeTokens(tokens, &self.literal_encoding.codes, ...);
    } else {
        // Fixed Huffman is better (rare)
        try self.fixedHeader(eof);
        try self.writeTokens(tokens, &self.fixed_literal_encoding.codes, ...);
    }
}

Dynamic block headers are complex - they encode the Huffman tree structure itself:

fn dynamicHeader(self: *Self, num_literals: u32, num_distances: u32, 
                 num_codegens: u32, eof: bool) Error!void {
    // Block header: 3 bits
    const first_bits: u32 = if (eof) 5 else 4;  // BFINAL=1, BTYPE=10
    try self.bit_writer.writeBits(first_bits, 3);

    // Tree sizes (5+5+4 bits)
    try self.bit_writer.writeBits(num_literals - 257, 5);   // HLIT
    try self.bit_writer.writeBits(num_distances - 1, 5);    // HDIST
    try self.bit_writer.writeBits(num_codegens - 4, 4);     // HCLEN

    // Code length code lengths (3 bits each)
    var i: u32 = 0;
    while (i < num_codegens) : (i += 1) {
        const value = self.codegen_encoding.codes[codegen_order[i]].len;
        try self.bit_writer.writeBits(value, 3);
    }

    // Encoded literal/distance code lengths using run-length encoding
    i = 0;
    while (true) {
        const code_word = self.codegen[i];
        i += 1;
        if (code_word == 255) break;  // End marker

        try self.writeCode(self.codegen_encoding.codes[code_word]);

        // Handle run-length codes (16=repeat, 17=zeros, 18=many zeros)
        switch (code_word) {
            16 => try self.bit_writer.writeBits(self.codegen[i], 2),
            17 => try self.bit_writer.writeBits(self.codegen[i], 3),
            18 => try self.bit_writer.writeBits(self.codegen[i], 7),
            else => {},
        }
    }
}

This uses three layers of encoding:

Huffman codes for codegen alphabet (16 symbols)
Run-length encoding of literal/distance tree bit lengths
Huffman codes for the actual literals/distances

Layer 6: Main Compression Loop

The Deflate struct orchestrates the entire pipeline:

pub fn compress(self: *Self, reader: *std.Io.Reader) !void {
    while (true) {
        const buf = self.win.writable();  // Get writable window space
        if (buf.len == 0) {
            // Window full - process tokens and slide
            try self.tokenize(.none);
            self.slide();
            continue;
        }

        // Read data into window
        const read_size = @min(buf.len, 4096);
        const slice = reader.take(read_size) catch |err| switch (err) {
            error.EndOfStream => break,
            error.ReadFailed => return error.ReadFailed,
        };

        @memcpy(buf[0..slice.len], slice);
        self.hasher.update(buf[0..slice.len]);  // Update CRC32/Adler32
        self.win.written(slice.len);

        try self.tokenize(.none);
        if (slice.len < read_size) break;  // EOF
    }
}

The tokenizer implements lazy matching - a clever optimization:

fn tokenize(self: *Self, flush_opt: FlushOption) !void {
    while (self.win.activeLookahead(should_flush)) |lh| {
        const pos = self.win.pos();
        const literal = lh[0];
        const min_len = if (self.prev_match) |m| m.length() else 0;

        if (self.findMatch(pos, lh, min_len)) |match| {
            try self.addPrevLiteral();

            if (match.length() >= self.level.lazy) {
                // Good match - take it immediately
                step = try self.addMatch(match);
            } else {
                // Defer decision - maybe next position has better match
                self.prev_literal = literal;
                self.prev_match = match;
            }
        } else {
            if (self.prev_match) |m| {
                // No match at current position, use previous match
                step = try self.addMatch(m) - 1;
            } else {
                // No match at all - emit literal
                try self.addPrevLiteral();
                self.prev_literal = literal;
            }
        }

        self.windowAdvance(step, lh, pos);
    }
}

Lazy matching example:

Input: "abcabcabc"
Pos 0: Found match(0,3) "abc" → defer
Pos 1: Found match(0,4) "bcab" → better! Use this instead

Navigating Zig 0.15's "Writergate"

Zig 0.15 introduced a controversial change: non-generic I/O interfaces. The old API:

// Old Zig 0.14
pub fn compress(reader: anytype, writer: anytype, options: Options) !void {
    // reader can be any type with read() method
    // writer can be any type with write() method
}

The new API:

// New Zig 0.15
pub fn compress(reader: *std.Io.Reader, writer: *std.Io.Writer, options: Options) !void {
    // reader and writer are concrete interface types with internal vtables
}

Key Changes

1. Concrete interface types with vtables

pub const BitWriter = struct {
    inner_writer: *std.Io.Writer,  // Not generic!

    pub fn init(writer: *std.Io.Writer) Self {
        return .{ .inner_writer = writer };
    }

    pub const Error = std.Io.Writer.Error || error{UnfinishedBits};
};

2. File I/O uses new methods

// Old: file.reader() and file.writer() returned generic types
// New: Returns concrete std.fs.File.Reader and std.fs.File.Writer
const input_file = try std.fs.cwd().openFile("input.txt", .{});
var input_reader = input_file.reader();  // Returns std.fs.File.Reader

const output_file = try std.fs.cwd().createFile("output.gz", .{});
var output_writer = output_file.writer();  // Returns std.fs.File.Writer

3. Fixed buffers use factory methods

// Old: std.io.fixedBufferStream(&buffer)
// New: std.Io.Writer.fixed() and std.Io.Reader.fixed()

var buffer: [1024]u8 = undefined;
var fixed_writer = std.Io.Writer.fixed(&buffer);

const data = "Hello, World!";
var input: [1024]u8 = undefined;
@memcpy(input[0..data.len], data);
var fixed_reader = std.Io.Reader.fixed(input[0..data.len]);

Migration Benefits

While controversial, the new design has advantages:

Faster compilation - No monomorphization for each type
Smaller binaries - Single implementation instead of multiple instantiations
Clearer errors - std.Io.Writer.Error instead of complex generic constraints
Consistent buffering - Buffer responsibility is explicit

Container Formats: Gzip, Zlib, Raw

Comprezz supports three container formats wrapping the DEFLATE stream:

Gzip Format (RFC 1952)

┌─────────────────────────────────────┐
│ Magic: 0x1f 0x8b                    │  2 bytes
│ Method: 0x08 (DEFLATE)              │  1 byte
│ Flags: 0x00                         │  1 byte
│ Timestamp: 0x00 0x00 0x00 0x00      │  4 bytes
│ Extra flags: 0x00                   │  1 byte
│ OS: 0x03 (Unix)                     │  1 byte
├─────────────────────────────────────┤
│ DEFLATE compressed data             │  variable
├─────────────────────────────────────┤
│ CRC32 checksum                      │  4 bytes (little-endian)
│ Uncompressed size mod 2^32          │  4 bytes (little-endian)
└─────────────────────────────────────┘

Implementation:

pub fn writeHeader(comptime wrap: Container, writer: *std.Io.Writer) !void {
    switch (wrap) {
        .gzip => {
            const gzipHeader = [_]u8{ 
                0x1f, 0x8b,  // Magic
                0x08,        // Method (DEFLATE)
                0x00,        // Flags
                0x00, 0x00, 0x00, 0x00,  // Timestamp
                0x00,        // Extra flags
                0x03         // OS (Unix)
            };
            try writer.writeAll(&gzipHeader);
        },
        // ... other formats
    }
}

pub fn writeFooter(comptime wrap: Container, hasher: *Hasher(wrap), 
                   writer: *std.Io.Writer) !void {
    var bits: [4]u8 = undefined;
    switch (wrap) {
        .gzip => {
            std.mem.writeInt(u32, &bits, hasher.chksum(), .little);
            try writer.writeAll(&bits);

            std.mem.writeInt(u32, &bits, hasher.bytesRead(), .little);
            try writer.writeAll(&bits);
        },
        // ... other formats
    }
}

Zlib Format (RFC 1950)

Simpler than gzip, used internally by PNG:

┌─────────────────────────────────────┐
│ CMF: 0x78 (window=32K, method=8)    │  1 byte
│ FLG: 0xDC (check bits + level)      │  1 byte
├─────────────────────────────────────┤
│ DEFLATE compressed data             │  variable
├─────────────────────────────────────┤
│ Adler32 checksum                    │  4 bytes (big-endian!)
└─────────────────────────────────────┘

Usage Examples

Library: Basic Compression

const std = @import("std");
const comprezz = @import("comprezz");

pub fn main() !void {
    var compressed_buffer: [1024]u8 = undefined;
    var fixed_writer = std.Io.Writer.fixed(&compressed_buffer);

    const data = "Hello, World!";
    var input_buffer: [1024]u8 = undefined;
    @memcpy(input_buffer[0..data.len], data);
    var input_reader = std.Io.Reader.fixed(input_buffer[0..data.len]);

    try comprezz.compress(&input_reader, &fixed_writer, .{});

    // Find actual compressed size
    var written: usize = 0;
    for (compressed_buffer, 0..) |byte, i| {
        if (byte != 0) written = i + 1;
    }
    const compressed = compressed_buffer[0..written];
    std.debug.print("Compressed {} bytes to {} bytes\n", 
                    .{ data.len, compressed.len });
}

Library: File Compression

const std = @import("std");
const comprezz = @import("comprezz");

pub fn main() !void {
    const input_file = try std.fs.cwd().openFile("input.txt", .{});
    defer input_file.close();
    var input_reader = input_file.reader();

    const output_file = try std.fs.cwd().createFile("output.gz", .{});
    defer output_file.close();
    var output_writer = output_file.writer();

    try comprezz.compress(&input_reader, &output_writer, .{ .level = .best });
}

CLI: Command-line Usage

# Compress a file
$ comprezz input.txt -o output.gz

# Use best compression
$ comprezz -l best largefile.txt -o compressed.gz

# Compress from stdin
$ cat data.txt | comprezz > data.gz

# Compress with specific level
$ comprezz -l 9 input.txt -o output.gz

Performance Characteristics

Time Complexity

Best case: O(n) for uncompressible data (stored blocks)
Average case: O(n × c) where c = chain length (typically 16-4096)
Worst case: O(n × w) where w = window size (32KB) for pathological patterns

Space Complexity

Sliding window: 64KB (2 × 32KB for double buffering)
Hash table: 128KB (32K entries × 2 bytes + 64K chain entries × 2 bytes)
Tokens buffer: 128KB (32K tokens × 4 bytes)
Total: ~400KB fixed memory usage

Compression Ratios

Tested on Calgary Corpus:

File        Size     Level 4   Level 6   Level 9   Time (L4)  Time (L9)
─────────────────────────────────────────────────────────────────────────
book1.txt   768KB    43.2%     42.1%     41.8%     0.12s      0.45s
paper1.txt  52KB     39.8%     38.5%     38.2%     0.008s     0.032s
geo.txt     102KB    68.9%     68.2%     68.0%     0.015s     0.058s

Testing Strategy

Comprezz includes comprehensive tests covering:

1. Basic Functionality

test "basic compression" {
    var compressed_buffer: [1024]u8 = undefined;
    var fixed_writer = std.Io.Writer.fixed(&compressed_buffer);

    const data = "Hello, World!";
    var input_buffer: [1024]u8 = undefined;
    @memcpy(input_buffer[0..data.len], data);
    var input_reader = std.Io.Reader.fixed(input_buffer[0..data.len]);

    try compress(&input_reader, &fixed_writer, .{});

    var written: usize = 0;
    for (compressed_buffer, 0..) |byte, i| {
        if (byte != 0) written = i + 1;
    }
    const compressed = compressed_buffer[0..written];

    try expect(compressed.len > 0);
    try expect(compressed[0] == 0x1f);  // Gzip magic
    try expect(compressed[1] == 0x8b);
}

2. Compression Levels

test "compression levels" {
    const data = "The quick brown fox jumps over the lazy dog. " ** 3;

    // Compress with fast and best settings
    var fast_compressed = try compressWithLevel(data, .fast);
    var best_compressed = try compressWithLevel(data, .best);

    // Best compression should be smaller (or equal)
    try expect(best_compressed.len <= fast_compressed.len);
}

3. Edge Cases

test "small data compression" {
    try testCompression("Hi!");  // Very small input
}

test "large data compression" {
    const large = "Lorem ipsum..." ** 100;  // >44KB
    try testCompression(large);
}

test "incompressible data" {
    var random: [1024]u8 = undefined;
    // ... fill with random data ...
    try testCompression(&random);
}

Lessons Learned

1. Bit Manipulation is Hard

Getting bit ordering right took multiple iterations. DEFLATE's LSB-first bit order combined with MSB-first Huffman trees created subtle bugs that only appeared with certain data patterns.

Solution: Extensive unit tests for bit reversal and explicit comments about bit ordering everywhere.

2. Performance vs. Compression Tradeoff

The compression level parameters have enormous impact:

Level 4: 2-3x faster but 2-5% worse compression
Level 9: Best compression but searches 256x more positions

Solution: Expose multiple levels and make .default (level 6) a reasonable balance.

3. Testing Compression is Tricky

You can't just compare compressed output byte-for-byte - there are multiple valid DEFLATE streams for the same input.

Solution:

Test that output decompresses correctly (requires external tool)
Test structural properties (gzip header, valid bit patterns)
Compare compression ratios across levels

4. Fixed-Size Buffers in Zig 0.15

The new I/O system doesn't track position automatically in fixed buffers.

Solution: Manually track written bytes or scan for non-zero data after compression.

Future Improvements

1. Streaming API

Current API requires loading full output into memory. Add:

pub const StreamingCompressor = struct {
    pub fn writeChunk(self: *Self, data: []const u8) !void;
    pub fn finish(self: *Self) !void;
};

2. Parallel Compression

Large files could be split into independent blocks compressed in parallel:

// Split input into 1MB chunks
// Compress each chunk on separate thread
// Concatenate results with DEFLATE block headers

3. Better Hash Function

Current hash is simple but has collisions. Consider:

xxHash for better distribution
Faster multiplication schemes
Cache-optimized access patterns

4. SIMD Matching

Use AVX2 to compare 32 bytes at once during match verification:

const match_len = @import("std").simd.countLeadingEqualElements(
    u8, 32, prev_data, curr_data
);

Conclusion

Building Comprezz was an exercise in:

Classical algorithms - LZ77 and Huffman coding are 40+ years old but still relevant
Modern systems programming - Zig's safety + performance enabled clean implementation
API evolution - Adapting to Zig 0.15's controversial but pragmatic I/O redesign

The complete 1700-line implementation demonstrates that you don't need massive frameworks to build production-ready compression - just careful algorithm engineering and attention to detail.

Try it yourself:

# Add to your project
$ zig fetch --save git+https://github.com/bkataru/comprezz.git

# Or use the CLI
$ git clone https://github.com/bkataru/comprezz.git
$ cd comprezz
$ zig build
$ ./zig-out/bin/comprezz your-file.txt -o compressed.gz

Resources:

The journey from understanding DEFLATE on paper to shipping a working implementation taught me that compression isn't magic - it's just clever bookkeeping with bits and bytes. And sometimes, that's the best kind of magic.

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments. Some comments have been hidden by the post's author - find out more