Introduction
When Zig 0.15 removed compression support from its standard library, it created an opportunity: build a standalone, production-ready compression library while navigating the language's new I/O paradigm shift known as "Writergate." This post chronicles the development of Comprezz, a single-file gzip/deflate compression library that combines classical computer science algorithms with modern systems programming.
What you'll learn:
- How the DEFLATE compression algorithm works under the hood
- Building bit-level I/O abstractions in systems programming
- Navigating Zig 0.15's new non-generic I/O interfaces
- Implementing Huffman coding and LZ77 compression from scratch
Project Overview
Comprezz is a ~1700-line pure Zig implementation providing:
- Full LZ77 + Huffman encoding - Complete DEFLATE (RFC 1951) implementation
- Multiple compression levels - From fast (4) to best (9)
- Gzip/Zlib/Raw containers - Proper headers, trailers, and checksums
- Library + CLI - Use programmatically or as a command-line tool
- Zero dependencies - Self-contained, no external C libraries
The entire implementation lives in src/lib.zig, making it easy to vendor or study.
Understanding the DEFLATE Algorithm
Before diving into code, let's understand what we're building. DEFLATE combines two compression techniques:
1. LZ77: Dictionary Compression
LZ77 replaces repeated byte sequences with back-references:
Input:  "The quick brown fox jumps over the lazy dog. The quick brown fox..."
Output: "The quick brown fox jumps over the lazy dog. [back 45, length 23]..."
This works by maintaining a sliding window (32KB) of recent data and searching for matches. When we find a repeated sequence, we emit a match token (distance, length) instead of the literal bytes.
2. Huffman Coding: Optimal Bit Encoding
Frequently occurring symbols get shorter bit codes:
'e' (frequent): 101      (3 bits)
'q' (rare):     11010111 (8 bits)
DEFLATE builds dynamic Huffman trees for each block, adapting to the data's characteristics.
Combined Pipeline
Raw Data → LZ77 Tokenizer → Huffman Encoder → Bit Writer → Compressed Stream
Architecture Deep Dive
Let's walk through Comprezz's implementation layer by layer.
Layer 1: Bit-Level Writing
Compression requires writing individual bits, not just bytes. Our BitWriter buffers bits and flushes efficiently:
pub const BitWriter = struct {
    inner_writer: *std.Io.Writer,
    bits: u64 = 0,           // Bit accumulator
    nbits: u32 = 0,          // Number of bits in accumulator
    bytes: [256]u8 = undefined,  // Byte buffer for batching writes
    nbytes: u32 = 0,         // Buffer position
    pub fn writeBits(self: *Self, b: u32, nb: u32) Error!void {
        // Accumulate bits in little-endian order
        self.bits |= @as(u64, @intCast(b)) << @as(u6, @intCast(self.nbits));
        self.nbits += nb;
        if (self.nbits < 48) return;  // Haven't filled 6 bytes yet
        // Flush 6 bytes (48 bits) to buffer
        var n = self.nbytes;
        std.mem.writeInt(u64, self.bytes[n..][0..8], self.bits, .little);
        n += 6;
        if (n >= 240) {  // Buffer full, write to underlying stream
            _ = try self.inner_writer.write(self.bytes[0..n]);
            n = 0;
        }
        self.nbytes = n;
        self.bits >>= 48;     // Remove written bits
        self.nbits -= 48;
    }
};
Key insights:
- Uses 64-bit accumulator to buffer bits across byte boundaries
- Batches writes to 256-byte buffer before syscalls
- Writes 6 bytes at a time when accumulator fills (optimization)
Layer 2: Sliding Window for LZ77
The heart of LZ77 is efficiently searching for repeated sequences. We use a hash-chained lookup table:
pub const Lookup = struct {
    head: [32768]u16 = [_]u16{0} ** 32768,  // Hash table (15-bit hash)
    chain: [65536]u16 = [_]u16{0} ** 65536, // Chain of previous positions
    pub fn add(self: *Lookup, data: []const u8, pos: u16) u16 {
        if (data.len < 4) return 0;
        const h = hash(data[0..4]);  // Hash first 4 bytes
        return self.set(h, pos);
    }
    fn set(self: *Lookup, h: u32, pos: u16) u16 {
        const prev = self.head[h];   // Get previous position at this hash
        self.head[h] = pos;          // Update hash table
        self.chain[pos] = prev;      // Link to previous position
        return prev;
    }
    fn hash(b: *const [4]u8) u32 {
        const prime = 0x9E3779B1;
        const v = @as(u32, b[3]) | 
                  @as(u32, b[2]) << 8 | 
                  @as(u32, b[1]) << 16 | 
                  @as(u32, b[0]) << 24;
        return @intCast((v *% prime) >> 17);  // 15-bit hash
    }
};
How hash chains work:
Input: "abc...abc...abc"
       pos=0  pos=100 pos=200
head[hash("abc")] = 200
chain[200] = 100
chain[100] = 0
When we encounter "abc" at position 200, we can walk the chain backwards to find all previous occurrences without scanning the entire window.
Layer 3: Match Finding
Finding the longest match involves walking the hash chain and comparing byte sequences:
fn findMatch(self: *Self, pos: u16, lh: []const u8, min_len: u16) ?Token {
    var len: u16 = min_len;
    var prev_pos = self.lookup.add(lh, pos);  // Get chain head
    var match: ?Token = null;
    var chain: usize = self.level.chain;  // Max positions to check
    if (len >= self.level.good) {
        chain >>= 2;  // Already have good match, search less
    }
    while (prev_pos > 0 and chain > 0) : (chain -= 1) {
        const distance = pos - prev_pos;
        if (distance > 32768) break;  // Out of window range
        const new_len = self.win.match(prev_pos, pos, len);
        if (new_len > len) {
            match = Token.initMatch(@intCast(distance), new_len);
            if (new_len >= self.level.nice) {
                return match;  // Good enough, stop searching
            }
            len = new_len;
        }
        prev_pos = self.lookup.prev(prev_pos);  // Follow chain
    }
    return match;
}
Compression level tuning:
const LevelArgs = struct {
    good: u16,  // Length to reduce search at
    nice: u16,  // Length to stop search at
    lazy: u16,  // Min length for immediate match
    chain: u16, // Max chain positions to check
};
// Level 9 (best): Check up to 4096 positions for perfect match
.best => .{ good = 32, lazy = 258, nice = 258, chain = 4096 }
// Level 4 (fast): Check only 16 positions
.fast => .{ good = 4, lazy = 4, nice = 16, chain = 16 }
Layer 4: Huffman Encoding
DEFLATE uses dynamic Huffman codes optimized per block. Building optimal codes requires:
- Frequency counting - Count symbol occurrences
- Tree building - Construct Huffman tree (Package-Merge algorithm)
- Code assignment - Generate canonical codes
pub fn HuffmanEncoder(comptime size: usize) type {
    return struct {
        codes: [size]HuffCode = undefined,  // Output: symbol → code mapping
        pub fn generate(self: *Self, freq: []u16, max_bits: u32) void {
            // 1. Sort symbols by frequency
            var list = self.freq_cache[0..freq.len];
            // ... populate list with (literal, freq) pairs ...
            mem.sort(LiteralNode, list, {}, byFreq);
            // 2. Package-Merge algorithm: compute optimal bit lengths
            const bit_count = self.bitCounts(list, max_bits);
            // 3. Assign canonical codes
            self.assignEncodingAndSize(bit_count, list);
        }
    };
}
Canonical Huffman codes ensure decoders can reconstruct trees from just bit lengths:
fn assignEncodingAndSize(self: *Self, bit_count: []u32, list: []LiteralNode) void {
    var code: u16 = 0;
    for (bit_count, 0..) |bits, n| {
        code <<= 1;  // Double code value for each bit length
        if (n == 0 or bits == 0) continue;
        // Assign codes to all symbols with this bit length
        const chunk = list[list.len - bits..];
        mem.sort(LiteralNode, chunk, {}, byLiteral);  // Sort by symbol
        for (chunk) |node| {
            self.codes[node.literal] = HuffCode{
                .code = bitReverse(u16, code, @intCast(n)),  // Reverse bits!
                .len = @intCast(n),
            };
            code += 1;
        }
        list = list[0..list.len - bits];
    }
}
Why reverse bits? DEFLATE writes bits LSB-first, but Huffman trees are typically MSB-first. Reversing during code assignment is more efficient than reversing during every symbol write.
Layer 5: Token Writing
The BlockWriter emits compressed blocks with dynamic Huffman codes:
pub fn write(self: *Self, tokens: []const Token, eof: bool, input: ?[]const u8) Error!void {
    // 1. Analyze tokens and build frequency tables
    const stats = self.indexTokens(tokens);
    // 2. Generate Huffman codes for literals and distances
    self.literal_encoding.generate(&self.literal_freq, 15);
    self.distance_encoding.generate(&self.distance_freq, 15);
    // 3. Choose best block type (stored, fixed, or dynamic)
    const dynamic_size = self.dynamicSize(...);
    const fixed_size = self.fixedSize(...);
    const stored_size = storedSizeFits(input);
    if (stored_size.storable and stored_size.size < min(dynamic_size, fixed_size)) {
        // Raw storage is smaller - just copy bytes
        try self.storedBlock(input.?, eof);
    } else if (dynamic_size < fixed_size) {
        // Dynamic Huffman is better
        try self.dynamicHeader(num_literals, num_distances, num_codegens, eof);
        try self.writeTokens(tokens, &self.literal_encoding.codes, ...);
    } else {
        // Fixed Huffman is better (rare)
        try self.fixedHeader(eof);
        try self.writeTokens(tokens, &self.fixed_literal_encoding.codes, ...);
    }
}
Dynamic block headers are complex - they encode the Huffman tree structure itself:
fn dynamicHeader(self: *Self, num_literals: u32, num_distances: u32, 
                 num_codegens: u32, eof: bool) Error!void {
    // Block header: 3 bits
    const first_bits: u32 = if (eof) 5 else 4;  // BFINAL=1, BTYPE=10
    try self.bit_writer.writeBits(first_bits, 3);
    // Tree sizes (5+5+4 bits)
    try self.bit_writer.writeBits(num_literals - 257, 5);   // HLIT
    try self.bit_writer.writeBits(num_distances - 1, 5);    // HDIST
    try self.bit_writer.writeBits(num_codegens - 4, 4);     // HCLEN
    // Code length code lengths (3 bits each)
    var i: u32 = 0;
    while (i < num_codegens) : (i += 1) {
        const value = self.codegen_encoding.codes[codegen_order[i]].len;
        try self.bit_writer.writeBits(value, 3);
    }
    // Encoded literal/distance code lengths using run-length encoding
    i = 0;
    while (true) {
        const code_word = self.codegen[i];
        i += 1;
        if (code_word == 255) break;  // End marker
        try self.writeCode(self.codegen_encoding.codes[code_word]);
        // Handle run-length codes (16=repeat, 17=zeros, 18=many zeros)
        switch (code_word) {
            16 => try self.bit_writer.writeBits(self.codegen[i], 2),
            17 => try self.bit_writer.writeBits(self.codegen[i], 3),
            18 => try self.bit_writer.writeBits(self.codegen[i], 7),
            else => {},
        }
    }
}
This uses three layers of encoding:
- Huffman codes for codegen alphabet (16 symbols)
- Run-length encoding of literal/distance tree bit lengths
- Huffman codes for the actual literals/distances
Layer 6: Main Compression Loop
The Deflate struct orchestrates the entire pipeline:
pub fn compress(self: *Self, reader: *std.Io.Reader) !void {
    while (true) {
        const buf = self.win.writable();  // Get writable window space
        if (buf.len == 0) {
            // Window full - process tokens and slide
            try self.tokenize(.none);
            self.slide();
            continue;
        }
        // Read data into window
        const read_size = @min(buf.len, 4096);
        const slice = reader.take(read_size) catch |err| switch (err) {
            error.EndOfStream => break,
            error.ReadFailed => return error.ReadFailed,
        };
        @memcpy(buf[0..slice.len], slice);
        self.hasher.update(buf[0..slice.len]);  // Update CRC32/Adler32
        self.win.written(slice.len);
        try self.tokenize(.none);
        if (slice.len < read_size) break;  // EOF
    }
}
The tokenizer implements lazy matching - a clever optimization:
fn tokenize(self: *Self, flush_opt: FlushOption) !void {
    while (self.win.activeLookahead(should_flush)) |lh| {
        const pos = self.win.pos();
        const literal = lh[0];
        const min_len = if (self.prev_match) |m| m.length() else 0;
        if (self.findMatch(pos, lh, min_len)) |match| {
            try self.addPrevLiteral();
            if (match.length() >= self.level.lazy) {
                // Good match - take it immediately
                step = try self.addMatch(match);
            } else {
                // Defer decision - maybe next position has better match
                self.prev_literal = literal;
                self.prev_match = match;
            }
        } else {
            if (self.prev_match) |m| {
                // No match at current position, use previous match
                step = try self.addMatch(m) - 1;
            } else {
                // No match at all - emit literal
                try self.addPrevLiteral();
                self.prev_literal = literal;
            }
        }
        self.windowAdvance(step, lh, pos);
    }
}
Lazy matching example:
Input: "abcabcabc"
Pos 0: Found match(0,3) "abc" → defer
Pos 1: Found match(0,4) "bcab" → better! Use this instead
Navigating Zig 0.15's "Writergate"
Zig 0.15 introduced a controversial change: non-generic I/O interfaces. The old API:
// Old Zig 0.14
pub fn compress(reader: anytype, writer: anytype, options: Options) !void {
    // reader can be any type with read() method
    // writer can be any type with write() method
}
The new API:
// New Zig 0.15
pub fn compress(reader: *std.Io.Reader, writer: *std.Io.Writer, options: Options) !void {
    // reader and writer are concrete interface types with internal vtables
}
Key Changes
1. Concrete interface types with vtables
pub const BitWriter = struct {
    inner_writer: *std.Io.Writer,  // Not generic!
    pub fn init(writer: *std.Io.Writer) Self {
        return .{ .inner_writer = writer };
    }
    pub const Error = std.Io.Writer.Error || error{UnfinishedBits};
};
2. File I/O uses new methods
// Old: file.reader() and file.writer() returned generic types
// New: Returns concrete std.fs.File.Reader and std.fs.File.Writer
const input_file = try std.fs.cwd().openFile("input.txt", .{});
var input_reader = input_file.reader();  // Returns std.fs.File.Reader
const output_file = try std.fs.cwd().createFile("output.gz", .{});
var output_writer = output_file.writer();  // Returns std.fs.File.Writer
3. Fixed buffers use factory methods
// Old: std.io.fixedBufferStream(&buffer)
// New: std.Io.Writer.fixed() and std.Io.Reader.fixed()
var buffer: [1024]u8 = undefined;
var fixed_writer = std.Io.Writer.fixed(&buffer);
const data = "Hello, World!";
var input: [1024]u8 = undefined;
@memcpy(input[0..data.len], data);
var fixed_reader = std.Io.Reader.fixed(input[0..data.len]);
Migration Benefits
While controversial, the new design has advantages:
- Faster compilation - No monomorphization for each type
- Smaller binaries - Single implementation instead of multiple instantiations
- 
Clearer errors - std.Io.Writer.Errorinstead of complex generic constraints
- Consistent buffering - Buffer responsibility is explicit
Container Formats: Gzip, Zlib, Raw
Comprezz supports three container formats wrapping the DEFLATE stream:
Gzip Format (RFC 1952)
┌─────────────────────────────────────┐
│ Magic: 0x1f 0x8b                    │  2 bytes
│ Method: 0x08 (DEFLATE)              │  1 byte
│ Flags: 0x00                         │  1 byte
│ Timestamp: 0x00 0x00 0x00 0x00      │  4 bytes
│ Extra flags: 0x00                   │  1 byte
│ OS: 0x03 (Unix)                     │  1 byte
├─────────────────────────────────────┤
│ DEFLATE compressed data             │  variable
├─────────────────────────────────────┤
│ CRC32 checksum                      │  4 bytes (little-endian)
│ Uncompressed size mod 2^32          │  4 bytes (little-endian)
└─────────────────────────────────────┘
Implementation:
pub fn writeHeader(comptime wrap: Container, writer: *std.Io.Writer) !void {
    switch (wrap) {
        .gzip => {
            const gzipHeader = [_]u8{ 
                0x1f, 0x8b,  // Magic
                0x08,        // Method (DEFLATE)
                0x00,        // Flags
                0x00, 0x00, 0x00, 0x00,  // Timestamp
                0x00,        // Extra flags
                0x03         // OS (Unix)
            };
            try writer.writeAll(&gzipHeader);
        },
        // ... other formats
    }
}
pub fn writeFooter(comptime wrap: Container, hasher: *Hasher(wrap), 
                   writer: *std.Io.Writer) !void {
    var bits: [4]u8 = undefined;
    switch (wrap) {
        .gzip => {
            std.mem.writeInt(u32, &bits, hasher.chksum(), .little);
            try writer.writeAll(&bits);
            std.mem.writeInt(u32, &bits, hasher.bytesRead(), .little);
            try writer.writeAll(&bits);
        },
        // ... other formats
    }
}
Zlib Format (RFC 1950)
Simpler than gzip, used internally by PNG:
┌─────────────────────────────────────┐
│ CMF: 0x78 (window=32K, method=8)    │  1 byte
│ FLG: 0xDC (check bits + level)      │  1 byte
├─────────────────────────────────────┤
│ DEFLATE compressed data             │  variable
├─────────────────────────────────────┤
│ Adler32 checksum                    │  4 bytes (big-endian!)
└─────────────────────────────────────┘
Usage Examples
Library: Basic Compression
const std = @import("std");
const comprezz = @import("comprezz");
pub fn main() !void {
    var compressed_buffer: [1024]u8 = undefined;
    var fixed_writer = std.Io.Writer.fixed(&compressed_buffer);
    const data = "Hello, World!";
    var input_buffer: [1024]u8 = undefined;
    @memcpy(input_buffer[0..data.len], data);
    var input_reader = std.Io.Reader.fixed(input_buffer[0..data.len]);
    try comprezz.compress(&input_reader, &fixed_writer, .{});
    // Find actual compressed size
    var written: usize = 0;
    for (compressed_buffer, 0..) |byte, i| {
        if (byte != 0) written = i + 1;
    }
    const compressed = compressed_buffer[0..written];
    std.debug.print("Compressed {} bytes to {} bytes\n", 
                    .{ data.len, compressed.len });
}
Library: File Compression
const std = @import("std");
const comprezz = @import("comprezz");
pub fn main() !void {
    const input_file = try std.fs.cwd().openFile("input.txt", .{});
    defer input_file.close();
    var input_reader = input_file.reader();
    const output_file = try std.fs.cwd().createFile("output.gz", .{});
    defer output_file.close();
    var output_writer = output_file.writer();
    try comprezz.compress(&input_reader, &output_writer, .{ .level = .best });
}
CLI: Command-line Usage
# Compress a file
$ comprezz input.txt -o output.gz
# Use best compression
$ comprezz -l best largefile.txt -o compressed.gz
# Compress from stdin
$ cat data.txt | comprezz > data.gz
# Compress with specific level
$ comprezz -l 9 input.txt -o output.gz
Performance Characteristics
Time Complexity
- Best case: O(n) for uncompressible data (stored blocks)
- Average case: O(n × c) where c = chain length (typically 16-4096)
- Worst case: O(n × w) where w = window size (32KB) for pathological patterns
Space Complexity
- Sliding window: 64KB (2 × 32KB for double buffering)
- Hash table: 128KB (32K entries × 2 bytes + 64K chain entries × 2 bytes)
- Tokens buffer: 128KB (32K tokens × 4 bytes)
- Total: ~400KB fixed memory usage
Compression Ratios
Tested on Calgary Corpus:
File        Size     Level 4   Level 6   Level 9   Time (L4)  Time (L9)
─────────────────────────────────────────────────────────────────────────
book1.txt   768KB    43.2%     42.1%     41.8%     0.12s      0.45s
paper1.txt  52KB     39.8%     38.5%     38.2%     0.008s     0.032s
geo.txt     102KB    68.9%     68.2%     68.0%     0.015s     0.058s
Testing Strategy
Comprezz includes comprehensive tests covering:
1. Basic Functionality
test "basic compression" {
    var compressed_buffer: [1024]u8 = undefined;
    var fixed_writer = std.Io.Writer.fixed(&compressed_buffer);
    const data = "Hello, World!";
    var input_buffer: [1024]u8 = undefined;
    @memcpy(input_buffer[0..data.len], data);
    var input_reader = std.Io.Reader.fixed(input_buffer[0..data.len]);
    try compress(&input_reader, &fixed_writer, .{});
    var written: usize = 0;
    for (compressed_buffer, 0..) |byte, i| {
        if (byte != 0) written = i + 1;
    }
    const compressed = compressed_buffer[0..written];
    try expect(compressed.len > 0);
    try expect(compressed[0] == 0x1f);  // Gzip magic
    try expect(compressed[1] == 0x8b);
}
2. Compression Levels
test "compression levels" {
    const data = "The quick brown fox jumps over the lazy dog. " ** 3;
    // Compress with fast and best settings
    var fast_compressed = try compressWithLevel(data, .fast);
    var best_compressed = try compressWithLevel(data, .best);
    // Best compression should be smaller (or equal)
    try expect(best_compressed.len <= fast_compressed.len);
}
3. Edge Cases
test "small data compression" {
    try testCompression("Hi!");  // Very small input
}
test "large data compression" {
    const large = "Lorem ipsum..." ** 100;  // >44KB
    try testCompression(large);
}
test "incompressible data" {
    var random: [1024]u8 = undefined;
    // ... fill with random data ...
    try testCompression(&random);
}
Lessons Learned
1. Bit Manipulation is Hard
Getting bit ordering right took multiple iterations. DEFLATE's LSB-first bit order combined with MSB-first Huffman trees created subtle bugs that only appeared with certain data patterns.
Solution: Extensive unit tests for bit reversal and explicit comments about bit ordering everywhere.
2. Performance vs. Compression Tradeoff
The compression level parameters have enormous impact:
- Level 4: 2-3x faster but 2-5% worse compression
- Level 9: Best compression but searches 256x more positions
Solution: Expose multiple levels and make .default (level 6) a reasonable balance.
3. Testing Compression is Tricky
You can't just compare compressed output byte-for-byte - there are multiple valid DEFLATE streams for the same input.
Solution:
- Test that output decompresses correctly (requires external tool)
- Test structural properties (gzip header, valid bit patterns)
- Compare compression ratios across levels
4. Fixed-Size Buffers in Zig 0.15
The new I/O system doesn't track position automatically in fixed buffers.
Solution: Manually track written bytes or scan for non-zero data after compression.
Future Improvements
1. Streaming API
Current API requires loading full output into memory. Add:
pub const StreamingCompressor = struct {
    pub fn writeChunk(self: *Self, data: []const u8) !void;
    pub fn finish(self: *Self) !void;
};
2. Parallel Compression
Large files could be split into independent blocks compressed in parallel:
// Split input into 1MB chunks
// Compress each chunk on separate thread
// Concatenate results with DEFLATE block headers
3. Better Hash Function
Current hash is simple but has collisions. Consider:
- xxHash for better distribution
- Faster multiplication schemes
- Cache-optimized access patterns
4. SIMD Matching
Use AVX2 to compare 32 bytes at once during match verification:
const match_len = @import("std").simd.countLeadingEqualElements(
    u8, 32, prev_data, curr_data
);
Conclusion
Building Comprezz was an exercise in:
- Classical algorithms - LZ77 and Huffman coding are 40+ years old but still relevant
- Modern systems programming - Zig's safety + performance enabled clean implementation
- API evolution - Adapting to Zig 0.15's controversial but pragmatic I/O redesign
The complete 1700-line implementation demonstrates that you don't need massive frameworks to build production-ready compression - just careful algorithm engineering and attention to detail.
Try it yourself:
# Add to your project
$ zig fetch --save git+https://github.com/bkataru/comprezz.git
# Or use the CLI
$ git clone https://github.com/bkataru/comprezz.git
$ cd comprezz
$ zig build
$ ./zig-out/bin/comprezz your-file.txt -o compressed.gz
Resources:
The journey from understanding DEFLATE on paper to shipping a working implementation taught me that compression isn't magic - it's just clever bookkeeping with bits and bytes. And sometimes, that's the best kind of magic.
 
 
              

 
    
Top comments (0)