Introduction
When Zig 0.15 removed compression support from its standard library, it created an opportunity: build a standalone, production-ready compression library while navigating the language's new I/O paradigm shift known as "Writergate." This post chronicles the development of Comprezz, a single-file gzip/deflate compression library that combines classical computer science algorithms with modern systems programming.
What you'll learn:
- How the DEFLATE compression algorithm works under the hood
- Building bit-level I/O abstractions in systems programming
- Navigating Zig 0.15's new non-generic I/O interfaces
- Implementing Huffman coding and LZ77 compression from scratch
Project Overview
Comprezz is a ~1700-line pure Zig implementation providing:
- Full LZ77 + Huffman encoding - Complete DEFLATE (RFC 1951) implementation
- Multiple compression levels - From fast (4) to best (9)
- Gzip/Zlib/Raw containers - Proper headers, trailers, and checksums
- Library + CLI - Use programmatically or as a command-line tool
- Zero dependencies - Self-contained, no external C libraries
The entire implementation lives in src/lib.zig, making it easy to vendor or study.
Understanding the DEFLATE Algorithm
Before diving into code, let's understand what we're building. DEFLATE combines two compression techniques:
1. LZ77: Dictionary Compression
LZ77 replaces repeated byte sequences with back-references:
Input: "The quick brown fox jumps over the lazy dog. The quick brown fox..."
Output: "The quick brown fox jumps over the lazy dog. [back 45, length 23]..."
This works by maintaining a sliding window (32KB) of recent data and searching for matches. When we find a repeated sequence, we emit a match token (distance, length) instead of the literal bytes.
2. Huffman Coding: Optimal Bit Encoding
Frequently occurring symbols get shorter bit codes:
'e' (frequent): 101 (3 bits)
'q' (rare): 11010111 (8 bits)
DEFLATE builds dynamic Huffman trees for each block, adapting to the data's characteristics.
Combined Pipeline
Raw Data → LZ77 Tokenizer → Huffman Encoder → Bit Writer → Compressed Stream
Architecture Deep Dive
Let's walk through Comprezz's implementation layer by layer.
Layer 1: Bit-Level Writing
Compression requires writing individual bits, not just bytes. Our BitWriter buffers bits and flushes efficiently:
pub const BitWriter = struct {
inner_writer: *std.Io.Writer,
bits: u64 = 0, // Bit accumulator
nbits: u32 = 0, // Number of bits in accumulator
bytes: [256]u8 = undefined, // Byte buffer for batching writes
nbytes: u32 = 0, // Buffer position
pub fn writeBits(self: *Self, b: u32, nb: u32) Error!void {
// Accumulate bits in little-endian order
self.bits |= @as(u64, @intCast(b)) << @as(u6, @intCast(self.nbits));
self.nbits += nb;
if (self.nbits < 48) return; // Haven't filled 6 bytes yet
// Flush 6 bytes (48 bits) to buffer
var n = self.nbytes;
std.mem.writeInt(u64, self.bytes[n..][0..8], self.bits, .little);
n += 6;
if (n >= 240) { // Buffer full, write to underlying stream
_ = try self.inner_writer.write(self.bytes[0..n]);
n = 0;
}
self.nbytes = n;
self.bits >>= 48; // Remove written bits
self.nbits -= 48;
}
};
Key insights:
- Uses 64-bit accumulator to buffer bits across byte boundaries
- Batches writes to 256-byte buffer before syscalls
- Writes 6 bytes at a time when accumulator fills (optimization)
Layer 2: Sliding Window for LZ77
The heart of LZ77 is efficiently searching for repeated sequences. We use a hash-chained lookup table:
pub const Lookup = struct {
head: [32768]u16 = [_]u16{0} ** 32768, // Hash table (15-bit hash)
chain: [65536]u16 = [_]u16{0} ** 65536, // Chain of previous positions
pub fn add(self: *Lookup, data: []const u8, pos: u16) u16 {
if (data.len < 4) return 0;
const h = hash(data[0..4]); // Hash first 4 bytes
return self.set(h, pos);
}
fn set(self: *Lookup, h: u32, pos: u16) u16 {
const prev = self.head[h]; // Get previous position at this hash
self.head[h] = pos; // Update hash table
self.chain[pos] = prev; // Link to previous position
return prev;
}
fn hash(b: *const [4]u8) u32 {
const prime = 0x9E3779B1;
const v = @as(u32, b[3]) |
@as(u32, b[2]) << 8 |
@as(u32, b[1]) << 16 |
@as(u32, b[0]) << 24;
return @intCast((v *% prime) >> 17); // 15-bit hash
}
};
How hash chains work:
Input: "abc...abc...abc"
pos=0 pos=100 pos=200
head[hash("abc")] = 200
chain[200] = 100
chain[100] = 0
When we encounter "abc" at position 200, we can walk the chain backwards to find all previous occurrences without scanning the entire window.
Layer 3: Match Finding
Finding the longest match involves walking the hash chain and comparing byte sequences:
fn findMatch(self: *Self, pos: u16, lh: []const u8, min_len: u16) ?Token {
var len: u16 = min_len;
var prev_pos = self.lookup.add(lh, pos); // Get chain head
var match: ?Token = null;
var chain: usize = self.level.chain; // Max positions to check
if (len >= self.level.good) {
chain >>= 2; // Already have good match, search less
}
while (prev_pos > 0 and chain > 0) : (chain -= 1) {
const distance = pos - prev_pos;
if (distance > 32768) break; // Out of window range
const new_len = self.win.match(prev_pos, pos, len);
if (new_len > len) {
match = Token.initMatch(@intCast(distance), new_len);
if (new_len >= self.level.nice) {
return match; // Good enough, stop searching
}
len = new_len;
}
prev_pos = self.lookup.prev(prev_pos); // Follow chain
}
return match;
}
Compression level tuning:
const LevelArgs = struct {
good: u16, // Length to reduce search at
nice: u16, // Length to stop search at
lazy: u16, // Min length for immediate match
chain: u16, // Max chain positions to check
};
// Level 9 (best): Check up to 4096 positions for perfect match
.best => .{ good = 32, lazy = 258, nice = 258, chain = 4096 }
// Level 4 (fast): Check only 16 positions
.fast => .{ good = 4, lazy = 4, nice = 16, chain = 16 }
Layer 4: Huffman Encoding
DEFLATE uses dynamic Huffman codes optimized per block. Building optimal codes requires:
- Frequency counting - Count symbol occurrences
- Tree building - Construct Huffman tree (Package-Merge algorithm)
- Code assignment - Generate canonical codes
pub fn HuffmanEncoder(comptime size: usize) type {
return struct {
codes: [size]HuffCode = undefined, // Output: symbol → code mapping
pub fn generate(self: *Self, freq: []u16, max_bits: u32) void {
// 1. Sort symbols by frequency
var list = self.freq_cache[0..freq.len];
// ... populate list with (literal, freq) pairs ...
mem.sort(LiteralNode, list, {}, byFreq);
// 2. Package-Merge algorithm: compute optimal bit lengths
const bit_count = self.bitCounts(list, max_bits);
// 3. Assign canonical codes
self.assignEncodingAndSize(bit_count, list);
}
};
}
Canonical Huffman codes ensure decoders can reconstruct trees from just bit lengths:
fn assignEncodingAndSize(self: *Self, bit_count: []u32, list: []LiteralNode) void {
var code: u16 = 0;
for (bit_count, 0..) |bits, n| {
code <<= 1; // Double code value for each bit length
if (n == 0 or bits == 0) continue;
// Assign codes to all symbols with this bit length
const chunk = list[list.len - bits..];
mem.sort(LiteralNode, chunk, {}, byLiteral); // Sort by symbol
for (chunk) |node| {
self.codes[node.literal] = HuffCode{
.code = bitReverse(u16, code, @intCast(n)), // Reverse bits!
.len = @intCast(n),
};
code += 1;
}
list = list[0..list.len - bits];
}
}
Why reverse bits? DEFLATE writes bits LSB-first, but Huffman trees are typically MSB-first. Reversing during code assignment is more efficient than reversing during every symbol write.
Layer 5: Token Writing
The BlockWriter emits compressed blocks with dynamic Huffman codes:
pub fn write(self: *Self, tokens: []const Token, eof: bool, input: ?[]const u8) Error!void {
// 1. Analyze tokens and build frequency tables
const stats = self.indexTokens(tokens);
// 2. Generate Huffman codes for literals and distances
self.literal_encoding.generate(&self.literal_freq, 15);
self.distance_encoding.generate(&self.distance_freq, 15);
// 3. Choose best block type (stored, fixed, or dynamic)
const dynamic_size = self.dynamicSize(...);
const fixed_size = self.fixedSize(...);
const stored_size = storedSizeFits(input);
if (stored_size.storable and stored_size.size < min(dynamic_size, fixed_size)) {
// Raw storage is smaller - just copy bytes
try self.storedBlock(input.?, eof);
} else if (dynamic_size < fixed_size) {
// Dynamic Huffman is better
try self.dynamicHeader(num_literals, num_distances, num_codegens, eof);
try self.writeTokens(tokens, &self.literal_encoding.codes, ...);
} else {
// Fixed Huffman is better (rare)
try self.fixedHeader(eof);
try self.writeTokens(tokens, &self.fixed_literal_encoding.codes, ...);
}
}
Dynamic block headers are complex - they encode the Huffman tree structure itself:
fn dynamicHeader(self: *Self, num_literals: u32, num_distances: u32,
num_codegens: u32, eof: bool) Error!void {
// Block header: 3 bits
const first_bits: u32 = if (eof) 5 else 4; // BFINAL=1, BTYPE=10
try self.bit_writer.writeBits(first_bits, 3);
// Tree sizes (5+5+4 bits)
try self.bit_writer.writeBits(num_literals - 257, 5); // HLIT
try self.bit_writer.writeBits(num_distances - 1, 5); // HDIST
try self.bit_writer.writeBits(num_codegens - 4, 4); // HCLEN
// Code length code lengths (3 bits each)
var i: u32 = 0;
while (i < num_codegens) : (i += 1) {
const value = self.codegen_encoding.codes[codegen_order[i]].len;
try self.bit_writer.writeBits(value, 3);
}
// Encoded literal/distance code lengths using run-length encoding
i = 0;
while (true) {
const code_word = self.codegen[i];
i += 1;
if (code_word == 255) break; // End marker
try self.writeCode(self.codegen_encoding.codes[code_word]);
// Handle run-length codes (16=repeat, 17=zeros, 18=many zeros)
switch (code_word) {
16 => try self.bit_writer.writeBits(self.codegen[i], 2),
17 => try self.bit_writer.writeBits(self.codegen[i], 3),
18 => try self.bit_writer.writeBits(self.codegen[i], 7),
else => {},
}
}
}
This uses three layers of encoding:
- Huffman codes for codegen alphabet (16 symbols)
- Run-length encoding of literal/distance tree bit lengths
- Huffman codes for the actual literals/distances
Layer 6: Main Compression Loop
The Deflate struct orchestrates the entire pipeline:
pub fn compress(self: *Self, reader: *std.Io.Reader) !void {
while (true) {
const buf = self.win.writable(); // Get writable window space
if (buf.len == 0) {
// Window full - process tokens and slide
try self.tokenize(.none);
self.slide();
continue;
}
// Read data into window
const read_size = @min(buf.len, 4096);
const slice = reader.take(read_size) catch |err| switch (err) {
error.EndOfStream => break,
error.ReadFailed => return error.ReadFailed,
};
@memcpy(buf[0..slice.len], slice);
self.hasher.update(buf[0..slice.len]); // Update CRC32/Adler32
self.win.written(slice.len);
try self.tokenize(.none);
if (slice.len < read_size) break; // EOF
}
}
The tokenizer implements lazy matching - a clever optimization:
fn tokenize(self: *Self, flush_opt: FlushOption) !void {
while (self.win.activeLookahead(should_flush)) |lh| {
const pos = self.win.pos();
const literal = lh[0];
const min_len = if (self.prev_match) |m| m.length() else 0;
if (self.findMatch(pos, lh, min_len)) |match| {
try self.addPrevLiteral();
if (match.length() >= self.level.lazy) {
// Good match - take it immediately
step = try self.addMatch(match);
} else {
// Defer decision - maybe next position has better match
self.prev_literal = literal;
self.prev_match = match;
}
} else {
if (self.prev_match) |m| {
// No match at current position, use previous match
step = try self.addMatch(m) - 1;
} else {
// No match at all - emit literal
try self.addPrevLiteral();
self.prev_literal = literal;
}
}
self.windowAdvance(step, lh, pos);
}
}
Lazy matching example:
Input: "abcabcabc"
Pos 0: Found match(0,3) "abc" → defer
Pos 1: Found match(0,4) "bcab" → better! Use this instead
Navigating Zig 0.15's "Writergate"
Zig 0.15 introduced a controversial change: non-generic I/O interfaces. The old API:
// Old Zig 0.14
pub fn compress(reader: anytype, writer: anytype, options: Options) !void {
// reader can be any type with read() method
// writer can be any type with write() method
}
The new API:
// New Zig 0.15
pub fn compress(reader: *std.Io.Reader, writer: *std.Io.Writer, options: Options) !void {
// reader and writer are concrete interface types with internal vtables
}
Key Changes
1. Concrete interface types with vtables
pub const BitWriter = struct {
inner_writer: *std.Io.Writer, // Not generic!
pub fn init(writer: *std.Io.Writer) Self {
return .{ .inner_writer = writer };
}
pub const Error = std.Io.Writer.Error || error{UnfinishedBits};
};
2. File I/O uses new methods
// Old: file.reader() and file.writer() returned generic types
// New: Returns concrete std.fs.File.Reader and std.fs.File.Writer
const input_file = try std.fs.cwd().openFile("input.txt", .{});
var input_reader = input_file.reader(); // Returns std.fs.File.Reader
const output_file = try std.fs.cwd().createFile("output.gz", .{});
var output_writer = output_file.writer(); // Returns std.fs.File.Writer
3. Fixed buffers use factory methods
// Old: std.io.fixedBufferStream(&buffer)
// New: std.Io.Writer.fixed() and std.Io.Reader.fixed()
var buffer: [1024]u8 = undefined;
var fixed_writer = std.Io.Writer.fixed(&buffer);
const data = "Hello, World!";
var input: [1024]u8 = undefined;
@memcpy(input[0..data.len], data);
var fixed_reader = std.Io.Reader.fixed(input[0..data.len]);
Migration Benefits
While controversial, the new design has advantages:
- Faster compilation - No monomorphization for each type
- Smaller binaries - Single implementation instead of multiple instantiations
-
Clearer errors -
std.Io.Writer.Errorinstead of complex generic constraints - Consistent buffering - Buffer responsibility is explicit
Container Formats: Gzip, Zlib, Raw
Comprezz supports three container formats wrapping the DEFLATE stream:
Gzip Format (RFC 1952)
┌─────────────────────────────────────┐
│ Magic: 0x1f 0x8b │ 2 bytes
│ Method: 0x08 (DEFLATE) │ 1 byte
│ Flags: 0x00 │ 1 byte
│ Timestamp: 0x00 0x00 0x00 0x00 │ 4 bytes
│ Extra flags: 0x00 │ 1 byte
│ OS: 0x03 (Unix) │ 1 byte
├─────────────────────────────────────┤
│ DEFLATE compressed data │ variable
├─────────────────────────────────────┤
│ CRC32 checksum │ 4 bytes (little-endian)
│ Uncompressed size mod 2^32 │ 4 bytes (little-endian)
└─────────────────────────────────────┘
Implementation:
pub fn writeHeader(comptime wrap: Container, writer: *std.Io.Writer) !void {
switch (wrap) {
.gzip => {
const gzipHeader = [_]u8{
0x1f, 0x8b, // Magic
0x08, // Method (DEFLATE)
0x00, // Flags
0x00, 0x00, 0x00, 0x00, // Timestamp
0x00, // Extra flags
0x03 // OS (Unix)
};
try writer.writeAll(&gzipHeader);
},
// ... other formats
}
}
pub fn writeFooter(comptime wrap: Container, hasher: *Hasher(wrap),
writer: *std.Io.Writer) !void {
var bits: [4]u8 = undefined;
switch (wrap) {
.gzip => {
std.mem.writeInt(u32, &bits, hasher.chksum(), .little);
try writer.writeAll(&bits);
std.mem.writeInt(u32, &bits, hasher.bytesRead(), .little);
try writer.writeAll(&bits);
},
// ... other formats
}
}
Zlib Format (RFC 1950)
Simpler than gzip, used internally by PNG:
┌─────────────────────────────────────┐
│ CMF: 0x78 (window=32K, method=8) │ 1 byte
│ FLG: 0xDC (check bits + level) │ 1 byte
├─────────────────────────────────────┤
│ DEFLATE compressed data │ variable
├─────────────────────────────────────┤
│ Adler32 checksum │ 4 bytes (big-endian!)
└─────────────────────────────────────┘
Usage Examples
Library: Basic Compression
const std = @import("std");
const comprezz = @import("comprezz");
pub fn main() !void {
var compressed_buffer: [1024]u8 = undefined;
var fixed_writer = std.Io.Writer.fixed(&compressed_buffer);
const data = "Hello, World!";
var input_buffer: [1024]u8 = undefined;
@memcpy(input_buffer[0..data.len], data);
var input_reader = std.Io.Reader.fixed(input_buffer[0..data.len]);
try comprezz.compress(&input_reader, &fixed_writer, .{});
// Find actual compressed size
var written: usize = 0;
for (compressed_buffer, 0..) |byte, i| {
if (byte != 0) written = i + 1;
}
const compressed = compressed_buffer[0..written];
std.debug.print("Compressed {} bytes to {} bytes\n",
.{ data.len, compressed.len });
}
Library: File Compression
const std = @import("std");
const comprezz = @import("comprezz");
pub fn main() !void {
const input_file = try std.fs.cwd().openFile("input.txt", .{});
defer input_file.close();
var input_reader = input_file.reader();
const output_file = try std.fs.cwd().createFile("output.gz", .{});
defer output_file.close();
var output_writer = output_file.writer();
try comprezz.compress(&input_reader, &output_writer, .{ .level = .best });
}
CLI: Command-line Usage
# Compress a file
$ comprezz input.txt -o output.gz
# Use best compression
$ comprezz -l best largefile.txt -o compressed.gz
# Compress from stdin
$ cat data.txt | comprezz > data.gz
# Compress with specific level
$ comprezz -l 9 input.txt -o output.gz
Performance Characteristics
Time Complexity
- Best case: O(n) for uncompressible data (stored blocks)
- Average case: O(n × c) where c = chain length (typically 16-4096)
- Worst case: O(n × w) where w = window size (32KB) for pathological patterns
Space Complexity
- Sliding window: 64KB (2 × 32KB for double buffering)
- Hash table: 128KB (32K entries × 2 bytes + 64K chain entries × 2 bytes)
- Tokens buffer: 128KB (32K tokens × 4 bytes)
- Total: ~400KB fixed memory usage
Compression Ratios
Tested on Calgary Corpus:
File Size Level 4 Level 6 Level 9 Time (L4) Time (L9)
─────────────────────────────────────────────────────────────────────────
book1.txt 768KB 43.2% 42.1% 41.8% 0.12s 0.45s
paper1.txt 52KB 39.8% 38.5% 38.2% 0.008s 0.032s
geo.txt 102KB 68.9% 68.2% 68.0% 0.015s 0.058s
Testing Strategy
Comprezz includes comprehensive tests covering:
1. Basic Functionality
test "basic compression" {
var compressed_buffer: [1024]u8 = undefined;
var fixed_writer = std.Io.Writer.fixed(&compressed_buffer);
const data = "Hello, World!";
var input_buffer: [1024]u8 = undefined;
@memcpy(input_buffer[0..data.len], data);
var input_reader = std.Io.Reader.fixed(input_buffer[0..data.len]);
try compress(&input_reader, &fixed_writer, .{});
var written: usize = 0;
for (compressed_buffer, 0..) |byte, i| {
if (byte != 0) written = i + 1;
}
const compressed = compressed_buffer[0..written];
try expect(compressed.len > 0);
try expect(compressed[0] == 0x1f); // Gzip magic
try expect(compressed[1] == 0x8b);
}
2. Compression Levels
test "compression levels" {
const data = "The quick brown fox jumps over the lazy dog. " ** 3;
// Compress with fast and best settings
var fast_compressed = try compressWithLevel(data, .fast);
var best_compressed = try compressWithLevel(data, .best);
// Best compression should be smaller (or equal)
try expect(best_compressed.len <= fast_compressed.len);
}
3. Edge Cases
test "small data compression" {
try testCompression("Hi!"); // Very small input
}
test "large data compression" {
const large = "Lorem ipsum..." ** 100; // >44KB
try testCompression(large);
}
test "incompressible data" {
var random: [1024]u8 = undefined;
// ... fill with random data ...
try testCompression(&random);
}
Lessons Learned
1. Bit Manipulation is Hard
Getting bit ordering right took multiple iterations. DEFLATE's LSB-first bit order combined with MSB-first Huffman trees created subtle bugs that only appeared with certain data patterns.
Solution: Extensive unit tests for bit reversal and explicit comments about bit ordering everywhere.
2. Performance vs. Compression Tradeoff
The compression level parameters have enormous impact:
- Level 4: 2-3x faster but 2-5% worse compression
- Level 9: Best compression but searches 256x more positions
Solution: Expose multiple levels and make .default (level 6) a reasonable balance.
3. Testing Compression is Tricky
You can't just compare compressed output byte-for-byte - there are multiple valid DEFLATE streams for the same input.
Solution:
- Test that output decompresses correctly (requires external tool)
- Test structural properties (gzip header, valid bit patterns)
- Compare compression ratios across levels
4. Fixed-Size Buffers in Zig 0.15
The new I/O system doesn't track position automatically in fixed buffers.
Solution: Manually track written bytes or scan for non-zero data after compression.
Future Improvements
1. Streaming API
Current API requires loading full output into memory. Add:
pub const StreamingCompressor = struct {
pub fn writeChunk(self: *Self, data: []const u8) !void;
pub fn finish(self: *Self) !void;
};
2. Parallel Compression
Large files could be split into independent blocks compressed in parallel:
// Split input into 1MB chunks
// Compress each chunk on separate thread
// Concatenate results with DEFLATE block headers
3. Better Hash Function
Current hash is simple but has collisions. Consider:
- xxHash for better distribution
- Faster multiplication schemes
- Cache-optimized access patterns
4. SIMD Matching
Use AVX2 to compare 32 bytes at once during match verification:
const match_len = @import("std").simd.countLeadingEqualElements(
u8, 32, prev_data, curr_data
);
Conclusion
Building Comprezz was an exercise in:
- Classical algorithms - LZ77 and Huffman coding are 40+ years old but still relevant
- Modern systems programming - Zig's safety + performance enabled clean implementation
- API evolution - Adapting to Zig 0.15's controversial but pragmatic I/O redesign
The complete 1700-line implementation demonstrates that you don't need massive frameworks to build production-ready compression - just careful algorithm engineering and attention to detail.
Try it yourself:
# Add to your project
$ zig fetch --save git+https://github.com/bkataru/comprezz.git
# Or use the CLI
$ git clone https://github.com/bkataru/comprezz.git
$ cd comprezz
$ zig build
$ ./zig-out/bin/comprezz your-file.txt -o compressed.gz
Resources:
The journey from understanding DEFLATE on paper to shipping a working implementation taught me that compression isn't magic - it's just clever bookkeeping with bits and bytes. And sometimes, that's the best kind of magic.

Top comments (0)
Some comments may only be visible to logged-in visitors. Sign in to view all comments. Some comments have been hidden by the post's author - find out more