Most developers reach for gzip or zstd and move on. I wanted to understand why they work — and whether I could do better. Two years later, MaxCompression compresses English text better than PAQ8l, one of the legendary compressors in the field.
The numbers
On alice29.txt (152 KB of English text from the Canterbury corpus):
| Compressor | Compressed size | Ratio |
|---|---|---|
| gzip -9 | 54,179 bytes | 2.81× |
| bzip2 -9 | 43,202 bytes | 3.52× |
| xz -9 | 48,492 bytes | 3.14× |
| MaxCompression L28 | 35,497 bytes | 4.28× |
| PAQ8l | ~35,500 bytes | ~4.28× |
On the full Silesia corpus (202 MB of mixed real-world data), MaxCompression's automatic mode achieves 4.35× overall — beating bzip2 on all 12 files and xz on 9 out of 12.
How it works
MaxCompression isn't one algorithm. It's five compression engines under a unified API:
- LZ77 (Levels 1–9) — fast, for general use
- BWT + multi-table rANS (L10–L14) — Burrows-Wheeler with arithmetic coding
- Smart Mode (L20) — automatically picks the best strategy per block
- LZRC (L24–L26) — LZ with range coder for high-ratio binary compression
- Context Mixing (L28) — the crown jewel
Context mixing: the deep end
Context mixing is the technique used by the world's best compressors (cmix, PAQ8px, ZPAQ). The idea is simple: instead of using one model to predict the next byte, you use dozens of models and combine their predictions.
MaxCompression's CM engine uses:
- 58 context models — order-0 through order-14, word contexts, sparse match, indirect contexts, bigram frequency, character class N-grams, sentence position, and more
- 8 neural mixers — logit-space weighted averages that learn which models to trust
- 3-stage APM cascade — Adaptive Probability Maps that refine the final prediction
- Cross-term feature — a novel nonlinear combination of mixer outputs that gave a surprising -57 byte improvement
Each bit of the file is predicted using all 58 models. Their predictions are combined by the mixers, refined by the APM cascade, and fed to an arithmetic coder. The whole thing adapts as it reads the file.
What worked
- StateMap for match prediction — replacing a hardcoded log-confidence formula with a learned 64K StateMap gave -45 bytes instantly. Lesson: let the data learn what you think you can hardcode.
- Sequential split K-means for Huffman table initialization — this single change was the breakthrough that let BWT mode finally beat bzip2.
- Mixer weight distribution 7:1:1:2:1:4:2:8 — the first mixer (4096-entry, high resolution) gets the most weight, but the coarser mixers still help on transitions.
What didn't work
- Neural network mixer (4 hidden neurons): +1,285 bytes worse. Not enough data in 152 KB for the network to converge.
- LSTM cell: +84 bytes. Without proper backpropagation through time, it's just adding noise.
- 9th mixer: +574 bytes. Too sparse, too many parameters for the data to fill.
- Bracket depth model, dialog model, trigram frequency: all marginal or negative. The existing 58 models already capture these patterns.
The architecture has reached a plateau. Closing the gap to ZPAQ (31,200 bytes) or cmix (27,370 bytes) would require LSTM/transformer-based mixing — a fundamentally different approach, not parameter tuning.
Engineering for production
MaxCompression isn't a research toy. It's built for real use:
- Portable C99 — no dependencies, compiles everywhere
- 21 test suites — unit, roundtrip, fuzz, stress, regression, streaming, edge cases
- CI on every push — Linux (GCC + Clang), macOS, Windows, Valgrind, WASM
- Memory-safe — Valgrind memcheck with zero leaks in CI
- Prebuilt binaries — download and run on Linux, macOS, or Windows
-
Python and Rust bindings —
mcx_compress()from any language - 30+ CLI commands — compress, decompress, bench, stat, diff, verify, hash, pipe...
#include <maxcomp/maxcomp.h>
// Compress
size_t bound = mcx_compress_bound(src_size);
uint8_t* dst = malloc(bound);
size_t compressed = mcx_compress(dst, bound, src, src_size, 20);
// Decompress
mcx_frame_info info;
mcx_get_frame_info(&info, dst, compressed);
uint8_t* out = malloc(info.original_size);
size_t decompressed = mcx_decompress(out, info.original_size, dst, compressed);
The ranking
On alice29.txt, MaxCompression ranks approximately #4 worldwide among single-file compressors:
- cmix — ~27,370 bytes (5.56×)
- PAQ8px — ~29,370 bytes (5.18×)
- ZPAQ — ~31,200 bytes (4.87×)
- MaxCompression — 35,497 bytes (4.28×)
- PAQ8l — ~35,500 bytes (~4.28×)
The top 3 use LSTM/transformer-based approaches. MaxCompression achieves its result with classical context mixing only — no neural networks beyond the logit-space mixers.
Try it
# Build from source
git clone https://github.com/SamDreamsMaker/Max-Compression.git
cd Max-Compression
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)
# Compress a file
./build/bin/mcx compress -l 20 myfile.txt
# Benchmark against system compressors
./build/bin/mcx bench --compare myfile.txt
Or download prebuilt binaries from the latest release.
GitHub: https://github.com/SamDreamsMaker/Max-Compression
MaxCompression is GPL-3.0 licensed and developed by Dreams-Makers Studio. Feedback, issues, and contributions welcome.
I'm also building TaleForge, a free creative writing platform. Check it out if you write fiction, manga, or screenplays.
Top comments (0)