Beyond LZ4 Limits, Logging at high speed with on-the-fly compression

#c #performance #algorithms #cloud

An introduction, full article linked at the bottom.

TL;DR
We built Loggr: a tiny (170 KB, no external dependencies) native C logging library that preprocesses, batches and compresses logs at line rate. On a Lenovo P14s developer laptop (Ryzen 5 Pro, NVMe) we processed 250,000,000 synthetic web-style logs in 11.52 seconds (21.71 million logs/second), achieving roughly 5× end-to-end (to disk) compression (preprocessing + LZ4) while keeping RAM usage low and zero lost logs. This article explains the architecture, test methodology, exact parameters, benchmark data, limitations, and how to reproduce the tests.

How We Achieved on-the-fly 5× Log Compression Where LZ4 Alone Fails

The preprocessing trick that lets fast compression algorithms achieve heavy compression ratios

The Problem

Most logging systems assume cloud-era resources: unlimited CPU, RAM, and cheap storage. But what if you're running edge computing, IoT devices, or just want to keep cloud bills under control?

We started with a simple question: how many logs can you realistically process on consumer hardware before hitting a wall?

The Breakthrough

Instead of throwing raw logs at LZ4, we preprocess them first - transforming logs into a low-entropy format that compressors love.

Key innovations:

Smart preprocessing** reduces entropy before compression
Lock-free queues** handle 21M+ logs/sec without contention
Batch compression** finds longer patterns for better ratios
Temporal caching** leverages natural log patterns

The Numbers Don't Lie

Tested on a stock Lenovo P14s (Ryzen 5 Pro, NVMe SSD, 96GB RAM)

250 Million Logs - Multiple Configurations
6 threads, 2MB batches:
✅ 250M logs in 11.52 seconds
✅ 21.71 million logs/second -> To disk
✅ 5:1 compression ratio
✅ 105MB RAM footprint (stable)
✅ 0 lost logs

1 thread, 500KB batch (economy mode):
✅ 250M logs in 29.52 seconds
✅ 8.47 million logs/second -> To disk
✅ 4.6:1 compression ratio
✅ 16MB RAM footprint (stable)
✅ 0 lost logs

How It Works

The architecture is centered around ring buffers and lock-free queues.
Producer Threads → [Lock-free Queue] → Batch Builder → [LZ4 Compression] → Storage

Core architecture:

170KB C DLL - zero dependencies
AVX2-optimized code paths
Highly configurable memory footprint (20MB to GB+)
Live telemetry and atomic sequencing

Real-World Impact

Edge computing: Full logging on resource-constrained devices
Cost reduction: 80%+ savings on storage and egress fees
High-throughput systems: Maintain detailed logs without I/O bottlenecks
Security: Complete audit trails with minimal resources

The Bottom Line

This isn't just about faster compression - it's about rethinking logging as a data optimization problem. By moving intelligence upstream, we can handle orders of magnitude more data on the same hardware.

For the complete technical deep-dive with full benchmark methodology and API documentation, check out the full article on Medium:

Loggr: Processing 250M Logs in 11.5s on a Laptop with On-the-Fly 5× Compression

What logging challenges are you facing with your high-throughput applications?