The Problem That Shouldn't Exist
I was working on an IoT project—temperature sensors reporting values every second. Simple stuff:
22.5, 22.5, 22.6, 22.5, 22.5, 22.6, 22.6, 22.5, ...
Each value takes 8 bytes as a float64. That's 691 KB per sensor per day.
"No problem," I thought. "Let's compress it with gzip."
Result: ~400 KB. Only 42% compression.
Wait, what? The data is obviously redundant. A human can see the pattern instantly. Why can't gzip?
Shannon Was Right (Of Course)
Claude Shannon proved in 1948 that you can't compress data below its entropy. Period. No exceptions.
But here's the key insight Shannon himself noted: entropy depends on your model of the data.
If gzip treats 22.5, 22.5, 22.6 as arbitrary bytes, it sees one entropy. But if we model it as "temperature sensor with 0.1°C precision, typically stable", the entropy is much lower.
The trick isn't violating Shannon's theorem. It's building a better model.
The IoT Data Model
Real IoT sensor data has properties that generic compressors ignore:
- Temporal stability - Values change slowly
- Predictability - Next value ≈ current value
- Bounded range - Temperature won't jump from 22°C to 500°C
- Quantization - Sensors have finite precision (0.1°C)
An intelligent compressor that exploits these properties can dramatically outperform generic alternatives.
Enter ALEC
I built ALEC (Adaptive Lazy Evolving Compression) specifically for IoT data. The core ideas:
1. Delta Encoding
Instead of transmitting 22.5, transmit +0.0 (delta from last value).
Raw: 22.5 → 22.5 → 22.6 → 22.5
Delta: 22.5 → 0.0 → +0.1 → -0.1
Zero delta? That's 2 bits instead of 64.
2. Pattern Dictionary
Frequent values get short codes. After observing 22.5 appear 1000 times, it gets a 4-bit code instead of 64 bits.
3. Evolving Context
Encoder and decoder maintain synchronized context that improves over time:
Week 1: "temperature=22.3°C" → 20 bytes
Week 4: [code_7][+0.3] → 3 bytes
The Benchmark Results
I ran ALEC against gzip on real IoT data patterns. Here's what happened:
On Variable Data (SmartGrid current sensor, only 8.7% unchanged readings)
| Condition | ALEC | gzip | ALEC Advantage |
|---|---|---|---|
| Cold start | 10.9x | 5.1x | +113% |
| With preload | 22.1x | 8.0x | +177% |
The Warmup Curve
Here's where it gets interesting. ALEC dominates at every sample count:
Samples | gzip | ALEC
-----------|--------|-------
10 | 0.9x | 8.0x ← ALEC wins immediately
100 | 2.4x | 9.2x
1000 | 6.3x | 11.4x
5000 | 7.4x | 18.2x
8640 | 8.0x | 22.0x
At 10 samples, gzip can't compress at all (0.9x = expansion!). ALEC achieves 8x because it understands the data model from the start.
The Preload Secret
The key insight: preload eliminates warmup cost.
In production, you:
- Generate a preload file from historical data
- Ship identical preload to encoder and decoder
- Achieve near-optimal compression from byte one
Without preload, ALEC still beats gzip. With preload, it's not even close.
When NOT to Use ALEC
ALEC isn't magic. It won't help with:
- Random data - No patterns to learn
- Very short transmissions - <100 samples, warmup dominates
- Constant data - Trivially compressed by any codec
The sweet spot: long-running IoT streams with predictable patterns.
Try It Yourself
ALEC is open source (AGPL-3.0) and available on crates.io:
use alec::{Encoder, Decoder, Context};
let mut encoder = Encoder::new();
let mut decoder = Decoder::new();
let mut context = Context::new();
// Encode
let message = encoder.encode(&data, &context);
context.observe(&data);
// Decode
let decoded = decoder.decode(&message, &context)?;
📦 Crates.io: alec
🔗 GitHub: zeekmartin/alec-codec
🌐 Website: alec-codec.com
The Bottom Line
Shannon's theorem is inviolable. But Shannon also taught us that entropy depends on our model.
Generic compressors use generic models. Domain-specific compressors use domain-specific models.
For IoT data, that difference is 22x vs 8x compression.
"Every byte counts. Everywhere."
Discussion
Have you hit bandwidth limits with IoT data? What compression approaches have you tried? Let me know in the comments!
ALEC is dual-licensed: AGPL-3.0 for open source, commercial licenses available for proprietary use.

Top comments (0)