DEV Community

Philip McClarence
Philip McClarence

Posted on

TimescaleDB Compression: A Complete Guide to 95%+ Storage Reduction

TimescaleDB Compression: A Complete Guide to 95%+ Storage Reduction

If you run time-series workloads on PostgreSQL, storage costs are probably your biggest operational headache. Raw heap tables grow relentlessly, and traditional approaches like partitioning or archiving only delay the inevitable. TimescaleDB's native columnstore compression offers a fundamentally different answer: keep all your data online, queryable, and compressed at ratios of 10-20x.

This guide breaks down exactly how it works, how to configure it, and what real-world results look like.

How Columnstore Compression Works Under the Hood

TimescaleDB compression transforms PostgreSQL's row-oriented heap storage into a column-oriented format. Rather than storing entire rows sequentially, it groups values by column and applies encoding strategies optimized for each data type:

  • Delta-of-delta encoding for timestamps. When your sensor reports every 10 seconds, the intervals are predictable. Only deviations from the expected pattern need storage, which means regular timestamps compress to almost nothing.
  • Gorilla encoding for floating-point values. Consecutive readings that change gradually -- temperature drifting from 20.1 to 20.3 degrees -- encode in just a few bits per value rather than the full 8 bytes.
  • Dictionary encoding for low-cardinality columns. Repeated values like status codes, device types, or region identifiers are replaced with compact lookup references.
  • LZ compression as a final pass over whatever the type-specific encoders produce.

The combination is devastatingly effective for typical time-series data: regular timestamps, slowly-changing measurements, and repeated categorical values.

The Two Configuration Decisions: segmentby and orderby

Getting compression right requires understanding two settings that control how data is organized before encoding.

segmentby -- Grouping Rows into Segments

The segmentby column determines how rows are grouped within each compressed chunk. Each unique value creates a separate segment, and queries that filter on this column decompress only the matching segments.

ALTER TABLE sensor_readings SET (
    timescaledb.compress,
    timescaledb.compress_segmentby = 'device_id',
    timescaledb.compress_orderby = 'reading_timestamp DESC'
);
Enter fullscreen mode Exit fullscreen mode

The ideal segmentby column has moderate cardinality -- roughly 100 to 10,000 unique values per chunk. Too few unique values means segments are too large (poor query selectivity). Too many means segments are too small (insufficient data for encoders to find patterns).

For an IoT table with 1,000 devices and 1-day chunks containing 8.6 million rows, each segment holds roughly 8,600 rows -- enough for the column encoders to work effectively.

orderby -- Sorting Within Segments

The orderby column controls the physical sort order inside each segment. Almost always, this should be your time column in descending order. This enables:

  • Efficient range scans on recent data, the most common query pattern
  • Improved delta encoding on the timestamp column itself
  • Implicit ordering that can eliminate explicit ORDER BY in queries

Adding an Automated Compression Policy

Configuring compression settings only declares how data should be compressed. To automate when it happens, you need a policy:

SELECT add_compression_policy('sensor_readings',
    compress_after => INTERVAL '1 hour');
Enter fullscreen mode Exit fullscreen mode

The compress_after interval controls how long chunks stay in uncompressed row format before the background worker compresses them. Shorter intervals save more space; longer intervals keep recent data in row format for faster point writes and updates.

For most workloads, one hour is a solid starting point. If you frequently query very recent data and need minimal latency, consider extending to 4 hours or even 1 day.

What Real-World Compression Looks Like

Against a test dataset of 1 million IoT sensor readings (temperature, humidity, pressure, battery voltage, signal strength) across 100 devices with 1-day chunks:

Metric Value
Before compression ~120 MB
After compression ~8 MB
Compression ratio 15x
Space reduction 93%

Float columns compress best because Gorilla encoding exploits their slow-changing nature. The timestamp column compresses to near-zero due to its regular 1-minute intervals.

Query Performance on Compressed Data

A common worry is that compression trades query speed for storage savings. In practice, analytical queries on compressed data often run faster than on uncompressed data.

SELECT device_id, AVG(temperature), COUNT(*)
FROM sensor_readings
WHERE reading_timestamp >= now() - INTERVAL '7 days'
GROUP BY device_id;
Enter fullscreen mode Exit fullscreen mode

TimescaleDB's ColumnarScan operator reads only the columns referenced by the query. For a SELECT AVG(temperature) on a 7-column table, the engine touches roughly 14% of the data compared to a full heap scan. Point queries filtered on the segmentby column decompress only the matching segment -- one segment of 8,600 rows instead of scanning millions.

Common Pitfalls to Avoid

Inserting into compressed chunks. TimescaleDB supports DML on compressed data, but each operation triggers a decompress-modify-recompress cycle on the affected segment. Keep your active chunk uncompressed by setting compress_after to at least one chunk interval.

Schema changes on compressed tables. Adding or removing columns requires decompressing all chunks first. Plan schema migrations during maintenance windows and budget time for the decompression step.

Missing compression policy. This is the most common mistake. Running ALTER TABLE ... SET (timescaledb.compress) configures compression but does not schedule it. Without add_compression_policy(), nothing is ever compressed automatically.

Storage Management as a Pipeline

The most effective TimescaleDB storage strategy combines three automated policies:

  1. Compression (add_compression_policy) to shrink data 10-20x after it ages past the active write window
  2. Retention (add_retention_policy) to drop chunks beyond your retention horizon -- compressed chunk drops are instant with no decompression required
  3. Monitoring to catch compression ratio degradation, policy failures, or uncompressed chunk backlogs before they become emergencies

When all three are working together, storage management becomes a hands-off pipeline rather than a manual chore. Monitor your compression ratios continuously -- a sudden drop often signals a schema change, a new high-cardinality column, or a misconfigured segmentby that deserves attention.

Top comments (0)