Philip McClarence

Posted on Mar 2

Why Your TimescaleDB Compression Ratio Is Bad (And How to Fix It)

#postgres #database #performance #tutorial

You turned on TimescaleDB compression expecting dramatic storage savings. The documentation promises 10-20x ratios. But your hypertable shows 2x -- or, in the worst case, the compressed data is actually larger than the original. What went wrong?

In nearly every case, the answer is one configuration decision: the segmentby column.

How segmentby Controls Compression Quality

TimescaleDB compression works by grouping rows into segments based on the value of the segmentby column. Within each segment, column-specific encoders (gorilla for floats, delta-of-delta for timestamps, dictionary for low-cardinality strings) compress the data by exploiting patterns.

The key principle: more rows per segment means better compression. The encoders need a meaningful number of data points to find patterns and achieve high ratios.

Consider a chunk with 1 million rows:

segmentby cardinality	Rows per segment	Expected compression ratio
1,000 unique values	1,000 rows/segment	10-20x
50 unique values	20,000 rows/segment	8-15x
1,000,000 unique values	1 row/segment	0.2x (data grows!)

When every row has a unique segmentby value, each segment contains a single row. There is nothing for the encoder to compress, and the per-segment metadata overhead actually increases total storage. A ratio below 1.0 means compression is making your data bigger.

Detecting Bad Compression Ratios

Check your current compression effectiveness:

SELECT
    hypertable_schema || '.' || hypertable_name AS hypertable,
    pg_size_pretty(before_compression_total_bytes) AS before_size,
    pg_size_pretty(after_compression_total_bytes) AS after_size,
    round(
        before_compression_total_bytes::numeric
        / NULLIF(after_compression_total_bytes, 0), 1
    ) AS compression_ratio,
    CASE
        WHEN before_compression_total_bytes::numeric
             / NULLIF(after_compression_total_bytes, 1) < 2
        THEN 'POOR -- review segmentby'
        ELSE 'OK'
    END AS verdict
FROM hypertable_compression_stats('your_hypertable');

Any ratio below 3x warrants investigation. Below 2x is actively wasteful.

Next, identify your current segmentby column and check its cardinality:

-- What is the segmentby column?
SELECT attname, segmentby_column_index
FROM timescaledb_information.compression_settings
WHERE hypertable_name = 'your_table'
  AND segmentby_column_index IS NOT NULL;

-- How many unique values does it have per chunk?
SELECT COUNT(DISTINCT your_segmentby_column) AS unique_values
FROM your_table
WHERE time_column >= now() - INTERVAL '1 day';

If unique values per chunk exceed 10,000, the segmentby cardinality is too high for effective compression.

The Three Compression Killers

1. Unique identifiers as segmentby

This is the worst possible choice. Using a UUID, auto-increment primary key, or any per-row-unique column creates one segment per row.

-- This produces terrible compression
ALTER TABLE readings SET (
    timescaledb.compress,
    timescaledb.compress_segmentby = 'reading_id'  -- unique per row!
);

TimescaleDB 2.24+ will actually warn you during compression:

WARNING: poor compression ratio detected for chunk "_hyper_173_14755_chunk"
DETAIL:  compression ratio: 0.18. Size before: 133 MB. After: 723 MB
HINT:  Changing compression settings can improve compression rate

A ratio of 0.18 means the compressed data is 5.4x larger than the original.

2. High-cardinality natural keys

Less obviously bad: columns that seem like reasonable partition keys but have too many unique values relative to your chunk size.

-- Seems reasonable, but 100K+ users with 1M rows/chunk
-- means only 10 rows per segment
ALTER TABLE events SET (
    timescaledb.compress,
    timescaledb.compress_segmentby = 'user_id'
);

Ten rows per segment is not enough data for the encoders to find meaningful patterns.

3. No segmentby at all

Omitting segmentby creates one giant segment per chunk. The compression ratio may be acceptable, but every query that filters on any column must decompress the entire chunk to find matching rows. This defeats one of compression's most valuable features: selective decompression, where only segments matching your filter criteria are decompressed.

Fixing Bad Compression

Step 1: Choose a better segmentby

Target a column with 100-10,000 unique values per chunk. Good candidates include:

device_id or sensor_id (IoT workloads)
host_id or server_id (infrastructure monitoring)
tenant_id (multi-tenant applications)
region or datacenter (geographic partitioning)

Step 2: Decompress, reconfigure, recompress

-- Decompress all chunks
SELECT decompress_chunk(chunk_schema || '.' || chunk_name)
FROM timescaledb_information.chunks
WHERE hypertable_name = 'your_table' AND is_compressed;

-- Change compression settings
ALTER TABLE your_table SET (
    timescaledb.compress,
    timescaledb.compress_segmentby = 'device_id',
    timescaledb.compress_orderby = 'timestamp DESC'
);

-- Recompress
SELECT compress_chunk(chunk_schema || '.' || chunk_name)
FROM timescaledb_information.chunks
WHERE hypertable_name = 'your_table' AND NOT is_compressed;

Important: Decompression requires enough free disk space to hold the uncompressed data temporarily. For very large tables, process one chunk at a time.

Step 3: Verify the improvement

The difference can be dramatic. In testing with identical data:

Configuration	Ratio	Result
segmentby = reading_id (unique per row)	0.18x	5.4x LARGER
segmentby = signal_strength (~50 values)	3-5x	Mediocre
segmentby = device_id (1,000 values)	10-20x	Excellent

Preventing the Problem on New Tables

Before enabling compression on any hypertable, always check the cardinality of your candidate segmentby column against one chunk's worth of data:

SELECT COUNT(DISTINCT candidate_column) AS cardinality
FROM your_table
WHERE time_column >= now() - INTERVAL '1 day';

Target the 100-10,000 range. If your natural partition key has too many unique values, consider a derived column. For example, user_id % 1000 creates 1,000 buckets from millions of users, giving the compression encoders enough rows per segment to work effectively.

Also be aware that compression quality can degrade over time. A table that compressed well at launch may deteriorate as data patterns change, new columns are added, or the cardinality of the segmentby column shifts. Periodic monitoring of compression ratios -- even a monthly check of hypertable_compression_stats() -- catches degradation before storage costs spiral.

DEV Community