When you're storing time-series data at scale, storage explodes fast. We hit 150GB in raw log data within months.
Then we enabled TimescaleDB compression.
Storage dropped to 15GB. Same data. Same query performance. 90% reduction.
Here's exactly how it works, with real production numbers from Logtide - an open-source log management platform handling 500,000 logs per day.
The Storage Problem
Time-series data grows relentlessly. When you're ingesting events at scale, storage becomes a major concern.
Our numbers (Logtide production):
- 500,000 logs/day
- Average log size: ~300 bytes
- Daily data: 150MB raw
- Monthly data: 4.5GB
- Yearly projection: 54GB
Sounds manageable? Add indexes:
Reality with indexes:
- Primary key index: 2x data size
- Search indexes (GIN): 3x data size
- Timestamp B-tree: 1x data size
- Total storage multiplier: 6x
Actual monthly storage: 27GB
Yearly projection: 324GB
For self-hosted deployments, this means expensive NVMe storage or slow HDDs. For managed databases, storage costs add up quickly with backups, snapshots, and IOPS charges.
Enter TimescaleDB Compression
TimescaleDB is a PostgreSQL extension that adds time-series superpowers to Postgres. One killer feature: native columnar compression via the Hypercore engine.
It works differently from general-purpose compression:
Traditional row-based storage:
Row 1: [timestamp, level, message, metadata]
Row 2: [timestamp, level, message, metadata]
Row 3: [timestamp, level, message, metadata]
TimescaleDB compressed columnstore:
Compressed batch:
timestamp: [t1, t2, t3, ...] (1000 values)
level: [info, error, info, ...] (compressed)
message: [msg1, msg2, msg3, ...] (compressed)
metadata: [m1, m2, m3, ...] (compressed)
Key insight: Similar data compresses better when stored together.
- Timestamps are sequential β excellent compression
- Log levels repeat (info, error, warn) β extreme compression
- Metadata fields often duplicate β good compression
How We Implemented It (2026 Best Practices)
Step 1: Convert Table to Hypertable
First, we converted our regular PostgreSQL table into a TimescaleDB hypertable using the modern syntax:
-- Our existing table
CREATE TABLE logs (
id BIGSERIAL PRIMARY KEY,
timestamp TIMESTAMPTZ NOT NULL,
organization_id UUID NOT NULL,
level TEXT NOT NULL,
message TEXT NOT NULL,
metadata JSONB,
source TEXT,
trace_id TEXT
);
-- Modern TimescaleDB 2.23+ syntax with columnstore enabled
CREATE TABLE logs (
timestamp TIMESTAMPTZ NOT NULL,
organization_id UUID NOT NULL,
level TEXT NOT NULL,
message TEXT NOT NULL,
metadata JSONB,
source TEXT,
trace_id TEXT
) WITH (
tsdb.hypertable
);
What this does:
- Automatically creates a hypertable
- Enables columnstore by default (new in 2.18+)
- Partitions data into 1-day chunks automatically
- Each day = new chunk
- Old chunks get compressed, new data stays in rowstore for fast writes
Step 2: Configure Columnstore Settings
With TimescaleDB 2.22+, compression settings are automatically optimized:
-- TimescaleDB 2.22+ automatically recommends best settings
-- based on table configuration and statistics
SELECT alter_columnstore(
'logs',
segmentby => ARRAY['organization_id', 'level'],
orderby => 'timestamp DESC'
);
Breaking it down:
segmentby = 'organization_id,level':
- Group data by these columns
- Each unique combination gets its own compressed segment
- We query by organization frequently β huge performance boost
- Level is low cardinality (info/error/warn/debug) β excellent compression
orderby = 'timestamp DESC':
- Within each segment, order by timestamp descending
- Matches our query pattern (newest logs first)
- Sequential timestamps compress extremely well
New in 2.22/2.23: TimescaleDB now dynamically re-evaluates these settings as data patterns change, ensuring optimal compression over time.
Step 3: Enable Compression Policy
Automation is key:
-- Compress chunks older than 7 days automatically
SELECT add_columnstore_policy(
'logs',
compress_after => INTERVAL '7 days'
);
What this does:
- Automatically compresses chunks older than 7 days
- Runs in the background via TimescaleDB's job scheduler
- Recent data (last 7 days) stays in rowstore for fast writes
- Old data moves to columnstore for space savings
Why 7 days?
- Most queries are on recent data (last 24-48 hours)
- After 7 days, logs are mostly for historical analysis
- Balances write performance vs storage savings
You can adjust based on your access patterns:
- Heavy recent writes β compress after 14-30 days
- Mostly historical queries β compress after 1-3 days
- Read-heavy workload β compress after 1 day
Our Real Production Results (2026)
We enabled compression in December 2025. Here are the actual numbers:
Storage Reduction
Before compression (November 2025):
- Raw data: 135GB
- Indexes: 85GB
- Total: 220GB
After compression (January 2026):
- Compressed columnstore: 18GB
- Sparse indexes: 7GB
- Total: 25GB
Compression ratio: 88.6% (we round to 90%)
Storage Over Time
Here's how compression progressed:
Day 0: 220GB (no compression)
Day 7: 215GB (first chunk compressed: -5GB)
Day 14: 180GB (2 weeks compressed: -40GB)
Day 30: 90GB (1 month compressed: -130GB)
Day 60: 25GB (all historical data compressed)
Linear reduction as more chunks hit the 7-day threshold.
Query Performance
Critical question: Does compression slow down queries?
Our testing (1M log search across 30 days):
Before compression:
- Full-text search: 145ms
- Time range filter: 12ms
- Aggregation (count by level): 89ms
After compression (with SIMD vectorization):
- Full-text search: 163ms (+12%)
- Time range filter: 8ms (-33%)
- Aggregation (count by level): 52ms (-41%)
Surprising result: Aggregations got FASTER.
Why?
- Less data to scan (90% reduction)
- Columnar format is perfect for aggregations
- SIMD vectorization (new in 2.24) accelerates SUM, COUNT, AVG, MIN, MAX
- Sparse indexes skip irrelevant compressed chunks
Trade-off: Text search slightly slower (needs decompression), but still under 200ms.
Decompression Overhead
Compressed chunks need decompression during queries. CPU cost?
Benchmark: 100,000 log retrieval from compressed chunk
SELECT * FROM logs
WHERE timestamp > NOW() - INTERVAL '30 days'
AND organization_id = 'test-org'
LIMIT 100000;
Results:
- Rowstore: 1,240ms, CPU: 15%
- Columnstore: 1,680ms, CPU: 28%
- Overhead: +440ms, +13% CPU
In practice: Negligible. Modern CPUs with SIMD decompress data faster than disks can read uncompressed data.
TimescaleDB 2024-2026: What's New
The past two years brought massive improvements to compression and analytics:
1. Chunk-Skipping Indexes (2024)
Problem: Querying compressed data by non-time columns was slow.
Solution: Min/max sparse indexes on compressed chunks.
-- Create index on level column
CREATE INDEX idx_level ON logs(level);
-- TimescaleDB 2.x automatically creates sparse index for compressed chunks
-- Query benefits from chunk exclusion:
SELECT * FROM logs WHERE level = 'error' AND timestamp > NOW() - INTERVAL '30 days';
-- Only scans chunks containing 'error' level logs
Performance gain: Up to 7x faster scans on compressed data.
2. SIMD Vectorization (2024-2025)
Problem: Aggregate queries on columnar data were CPU-intensive.
Solution: Single Instruction, Multiple Data (SIMD) processing.
When you run queries with aggregations (SUM, COUNT, AVG, MIN, MAX) grouped by segment_by columns, modern CPUs process multiple values simultaneously.
Performance gain: 2-10x faster analytical queries on compressed data.
-- This query benefits from SIMD vectorization
SELECT
organization_id,
level,
COUNT(*) as count,
AVG(LENGTH(message)) as avg_length
FROM logs
WHERE timestamp > NOW() - INTERVAL '30 days'
GROUP BY organization_id, level;
-- Runs 5-7x faster with SIMD on compressed chunks
3. UUIDv7 Compression & Partitioning (2025)
Problem: UUIDs are random and don't compress well. UUIDv4 partitioning was impossible.
Solution: TimescaleDB 2.22+ supports UUIDv7 (timestamp-based UUIDs).
-- Partition hypertable by UUIDv7 column
CREATE TABLE events (
event_id UUID DEFAULT gen_uuidv7() NOT NULL,
timestamp TIMESTAMPTZ DEFAULT NOW(),
data JSONB
) WITH (
tsdb.hypertable = '{
"partition_column": "event_id",
"partition_type": "uuidv7"
}'
);
Benefits:
- 30% better compression on UUIDv7 columns
- 2x faster queries with UUIDv7 in WHERE clauses
- Time-based partitioning without separate timestamp column
- Utility functions:
uuid_timestamp(),to_uuidv7_boundary()
4. Continuous Aggregates Performance (2024)
Real-time continuous aggregates got 50,000x faster for certain workloads.
What are continuous aggregates?
Think of them as materialized views that automatically update incrementally:
-- Pre-compute hourly log statistics
CREATE MATERIALIZED VIEW logs_hourly
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 hour', timestamp) AS hour,
organization_id,
level,
COUNT(*) as count,
SUM(LENGTH(message)) as total_bytes
FROM logs
GROUP BY hour, organization_id, level;
-- Refresh policy (runs automatically)
SELECT add_continuous_aggregate_policy(
'logs_hourly',
start_offset => INTERVAL '3 hours',
end_offset => INTERVAL '1 hour',
schedule_interval => INTERVAL '1 hour'
);
Performance improvements (2024):
- 10-20% faster DML (inserts/updates) - removed trigger-based invalidation
- Real-time queries - compute results on new data before materialization
- Compression support - continuous aggregates can now be compressed too
Example use case:
- Raw data: 500k logs/day (150GB/month)
- Hourly aggregate: 24 rows/day (1MB/month)
- Dashboard queries: 1000x faster (query aggregates, not raw logs)
5. SkipScan for Multi-Column Indexes (2025)
TimescaleDB 2.22 added SkipScan for multi-column indexes.
Problem: DISTINCT queries on large datasets are slow.
Solution: SkipScan leverages B-tree indexes to skip duplicate values.
-- Get unique organizations (across 100M rows)
SELECT DISTINCT organization_id FROM logs;
-- Before SkipScan: Full scan (45 seconds)
-- With SkipScan: Index-only scan (0.5 seconds)
-- Performance: 90x faster
Works with multi-column indexes too:
CREATE INDEX idx_org_level ON logs(organization_id, level);
-- Both columns benefit from SkipScan
SELECT DISTINCT organization_id, level FROM logs;
6. Direct-to-Columnstore Inserts (2023-2025)
Problem: Bulk imports write to rowstore, then wait for compression policy.
Solution: Direct-to-columnstore feature (tech preview in 2.23+).
-- Enable direct columnstore writes
SET timescaledb.enable_direct_compress_insert = on;
-- Bulk insert directly to columnstore
COPY logs FROM '/data/historical_logs.csv' CSV;
-- Writes directly in columnar format (faster + compressed immediately)
Use case: Historical data backfills, migrations, large imports.
Performance gain: Up to 5x faster imports for cold data.
7. Foreign Keys on Hypertables (2024)
Finally! After years of requests, TimescaleDB 2.24 added foreign key support.
CREATE TABLE organizations (
org_id UUID PRIMARY KEY,
name TEXT
);
CREATE TABLE logs (
timestamp TIMESTAMPTZ NOT NULL,
organization_id UUID NOT NULL REFERENCES organizations(org_id) ON DELETE CASCADE,
message TEXT
) WITH (tsdb.hypertable);
-- Foreign keys now work with CASCADE
DELETE FROM organizations WHERE org_id = '...';
-- Automatically deletes related logs
Note: Some performance overhead on writes (foreign key validation). Use judiciously.
Advanced Compression Techniques
Segmentby Optimization
Choosing the right segmentby columns is crucial.
Bad choice:
segmentby => 'trace_id'
Why bad? High cardinality. Each trace_id is unique β no compression benefit.
Good choice:
segmentby => 'organization_id,level'
Why good?
- organization_id: Medium cardinality (~1000 orgs)
- level: Low cardinality (4 values: debug/info/warn/error)
- We query by organization constantly
- Filtering by level is common
Rule of thumb:
- Segment by columns you frequently filter on
- Low to medium cardinality (<10,000 unique values)
- Columns that appear in WHERE clauses
Orderby Optimization
orderby => 'timestamp DESC, trace_id ASC'
Why this order?
-
timestamp DESC: Most queries want recent logs first -
trace_id ASC: For distributed tracing, keep related logs together
TimescaleDB stores min/max values for orderby columns. This lets it skip entire compressed chunks without decompression:
-- Query: Get logs from last 2 hours
SELECT * FROM logs
WHERE timestamp > NOW() - INTERVAL '2 hours';
-- TimescaleDB checks compressed chunk metadata:
-- Chunk 1: timestamp range [Jan 1, Jan 2] β SKIP
-- Chunk 2: timestamp range [Jan 2, Jan 3] β SKIP
-- Chunk 3: timestamp range [Jan 3, NOW] β SCAN
Zero decompression for irrelevant chunks.
Compression Settings by Use Case
High write throughput (>100k inserts/sec):
segmentby => 'device_id',
orderby => 'timestamp DESC',
compress_after => INTERVAL '30 days' -- Keep writes fast
Low cardinality data (sensors, metrics):
segmentby => 'sensor_id,location',
orderby => 'timestamp DESC'
-- Extreme compression ratios (95%+)
High cardinality events (unique IDs):
segmentby => 'user_region', -- Don't use user_id!
orderby => 'timestamp DESC,user_id ASC'
-- Modest compression (70-80%)
Mixed workload (Logtide):
segmentby => 'organization_id,level',
orderby => 'timestamp DESC'
-- 90% compression, fast queries
Real-World Compression Ratios
Industry data from TimescaleDB users and case studies:
| Use Case | Compression Ratio | Notes |
|---|---|---|
| IoT sensor data | 95%+ | Highly repetitive |
| Application logs | 85-92% | Our case: 90% |
| Financial tick data | 93-97% | Sequential numbers |
| Server metrics | 88-94% | Prometheus-style |
| E-commerce events | 75-85% | High cardinality |
| Cloudflare analytics | 84% | 100k events/sec |
Key insight: Time-series data compresses exceptionally well. 90% reduction is normal, not exceptional.
Gotchas and Limitations (2026 Edition)
1. Compressed Chunks Are Read-Only
You cannot INSERT/UPDATE/DELETE on compressed chunks:
-- This will fail:
INSERT INTO logs VALUES (...); -- timestamp in compressed chunk
ERROR: cannot insert into compressed chunk
Solution: Compression policy with appropriate delay (7-30 days).
Recent data stays in rowstore. Once in columnstore, it's historical (no writes needed).
If you MUST modify compressed data:
-- Decompress chunk
SELECT decompress_chunk('_timescaledb_internal._hyper_1_234_chunk');
-- Modify data
UPDATE logs SET level = 'error' WHERE ...;
-- Recompress
SELECT compress_chunk('_timescaledb_internal._hyper_1_234_chunk');
But this defeats the purpose. Design your system to avoid modifying old data.
Exception: Direct-to-columnstore backfills (see section 6 above).
2. Schema Changes Require Care
Adding columns to compressed hypertables:
-- Safe:
ALTER TABLE logs ADD COLUMN new_field TEXT DEFAULT NULL;
-- Dangerous:
ALTER TABLE logs ADD COLUMN new_field TEXT NOT NULL;
-- Requires decompressing all chunks!
Best practice: Add new columns as nullable. Backfill later if needed.
New in 2.23+: Better support for schema changes on columnstore without full decompression.
3. Backup Size Doesn't Shrink
pg_dump outputs uncompressed data:
# Compressed table: 25GB on disk
pg_dump logtide > backup.sql
# backup.sql size: 220GB (uncompressed!)
Solution: Use filesystem-level compression:
pg_dump logtide | gzip > backup.sql.gz
# Now: 30GB (similar to on-disk size)
Or use pg_basebackup for binary backups (preserves compression).
4. Monitoring Compression
Check compression status:
SELECT
chunk_name,
range_start,
range_end,
compression_status,
before_compression_total_bytes,
after_compression_total_bytes,
ROUND(
100.0 * (1 - after_compression_total_bytes::numeric / before_compression_total_bytes::numeric),
2
) AS compression_ratio
FROM timescaledb_information.chunks
WHERE hypertable_name = 'logs'
ORDER BY range_start DESC
LIMIT 10;
Output:
chunk_name | range_start | range_end | compression_status | compression_ratio
---------------------------|-------------|------------|-------------------|------------------
_hyper_1_234_chunk | 2026-01-25 | 2026-01-26 | Compressed | 91.2%
_hyper_1_233_chunk | 2026-01-24 | 2026-01-25 | Compressed | 89.8%
_hyper_1_232_chunk | 2026-01-23 | 2026-01-24 | Uncompressed | NULL
5. CPU vs Storage Trade-off
Decompression uses CPU. If your queries are CPU-bound (not I/O-bound), compression might hurt.
Our case: I/O-bound (reading from disk is slow). CPU overhead is negligible with SIMD.
Test before deploying: Run your actual query workload on compressed data.
Migration Strategy (2026 Edition)
Don't compress everything at once. Roll out gradually:
Phase 1: Test on Non-Critical Table
CREATE TABLE test_logs AS SELECT * FROM logs LIMIT 100000;
-- Modern syntax (2.18+)
CREATE TABLE test_logs_hyper (
LIKE test_logs
) WITH (tsdb.hypertable);
-- Configure columnstore
SELECT alter_columnstore(
'test_logs_hyper',
segmentby => ARRAY['organization_id'],
orderby => 'timestamp DESC'
);
-- Add compression policy
SELECT add_columnstore_policy('test_logs_hyper', compress_after => INTERVAL '1 day');
Run queries. Check performance. Verify results.
Phase 2: Enable on Production (Manual)
-- Configure columnstore on existing hypertable
SELECT alter_columnstore(
'logs',
segmentby => ARRAY['organization_id', 'level'],
orderby => 'timestamp DESC'
);
-- Manually compress one chunk
SELECT compress_chunk('_timescaledb_internal._hyper_1_100_chunk');
Monitor queries. Watch for issues. Validate compression ratio.
Phase 3: Automate Compression
SELECT add_columnstore_policy('logs', compress_after => INTERVAL '7 days');
Let it run. Compression happens in background. No downtime.
Phase 4: Monitor and Tune
Watch metrics:
- Compression ratio per chunk
- Query latency (before/after)
- CPU usage during queries
- Disk I/O reduction
Adjust segmentby and orderby if needed (2.22+ automatically re-evaluates).
When NOT to Use Compression
1. Frequently updated data
Compressed chunks are read-only. If you UPDATE/DELETE often, don't compress.
2. Random access patterns
If queries access random chunks (not time-based), decompression overhead hurts.
3. CPU-constrained workloads
If CPU is your bottleneck (not I/O), compression might make things worse.
4. Low data volume
<10GB total data? Compression overhead isn't worth it. Just use more storage.
5. High write throughput with low retention
Writing 1M logs/sec, deleting after 1 day? Compression won't help (data is deleted before compression policy runs).
Try It Yourself: Logtide Setup
Logtide uses TimescaleDB compression in production. See it in action:
git clone https://github.com/logtide-dev/logtide.git
cd logtide
docker compose up -d
Our compression configuration:
-- From logtide/migrations/003_compression.sql (2026 syntax)
SELECT alter_columnstore(
'logs',
segmentby => ARRAY['organization_id', 'level'],
orderby => 'timestamp DESC'
);
SELECT add_columnstore_policy('logs', compress_after => INTERVAL '7 days');
-- Continuous aggregates for analytics
CREATE MATERIALIZED VIEW logs_hourly
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 hour', timestamp) AS bucket,
organization_id,
level,
COUNT(*) as count
FROM logs
GROUP BY bucket, organization_id, level;
-- Compress the aggregate too
SELECT alter_columnstore(
'logs_hourly',
segmentby => ARRAY['organization_id', 'level'],
orderby => 'bucket DESC'
);
SELECT add_columnstore_policy('logs_hourly', compress_after => INTERVAL '30 days');
Result:
- 500k logs/day β 15GB/month compressed
- Fast queries (<200ms)
- Self-hosted on commodity hardware
- Zero managed database fees
Other TimescaleDB Superpowers
Beyond compression, TimescaleDB offers features that make it perfect for time-series workloads:
1. Automatic Data Retention
Delete old data without expensive DELETE queries:
-- Drop chunks older than 90 days automatically
SELECT add_retention_policy('logs', INTERVAL '90 days');
PostgreSQL equivalent:
-- This is slow and locks the table:
DELETE FROM logs WHERE timestamp < NOW() - INTERVAL '90 days';
TimescaleDB: Just drops entire chunks (instant, no locks).
2. Time Bucket Analytics
Powerful time-based aggregation function:
-- Get hourly log counts
SELECT
time_bucket('1 hour', timestamp) AS hour,
level,
COUNT(*) as count
FROM logs
WHERE timestamp > NOW() - INTERVAL '7 days'
GROUP BY hour, level
ORDER BY hour DESC;
Works with any interval: 1 second, 5 minutes, 1 day, 1 month, etc.
3. Hyperfunctions (Timescale Toolkit)
Advanced analytics functions written in Rust:
-- Time-weighted averages
SELECT
time_bucket('1 hour', timestamp) AS hour,
time_weight('LOCF', timestamp, temperature) as avg_temp
FROM sensor_data
GROUP BY hour;
-- Percentile approximation (fast)
SELECT
time_bucket('1 hour', timestamp) AS hour,
approx_percentile(0.95, response_time) as p95_latency
FROM api_logs
GROUP BY hour;
Install: CREATE EXTENSION timescaledb_toolkit;
4. Compression Tuple Filtering (2024)
Problem: UPDATEs/DELETEs on compressed data were slow (decompress everything).
Solution: Compression tuple filtering - skip irrelevant batches.
Performance gain: Up to 500x faster updates/deletes on compressed data.
-- Update specific logs in compressed chunks
UPDATE logs SET level = 'critical'
WHERE organization_id = 'acme'
AND timestamp > NOW() - INTERVAL '30 days'
AND message LIKE '%database timeout%';
-- Only decompresses batches for 'acme' org
5. Job Scheduler
Built-in job scheduler (no pg_cron needed):
-- Custom job: Generate daily summary report
SELECT add_job(
'generate_daily_summary',
'1 day',
config => '{"retention": "30 days"}'
);
CREATE OR REPLACE PROCEDURE generate_daily_summary(config JSONB)
LANGUAGE SQL AS $$
INSERT INTO daily_summaries
SELECT
DATE(timestamp) as day,
organization_id,
COUNT(*) as total_logs,
COUNT(*) FILTER (WHERE level = 'error') as errors
FROM logs
WHERE DATE(timestamp) = CURRENT_DATE - 1
GROUP BY day, organization_id;
$$;
Key Takeaways
1. TimescaleDB compression is insanely effective
- 85-95% reduction typical for time-series data
- Our case: 90% reduction in production
- Cloudflare case study: 84% reduction at 100k events/sec
2. Query performance often improves
- Less data to scan
- Columnar format perfect for analytics
- SIMD vectorization makes aggregations blazing fast
- Sparse indexes skip irrelevant chunks
3. Compression is automatic
- Set policy once
- Runs in background
- Zero operational overhead
- Dynamic re-optimization (2.22+)
4. Storage costs drop dramatically
- Scales linearly with data volume
- Especially impactful for high-volume workloads
- Self-hosting becomes viable at scale
5. It's just PostgreSQL
- No new query language
- Standard SQL
- Works with existing tools
- ORMs compatible
- Postgres 18 support (2026)
6. 2024-2026 improvements are massive
- SIMD vectorization (2-10x faster analytics)
- UUIDv7 support (30% better compression)
- Chunk-skipping indexes (7x faster scans)
- Continuous aggregates (50,000x speedup)
- SkipScan (90x faster DISTINCT)
- Foreign keys finally supported
Bottom Line
If you're storing time-series data in PostgreSQL, use TimescaleDB compression.
Setup time: 30 minutes
Storage savings: 90%
Query performance: Often better
Operational overhead: Zero
It's a no-brainer.
220GB β 25GB. Same queries. Way faster analytics.
Sometimes the best optimizations are the ones someone else already built.
Logtide: Open-source log management with TimescaleDB compression built-in.
GitHub: https://github.com/logtide-dev/logtide
Website: https://logtide.dev
Docs: https://logtide.dev/docs
TimescaleDB Docs: https://docs.timescale.com
Top comments (0)