DEV Community

Cover image for TimescaleDB Compression: From 150GB to 15GB (90% Reduction, Real Production Data)
Polliog
Polliog

Posted on

TimescaleDB Compression: From 150GB to 15GB (90% Reduction, Real Production Data)

When you're storing time-series data at scale, storage explodes fast. We hit 150GB in raw log data within months.

Then we enabled TimescaleDB compression.

Storage dropped to 15GB. Same data. Same query performance. 90% reduction.

Here's exactly how it works, with real production numbers from Logtide - an open-source log management platform handling 500,000 logs per day.

The Storage Problem

Time-series data grows relentlessly. When you're ingesting events at scale, storage becomes a major concern.

Our numbers (Logtide production):

  • 500,000 logs/day
  • Average log size: ~300 bytes
  • Daily data: 150MB raw
  • Monthly data: 4.5GB
  • Yearly projection: 54GB

Sounds manageable? Add indexes:

Reality with indexes:

  • Primary key index: 2x data size
  • Search indexes (GIN): 3x data size
  • Timestamp B-tree: 1x data size
  • Total storage multiplier: 6x

Actual monthly storage: 27GB
Yearly projection: 324GB

For self-hosted deployments, this means expensive NVMe storage or slow HDDs. For managed databases, storage costs add up quickly with backups, snapshots, and IOPS charges.

Enter TimescaleDB Compression

TimescaleDB is a PostgreSQL extension that adds time-series superpowers to Postgres. One killer feature: native columnar compression via the Hypercore engine.

It works differently from general-purpose compression:

Traditional row-based storage:

Row 1: [timestamp, level, message, metadata]
Row 2: [timestamp, level, message, metadata]
Row 3: [timestamp, level, message, metadata]
Enter fullscreen mode Exit fullscreen mode

TimescaleDB compressed columnstore:

Compressed batch:
  timestamp: [t1, t2, t3, ...] (1000 values)
  level:     [info, error, info, ...] (compressed)
  message:   [msg1, msg2, msg3, ...] (compressed)
  metadata:  [m1, m2, m3, ...] (compressed)
Enter fullscreen mode Exit fullscreen mode

Key insight: Similar data compresses better when stored together.

  • Timestamps are sequential β†’ excellent compression
  • Log levels repeat (info, error, warn) β†’ extreme compression
  • Metadata fields often duplicate β†’ good compression

How We Implemented It (2026 Best Practices)

Step 1: Convert Table to Hypertable

First, we converted our regular PostgreSQL table into a TimescaleDB hypertable using the modern syntax:

-- Our existing table
CREATE TABLE logs (
  id BIGSERIAL PRIMARY KEY,
  timestamp TIMESTAMPTZ NOT NULL,
  organization_id UUID NOT NULL,
  level TEXT NOT NULL,
  message TEXT NOT NULL,
  metadata JSONB,
  source TEXT,
  trace_id TEXT
);

-- Modern TimescaleDB 2.23+ syntax with columnstore enabled
CREATE TABLE logs (
  timestamp TIMESTAMPTZ NOT NULL,
  organization_id UUID NOT NULL,
  level TEXT NOT NULL,
  message TEXT NOT NULL,
  metadata JSONB,
  source TEXT,
  trace_id TEXT
) WITH (
  tsdb.hypertable
);
Enter fullscreen mode Exit fullscreen mode

What this does:

  • Automatically creates a hypertable
  • Enables columnstore by default (new in 2.18+)
  • Partitions data into 1-day chunks automatically
  • Each day = new chunk
  • Old chunks get compressed, new data stays in rowstore for fast writes

Step 2: Configure Columnstore Settings

With TimescaleDB 2.22+, compression settings are automatically optimized:

-- TimescaleDB 2.22+ automatically recommends best settings
-- based on table configuration and statistics
SELECT alter_columnstore(
  'logs',
  segmentby => ARRAY['organization_id', 'level'],
  orderby => 'timestamp DESC'
);
Enter fullscreen mode Exit fullscreen mode

Breaking it down:

segmentby = 'organization_id,level':

  • Group data by these columns
  • Each unique combination gets its own compressed segment
  • We query by organization frequently β†’ huge performance boost
  • Level is low cardinality (info/error/warn/debug) β†’ excellent compression

orderby = 'timestamp DESC':

  • Within each segment, order by timestamp descending
  • Matches our query pattern (newest logs first)
  • Sequential timestamps compress extremely well

New in 2.22/2.23: TimescaleDB now dynamically re-evaluates these settings as data patterns change, ensuring optimal compression over time.

Step 3: Enable Compression Policy

Automation is key:

-- Compress chunks older than 7 days automatically
SELECT add_columnstore_policy(
  'logs',
  compress_after => INTERVAL '7 days'
);
Enter fullscreen mode Exit fullscreen mode

What this does:

  • Automatically compresses chunks older than 7 days
  • Runs in the background via TimescaleDB's job scheduler
  • Recent data (last 7 days) stays in rowstore for fast writes
  • Old data moves to columnstore for space savings

Why 7 days?

  • Most queries are on recent data (last 24-48 hours)
  • After 7 days, logs are mostly for historical analysis
  • Balances write performance vs storage savings

You can adjust based on your access patterns:

  • Heavy recent writes β†’ compress after 14-30 days
  • Mostly historical queries β†’ compress after 1-3 days
  • Read-heavy workload β†’ compress after 1 day

Our Real Production Results (2026)

We enabled compression in December 2025. Here are the actual numbers:

Storage Reduction

Before compression (November 2025):

  • Raw data: 135GB
  • Indexes: 85GB
  • Total: 220GB

After compression (January 2026):

  • Compressed columnstore: 18GB
  • Sparse indexes: 7GB
  • Total: 25GB

Compression ratio: 88.6% (we round to 90%)

Storage Over Time

Here's how compression progressed:

Day 0:  220GB (no compression)
Day 7:  215GB (first chunk compressed: -5GB)
Day 14: 180GB (2 weeks compressed: -40GB)
Day 30: 90GB  (1 month compressed: -130GB)
Day 60: 25GB  (all historical data compressed)
Enter fullscreen mode Exit fullscreen mode

Linear reduction as more chunks hit the 7-day threshold.

Query Performance

Critical question: Does compression slow down queries?

Our testing (1M log search across 30 days):

Before compression:

  • Full-text search: 145ms
  • Time range filter: 12ms
  • Aggregation (count by level): 89ms

After compression (with SIMD vectorization):

  • Full-text search: 163ms (+12%)
  • Time range filter: 8ms (-33%)
  • Aggregation (count by level): 52ms (-41%)

Surprising result: Aggregations got FASTER.

Why?

  • Less data to scan (90% reduction)
  • Columnar format is perfect for aggregations
  • SIMD vectorization (new in 2.24) accelerates SUM, COUNT, AVG, MIN, MAX
  • Sparse indexes skip irrelevant compressed chunks

Trade-off: Text search slightly slower (needs decompression), but still under 200ms.

Decompression Overhead

Compressed chunks need decompression during queries. CPU cost?

Benchmark: 100,000 log retrieval from compressed chunk

SELECT * FROM logs
WHERE timestamp > NOW() - INTERVAL '30 days'
  AND organization_id = 'test-org'
LIMIT 100000;
Enter fullscreen mode Exit fullscreen mode

Results:

  • Rowstore: 1,240ms, CPU: 15%
  • Columnstore: 1,680ms, CPU: 28%
  • Overhead: +440ms, +13% CPU

In practice: Negligible. Modern CPUs with SIMD decompress data faster than disks can read uncompressed data.

TimescaleDB 2024-2026: What's New

The past two years brought massive improvements to compression and analytics:

1. Chunk-Skipping Indexes (2024)

Problem: Querying compressed data by non-time columns was slow.

Solution: Min/max sparse indexes on compressed chunks.

-- Create index on level column
CREATE INDEX idx_level ON logs(level);

-- TimescaleDB 2.x automatically creates sparse index for compressed chunks
-- Query benefits from chunk exclusion:
SELECT * FROM logs WHERE level = 'error' AND timestamp > NOW() - INTERVAL '30 days';
-- Only scans chunks containing 'error' level logs
Enter fullscreen mode Exit fullscreen mode

Performance gain: Up to 7x faster scans on compressed data.

2. SIMD Vectorization (2024-2025)

Problem: Aggregate queries on columnar data were CPU-intensive.

Solution: Single Instruction, Multiple Data (SIMD) processing.

When you run queries with aggregations (SUM, COUNT, AVG, MIN, MAX) grouped by segment_by columns, modern CPUs process multiple values simultaneously.

Performance gain: 2-10x faster analytical queries on compressed data.

-- This query benefits from SIMD vectorization
SELECT 
  organization_id,
  level,
  COUNT(*) as count,
  AVG(LENGTH(message)) as avg_length
FROM logs
WHERE timestamp > NOW() - INTERVAL '30 days'
GROUP BY organization_id, level;
-- Runs 5-7x faster with SIMD on compressed chunks
Enter fullscreen mode Exit fullscreen mode

3. UUIDv7 Compression & Partitioning (2025)

Problem: UUIDs are random and don't compress well. UUIDv4 partitioning was impossible.

Solution: TimescaleDB 2.22+ supports UUIDv7 (timestamp-based UUIDs).

-- Partition hypertable by UUIDv7 column
CREATE TABLE events (
  event_id UUID DEFAULT gen_uuidv7() NOT NULL,
  timestamp TIMESTAMPTZ DEFAULT NOW(),
  data JSONB
) WITH (
  tsdb.hypertable = '{
    "partition_column": "event_id",
    "partition_type": "uuidv7"
  }'
);
Enter fullscreen mode Exit fullscreen mode

Benefits:

  • 30% better compression on UUIDv7 columns
  • 2x faster queries with UUIDv7 in WHERE clauses
  • Time-based partitioning without separate timestamp column
  • Utility functions: uuid_timestamp(), to_uuidv7_boundary()

4. Continuous Aggregates Performance (2024)

Real-time continuous aggregates got 50,000x faster for certain workloads.

What are continuous aggregates?

Think of them as materialized views that automatically update incrementally:

-- Pre-compute hourly log statistics
CREATE MATERIALIZED VIEW logs_hourly
WITH (timescaledb.continuous) AS
SELECT
  time_bucket('1 hour', timestamp) AS hour,
  organization_id,
  level,
  COUNT(*) as count,
  SUM(LENGTH(message)) as total_bytes
FROM logs
GROUP BY hour, organization_id, level;

-- Refresh policy (runs automatically)
SELECT add_continuous_aggregate_policy(
  'logs_hourly',
  start_offset => INTERVAL '3 hours',
  end_offset => INTERVAL '1 hour',
  schedule_interval => INTERVAL '1 hour'
);
Enter fullscreen mode Exit fullscreen mode

Performance improvements (2024):

  • 10-20% faster DML (inserts/updates) - removed trigger-based invalidation
  • Real-time queries - compute results on new data before materialization
  • Compression support - continuous aggregates can now be compressed too

Example use case:

  • Raw data: 500k logs/day (150GB/month)
  • Hourly aggregate: 24 rows/day (1MB/month)
  • Dashboard queries: 1000x faster (query aggregates, not raw logs)

5. SkipScan for Multi-Column Indexes (2025)

TimescaleDB 2.22 added SkipScan for multi-column indexes.

Problem: DISTINCT queries on large datasets are slow.

Solution: SkipScan leverages B-tree indexes to skip duplicate values.

-- Get unique organizations (across 100M rows)
SELECT DISTINCT organization_id FROM logs;

-- Before SkipScan: Full scan (45 seconds)
-- With SkipScan: Index-only scan (0.5 seconds)
-- Performance: 90x faster
Enter fullscreen mode Exit fullscreen mode

Works with multi-column indexes too:

CREATE INDEX idx_org_level ON logs(organization_id, level);

-- Both columns benefit from SkipScan
SELECT DISTINCT organization_id, level FROM logs;
Enter fullscreen mode Exit fullscreen mode

6. Direct-to-Columnstore Inserts (2023-2025)

Problem: Bulk imports write to rowstore, then wait for compression policy.

Solution: Direct-to-columnstore feature (tech preview in 2.23+).

-- Enable direct columnstore writes
SET timescaledb.enable_direct_compress_insert = on;

-- Bulk insert directly to columnstore
COPY logs FROM '/data/historical_logs.csv' CSV;
-- Writes directly in columnar format (faster + compressed immediately)
Enter fullscreen mode Exit fullscreen mode

Use case: Historical data backfills, migrations, large imports.

Performance gain: Up to 5x faster imports for cold data.

7. Foreign Keys on Hypertables (2024)

Finally! After years of requests, TimescaleDB 2.24 added foreign key support.

CREATE TABLE organizations (
  org_id UUID PRIMARY KEY,
  name TEXT
);

CREATE TABLE logs (
  timestamp TIMESTAMPTZ NOT NULL,
  organization_id UUID NOT NULL REFERENCES organizations(org_id) ON DELETE CASCADE,
  message TEXT
) WITH (tsdb.hypertable);

-- Foreign keys now work with CASCADE
DELETE FROM organizations WHERE org_id = '...';
-- Automatically deletes related logs
Enter fullscreen mode Exit fullscreen mode

Note: Some performance overhead on writes (foreign key validation). Use judiciously.

Advanced Compression Techniques

Segmentby Optimization

Choosing the right segmentby columns is crucial.

Bad choice:

segmentby => 'trace_id'
Enter fullscreen mode Exit fullscreen mode

Why bad? High cardinality. Each trace_id is unique β†’ no compression benefit.

Good choice:

segmentby => 'organization_id,level'
Enter fullscreen mode Exit fullscreen mode

Why good?

  • organization_id: Medium cardinality (~1000 orgs)
  • level: Low cardinality (4 values: debug/info/warn/error)
  • We query by organization constantly
  • Filtering by level is common

Rule of thumb:

  • Segment by columns you frequently filter on
  • Low to medium cardinality (<10,000 unique values)
  • Columns that appear in WHERE clauses

Orderby Optimization

orderby => 'timestamp DESC, trace_id ASC'
Enter fullscreen mode Exit fullscreen mode

Why this order?

  • timestamp DESC: Most queries want recent logs first
  • trace_id ASC: For distributed tracing, keep related logs together

TimescaleDB stores min/max values for orderby columns. This lets it skip entire compressed chunks without decompression:

-- Query: Get logs from last 2 hours
SELECT * FROM logs
WHERE timestamp > NOW() - INTERVAL '2 hours';

-- TimescaleDB checks compressed chunk metadata:
-- Chunk 1: timestamp range [Jan 1, Jan 2] β†’ SKIP
-- Chunk 2: timestamp range [Jan 2, Jan 3] β†’ SKIP
-- Chunk 3: timestamp range [Jan 3, NOW] β†’ SCAN
Enter fullscreen mode Exit fullscreen mode

Zero decompression for irrelevant chunks.

Compression Settings by Use Case

High write throughput (>100k inserts/sec):

segmentby => 'device_id',
orderby => 'timestamp DESC',
compress_after => INTERVAL '30 days'  -- Keep writes fast
Enter fullscreen mode Exit fullscreen mode

Low cardinality data (sensors, metrics):

segmentby => 'sensor_id,location',
orderby => 'timestamp DESC'
-- Extreme compression ratios (95%+)
Enter fullscreen mode Exit fullscreen mode

High cardinality events (unique IDs):

segmentby => 'user_region',  -- Don't use user_id!
orderby => 'timestamp DESC,user_id ASC'
-- Modest compression (70-80%)
Enter fullscreen mode Exit fullscreen mode

Mixed workload (Logtide):

segmentby => 'organization_id,level',
orderby => 'timestamp DESC'
-- 90% compression, fast queries
Enter fullscreen mode Exit fullscreen mode

Real-World Compression Ratios

Industry data from TimescaleDB users and case studies:

Use Case Compression Ratio Notes
IoT sensor data 95%+ Highly repetitive
Application logs 85-92% Our case: 90%
Financial tick data 93-97% Sequential numbers
Server metrics 88-94% Prometheus-style
E-commerce events 75-85% High cardinality
Cloudflare analytics 84% 100k events/sec

Key insight: Time-series data compresses exceptionally well. 90% reduction is normal, not exceptional.

Gotchas and Limitations (2026 Edition)

1. Compressed Chunks Are Read-Only

You cannot INSERT/UPDATE/DELETE on compressed chunks:

-- This will fail:
INSERT INTO logs VALUES (...);  -- timestamp in compressed chunk
ERROR: cannot insert into compressed chunk
Enter fullscreen mode Exit fullscreen mode

Solution: Compression policy with appropriate delay (7-30 days).

Recent data stays in rowstore. Once in columnstore, it's historical (no writes needed).

If you MUST modify compressed data:

-- Decompress chunk
SELECT decompress_chunk('_timescaledb_internal._hyper_1_234_chunk');

-- Modify data
UPDATE logs SET level = 'error' WHERE ...;

-- Recompress
SELECT compress_chunk('_timescaledb_internal._hyper_1_234_chunk');
Enter fullscreen mode Exit fullscreen mode

But this defeats the purpose. Design your system to avoid modifying old data.

Exception: Direct-to-columnstore backfills (see section 6 above).

2. Schema Changes Require Care

Adding columns to compressed hypertables:

-- Safe:
ALTER TABLE logs ADD COLUMN new_field TEXT DEFAULT NULL;

-- Dangerous:
ALTER TABLE logs ADD COLUMN new_field TEXT NOT NULL;
-- Requires decompressing all chunks!
Enter fullscreen mode Exit fullscreen mode

Best practice: Add new columns as nullable. Backfill later if needed.

New in 2.23+: Better support for schema changes on columnstore without full decompression.

3. Backup Size Doesn't Shrink

pg_dump outputs uncompressed data:

# Compressed table: 25GB on disk
pg_dump logtide > backup.sql

# backup.sql size: 220GB (uncompressed!)
Enter fullscreen mode Exit fullscreen mode

Solution: Use filesystem-level compression:

pg_dump logtide | gzip > backup.sql.gz
# Now: 30GB (similar to on-disk size)
Enter fullscreen mode Exit fullscreen mode

Or use pg_basebackup for binary backups (preserves compression).

4. Monitoring Compression

Check compression status:

SELECT
  chunk_name,
  range_start,
  range_end,
  compression_status,
  before_compression_total_bytes,
  after_compression_total_bytes,
  ROUND(
    100.0 * (1 - after_compression_total_bytes::numeric / before_compression_total_bytes::numeric),
    2
  ) AS compression_ratio
FROM timescaledb_information.chunks
WHERE hypertable_name = 'logs'
ORDER BY range_start DESC
LIMIT 10;
Enter fullscreen mode Exit fullscreen mode

Output:

chunk_name                 | range_start | range_end  | compression_status | compression_ratio
---------------------------|-------------|------------|-------------------|------------------
_hyper_1_234_chunk         | 2026-01-25  | 2026-01-26 | Compressed        | 91.2%
_hyper_1_233_chunk         | 2026-01-24  | 2026-01-25 | Compressed        | 89.8%
_hyper_1_232_chunk         | 2026-01-23  | 2026-01-24 | Uncompressed      | NULL
Enter fullscreen mode Exit fullscreen mode

5. CPU vs Storage Trade-off

Decompression uses CPU. If your queries are CPU-bound (not I/O-bound), compression might hurt.

Our case: I/O-bound (reading from disk is slow). CPU overhead is negligible with SIMD.

Test before deploying: Run your actual query workload on compressed data.

Migration Strategy (2026 Edition)

Don't compress everything at once. Roll out gradually:

Phase 1: Test on Non-Critical Table

CREATE TABLE test_logs AS SELECT * FROM logs LIMIT 100000;

-- Modern syntax (2.18+)
CREATE TABLE test_logs_hyper (
  LIKE test_logs
) WITH (tsdb.hypertable);

-- Configure columnstore
SELECT alter_columnstore(
  'test_logs_hyper',
  segmentby => ARRAY['organization_id'],
  orderby => 'timestamp DESC'
);

-- Add compression policy
SELECT add_columnstore_policy('test_logs_hyper', compress_after => INTERVAL '1 day');
Enter fullscreen mode Exit fullscreen mode

Run queries. Check performance. Verify results.

Phase 2: Enable on Production (Manual)

-- Configure columnstore on existing hypertable
SELECT alter_columnstore(
  'logs',
  segmentby => ARRAY['organization_id', 'level'],
  orderby => 'timestamp DESC'
);

-- Manually compress one chunk
SELECT compress_chunk('_timescaledb_internal._hyper_1_100_chunk');
Enter fullscreen mode Exit fullscreen mode

Monitor queries. Watch for issues. Validate compression ratio.

Phase 3: Automate Compression

SELECT add_columnstore_policy('logs', compress_after => INTERVAL '7 days');
Enter fullscreen mode Exit fullscreen mode

Let it run. Compression happens in background. No downtime.

Phase 4: Monitor and Tune

Watch metrics:

  • Compression ratio per chunk
  • Query latency (before/after)
  • CPU usage during queries
  • Disk I/O reduction

Adjust segmentby and orderby if needed (2.22+ automatically re-evaluates).

When NOT to Use Compression

1. Frequently updated data

Compressed chunks are read-only. If you UPDATE/DELETE often, don't compress.

2. Random access patterns

If queries access random chunks (not time-based), decompression overhead hurts.

3. CPU-constrained workloads

If CPU is your bottleneck (not I/O), compression might make things worse.

4. Low data volume

<10GB total data? Compression overhead isn't worth it. Just use more storage.

5. High write throughput with low retention

Writing 1M logs/sec, deleting after 1 day? Compression won't help (data is deleted before compression policy runs).

Try It Yourself: Logtide Setup

Logtide uses TimescaleDB compression in production. See it in action:

git clone https://github.com/logtide-dev/logtide.git
cd logtide
docker compose up -d
Enter fullscreen mode Exit fullscreen mode

Our compression configuration:

-- From logtide/migrations/003_compression.sql (2026 syntax)
SELECT alter_columnstore(
  'logs',
  segmentby => ARRAY['organization_id', 'level'],
  orderby => 'timestamp DESC'
);

SELECT add_columnstore_policy('logs', compress_after => INTERVAL '7 days');

-- Continuous aggregates for analytics
CREATE MATERIALIZED VIEW logs_hourly
WITH (timescaledb.continuous) AS
SELECT
  time_bucket('1 hour', timestamp) AS bucket,
  organization_id,
  level,
  COUNT(*) as count
FROM logs
GROUP BY bucket, organization_id, level;

-- Compress the aggregate too
SELECT alter_columnstore(
  'logs_hourly',
  segmentby => ARRAY['organization_id', 'level'],
  orderby => 'bucket DESC'
);

SELECT add_columnstore_policy('logs_hourly', compress_after => INTERVAL '30 days');
Enter fullscreen mode Exit fullscreen mode

Result:

  • 500k logs/day β†’ 15GB/month compressed
  • Fast queries (<200ms)
  • Self-hosted on commodity hardware
  • Zero managed database fees

Other TimescaleDB Superpowers

Beyond compression, TimescaleDB offers features that make it perfect for time-series workloads:

1. Automatic Data Retention

Delete old data without expensive DELETE queries:

-- Drop chunks older than 90 days automatically
SELECT add_retention_policy('logs', INTERVAL '90 days');
Enter fullscreen mode Exit fullscreen mode

PostgreSQL equivalent:

-- This is slow and locks the table:
DELETE FROM logs WHERE timestamp < NOW() - INTERVAL '90 days';
Enter fullscreen mode Exit fullscreen mode

TimescaleDB: Just drops entire chunks (instant, no locks).

2. Time Bucket Analytics

Powerful time-based aggregation function:

-- Get hourly log counts
SELECT
  time_bucket('1 hour', timestamp) AS hour,
  level,
  COUNT(*) as count
FROM logs
WHERE timestamp > NOW() - INTERVAL '7 days'
GROUP BY hour, level
ORDER BY hour DESC;
Enter fullscreen mode Exit fullscreen mode

Works with any interval: 1 second, 5 minutes, 1 day, 1 month, etc.

3. Hyperfunctions (Timescale Toolkit)

Advanced analytics functions written in Rust:

-- Time-weighted averages
SELECT
  time_bucket('1 hour', timestamp) AS hour,
  time_weight('LOCF', timestamp, temperature) as avg_temp
FROM sensor_data
GROUP BY hour;

-- Percentile approximation (fast)
SELECT
  time_bucket('1 hour', timestamp) AS hour,
  approx_percentile(0.95, response_time) as p95_latency
FROM api_logs
GROUP BY hour;
Enter fullscreen mode Exit fullscreen mode

Install: CREATE EXTENSION timescaledb_toolkit;

4. Compression Tuple Filtering (2024)

Problem: UPDATEs/DELETEs on compressed data were slow (decompress everything).

Solution: Compression tuple filtering - skip irrelevant batches.

Performance gain: Up to 500x faster updates/deletes on compressed data.

-- Update specific logs in compressed chunks
UPDATE logs SET level = 'critical'
WHERE organization_id = 'acme'
  AND timestamp > NOW() - INTERVAL '30 days'
  AND message LIKE '%database timeout%';
-- Only decompresses batches for 'acme' org
Enter fullscreen mode Exit fullscreen mode

5. Job Scheduler

Built-in job scheduler (no pg_cron needed):

-- Custom job: Generate daily summary report
SELECT add_job(
  'generate_daily_summary',
  '1 day',
  config => '{"retention": "30 days"}'
);

CREATE OR REPLACE PROCEDURE generate_daily_summary(config JSONB)
LANGUAGE SQL AS $$
  INSERT INTO daily_summaries
  SELECT
    DATE(timestamp) as day,
    organization_id,
    COUNT(*) as total_logs,
    COUNT(*) FILTER (WHERE level = 'error') as errors
  FROM logs
  WHERE DATE(timestamp) = CURRENT_DATE - 1
  GROUP BY day, organization_id;
$$;
Enter fullscreen mode Exit fullscreen mode

Key Takeaways

1. TimescaleDB compression is insanely effective

  • 85-95% reduction typical for time-series data
  • Our case: 90% reduction in production
  • Cloudflare case study: 84% reduction at 100k events/sec

2. Query performance often improves

  • Less data to scan
  • Columnar format perfect for analytics
  • SIMD vectorization makes aggregations blazing fast
  • Sparse indexes skip irrelevant chunks

3. Compression is automatic

  • Set policy once
  • Runs in background
  • Zero operational overhead
  • Dynamic re-optimization (2.22+)

4. Storage costs drop dramatically

  • Scales linearly with data volume
  • Especially impactful for high-volume workloads
  • Self-hosting becomes viable at scale

5. It's just PostgreSQL

  • No new query language
  • Standard SQL
  • Works with existing tools
  • ORMs compatible
  • Postgres 18 support (2026)

6. 2024-2026 improvements are massive

  • SIMD vectorization (2-10x faster analytics)
  • UUIDv7 support (30% better compression)
  • Chunk-skipping indexes (7x faster scans)
  • Continuous aggregates (50,000x speedup)
  • SkipScan (90x faster DISTINCT)
  • Foreign keys finally supported

Bottom Line

If you're storing time-series data in PostgreSQL, use TimescaleDB compression.

Setup time: 30 minutes
Storage savings: 90%
Query performance: Often better
Operational overhead: Zero

It's a no-brainer.


220GB β†’ 25GB. Same queries. Way faster analytics.

Sometimes the best optimizations are the ones someone else already built.


Logtide: Open-source log management with TimescaleDB compression built-in.

GitHub: https://github.com/logtide-dev/logtide
Website: https://logtide.dev
Docs: https://logtide.dev/docs
TimescaleDB Docs: https://docs.timescale.com

Top comments (0)