How TimescaleDB Chunks Actually Work (And Why Size Matters)

#architecture #database #performance #postgres

When you call create_hypertable(), TimescaleDB does not store your data in a single table. It automatically partitions incoming rows across many physical tables called chunks, each covering a specific time interval. These chunks are real PostgreSQL tables in the _timescaledb_internal schema, with real indexes, real CHECK constraints, and real performance implications.

Understanding chunks is foundational to working effectively with TimescaleDB. Most performance issues -- slow queries on well-indexed tables, dashboard pages that take seconds to load, compression that seems slow for no reason -- trace back to chunk configuration.

Chunks Are PostgreSQL Tables

When you create a hypertable with a 1-day chunk interval, every day of data gets its own table:

SELECT create_hypertable('sensor_readings', 'recorded_at',
    chunk_time_interval => INTERVAL '1 day');

Each chunk receives a CHECK constraint that bounds its time range:

-- Auto-generated constraint (not something you write)
-- CHECK (recorded_at >= '2026-02-01' AND recorded_at < '2026-02-02')

When you insert a row, TimescaleDB routes it to the correct chunk based on the timestamp. When you query with a time filter, PostgreSQL uses these constraints to exclude irrelevant chunks before execution begins. This is called constraint exclusion, and it is the core mechanism that makes TimescaleDB performant for time-range queries.

Constraint Exclusion in Practice

EXPLAIN ANALYZE
SELECT * FROM sensor_readings
WHERE recorded_at >= '2026-02-20' AND recorded_at < '2026-02-21';

In the EXPLAIN output, you will see:

Chunks excluded during startup: 347

This means 347 chunks were evaluated and skipped because their time constraints did not overlap with the query's time range. Only the matching chunk (February 20th) was actually scanned. No I/O, no buffer reads, no index lookups on the excluded chunks.

The critical detail that trips people up: without a time predicate in your WHERE clause, PostgreSQL cannot exclude any chunks.

A query like this:

SELECT COUNT(*) FROM sensor_readings WHERE device_id = 42;

Has no timestamp filter, so the planner generates an append plan across every chunk in the hypertable. Even if device_id is indexed, the planner creates one index scan node per chunk. With thousands of chunks, this is extremely expensive.

Adding a time bound transforms the query:

SELECT COUNT(*) FROM sensor_readings
WHERE device_id = 42
  AND recorded_at >= now() - INTERVAL '7 days';

Now the planner excludes all but the last 7 days of chunks, and the query touches a handful of tables instead of thousands.

The Chunk Count Problem

Here is the part that surprises most people: even though excluded chunks are never scanned, the planner still evaluates every chunk's CHECK constraint during the planning phase. This evaluation has a cost that grows linearly with total chunk count.

Measured on the same dataset with the same query:

Chunk Count	Planning Time	Execution Time	Total
4,322 (1-hour interval)	443 ms	2 ms	445 ms
26 (7-day interval)	~5 ms	2 ms	7 ms

-- Identical query, vastly different total times
EXPLAIN ANALYZE
SELECT AVG(temperature)
FROM sensor_readings
WHERE recorded_at >= '2026-02-20' AND recorded_at < '2026-02-21';

-- 4322 chunks: Planning Time: 443ms  Execution Time: 2ms
-- 26 chunks:   Planning Time: 5ms    Execution Time: 2ms

The execution time is identical because the same rows are scanned. But with 4,322 chunks, the planner spends 443ms just evaluating constraints and building the query plan. This overhead applies to every single query, including trivial ones.

In a dashboard that issues 10-20 parallel queries, this turns a responsive page into a multi-second loading experience. It also affects prepared statements: each PREPARE re-evaluates constraints, so connection pools that re-prepare on checkout pay the planning cost repeatedly.

Choosing the Right Interval

The goal is to keep total chunk count manageable while maintaining chunks large enough for efficient compression. The recommended target is approximately 25 million rows per chunk.

chunk_interval_seconds = 25,000,000 / rows_per_second

For a sensor fleet ingesting 100 rows per second:

25,000,000 / 100 = 250,000 seconds = ~2.9 days

A 3-day interval keeps each chunk around 25M rows. At 1,000 rows/sec, a 7-hour interval produces similar sizes.

To change the interval:

-- Only affects chunks created after this point
SELECT set_chunk_time_interval('sensor_readings', INTERVAL '3 days');

Existing chunks retain their original interval and persist until they age out through retention.

Verify your current state:

SELECT
    hypertable_name,
    count(*) AS total_chunk_count,
    pg_size_pretty(avg(total_bytes)) AS average_chunk_size
FROM timescaledb_information.chunks
GROUP BY hypertable_name
ORDER BY total_chunk_count DESC;

Retention: Keeping Chunk Count Bounded

Without retention, even a well-configured interval produces an ever-growing chunk count. The add_retention_policy() function automatically drops chunks older than your retention window:

SELECT add_retention_policy('sensor_readings', drop_after => INTERVAL '30 days');

Dropping a chunk is an instant metadata operation. PostgreSQL removes the underlying table file rather than deleting rows individually. There are no dead tuples, no vacuum overhead, and no transaction ID consumption. This is one of the key advantages of chunk-based architecture over manual DELETE statements on partitioned tables.

With a 1-day interval and 30-day retention, you maintain approximately 30 chunks at all times -- well within the range where planning time is negligible.

Prevention Checklist

Set the interval based on ingest rate. Target 25M rows per chunk. Measure actual rows_per_second on production data.
Always include time predicates in queries. Without them, constraint exclusion cannot work and every chunk is scanned.
Configure retention policies. Cap total chunk count to prevent unbounded growth.
Monitor chunk count over time. A rising count with stable ingest means the interval is too small or retention is missing.
Check planning time periodically. EXPLAIN ANALYZE planning time over 50ms signals too many chunks.

Chunks are the fundamental unit of TimescaleDB's performance model. The interval controls how many exist, retention controls how long they persist, and constraint exclusion controls which ones are actually read. Get these three things right and most TimescaleDB performance problems never materialize.