DEV Community

Cover image for Day 28 of 100 Days of ClickHouse® – Understanding TTL (Time To Live) for Automated Data Lifecycle Management
Kanishga Subramani
Kanishga Subramani

Posted on

Day 28 of 100 Days of ClickHouse® – Understanding TTL (Time To Live) for Automated Data Lifecycle Management

Introduction

As data grows over time, storing every row forever becomes increasingly expensive and often unnecessary. Application logs, IoT events, monitoring metrics, audit records, and clickstream data typically have different retention requirements depending on their age.

ClickHouse® provides TTL (Time To Live), a powerful built-in feature that automates data lifecycle management. Instead of relying on scheduled cleanup scripts or manual DELETE operations, TTL rules can be defined directly within a table schema.

With TTL, ClickHouse can automatically:

  • Delete expired data
  • Move older data to lower-cost storage
  • Recompress aging data using stronger codecs
  • Aggregate historical records into summarized data

These capabilities make TTL one of the most effective storage optimization features available in ClickHouse.


What is TTL?

TTL (Time To Live) defines what should happen to data after a specified period of time.

A TTL expression is usually based on a Date or DateTime column.

TTL event_time + INTERVAL 30 DAY
Enter fullscreen mode Exit fullscreen mode

After the specified interval expires, ClickHouse performs the configured action during background merge operations.

Unlike scheduled jobs, TTL is completely integrated into the MergeTree engine, making lifecycle management automatic and maintenance-free.


How TTL Works Internally

When new data is inserted into a MergeTree table, the following process occurs:

  1. Data is written into immutable parts.
  2. Background merge operations combine smaller parts into larger ones.
  3. During each merge, ClickHouse evaluates TTL expressions.
  4. Expired rows are deleted, moved, recompressed, or aggregated depending on the configured rule.

Because TTL works during merges:

  • Expired data may remain visible for a short period.
  • Cleanup is asynchronous.
  • Execution depends on the merge schedule.

For testing purposes, you can force a merge using:

OPTIMIZE TABLE table_name FINAL;
Enter fullscreen mode Exit fullscreen mode

1. Automatically Deleting Expired Data

The most common TTL use case is automatic data deletion.

CREATE TABLE logs
(
    timestamp DateTime,
    message String
)
ENGINE = MergeTree
ORDER BY timestamp
TTL timestamp + INTERVAL 30 DAY;
Enter fullscreen mode Exit fullscreen mode

What happens?

  • Newly inserted data remains available.
  • After 30 days, rows become eligible for removal.
  • During future background merges, ClickHouse automatically deletes expired rows.

This eliminates the need for cron jobs or periodic cleanup scripts while reducing storage usage automatically.


2. Column-Level TTL

TTL isn't limited to entire rows. It can also be applied to individual columns.

CREATE TABLE user_events
(
    event_time DateTime,
    user_id UInt64,
    session_data String TTL event_time + INTERVAL 7 DAY
)
ENGINE = MergeTree
ORDER BY event_time;
Enter fullscreen mode Exit fullscreen mode

Result

After seven days:

  • session_data is removed.
  • The remaining columns stay intact.

This is particularly useful when temporary or sensitive information only needs to be retained for a limited time.


3. Moving Data to Cold Storage

Many organizations use multiple storage tiers.

Storage Tier Purpose
SSD Frequently accessed recent data
HDD / Object Storage Historical or infrequently accessed data

TTL can automatically move older data between storage volumes.

CREATE TABLE logs
(
    timestamp DateTime,
    message String
)
ENGINE = MergeTree
ORDER BY timestamp
TTL timestamp + INTERVAL 90 DAY
TO VOLUME 'cold_storage';
Enter fullscreen mode Exit fullscreen mode

Result

  • Recent data remains on fast storage.
  • Older data is moved to lower-cost storage.

This helps reduce infrastructure costs while keeping historical data available when needed.


4. Recompressing Older Data

Recent data is queried frequently and benefits from fast compression codecs such as LZ4.

Historical data is accessed less often, making stronger compression more efficient.

TTL supports automatic recompression.

CREATE TABLE events
(
    event_time DateTime,
    user_id UInt64,
    value Float64
)
ENGINE = MergeTree
ORDER BY event_time
TTL event_time + INTERVAL 30 DAY
RECOMPRESS CODEC(ZSTD(17));
Enter fullscreen mode Exit fullscreen mode

Compression lifecycle

Data Age Compression
0–30 Days Default (LZ4)
After 30 Days ZSTD(17)

Benefits

  • Lower storage consumption
  • Better disk utilization
  • Fully automated optimization
  • No manual recompression required

5. Aggregating Historical Data with GROUP BY TTL

One of the most powerful TTL capabilities is automatic rollup aggregation.

Instead of deleting old data, ClickHouse can summarize historical records into aggregated values.

This is particularly useful for:

  • Monitoring systems
  • IoT platforms
  • Clickstream analytics
  • Time-series databases

Example

CREATE TABLE hits
(
    timestamp DateTime,
    id String,
    hits Int32,
    max_hits Int32 DEFAULT hits,
    sum_hits Int64 DEFAULT hits
)
ENGINE = MergeTree
PRIMARY KEY (id, toStartOfDay(timestamp), timestamp)
TTL timestamp + INTERVAL 1 DAY
GROUP BY
    id,
    toStartOfDay(timestamp)
SET
    max_hits = max(max_hits),
    sum_hits = sum(sum_hits);
Enter fullscreen mode Exit fullscreen mode

Before Rollup

Timestamp ID Hits
10:00 A 5
11:00 A 8
12:00 A 12

After one day, the TTL rule executes:

TTL timestamp + INTERVAL 1 DAY
GROUP BY
    id,
    toStartOfDay(timestamp)
SET
    max_hits = max(max_hits),
    sum_hits = sum(sum_hits);
Enter fullscreen mode Exit fullscreen mode

After Rollup

Day ID Max Hits Sum Hits
2026-06-20 A 12 25

Instead of storing three separate rows, ClickHouse stores one summarized row.

This significantly reduces storage while preserving meaningful historical insights.


Best Practices for Using TTL

  • Use TTL for predictable retention policies instead of scheduled DELETE statements.
  • Remember that TTL actions occur during background merges, not immediately after expiration.
  • Apply column-level TTL when only specific fields need limited retention.
  • Use storage tiering to reduce infrastructure costs without losing historical data.
  • Combine TTL with recompression to maximize disk efficiency.
  • Consider GROUP BY TTL for time-series workloads where summarized historical data is sufficient.
  • Use OPTIMIZE TABLE ... FINAL only for testing, not as a regular maintenance task.

Conclusion

TTL is one of ClickHouse's most valuable automation features for managing data throughout its lifecycle. With a single declarative rule, you can automatically delete expired data, move historical records to lower-cost storage, apply stronger compression to older data, or aggregate detailed records into summarized insights.

By leveraging TTL effectively, organizations can reduce storage costs, simplify maintenance, improve long-term performance, and build scalable analytical systems without relying on external cleanup jobs or manual intervention.

Whether you're managing log data, monitoring metrics, IoT telemetry, or large-scale analytical workloads, TTL provides a clean and efficient approach to implementing automated data retention policies in ClickHouse.

Top comments (0)