Why I chose ClickHouse over PostgreSQL for a billion-row analytics platform

#clickhouse #postgressql #backenddevelopment #database

Last year I was handed a problem: an enterprise IoT monitoring dashboard that took 18–22 seconds to load. The client — a large industrial operator — had engineers staring at a spinner every time they opened it. The underlying data was billions of sensor readings, stored in PostgreSQL.

My job was to fix it. The solution ended up being ClickHouse. Here's exactly why, including the benchmarks and the tradeoffs I accepted to get there.

The Query That Broke Postgres

The dashboard had a "fleet overview" view that aggregated data across all sensor racks simultaneously. In simplified terms:

SELECT
    rack_id,
    quantile(0.95)(value) AS p95_reading,
    quantile(0.50)(value) AS median_reading,
    count()               AS sample_count
FROM sensor_readings
WHERE ts >= now() - INTERVAL 7 DAY
GROUP BY rack_id
ORDER BY p95_reading DESC

On PostgreSQL 16 with BRIN indexes on ts: 42,000ms. Query was being killed at 60 seconds in production. Users saw a broken dashboard and assumed the system was down.

On ClickHouse with MergeTree partitioned by month: 380ms.

That's not a configuration problem. That's a fundamentally different storage model.

Why Columnar Storage Wins for This Workload

PostgreSQL stores data row by row. To aggregate a column across millions of rows, it has to read every row — even the columns you don't need.

ClickHouse stores data column by column. An aggregation query reads only the columns it touches. For a table with 12 columns, that's roughly a 12x reduction in I/O before any other optimization kicks in.

On top of that, columnar storage compresses repetitively structured data extremely well. IoT sensor data is highly repetitive — the same sensor IDs, the same rack IDs, the same status codes, repeated billions of times. Our 1-billion-row dataset that occupied 847GB in PostgreSQL occupied 94GB in ClickHouse. A 9x compression ratio, on the same hardware.

Less data on disk means faster reads. Faster reads means faster queries. The math is straightforward.

The Benchmark Results

I ran three representative queries against the same 1-billion-row dataset on the same hardware (8-core CPU, 32GB RAM, NVMe SSD). Here's what came back:

Query 1: Count rows in a 24-hour window

Engine	Latency (p50)
PostgreSQL 16 (BRIN index)	1,240ms
TimescaleDB	310ms
ClickHouse	18ms

Query 2: 1-hour candle aggregation over 90 days

Engine	Latency (p50)
PostgreSQL 16	8,200ms
TimescaleDB (continuous aggregate)	95ms
ClickHouse (AggregatingMergeTree view)	8ms

Query 3: Cross-rack percentile across 7 days (the killer query)

Engine	Latency (p50)
PostgreSQL 16	42,000ms
TimescaleDB	18,000ms
ClickHouse	380ms

The third query is the one that mattered most. TimescaleDB at 18 seconds is still too slow for an interactive dashboard. ClickHouse at 380ms feels instant.

What Made the Difference: Materialised Views at Insert Time

The 8ms result on Query 2 wasn't just columnar storage. It was ClickHouse's AggregatingMergeTree materialised views — which compute aggregations at insert time, not at query time.

CREATE MATERIALIZED VIEW candles_1h
ENGINE = AggregatingMergeTree()
ORDER BY (asset, ts_bucket)
AS
SELECT
    asset,
    toStartOfHour(ts)         AS ts_bucket,
    argMinState(value, ts)    AS open,
    maxState(value)           AS high,
    minState(value)           AS low,
    argMaxState(value, ts)    AS close
FROM sensor_readings
GROUP BY asset, ts_bucket;

Every time data is inserted, ClickHouse updates the materialised view in the background. By the time a dashboard query arrives, the 1-hour candles are pre-computed. The query scans 2,160 rows (90 days × 24 hours) instead of 200 million raw readings.

TimescaleDB has something similar — continuous aggregates. The difference: TimescaleDB continuous aggregates break when the query uses dynamic intervals (users choosing arbitrary time windows). ClickHouse materialised views handle this correctly.

The Tradeoffs I Accepted

ClickHouse is not PostgreSQL. There are real things it can't do:

No row-level updates. UPDATE sensor_readings SET value = X WHERE id = Y is an async mutation in ClickHouse — expensive and non-atomic. For IoT data that's append-only, this doesn't matter. For data that needs corrections, you need a strategy (we used a separate corrections log in Postgres).

Joins are slow. ClickHouse joins are hash-based and don't benefit from indexes the way PostgreSQL does. We kept all relational data — users, device registry, permissions — in PostgreSQL and only put time-series data in ClickHouse.

Local development is heavy. ClickHouse in Docker takes 2–3 minutes to start and consumes significant RAM. We eventually moved to a shared dev instance rather than having every engineer run it locally.

No transactions. If you need BEGIN / COMMIT / ROLLBACK, ClickHouse can't help you.

The decision was straightforward for our workload: append-only time-series, aggregation-heavy queries, no relational joins on the hot path. If any of those conditions were different, the answer might have been different too.

The Result

Dashboard load time went from 18–22 seconds to under 1 second. Not through clever caching alone (though we added multi-tier Redis caching on top), but because the underlying queries went from multi-second to sub-100ms.

The client's engineers went from avoiding the fleet overview to opening it as their first action every morning.

When ClickHouse is the Wrong Answer

Don't use ClickHouse if:

Your queries are primarily point lookups by primary key
You need multi-table relational joins on the hot path
Your dataset is under ~100M rows (Postgres handles it fine)
You need ACID transactions
Your team has no bandwidth to learn a new query engine

ClickHouse solves one problem extremely well: aggregating over large amounts of time-ordered data. If that's your problem, it's the right tool. If it isn't, PostgreSQL is probably fine and you're adding operational complexity for no gain.

If you want to see the full schema design, materialised view definitions, and benchmark methodology: I wrote it up in detail in my IoT Architecture Playbook — specifically the storage strategy doc and the ClickHouse vs TimescaleDB ADR.