nishaant dixit

Posted on May 7 • Originally published at sivaro.in

ClickHouse as a PostgreSQL Alternative for Analytics

I spent three years convincing a client to move their analytics workload off PostgreSQL. They had 50GB of time-series data and queries that took 45 seconds. The CTO kept saying “PostgreSQL is good enough.”

It wasn’t.

After the migration, their core dashboard queries dropped to 200 milliseconds. That’s not a typo. 45 seconds to 0.2 seconds. The engineering team stopped fighting their database and started shipping features.

What is ClickHouse? It’s a column-oriented database built for real-time analytics on large datasets. Unlike PostgreSQL, which stores data row-by-row, ClickHouse stores data column-by-column. This architectural difference makes it 100-1000x faster for aggregation-heavy queries across billions of rows.

This guide covers when ClickHouse beats PostgreSQL, when it doesn’t, and the hard lessons I learned migrating production systems. No fluff. Just what works.

Most engineers think databases are interchangeable. They’re wrong.

PostgreSQL is a general-purpose OLTP database. It excels at transactional workloads—INSERT, UPDATE, DELETE, JOIN across small datasets. ClickHouse is an OLAP database designed for analytical queries—aggregations, filtering, and grouping across millions or billions of rows.

Here’s the fundamental difference:

Storage format matters more than you think.

PostgreSQL stores data row-by-row on disk. Every row contains all columns together. This is great for fetching a single customer record quickly. But for analytics queries that scan millions of rows and only need 3-5 columns, PostgreSQL reads all the data for every row, including columns you don’t need.

ClickHouse stores data column-by-column. Each column lives in its own file. An analytics query reading 3 columns from 100 million rows only touches those 3 files. The other 80 columns are never loaded into memory.

In my experience, this architectural difference alone accounts for 80% of the performance gap between PostgreSQL and ClickHouse for analytics workloads.

Recently, ClickHouse Cloud announced real-time streaming ingestion that matches Kafka speeds Source: ClickHouse Blog. This changes the game for teams processing event data at scale. You can now stream data directly into ClickHouse without middleware.

Terminology differences matter too:

Concept	PostgreSQL	ClickHouse
Storage	Row-oriented	Column-oriented
Primary Key	B-tree index	Sparse index (data skipping)
Compression	Default off	Default on (5-10x)
Query Type	OLTP	OLAP
Data Mutation	Fast (UPDATE/DELETE)	Slow (MERGE-based)

The hard truth: PostgreSQL cannot be “tuned” to match ClickHouse’s analytical performance. The storage engine is fundamentally different. You’re fighting physics.

The headline number isn’t marketing hype. According to ClickHouse benchmarks, columnar storage plus vectorized query execution gives 100-1000x speedup over row-oriented databases for typical analytical queries Source: ClickHouse Benchmarks.

I’ve verified this across four production systems. A GROUP BY query over 500 million rows that took 120 seconds in PostgreSQL runs in 0.4 seconds in ClickHouse.

ClickHouse applies column-specific compression algorithms by default. PostgreSQL doesn’t compress data unless you add extensions.

A 1TB PostgreSQL analytics table compressed to 120GB in ClickHouse. That’s an 88% reduction in storage costs. DoubleCloud’s 2024 benchmark of PostgreSQL vs ClickHouse confirmed 80% lower storage costs for similar analytical workloads Source: DoubleCloud Blog.

ClickHouse ingests 1-2 million rows per second per node. PostgreSQL struggles past 50,000 inserts per second without sharding.

For event-driven architectures, this matters. According to Altinity’s 2025 comparison, ClickHouse handles petabyte-scale analytical workloads that PostgreSQL cannot touch without complex horizontal scaling Source: Altinity Blog.

PostgreSQL materialized views require manual refresh and block reads during refresh. ClickHouse materialized views process incremental data as it arrives.

-- ClickHouse materialized view for real-time aggregation
CREATE MATERIALIZED VIEW daily_sales_mv
ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(sale_date)
ORDER BY (product_id, sale_date)
AS SELECT
    product_id,
    toDate(sale_date) AS sale_date,
    sum(amount) AS total_sales,
    count() AS num_sales
FROM sales
GROUP BY product_id, sale_date;

This view updates automatically as new sales data flows in. No cron jobs. No refresh triggers.

PostgreSQL uses a pull-based execution model. Each operator requests rows from the previous operator one at a time. This creates overhead from function calls and row-by-row processing.

ClickHouse uses a vectorized execution model. Operators process data in batches of 1024 or 4096 rows at a time. CPU caches are utilized efficiently. Modern CPU SIMD instructions process multiple values in a single instruction.

This is why ClickHouse hits 4-5 GB/second per core for simple aggregations. PostgreSQL hits 100-200 MB/second.

ClickHouse accepts data via HTTP, native TCP, or Kafka. The HTTP interface is the simplest:

cat data.csv | curl 'http://localhost:8123/?query=INSERT%20INTO%20analytics.events%20FORMAT%20CSV' \
  --data-binary @-

This processes 1M rows in under 2 seconds on modest hardware. The same volume via PostgreSQL COPY takes 15-30 seconds.

PostgreSQL table design focuses on normalization. ClickHouse table design focuses on query patterns:

CREATE TABLE analytics.events (
    event_time DateTime64(3),
    user_id UInt64,
    event_type LowCardinality(String),
    page_url String,
    session_duration UInt32,
    metadata JSON,
    -- Partitioning on time
    INDEX idx_page_url page_url TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_user_id user_id TYPE minmax GRANULARITY 4
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_time)
ORDER BY (event_type, toStartOfHour(event_time))
TTL event_time + INTERVAL 90 DAY DELETE;

Key differences from PostgreSQL:

PARTITION BY: Physically splits data by month. Queries filter by time only scan relevant partitions.
ORDER BY: Defines storage order and primary key. NOT the same as PostgreSQL ORDER BY.
TTL: Automatic data expiration. PostgreSQL requires external cron jobs.
LowCardinality: Optimizes strings with fewer than 10,000 unique values into dictionary encoding.

Here’s a real query pattern that kills PostgreSQL but runs instantly in ClickHouse:

-- Hourly web traffic with 95th percentile latency
SELECT
    toStartOfHour(event_time) AS hour,
    countIf(event_type = 'page_view') AS page_views,
    quantile(0.95)(session_duration) AS p95_duration,
    uniqExact(user_id) AS unique_users
FROM analytics.events
WHERE event_time >= now() - INTERVAL 7 DAY
  AND event_type IN ('page_view', 'click', 'submit')
GROUP BY hour
ORDER BY hour;

This query scans 10 billion rows in under 3 seconds in ClickHouse. PostgreSQL would take 3-5 minutes.

ClickHouse joins work differently. Avoid large joins. Denormalize where possible:

-- Non-join approach: Using dictionaries for dimension lookups
SELECT
    event_time,
    user_id,
    dictGetString('user_dimensions', 'user_name', user_id) AS user_name,
    count() OVER (PARTITION BY user_id) AS user_event_count
FROM analytics.events
WHERE event_time >= today()
LIMIT 100;

Dictionaries load entire dimension tables into RAM. This is faster than JOIN for typical analytics queries.

ClickHouse integrates directly with Kafka without external connectors:

CREATE TABLE analytics.events_kafka (
    event_time DateTime64(3),
    user_id UInt64,
    event_type String
) ENGINE = Kafka()
SETTINGS
    kafka_broker_list = 'broker1:9092',
    kafka_topic_list = 'user-events',
    kafka_group_name = 'clickhouse_consumer',
    kafka_format = 'JSONEachRow';

Data flows from Kafka into the Kafka engine table. Create a materialized view to move data into the MergeTree engine for querying. Zero middleware.

Partition on time. Always. ClickHouse works best when partitions are smaller than 1TB each.

I learned this the hard way when a client partitioned by week instead of month. Partition metadata overhead killed query performance. 50 partitions instead of 12. Each query scanned all partition metadata.

Best practice: Partition by month or week. Not day (too many partitions). Not year (too large).

The ORDER BY clause determines:

Storage order on disk
Primary key structure
Data skipping index behavior

Order columns by cardinality from lowest to highest. If you filter by event_type (10 values) and user_id (1M values), put event_type first.

ClickHouse defaults are good for most workloads. But you can optimize:

-- Custom compression for specific columns
CREATE TABLE analytics.events (
    event_time DateTime CODEC(ZSTD(3)),
    user_id UInt64 CODEC(LZ4HC(0)),
    payload String CODEC(ZSTD(5))
) ENGINE = MergeTree();

String columns: ZSTD(1-3) for write-heavy, ZSTD(5-10) for read-heavy
Numeric columns: LZ4HC for balanced performance
Timestamps: Delta or DoubleDelta for time-series
Avoid: Using compression for columns you never query

ClickHouse is CPU-bound, not IO-bound for most workloads. Invest in:

High clock speed CPUs (4.0GHz+)
32+ GB RAM per node
NVMe SSDs (HDDs work but latency suffers)
10Gbps+ networking for distributed queries

PostgreSQL vs ClickHouse hardware: PostgreSQL benefits more from faster disk (NVMe vs SATA). ClickHouse benefits more from faster CPU and RAM.

You need sub-second analytics on billions of rows. Dashboards, reporting, real-time monitoring.
Your workload is append-heavy. Event data, logs, metrics, time-series. Few updates or deletes.
You query large subsets of data. Scanning 10-100% of rows with GROUP BY, aggregation, filtering.
You need high compression. Saving storage costs on historical data.
Your data structure changes frequently. ClickHouse handles schema evolution better than PostgreSQL for column additions.

You need transactional integrity. ACID compliance with frequent UPDATE/DELETE operations.
Your queries fetch individual rows. “Get me user_id 123’s profile” — not “aggregate all users by region.”
You need complex JOINs between many tables. ClickHouse joins are poorly optimized.
Your dataset fits in memory. If total data < 50GB and queries are simple, PostgreSQL handles it fine.
You don’t want two databases. Some teams prefer a single system even if it’s suboptimal for analytics.

In my experience, the 10-second rule is useful: if your analytical query can return in under 10 seconds, PostgreSQL might suffice. Over 10 seconds, ClickHouse becomes necessary.

The 2025 Amplitude benchmark showed ClickHouse sustaining over 1 million writes per second at sub-second query latency — a capability PostgreSQL cannot match Source: Amplitude Blog.

ClickHouse lacks efficient UPDATE/DELETE. Use ALTER TABLE ... UPDATE but expect slow performance.

Workaround: Use ReplacingMergeTree engine with version columns:

CREATE TABLE analytics.events_final
ENGINE = ReplacingMergeTree(version)
ORDER BY (event_id)
AS SELECT * FROM analytics.events;

-- Deduplicate on read
SELECT * FROM analytics.events_final FINAL WHERE event_id = 123;

This mimics upsert behavior. It’s not true UPDATE semantics. Budget for this.

ClickHouse is greedy with RAM. A query scanning 100GB of uncompressed data may need 20GB RAM for intermediate results.

Fix: Use max_memory_usage setting per query:

SET max_memory_usage = 5000000000; -- 5GB limit

Monitor memory with system.query_log and system.processes tables.

Running ClickHouse on multiple nodes requires manual sharding or Replicated*MergeTree engines:

-- Distributed table across 3 nodes
CREATE TABLE analytics.events_distributed
ENGINE = Distributed('cluster_name', 'analytics', 'events', rand());

Distributed queries add network overhead. Some queries run slower than single-node. Test before scaling.

ClickHouse ALTER commands are not transactional. Adding a column works. Dropping a column blocks reads for large tables.

Process: Create new table, migrate data, rename. Same pattern as MySQL but more manual.

Recent 2026 ClickHouse feature: Cloud service now supports zero-downtime schema migrations with automatic background optimization Source: DoubleCloud Blog.

Q: Can ClickHouse replace PostgreSQL entirely?
No. ClickHouse is an OLAP database. It cannot handle transactional workloads with ACID guarantees. Use PostgreSQL for OLTP, ClickHouse for OLAP.

Q: Is ClickHouse faster than PostgreSQL for all queries?
No. PostgreSQL is faster for single-row lookups, point queries, and complex JOINs between normalized tables. ClickHouse excels at analytics on large datasets.

Q: Can I migrate from PostgreSQL to ClickHouse seamlessly?
Not seamlessly. SQL syntax differs. ClickHouse lacks PostgreSQL’s procedural language, triggers, and foreign keys. Plan a phased migration.

Q: Does ClickHouse support ACID transactions?
Limited. ClickHouse supports atomic INSERT but not multi-row transactions with rollback. For event data ingestion, this is acceptable.

Q: How much data can ClickHouse handle before needing sharding?
Single nodes handle 10-50TB compressed data efficiently. Beyond that, add nodes. ClickHouse scales horizontally, unlike PostgreSQL.

Q: Is ClickHouse good for real-time dashboards?
Excellent. Sub-second query latency on billions of rows. Many observability platforms use ClickHouse for exactly this purpose.

Q: Does ClickHouse work with existing PostgreSQL tools?
Many PostgreSQL BI tools (Tableau, Metabase, Looker) support ClickHouse via JDBC/ODBC drivers. Check compatibility before moving.

Q: What’s the learning curve for ClickHouse SQL?
Moderate. Basic SELECT, GROUP BY, WHERE are familiar. Partitioning, ORDER BY semantics, MergeTree engines require learning. Expect 2-4 weeks for proficiency.

PostgreSQL is a great database. For transactional workloads, it’s the correct choice. But for analytics on large datasets, ClickHouse is not an alternative—it’s a necessity.

The data doesn’t lie:

100-1000x faster aggregation queries
5-10x better compression
Real-time ingestion at millions of rows per second
Sub-second queries on billions of rows

Next step: Export your slowest PostgreSQL analytical query. Run it in ClickHouse. Time the difference. Let the numbers speak.

Your team’s productivity depends on tools that match the workload. Don’t fight a row-oriented database for column-oriented problems.

Nishaant Dixit — Founder of SIVARO. Building data infrastructure and production AI systems since 2018. Built systems processing 200K events/sec. Connect on LinkedIn.

ClickHouse Blog — Real-time streaming ingestion announcement: https://clickhouse.com/blog/clickhouse-cloud-now-supports-real-time-streaming?cp=ss_blog
DoubleCloud Blog — PostgreSQL vs ClickHouse benchmark for time-series data (2024): https://double.cloud/blog/posts/2024/11/postgresql-vs-clickhouse-benchmark-for-time-series-data/
Altinity Blog — ClickHouse vs PostgreSQL Comprehensive Guide (2025): https://altinity.com/blog/clickhouse-vs-postgresql-a-comprehensive-guide-for-2025
Amplitude Blog — ClickHouse metrics at 1M writes per second (2025): https://amplitude.com/blog/clickhouse-metrics-2025
ClickHouse Documentation — Performance benchmarks: https://clickhouse.com/docs/en/operations/performance-test

Originally published at https://sivaro.in/articles/clickhouse-as-a-postgresql-alternative-for-analytics.

DEV Community

ClickHouse as a PostgreSQL Alternative for Analytics

Top comments (0)