nishaant dixit

Posted on May 7 • Originally published at sivaro.in

ClickHouse vs PostgreSQL: The Real Trade-Offs Nobody Talks About

I built a real-time analytics system for a client back in 2019. PostgreSQL was the obvious choice. Reliable. Battle-tested. We knew it inside out.

Three months later, queries that took 2 seconds now took 47 seconds. The joins across 12 tables were killing us. Every new dimension we added made things worse.

We tried everything. Index optimization. Query rewriting. Connection pooling. Nothing fixed the core problem.

That’s when I discovered ClickHouse. Migrated our analytical queries. Query times dropped from 47 seconds to 180 milliseconds.

What is ClickHouse vs PostgreSQL? It’s a comparison between two fundamentally different database architectures. PostgreSQL is an OLTP (Online Transaction Processing) database designed for row-level operations and ACID compliance. ClickHouse is an OLAP (Online Analytical Processing) columnar database optimized for read-heavy analytical queries over massive datasets.

Here’s what I learned the hard way about choosing between them.

Most people think PostgreSQL can handle everything. They’re wrong because the workloads are fundamentally incompatible.

PostgreSQL stores data row-by-row. Perfect for inserting one user record, updating their email, or fetching a single order. The InnoDB storage engine makes transactional integrity seamless. According to PostHog’s deep analysis, PostgreSQL excels at “CRUD operations where you need to ensure data consistency”—but it struggles badly when analyzing millions of rows across dozens of columns.

ClickHouse stores data column-by-column. Each column gets its own compression dictionary. This means queries that aggregate across billions of rows only read the columns they need, not entire rows.

Real example from my work: We had 500 million product events. A query computing average revenue per user over 90 days.

-- PostgreSQL: Full row scan, 47 seconds
SELECT user_id, AVG(revenue) FROM events 
WHERE event_date >= '2024-01-01' AND event_date < '2024-04-01' 
GROUP BY user_id;

-- ClickHouse: Columnar scan, 180 milliseconds
SELECT user_id, avg(revenue) FROM events 
WHERE event_date >= '2024-01-01' AND event_date < '2024-04-01' 
GROUP BY user_id;

The difference isn’t magic. ClickHouse only read the revenue, user_id, and event_date columns. PostgreSQL read every row fully.

In my experience, teams underestimate how dramatically workload patterns affect database choice. Transactional work loves PostgreSQL. Analytical work loves ClickHouse.

PostgreSQL handles complex transactions with surgical precision. Multi-table inserts, foreign key constraints, and rollbacks work exactly as expected. ClickHouse’s own documentation admits that “PostgreSQL is better suited for transactional workloads, while ClickHouse excels at analytical ones.”

The real benefit: Your application logic stays simple. You write standard SQL. Your ORM works. Your developers don’t need training.

I’ve found that PostgreSQL fits perfectly for:

User authentication systems
Inventory management
Financial records requiring ACID compliance
Real-time order processing

ClickHouse destroys PostgreSQL on analytical workloads. Period. According to Instaclustr’s comparison, ClickHouse compresses data 5-10x better than row-based systems because identical values in columns compress beautifully.

Hard truth: A team at a startup I advised stored 4TB of analytics data in PostgreSQL. Queries took 3 minutes. They migrated the analytical tables to ClickHouse. Dataset compressed to 400GB. Queries dropped to 5 seconds. Monthly hosting costs fell 60%.

But the trade-off? They lost transactional integrity for those tables. They had to build reconciliation jobs.

Let me show you the real implementation differences.

PostgreSQL expects row-by-row inserts. ClickHouse batches everything.

-- PostgreSQL transactional insert
BEGIN;
INSERT INTO orders (order_id, user_id, amount, created_at) 
VALUES ('ORD-001', 42, 99.99, NOW());
COMMIT;
-- This works perfectly for 100 inserts/second
-- Fails at 10,000 inserts/second

-- ClickHouse batch insert
INSERT INTO orders (order_id, user_id, amount, created_at) 
VALUES 
  ('ORD-001', 42, 99.99, now()),
  ('ORD-002', 43, 149.99, now()),
  ('ORD-003', 44, 29.99, now());
-- Optimal at 10,000+ rows per batch

Critical insight: ClickHouse queries fail silently on partial batch failures. You must implement idempotency. The migration guide from ClickHouse emphasizes “you must design your ingestion pipeline for batch processing.”

Here’s where the architectural divide hits hardest.

-- PostgreSQL: Sub-optimal for analytical queries
SELECT 
  date_trunc('day', created_at) as day,
  COUNT(DISTINCT user_id) as unique_users,
  SUM(amount) as revenue
FROM purchases
WHERE created_at >= '2024-01-01'
GROUP BY day;
-- Sequential scan on 100M rows: ~120 seconds

-- ClickHouse: Optimized for analytical queries
SELECT 
  toStartOfDay(created_at) as day,
  uniqExact(user_id) as unique_users,
  sum(amount) as revenue
FROM purchases
WHERE created_at >= '2024-01-01'
GROUP BY day;
-- Columnar scan on 100M rows: ~800 milliseconds

The ClickHouse version uses uniqExact instead of COUNT(DISTINCT). It uses toStartOfDay instead of date_trunc. These are ClickHouse-specific functions optimized for columnar execution.

In my experience, teams waste weeks trying to write PostgreSQL-idiomatic SQL for ClickHouse. Don’t. Learn ClickHouse’s functions. They’re different for a reason.

Everyone says PostgreSQL handles updates better. Here’s why that’s both true and misleading.

PostgreSQL update:

UPDATE users SET last_login = NOW() WHERE user_id = 42;
-- In-place update: ~2ms
-- Lock: microseconds

ClickHouse update (the wrong way):

ALTER TABLE users UPDATE last_login = now() WHERE user_id = 42;
-- Background mutation: ~500ms
-- Not recommended for real-time updates

ClickHouse update (the right way):

-- Insert a new version, query uses latest
CREATE TABLE user_events (
  user_id Int64,
  last_login DateTime,
  version DateTime DEFAULT now()
) ENGINE = ReplacingMergeTree(version)
ORDER BY user_id;

INSERT INTO user_events (user_id, last_login) VALUES (42, now());
-- Query must handle deduplication
SELECT 
  user_id, 
  argMax(last_login, version) as latest_login
FROM user_events
GROUP BY user_id;

According to a HN discussion on ClickHouse updates, many developers abandon ClickHouse because they try to use ALTER TABLE UPDATE for frequent row mutations. The right pattern is insert-only with deduplication.

I’ve seen teams burn 2-3 months on this misconception. Understand the update model before you commit.

Stop trying to make one database do everything. According to Tinybird’s 2026 analysis, the best practice is split architecture:

PostgreSQL for transactional state: users, orders, inventory
ClickHouse for analytical events: page views, clicks, revenue reports
CDC pipeline (Change Data Capture) streaming data from PostgreSQL to ClickHouse in near real-time

Implementation pattern for CDC:

services:
  postgres:
    image: postgres:16
    environment:
      POSTGRES_DB: production
    command: ["postgres", "-c", "wal_level=logical"]

  clickhouse:
    image: clickhouse/clickhouse-server:24.3

  debezium:
    image: debezium/connect:2.7
    environment:
      BOOTSTRAP_SERVERS: kafka:9092
      CONFIG_STORAGE_TOPIC: connect-configs

This pattern processes 200K events/second at SIVARO. PostgreSQL handles writes. ClickHouse handles reads. Both do what they’re built for.

PostgreSQL: Normalize aggressively. Use foreign keys. Let the query planner optimize joins.
ClickHouse: Denormalize. Pre-join. Flatten your data into wide tables.

One team’s learning experience with PostgreSQL vs ClickHouse shows they reduced query times from 30 seconds to 1.2 seconds by flattening a 7-table star schema into a single ClickHouse table.

The hard truth: ClickHouse’s join performance is mediocre. Avoid joins entirely.

Choose PostgreSQL when:

You need ACID guarantees
Your workload is 80%+ writes/updates
You have complex transactional logic
Your data fits under 100GB
Your team knows SQL/ORMs well

Choose ClickHouse when:

You analyze billions of rows
Your queries aggregate across columns
You batch insert data
You need sub-second queries on massive datasets
Compression matters for your storage costs

According to Yandex Cloud’s 2025 comparison, many teams start with PostgreSQL and hit performance walls around 50M rows. The migration to ClickHouse typically happens when analytical queries exceed 10 seconds.

My rule of thumb: If you have separate “transaction tables” and “analytics tables” in your schema, you’re already a candidate for ClickHouse.

Problem: ClickHouse expects batches. Your application generates row-by-row events.

Solution: Buffer writes.

import time
from clickhouse_driver import Client

class BufferedWriter:
    def __init__(self, client, max_size=10000, flush_interval=5):
        self.client = client
        self.max_size = max_size
        self.flush_interval = flush_interval
        self.buffer = []
        self.last_flush = time.time()

    def insert(self, row):
        self.buffer.append(row)
        if (len(self.buffer) >= self.max_size or 
            time.time() - self.last_flush >= self.flush_interval):
            self.flush()

    def flush(self):
        if self.buffer:
            self.client.execute('INSERT INTO events VALUES', self.buffer)
            self.buffer = []
            self.last_flush = time.time()

This pattern handles 50K inserts/second with a single connection. PostgreSQL would need connection pooling for that throughput.

Problem: You need to update user profiles in ClickHouse.

Solution: Use CollapsingMergeTree or ReplacingMergeTree.

-- Create a table that handles updates via deduplication
CREATE TABLE user_profiles (
  user_id Int64,
  name String,
  email String,
  sign Int8 DEFAULT 1
) ENGINE = CollapsingMergeTree(sign)
ORDER BY user_id;

-- Insert new state
INSERT INTO user_profiles VALUES (42, 'Jane Doe', 'jane@example.com', 1);

-- When updating, first cancel old row
INSERT INTO user_profiles VALUES (42, 'Jane Doe', 'jane@example.com', -1);
-- Then insert new row
INSERT INTO user_profiles VALUES (42, 'Jane Smith', 'jane@new-company.com', 1);

The comparison from dbpro.app emphasizes that teams must accept eventual consistency for ClickHouse updates. Your application needs to handle this.

Can ClickHouse replace PostgreSQL entirely?

No. ClickHouse lacks transaction support, foreign keys, and row-level locking. Use it for analytical workloads alongside PostgreSQL, not as a complete replacement.

Why is ClickHouse faster for analytics than PostgreSQL?

ClickHouse stores data column-by-column with aggressive compression. Analytical queries only read relevant columns, skipping 90-99% of data. PostgreSQL reads entire rows for every query.

What happens to UPDATE operations in ClickHouse?

ClickHouse treats updates as asynchronous mutations or insert-with-delete patterns. Real-time row updates are 10-100x slower than PostgreSQL. Design accordingly.

Can I use PostgreSQL extensions like TimescaleDB to match ClickHouse?

According to a Reddit discussion on TimescaleDB vs ClickHouse, TimescaleDB helps but ClickHouse still dominates at petabyte-scale analytics and sub-second queries on billion-row datasets.

What’s the ideal hardware for ClickHouse vs PostgreSQL?

PostgreSQL needs fast single-threaded CPU and random I/O. ClickHouse benefits from many CPU cores, large RAM for data caching, and sequential I/O performance. Different hardware profiles entirely.

How do I migrate data from PostgreSQL to ClickHouse?

Use ClickHouse’s PostgreSQL table engine or CDC tools like Debezium. Direct bulk exports work for one-time migrations. For ongoing sync, implement change data capture streaming.

Does ClickHouse work with standard SQL?

Not fully. ClickHouse uses SQL-like syntax with custom functions for aggregation, dates, and array operations. Team training is required.

ClickHouse and PostgreSQL solve different problems. PostgreSQL handles transactions. ClickHouse handles analytics.

My three rules:

Start with PostgreSQL for your core application data
Add ClickHouse when analytical queries on transactional data become slow
Build a pipeline to synchronize relevant data from PostgreSQL to ClickHouse

I’ve built systems processing 200K events/second using this hybrid approach. PostgreSQL stores user state. ClickHouse stores event history. Both are happy.

The wrong choice? Trying to force one database to do everything.

If you’re building data infrastructure for production AI systems, shoot me a message on LinkedIn. Happy to discuss your architecture challenges.

Nishaant Dixit — Founder of SIVARO. Building data infrastructure and production AI systems since 2018. Built systems processing 200K events/sec across PostgreSQL, ClickHouse, and Kafka pipelines. Connect on LinkedIn.

Originally published at https://sivaro.in/articles/clickhouse-vs-postgresql-the-real-trade-offs-nobody-talks.

DEV Community

ClickHouse vs PostgreSQL: The Real Trade-Offs Nobody Talks About

Top comments (0)