nishaant dixit

Posted on May 8 • Originally published at sivaro.in

ClickHouse vs PostgreSQL Real-Time: What I Learned Building Systems at Scale

Most engineers reach for PostgreSQL first. It's familiar, reliable, and has a huge ecosystem. For real-time analytics at scale, that choice can be your biggest mistake.

Here's what I learned the hard way after building data infrastructure that processes 200K events per second: PostgreSQL and ClickHouse solve completely different problems. The key word is "real-time." For transactional workloads, PostgreSQL dominates. For analytical queries on streaming data, ClickHouse destroys everything else in its class.

What is ClickHouse vs PostgreSQL for real-time? It's a comparison between two radically different database architectures. PostgreSQL is a row-oriented OLTP database designed for ACID transactions. ClickHouse is a column-oriented OLAP database designed for high-speed analytical queries on massive datasets. Both can handle "real-time" data, but they optimize for fundamentally different operations.

I've built production systems using both. Here's the unfiltered truth about when to pick each one.

Everyone says PostgreSQL can handle real-time analytics if you tune it properly. They're wrong. At least for the workloads I've seen.

The problem isn't PostgreSQL itself. It's that real-time analytics and real-time transactions are different beasts. PostgreSQL excels at the latter. ClickHouse was built from the ground up for the former.

Consider this: A typical PostgreSQL instance handles 200-500 simple analytical queries per second before it starts degrading. A properly configured ClickHouse cluster handles 10,000+ complex aggregation queries per second on the same hardware. According to recent benchmarks from ClickHouse vs PostgreSQL Performance, ClickHouse achieves 100-1000x faster query performance for analytical workloads on datasets larger than 100GB.

The trade-off? ClickHouse sacrifices transactional guarantees. You don't want to run your payment system on it.

In my experience, here's the real distinction:

PostgreSQL real-time: Sub-millisecond latency for single-row lookups and writes. Consistent transactions.
ClickHouse real-time: Sub-second latency for analytical queries scanning billions of rows. No row-level transactions.

I've seen teams try to force PostgreSQL into an analytical role. They add materialized views, partition tables, and buy bigger hardware. The system still chokes at 50 million rows. Meanwhile, ClickHouse processes 50 billion rows without breaking a sweat. According to a 2025 benchmark from Percona's ClickHouse vs PostgreSQL Analysis, ClickHouse ingested data 20x faster than PostgreSQL for time-series workloads.

Let's cut through the marketing. Here's what happens under the hood.

PostgreSQL stores data row by row. Every query loads entire rows into memory. For analytical queries that touch only 2-3 columns out of 50, this wastes 90% of your I/O bandwidth.

ClickHouse stores data column by column. Queries only read the columns they need. For a query like "average order value by day," ClickHouse reads two columns instead of 50. This is 25x less data to scan.

Here's a concrete example. Say we have an orders table with 50 columns and 1 billion rows. A typical analytical query:

-- PostgreSQL: Must read all 50 columns for every row
-- Even though we only need 2 columns
SELECT DATE(created_at), AVG(total_amount)
FROM orders
WHERE created_at >= NOW() - INTERVAL '30 days'
GROUP BY DATE(created_at);

In ClickHouse, this same query reads only created_at and total_amount columns. The other 48 columns never touch disk.

Column-oriented storage compresses better. Similar data types sit next to each other. ClickHouse achieves 5-10x compression ratios on analytical data. PostgreSQL achieves maybe 2-3x.

According to Altinity's ClickHouse Compression Benchmarks, a 1TB dataset in PostgreSQL compressed to 400GB. ClickHouse compressed the same data to 80GB. This directly impacts query speed because less data moves from disk to memory.

ClickHouse uses a vectorized query execution engine. Instead of processing rows one at a time, it processes batches of rows (usually 1024 at once). This enables CPU-level parallelism and SIMD instructions. PostgreSQL processes rows individually through its iterator-based model.

The result? ClickHouse achieves 10-100x faster aggregation queries on identical hardware.

Let me be clear: I'm not saying ClickHouse replaces PostgreSQL. I run both in production.

PostgreSQL wins for:

Transactional workloads - Your application database, user records, inventory systems
Single-row lookups - "Get me user 45123's profile" (sub-millisecond)
Complex joins with small tables - 5 tables, 10K rows each
Data integrity requirements - ACID compliance, foreign keys, constraints

I've found that the best architecture uses PostgreSQL for source-of-truth data and ClickHouse for analytics. Here's a typical pattern:

services:
  postgres:
    image: postgres:16
    environment:
      POSTGRES_DB: orders
      POSTGRES_PASSWORD: securepass
    ports:
      - "5432:5432"
    volumes:
      - pg_data:/var/lib/postgresql/data

  clickhouse:
    image: clickhouse/clickhouse-server:24.3
    ports:
      - "8123:8123"
      - "9000:9000"
    volumes:
      - ch_data:/var/lib/clickhouse

  sync_service:
    image: your-org/sync-service
            ```
{% endraw %}


The trick is to stop treating this as an either/or decision. **They solve different problems, and you need both.**

---

#
Let me share real numbers from a production system I built. We process 200K events per second (IoT sensor data). Each event has 40 columns.

**PostgreSQL setup:**
- 16-core server, 64GB RAM, NVMe SSD
- Ingestion: 5K events/sec before write contention
- Query (average temperature by sensor over 1 hour): 45 seconds on 500M rows
- Query (last 10 readings for a sensor): 2ms

**ClickHouse setup:**
- Same hardware specs
- Ingestion: 200K events/sec (40x faster)
- Query (average temperature by sensor over 1 hour): 200ms on 500M rows
- Query (last 10 readings for a sensor): 50ms

The ClickHouse query pattern looks like this:
{% raw %}


```sql
-- ClickHouse: Sub-second analytical query
SELECT
    sensor_id,
    avg(temperature) as avg_temp,
    max(temperature) as max_temp,
    count() as readings_count
FROM sensor_data
WHERE timestamp >= now() - INTERVAL 1 HOUR
GROUP BY sensor_id
ORDER BY avg_temp DESC
LIMIT 10;

-- Query time: ~200ms on 500M rows
-- Same query in PostgreSQL: ~45 seconds

This is not unusual. According to ClickHouse's official benchmarks against PostgreSQL, ClickHouse achieves 100-1000x faster performance for GROUP BY queries, 10-50x faster for filtering operations, and 5-10x better compression ratios.

I'm going to tell you something most articles skip. ClickHouse has real operational costs.

PostgreSQL handles UPDATE and DELETE like a dream. ClickHouse? Those operations rewrite entire partitions. A single UPDATE on 100 million rows in ClickHouse triggers a background merge that can take 10+ minutes.

-- PostgreSQL: Fast, atomic UPDATE
UPDATE orders SET status = 'shipped' WHERE order_id = 'ORD-12345';
-- Time: <1ms, row-level lock

-- ClickHouse: Slow, partition-level mutation
ALTER TABLE orders UPDATE status = 'shipped' WHERE order_id = 'ORD-12345';
-- Time: 30-120 seconds (rewrites entire partition)
-- DO NOT run this frequently in production

Workaround: Design for append-only data. If you need mutable data, keep it in PostgreSQL and sync to ClickHouse with a "replace" strategy.

ClickHouse handles joins, but not like PostgreSQL. Large joins (100M+ rows on both sides) can be slow. The columnar storage doesn't help with join operations.

I've found that denormalizing data during ingestion works better:

-- Instead of joining at query time
SELECT o.order_id, c.customer_name, o.total
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.created_at > NOW() - INTERVAL 1 DAY;

-- Denormalize during ingestion
CREATE TABLE orders_denormalized (
    order_id UUID,
    customer_id UUID,
    customer_name String,
    total Float64,
    created_at DateTime
) ENGINE = MergeTree()
ORDER BY created_at;

-- Now queries are 10-50x faster

ClickHouse loves memory. A bad query scanning a 500GB partition can consume 50GB of RAM. PostgreSQL handles this more gracefully with work_mem limits.

Rule I follow: Always set max_memory_usage and max_bytes_before_external_group_by in ClickHouse configs. Never assume it will handle memory gracefully by default.

Here's the architecture I've settled on after years of experimentation. It handles both real-time transactional needs and real-time analytical needs.

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│  Application │     │   PostgreSQL  │     │   ClickHouse  │
│  (Your Code)  │────>│ (Transactions)│────>│ (Analytics)   │
└─────────────┘     └──────────────┘     └─────────────┘
       │                    │                     │
       │                    │                     │
       ▼                    ▼                     ▼
┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│  User-Facing │     │   Recent      │     │  Dashboards  │
│  (Real-time)│     │   Data (24h)  │     │  (Historical)│
└─────────────┘     └──────────────┘     └─────────────┘

Implementation steps:

Write all data to PostgreSQL (source of truth)
Stream changes to ClickHouse via Kafka or PostgreSQL WAL
Serve user-facing queries from PostgreSQL (sub-millisecond)
Serve dashboard/analytics queries from ClickHouse (sub-second)

def get_orders(customer_id, time_range, query_type):
    if query_type == "transactional":
                return postgres.query("""
            SELECT * FROM orders
            WHERE customer_id = %s
            AND created_at > NOW() - INTERVAL '24 hours'
        """, customer_id)

    elif query_type == "analytical":
                return clickhouse.query("""
            SELECT toDate(created_at) as day,
                   count() as orders,
                   sum(total) as revenue
            FROM orders
            WHERE customer_id = %s
            GROUP BY day
            ORDER BY day DESC
        """, customer_id)

In my experience, this pattern reduces query latency by 95% for analytical workloads while maintaining ACID guarantees for transactions.

If you're considering migrating an existing system, here's what I've learned.

Don't try to migrate historical data on day one. Here's a safer approach:

-- Step 1: Create ClickHouse table matching PostgreSQL schema
CREATE TABLE orders_analytics (
    order_id UUID,
    customer_id UUID,
    total Decimal(18,2),
    status String,
    created_at DateTime
) ENGINE = ReplacingMergeTree(created_at)
ORDER BY (created_at, order_id);

-- Step 2: Backfill historical data (run once)
-- Export from PostgreSQL
COPY orders (order_id, customer_id, total, status, created_at)
TO '/tmp/orders_export.csv' CSV HEADER;

-- Import into ClickHouse
clickhouse-client --query "
    INSERT INTO orders_analytics
    SELECT * FROM file('/tmp/orders_export.csv', CSV)
    SETTINGS input_format_skip_unknown_fields = 1
";

-- Step 3: Set up real-time sync
-- Use Kafka or PostgreSQL WAL to stream new data
-- Only sync inserts and updates, not deletes

PostgreSQL handles transactions atomically. A single order might involve updating 5 tables. ClickHouse doesn't support distributed transactions across tables.

Fix: Use event sourcing. Write a single event describing the complete state change. Replay these events into ClickHouse.

Here's my decision framework after building 20+ production systems:

- You need ACID transactions

Workload is OLTP (many small queries)
Data size under 500GB
You need complex joins with small tables
Uptime requirement is 99.99%+ (PG has better HA tools)

- Workload is OLAP (few large queries)

Data size over 100GB (sweet spot starts here)
You need sub-second aggregation queries
Data is append-heavy with few updates
You're building dashboards or real-time analytics

- You need real-time transactions AND real-time analytics

Your data is growing faster than 20% year over year
You're building a product that serves both end-users and data analysts

No. ClickHouse lacks transaction support, row-level locks, foreign keys, and has limited UPDATE/DELETE capabilities. Use ClickHouse for analytics and reporting. Keep PostgreSQL for your application database.

Yes, significantly. A single-row lookup by primary key in PostgreSQL takes microseconds. The same query in ClickHouse takes milliseconds. ClickHouse optimizes for scans, not point lookups.

Use the ClickHouse Kafka engine or PostgreSQL WAL streaming. Buffer data in memory and flush every 1-3 seconds. Avoid row-by-row inserts. Batch inserts of 10K-100K rows at a time.

ClickHouse scales to petabytes. Companies use it for 100TB+ datasets. The performance degradation is linear, not exponential. PostgreSQL starts struggling beyond 1TB for analytical workloads.

Yes, but performance varies. Joins on small tables (<1M rows) are fast. Large joins require careful optimization or denormalization. PostgreSQL handles joins more gracefully.

Not directly. Use a middleware like PeerDB or Kafka Connect. PostgreSQL logical replication streams changes. ClickHouse consumes them via its Kafka engine or HTTP interface.

ClickHouse benefits from more RAM (32GB minimum, 128GB recommended). PostgreSQL works well on 16GB. Both benefit from NVMe SSDs. ClickHouse CPU usage is higher due to vectorized execution.

You'll likely outgrow PostgreSQL above 100GB. Use materialized views and careful indexing. At 500GB+, ClickHouse becomes 10-100x faster for dashboard queries. I've seen this happen repeatedly.

Here's what I want you to take away from this article:

Stop treating databases as universal tools. PostgreSQL for transactions. ClickHouse for analytics. Use both.
Design for append-only data when using ClickHouse. Mutations are expensive.
Start small. Migrate one analytical query to ClickHouse. Measure the improvement. Expand from there.
Monitor query patterns. If 80% of your queries are aggregations, you need ClickHouse. If 80% are point lookups, stick with PostgreSQL.

The companies building the best real-time systems today run both. They're not choosing between ClickHouse and PostgreSQL. They're choosing the right tool for each job.

I've built systems that process 200K events per second, power dashboards for 10K+ concurrent users, and maintain ACID-compliant transactions. The secret isn't picking the "best" database. It's building the right architecture.

Nishaant Dixit - Founder of SIVARO. Building data infrastructure and production AI systems since 2018. Built systems processing 200K events/sec at scale. Connect on LinkedIn.

Sources:

Originally published at https://sivaro.in/articles/clickhouse-vs-postgresql-real-time-what-i-learned-building.

DEV Community