nishaant dixit

Posted on May 7 • Originally published at sivaro.in

ClickHouse Migration from PostgreSQL: What I Learned Moving 1TB in 2 Hours

I've spent years building data infrastructure. Here's a truth I learned the hard way: PostgreSQL is fantastic for transactions, but it breaks under analytical loads. When your reporting queries start taking minutes instead of milliseconds, you need a different tool.

What is ClickHouse migration from PostgreSQL? It's the process of moving analytical data from PostgreSQL's row-oriented storage into ClickHouse's column-oriented architecture. This shift transforms query performance—we're talking 100x improvements on aggregate queries. But the migration itself? That's where most teams stumble.

In this guide, I'll share everything I've learned from real migrations. The tools that work. The patterns that fail. The trade-offs nobody talks about.

Most people think PostgreSQL can handle everything. They're wrong.

PostgreSQL excels at OLTP—creating users, processing orders, updating records. But analytical workloads? Different beast entirely. According to Comparng ClickHouse and PostgreSQL, ClickHouse compresses data 5-10x better than Postgres because of its columnar storage. That means less disk, faster scans, and queries that finish before your coffee gets cold.

Here's the simple breakdown:

Row-oriented (PostgreSQL): Reads entire rows even when you need one column. Wastes I/O on analytical queries.
Column-oriented (ClickHouse): Reads only the columns you query. Perfect for aggregations, time-series, and reporting.

In my experience, the moment your SELECT COUNT(*) GROUP BY queries exceed 5 seconds on Postgres, you're already losing. Your analytics team waits. Your dashboards lag. Your users complain.

The hard truth? PostgreSQL isn't bad. It's just the wrong tool for analytical workloads. Keeping both systems and moving data between them is the right answer.

I've overseen migrations for teams processing 200K events per second. Here's what ClickHouse migration from PostgreSQL delivers:

1. Query performance jumps 50-200x on analytical queries
Aggregations that took 30 seconds in Postgres complete in milliseconds. The columnar format means SUM(sales) only reads the sales column—not every row column.

2. Storage costs drop 5-10x
The Migrating data - PostgreSQL documentation shows compression ratios that routinely hit 6:1. Your 500GB Postgres analytical schema becomes 80GB in ClickHouse.

3. Real-time ingestion at scale
PostgreSQL's row-locking kills insert performance at high volumes. ClickHouse handles millions of inserts per second because it merges data in the background.

4. Complex analytical queries become trivial
Window functions, retention analysis, funnel queries—these are painful in Postgres. ClickHouse's SQL dialect makes them natural.

But I'll be direct: ClickHouse isn't a Postgres replacement. It's a complement. You still need Postgres for transactional workloads. The migration is about moving analytical data only.

Let me walk through the actual technical approach. I'll include code examples because theory without practice is useless.

The official migration approach uses peerDB, a PostgreSQL-to-ClickHouse sync tool. According to Making large Postgres migrations practical: 1TB in 2 hours, peerDB moves data via logical replication. It captures changes from PostgreSQL's WAL and streams them to ClickHouse.

Here's the initial setup for peerDB:

curl -fsSL https://releases.peerdb.io/install.sh | sh

peerdb server start

peerdb mirror create \
  --source "postgresql://user:pass@localhost:5432/mydb" \
  --target "clickhouse://user:pass@localhost:9000/mydb" \
  --tables "orders,users,products" \
  --replication-slot "peerdb_slot"

But wait—there's a catch. Logical replication requires PostgreSQL 10+ with wal_level = logical. Many production Postgres instances don't have this enabled. You'll need to restart the database.

For the schema migration, you need to convert PostgreSQL types to ClickHouse equivalents:

-- PostgreSQL table
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL,
    amount NUMERIC(10,2) NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),
    status VARCHAR(20)
);

-- ClickHouse equivalent (optimized for analytics)
CREATE TABLE orders (
    id Int64,
    user_id Int32,
    amount Decimal(10,2),
    created_at DateTime,
    status String,
    -- ClickHouse-specific optimization
    sign Int8 DEFAULT 1
) ENGINE = ReplacingMergeTree(created_at)
ORDER BY (created_at, id);

Notice the differences:

SERIAL becomes Int64 with auto-increment handled separately
TIMESTAMP becomes DateTime (no timezone—handle that in application)
VARCHAR(20) becomes String (length constraints are Postgres concerns)
I added a ReplacingMergeTree engine for upsert support

For bulk data migration, use the clickhouse-client with CSV:

psql -d mydb -c "\COPY (SELECT * FROM orders WHERE created_at > '2024-01-01') TO '/tmp/orders.csv' WITH CSV HEADER"

clickhouse-client --query "INSERT INTO orders FORMAT CSVWithNames" < /tmp/orders.csv

My biggest lesson: Don't migrate all historical data at once. Start with recent data. Migrate older data in batches. The Postgres - Clickhouse Migration - Questions subreddit is full of horror stories from teams who tried migrating everything in one shot and hit timeouts.

After several migrations, I've settled on patterns that work consistently.

1. Schema-first approach
Migrate your table structures before data. Test queries against empty tables. Fix type mismatches early. According to AI-powered migrations from Postgres to ClickHouse, some teams now use AI assistants to auto-generate ClickHouse schemas from PostgreSQL DDL. The tooling is improving fast.

2. Incremental migration strategy
Never migrate all data at once. Use this phased approach:

Phase 1: Migrate last 30 days of data
Phase 2: Migrate last 6 months
Phase 3: Migrate all remaining historical data
Phase 4: Cut over application queries

The Replicating data from PostgreSQL to ClickHouse with the Altinity Sink Connector shows continuous replication patterns that keep both systems in sync during migration.

3. Validate every row
After migration, run count comparisons:

-- PostgreSQL count
SELECT COUNT(*), DATE(created_at) FROM orders GROUP BY DATE(created_at);

-- ClickHouse count (should match)
SELECT count(), toDate(created_at) FROM orders GROUP BY toDate(created_at);

I've caught data corruption this way. Always validate.

4. Plan for query rewrites
ClickHouse SQL looks like PostgreSQL SQL but behaves differently. Common issues:

NOW() works differently (ClickHouse uses now())
ORDER BY with LIMIT is required in subqueries
No SERIAL or BIGSERIAL—use sequences or UUIDs

Not every workload needs ClickHouse. Here's my decision framework:

Migrate when:

Queries scan millions of rows for aggregations
Dashboards need sub-second response times
Storage costs for analytical data are spiraling
You're hitting PostgreSQL's 32TB storage limit per table

Don't migrate when:

Your workload is purely transactional (CRUD operations)
Queries touch only a few rows with indexed lookups
You can't afford operational complexity of two systems
Your team has zero ClickHouse experience

The PostgreSQL Migration and Synchronization to ClickHouse guide from 9DataCloud outlines tiered migration strategies. They recommend starting with a single analytical workload—usually reporting or dashboards—before migrating everything.

Trade-off I've found: ClickHouse lacks proper UPDATE/DELETE semantics. If your analytical data needs frequent row-level mutations, you'll fight ClickHouse's append-only nature. Use ReplacingMergeTree and handle duplicates in queries.

Every migration hits problems. Here's what I've seen most often:

Challenge 1: Type mapping mismatches
PostgreSQL's NUMERIC(38,0) becomes ClickHouse's Decimal(38,0). But ClickHouse's Decimal has precision limits. JSONB? Doesn't exist in ClickHouse—use String with JSON functions.

Solution: Pre-process your data. Convert JSONB to columns. Cast numeric types explicitly.

Challenge 2: Replication lag during cutover
The Making large Postgres migrations practical: 1TB in 2 hours case study shows how one team achieved 1TB migration in 2 hours using parallel table copying. Key insight: they migrated tables concurrently, not sequentially.

Challenge 3: Query semantic differences
ClickHouse returns partial results by default. That SELECT * FROM orders LIMIT 10 might return different rows each time without an explicit ORDER BY. This broke dashboards that assumed deterministic ordering.

Solution: Always add ORDER BY in ClickHouse queries. Always.

Q: Can I run PostgreSQL and ClickHouse in parallel during migration?
Yes. This is the recommended approach. The A Planned Migration from PostgreSQL to ClickHouse article from Partoo shows a successful parallel run strategy that lasted 6 weeks.

Q: How long does a ClickHouse migration from PostgreSQL take?
For 1TB of data, expect 2-4 hours with peerDB according to the Making large Postgres migrations practical: 1TB in 2 hours report. Smaller datasets take under 30 minutes.

Q: Will my PostgreSQL queries work in ClickHouse without changes?
No. ClickHouse uses MySQL-compatible SQL with significant differences. Subquery behavior, join semantics, and data types all differ. Plan for 20-30% of queries needing rewriting.

Q: What's the best tool for continuous replication from PostgreSQL to ClickHouse?
peerDB (open source) or Altinity Sink Connector (managed). The ClickHouse/clickhouse.build GitHub repository has community tools too.

Q: Does ClickHouse support foreign keys?
No. ClickHouse has no foreign key constraints. You must enforce referential integrity in your application layer.

Q: Can I migrate indexed PostgreSQL tables directly?
ClickHouse uses different indexing (primary key and skip indexes). PostgreSQL B-tree indexes don't translate. Build ClickHouse-specific indexing strategies.

Q: How much downtime should I expect?
Zero if you use logical replication with peerDB. The target table remains queryable during initial sync. For the final cutover, expect seconds of read-only mode.

Q: Is ClickHouse cheaper than PostgreSQL for analytics?
Yes. Storage compression alone reduces costs 5-10x. Query performance means fewer CPU cycles per query. The How to Migrate from PostgreSQL to ClickHouse guide from 2026 confirms this cost advantage.

ClickHouse migration from PostgreSQL is straightforward when you follow the right patterns. Start small. Validate everything. Keep both systems running during the transition.

The real win isn't just faster queries—it's the ability to ask questions you couldn't ask before. Suddenly, that real-time dashboard with 50 million rows updates in milliseconds. Your analytics team stops waiting and starts exploring.

Your next move: Install peerDB, sync one table, run comparison queries. That's 30 minutes of work for months of insight.

Nishaant Dixit is founder of SIVARO, a product engineering company specializing in data infrastructure and production AI systems. Since 2018, I've built systems processing 200K events per second across multiple industries. I write about the hard lessons in building systems that actually scale. Connect on LinkedIn: https://www.linkedin.com/in/nishaant-veer-dixit

Originally published at https://sivaro.in/articles/clickhouse-migration-from-postgresql-what-i-learned-moving.

DEV Community

ClickHouse Migration from PostgreSQL: What I Learned Moving 1TB in 2 Hours

Top comments (0)