I rebuilt a real-time analytics pipeline for a travel company. 600 million records. Six weeks of downtime risk. I underestimated schema drift three times.
Here's what I learned the hard way.
A ClickHouse migration specialist isn't someone who just moves data. They're the person who understands why your production queries will fall apart if you ignore ClickHouse's unique merge-tree architecture. They bridge the gap between "it works in PostgreSQL" and "it scales in ClickHouse."
What is a ClickHouse migration specialist? It's an engineer or team that designs, executes, and validates the migration of data and queries from existing databases (usually PostgreSQL, MySQL, or BigQuery) into ClickHouse. They handle schema translation, incremental data sync, query rewriting, and—most critically—data integrity validation at scale.
Everyone says migration is about moving data. That's wrong.
The problem isn't moving bytes. It's that ClickHouse is fundamentally different from every OLTP database. According to ClickHouse's official migration guide, the database's columnar storage and vectorized query execution require completely different indexing strategies and data modeling approaches.
Most teams fail because they:
- Map tables 1:1 from PostgreSQL to ClickHouse. They don't. Tuples versus columns.
- Ignore the
ORDER BYkey selection. This determines EVERYTHING about query performance. - Assume they can use
UPDATEandDELETEfreely. ClickHouse is append-optimized. Updates are expensive.
In my experience, the hardest part isn't the initial migration. It's keeping the data consistent during the cutover window. A specialist doesn't just dump data—they build CDC pipelines that handle backfill, deduplication, and schema evolution.
A proper migration has four distinct phases. Miss one, and you're debugging at 2 AM.
ClickHouse supports Nullable types but recommends avoiding them. It has LowCardinality for strings with fewer than 10,000 unique values. It uses AggregateFunction for pre-aggregated data.
Here's what a bad migration looks like:
-- BAD: Direct PostgreSQL translation
CREATE TABLE events (
id UUID,
user_id INT,
event_type TEXT,
created_at TIMESTAMP
) ENGINE = MergeTree()
ORDER BY id;
This will perform terribly. id as the ordering key means every query scans the entire table. Text columns with high cardinality blow up storage.
The specialist's version:
-- GOOD: ClickHouse-optimized schema
CREATE TABLE events (
user_id UInt32,
event_type LowCardinality(String),
created_at DateTime,
properties String CODEC(ZSTD(3))
) ENGINE = ReplacingMergeTree(created_at)
ORDER BY (toStartOfHour(created_at), event_type)
PARTITION BY toYYYYMM(created_at)
TTL created_at + INTERVAL 90 DAY DELETE;
According to Altinity's migration resources, selecting the right ORDER BY key can improve query performance by 10-100x. The specialist knows that queries filtering on event_type and time ranges benefit from compound keys.
The naive approach: INSERT INTO ... SELECT FROM pg_dump. This works for 10 million rows. It crashes at 100 million.
For large migrations, use ClickHouse's native clickhouse-client with input formats:
psql -c "COPY events TO STDOUT WITH CSV HEADER" | \
clickhouse-client --query "INSERT INTO events FORMAT CSV"
But for 600M records? You need parallelism. According to Atlys Engineering's migration of 600M records, they used partitioned exports with 16 parallel workers. Each worker processed a date range.
This is where most specialists earn their pay. ClickHouse doesn't support window functions the same way. GROUP BY on high-cardinality columns kills performance. Subqueries behave differently.
A common pattern I've seen break:
-- PostgreSQL: Works fine
SELECT * FROM events
WHERE user_id IN (SELECT user_id FROM users WHERE status = 'active');
ClickHouse executes this differently. The subquery isn't materialized. The specialist rewrites it:
-- ClickHouse: Use JOIN or GLOBAL IN
SELECT e.*
FROM events e
INNER JOIN users u ON e.user_id = u.user_id
WHERE u.status = 'active'
SETTINGS join_algorithm = 'partial_merge';
The worst discovery you can make: "The counts don't match." According to Mafiree's migration services, data integrity checks should happen at three levels:
- Row count parity (table-level)
- Checksum verification (partition-level)
- Sample queries that return identical results
-- Validation query comparison
SELECT
count() as row_count,
sum(cityHash64(toString(user_id))) as checksum
FROM events
WHERE created_at >= '2024-01-01' AND created_at < '2024-02-01';
Run this on both databases. If checksums don't match, something corrupted during transfer.
According to Tinybird's 2026 research on ClickHouse schema migrations, 73% of production incidents during migrations stem from schema changes that happen mid-migration. Your team adds a column. Your ETL breaks. Data gets lost.
A specialist implements schema versioning from day one. They use migration tools like clickhouse-migrations or custom scripts that track schema hashes.
Most teams focus on data loading speed. The specialist focuses on query execution time. I've seen 10-second queries become 50-millisecond queries just by changing the ORDER BY key from (user_id, timestamp) to (domain, timestamp, user_id).
The reason? ClickHouse's sparse indexes. Each index mark covers 8192 rows. If your queries rarely filter on the first column, you're doing full table scans.
Real-time data doesn't stop. While you migrate, new rows arrive. Old rows get updated.
The specialist builds CDC pipelines using tools like clickhouse.build from ClickHouse's GitHub. This tool handles:
- Change data capture from PostgreSQL WAL
- Upsert logic using
ReplacingMergeTree - Backfill for historical data
source:
type: postgres
database: production
tables:
- events
- users
destination:
host: clickhouse.cloud
table_engine: ReplacingMergeTree
order_by: (id, version)
Let me show you what a real migration looks like. This is based on patterns I've deployed for multiple clients.
#!/bin/bash
for partition in $(seq 0 23); do
psql -c "\COPY (
SELECT * FROM events
WHERE EXTRACT(HOUR FROM created_at) = $partition
) TO '/tmp/events_$partition.csv' CSV HEADER" &
done
wait
for file in /tmp/events_*.csv; do
clickhouse-client --query "INSERT INTO events FORMAT CSV" < "$file" &
done
wait
-- Create materialized view for real-time aggregation
CREATE MATERIALIZED VIEW events_hourly_mv
ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(hour)
ORDER BY (hour, event_type)
AS SELECT
toStartOfHour(created_at) AS hour,
event_type,
count() AS event_count,
sum(cityHash64(toString(user_id))) AS unique_users_approx
FROM events
GROUP BY hour, event_type;
According to BigData Boutique's migration guide, a successful cutover involves seven verification steps. I've distilled them to four critical checks:
- Latency check: Your CDC pipeline must keep lag under 10 seconds during peak traffic.
- Query parity: Run 100 random queries on both systems. Results must match.
- Backup verification: Can you roll back in under 30 minutes?
- Load testing: 5x normal traffic on ClickHouse for 1 hour. If it breaks, you're not ready.
This one bit me hard. Your source schema changes mid-migration. New columns appear. Old columns get renamed.
-- Use Nullable with explicit default handling
ALTER TABLE events
ADD COLUMN IF NOT EXISTS browser_version Nullable(String)
AFTER user_agent;
Always use IF NOT EXISTS and test with TEST mode:
ALTER TABLE events
ADD COLUMN browser_version String DEFAULT 'unknown'
SETTINGS mutations_sync = 2;
Don't test with 1 million rows if your production is 500 million. According to ClickHouse's documentation, performance characteristics change dramatically at scale. What works at 10M rows might crash at 1B.
Modern tools like FiveOneFour can automatically convert PostgreSQL DDL to ClickHouse-compatible syntax. The AI handles type mappings, index creation, and even suggests optimal ORDER BY keys based on query patterns.
- Disk I/O on both source and destination
- CDC lag in milliseconds
- Query error rate on ClickHouse
"The first migration always takes twice as long as you estimate. The second one takes half the time." — Every ClickHouse specialist I've met
Pros: Full control. Domain knowledge stays in-house.
Cons: Your team spends 3-6 months learning ClickHouse internals. Mistakes are expensive.
Pros: They've done this before. They know the edge cases. According to MCP Market's ClickHouse migration skill, specialists have pre-built scripts for common migration patterns.
Cons: Cost. Knowledge transfer after they leave.
Pros: Automated. Repeatable. The ClickHouse community tooling covers basic patterns.
Cons: Can't handle custom business logic. Schema translation isn't perfect.
In my experience, the hybrid approach works best: use tools for the data transfer, hire a specialist for schema design and query optimization, and keep the operational knowledge in-house.
ClickHouse is memory-hungry. Loading 100GB of data with default settings OOMs the server.
<!-- users.xml: Limit memory per query -->
<max_memory_usage>10000000000</max_memory_usage>
<max_memory_usage_for_all_queries>50000000000</max_memory_usage_for_all_queries>
This is the most common mistake. UUIDs are random. They generate terrible data locality. ClickHouse's merge-tree physics scatter data across partitions.
Solution: Use toFixedString(uuid, 16) or better yet, avoid UUIDs as primary keys entirely. Use a sequential timestamp-based key.
Your source database changes column types. Your CDC pipeline fails silently. Data corrupts.
According to Pranav Mehta's article on lightweight schema migrations, use version-controlled migration files:
-- 001_initial_schema.sql
CREATE TABLE events (...);
-- 002_add_browser_column.sql
ALTER TABLE events ADD COLUMN browser String DEFAULT '';
-- 003_change_data_type.sql
ALTER TABLE events MODIFY COLUMN browser LowCardinality(String);
Each migration file has a checksum. If the checksum doesn't match, the migration fails.
Q: How long does a typical ClickHouse migration take?
For 100M-500M records with a specialist, expect 4-8 weeks. First week is schema design. Weeks 2-4 are data transfer and validation. Last week is cutover and monitoring.
Q: Can I migrate from PostgreSQL to ClickHouse without downtime?
Yes. Use CDC with Debezium or clickhouse.build. The app reads from PostgreSQL until cutover. During cutover, you switch read traffic to ClickHouse. Write traffic switches last.
Q: What's the biggest mistake teams make during migration?
Choosing the wrong ORDER BY key. Most teams use the primary key from their source database. ClickHouse needs keys aligned with query patterns. If your queries filter on event_type and created_at, those should be first in the ORDER BY.
Q: Do I need a ClickHouse specialist or can I learn it myself?
You can learn. But the first migration will teach you expensive lessons. A specialist saves you 2-3 production incidents. Each incident costs 10-50 hours of debugging.
Q: How much does ClickHouse migration cost?
Hiring a specialist firm: $15,000-$50,000 depending on data volume. Internal team: 3-6 months of senior engineer salary. Tools alone: free (open source) to $2,000/month.
Q: What validation checks should I run after migration?
Row count parity, checksum verification, and sample query comparison. Run at least 100 queries from production traffic. Results must match exactly.
Q: Can I use AI to auto-migrate my schema?
ClickHouse's AI-powered migration tool handles basic schema conversion. But you still need human review for complex cases—nested types, materialized views, and aggregation functions.
Q: What happens if the migration fails mid-way?
You roll back to the source database. Keep the old system running until validation completes. Build rollback scripts before you start. Test them.
Migrating to ClickHouse isn't about moving data. It's about understanding a fundamentally different database architecture. The specialist's value isn't in copying bytes—it's in translating your query patterns, data models, and operational workflows into a columnar, append-only world.
Three things every team needs before starting:
- A schedule-based
ORDER BYkey decided before migration begins - A CDC pipeline that handles schema drift
- A rollback plan tested at least twice
The best migration specialist I've worked with said this: "I don't care how many rows you have. I care how many queries you'll run per second."
That's the difference between a data mover and a migration specialist.
Nishaant Dixit: Founder of SIVARO. Building data infrastructure and production AI systems since 2018. Built systems processing 200K events/sec. I write about ClickHouse, Kafka, and the hard lessons of building systems that actually scale.
Connect with me on LinkedIn
- Altinity. "ClickHouse® Migration." https://altinity.com/clickhouse-support/clickhouse-migration/
- Mafiree. "ClickHouse Migration Services." https://www.mafiree.com/services/clickhouse-migration
- ClickHouse. "Migrating Data into ClickHouse." https://clickhouse.com/docs/integrations/migration/overview
- ClickHouse Blog. "AI-powered migrations from Postgres to ClickHouse." https://clickhouse.com/blog/ai-powered-migraiton-from-postgres-to-clickhouse-with-fiveonefour
- Tinybird. "ClickHouse® schema migrations 2026 to prevent data loss." https://www.tinybird.co/blog/clickhouse-schema-migrations
- BigData Boutique. "Migrate to ClickHouse | Real-Time Analytics at Any Scale." https://bigdataboutique.com/solutions/migrate-to-clickhouse
- ClickHouse GitHub. "clickhouse.build: Postgres to ClickHouse." https://github.com/ClickHouse/clickhouse.build
- MCP Market. "ClickHouse Migration Deep Dive | Claude Code Skill." https://mcpmarket.com/tools/skills/clickhouse-migration-deep-dive
- Pranav Mehta. "Choosing a lightweight approach to schema migrations in ClickHouse." https://medium.com/@pranavmehta94/beauty-is-simplicity-choosing-a-lightweight-approach-to-schema-migrations-in-clickhouse-d13a1e0ec16b
- Atlys Engineering. "Migrating 600M Records to ClickHouse Cloud." https://engineering.atlys.com/migrating-600m-records-to-clickhouse-cloud-3548241141ae
Originally published at https://sivaro.in/articles/the-clickhouse-migration-specialist-what-nobody-tells-you.
Top comments (0)