nishaant dixit

Posted on May 6 • Originally published at sivaro.in

Why Most ClickHouse Migrations Fail (And How to Choose the Right Partner)

I’ve seen it happen more times than I care to count. A team spends months building a Postgres cluster. They tune indexes. They buy bigger servers. Then the data doubles. Queries that took 50ms now take 50 seconds. Panic sets in.

Everyone says "Just migrate to ClickHouse." They're wrong about how to do it.

The hard truth: ClickHouse migration isn't a weekend project. It's a surgical process that requires deep understanding of both your source system and ClickHouse's internals. The difference between success and failure often comes down to one factor: the migration partner you choose.

According to ClickHouse's official migration overview, successful migrations require careful planning around data formats, schema compatibility, and incremental sync strategies. In this guide, I'll show you exactly what to look for in a ClickHouse migration partner, based on real migrations I've led and the lessons I've learned the hard way.

Most people think migrating to ClickHouse is just about dumping data and loading it. They're wrong because the real challenge is threefold: schema differences, data types, and partitioning strategy.

Here's what I learned the hard way: ClickHouse isn't MySQL. It isn't Postgres. It's an OLAP database built for columnar storage and vectorized execution. A relational schema from Postgres will break in ClickHouse if you don't redesign it.

I've found that the most common migration pitfalls include:

Date/time handling – ClickHouse treats timestamps differently than most RDBMS systems
NULL semantics – ClickHouse handles NULLs differently in aggregation
Indexing models – ClickHouse uses primary key indices, not B-tree indexes
Data types – No UUID type in older versions, no ARRAY in standard SQL

The real problem isn't the data volume. It's that your team likely doesn't know what they don't know about ClickHouse's architecture.

A good ClickHouse migration partner brings two things to the table: battle-tested tooling and deep knowledge of ClickHouse internals. According to Atlas's ClickHouse migration guide, automatic schema migration tools can handle the repetitive work, but only if someone understands the mapping rules.

I've seen teams bring over a Postgres schema exactly as-is. They hit a wall when ClickHouse rejects VARCHAR(255) or silently corrupts TIMESTAMP data.

A proper migration partner understands the ClickHouse data type system. They'll transform your source schema into something that actually performs.

In my experience, the biggest win comes from proper partitioning. A partner who knows ClickHouse will:

Convert TIMESTAMP WITH TIME ZONE to DateTime64
Map SERIAL primary keys to UUID or synthetic keys
Flatten normalized schemas into denormalized wide tables
Choose ORDER BY keys that match your query patterns

Speed matters. According to PeerDB's case study on large Postgres migrations, they migrated 1TB of data from Postgres to ClickHouse in just 2 hours. That's not magic. That's smart engineering.

The secret? Incremental sync and parallel processing. A good partner implements:

Snapshot-based initial loads – consistent, fast, and restartable
Change data capture (CDC) – real-time replication without downtime
Batch optimization – tuning chunk sizes for ClickHouse's merge tree engine

This is where most DIY efforts fail. They take the application offline for a "migration weekend." With a proper partner, your users won't notice a thing.

Here's what a real zero-downtime migration looks like:

Set up CDC replication from source to ClickHouse
Run a shadow read layer pointing to ClickHouse
Validate query results against source for 48-72 hours
Cut over traffic gradually
Tear down old system

No downtime. No rollback panic. Just clean data flowing to the new system.

Let me show you three concrete migration patterns I've used in production.

The clickhouse-local tool is surprisingly powerful for small-to-medium migrations. According to ClickHouse's documentation on clickhouse-local, you can pipe data directly without a running server.

psql -h localhost -U user -d mydb -c "\COPY (SELECT * FROM events) TO '/tmp/events.csv' WITH CSV HEADER"

clickhouse-local --query "
  INSERT INTO events
  SELECT * FROM file('/tmp/events.csv', 'CSVWithNames')
  SETTINGS input_format_skip_unknown_fields=1
"

clickhouse-client --host your-instance.clickhouse.cloud --port 9440 --secure \
  --query "INSERT INTO events FORMAT CSV" < /tmp/events.csv

Trade-off: This works for datasets under 100GB. For larger migrations, you need parallelized approaches.

For real-time, zero-downtime migrations, Kafka Connect with Debezium is the gold standard.

{
  "name": "postgres-connector",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "postgres-primary",
    "database.port": "5432",
    "database.user": "replicator",
    "database.password": "secret",
    "database.dbname": "mydb",
    "database.server.name": "my-pg-server",
    "table.include.list": "public.events,public.users",
    "plugin.name": "pgoutput",
    "slot.name": "clickhouse_slot",
    "publication.name": "clickhouse_pub"
  }
}

{
  "name": "clickhouse-sink",
  "config": {
    "connector.class": "io.clickhouse.kafka.ClickHouseSinkConnector",
    "clickhouse.url": "jdbc:clickhouse://clickhouse-server:8123",
    "clickhouse.database": "mydb",
    "clickhouse.table.auto.create": "true",
    "topics": "my-pg-server.public.events",
    "errors.tolerance": "all",
    "batch.size": "5000",
    "tasks.max": "4"
  }
}

In my experience, CDC migrations are the only way to handle multi-terabyte datasets without downtime. The initial sync catches up the backlog, then CDC keeps things in sync.

Schema changes in ClickHouse are notoriously tricky. ClickHouse doesn't support transactional DDL like Postgres.

-- Atlas migration for ClickHouse
-- version: 001_add_partitioning.up.sql

-- First, create the new table with proper schema
CREATE TABLE events_v2 (
    event_id UUID,
    timestamp DateTime64(3),
    user_id UInt32,
    event_type String,
    payload String
) ENGINE = MergeTree()
ORDER BY (timestamp, event_id)
PARTITION BY toYYYYMM(timestamp)
TTL timestamp + INTERVAL 90 DAY DELETE;

-- Use ALTER TABLE ... MODIFY COLUMN for type changes
ALTER TABLE events_v2
MODIFY COLUMN user_id Nullable(UInt32);

-- Apply TTL changes online
ALTER TABLE events_v2
MODIFY TTL timestamp + INTERVAL 180 DAY DELETE;

According to the golang-migrate ClickHouse driver documentation, ClickHouse migrations require explicit locking strategies because of the engine's eventual consistency model.

For truly large datasets, skip the wire. Use object storage as an intermediary.

pg2parquet --host postgres --db mydb \
  --query "SELECT * FROM events WHERE date >= '2025-01-01'" \
  --output s3://my-bucket/clickhouse-migration/events/

clickhouse-client --query "
  INSERT INTO events
  SELECT * FROM s3(
    'https://my-bucket.s3.amazonaws.com/clickhouse-migration/events/*.parquet',
    'Parquet',
    'event_id UUID, timestamp DateTime64(3), user_id UInt32, event_type String'
  )
  SETTINGS input_format_parquet_skip_columns_with_errors=1
"

The hard truth about bulk ingests: They're fast but fragile. Always validate row counts. I've caught mismatches of 0.03% that took weeks to debug because of trailing whitespace in strings.

Don't migrate everything at once. Pick your highest-query-volume table. Migrate it. Measure query performance against the original. According to Tinybird's blog on when NOT to migrate, some workloads genuinely don't benefit from ClickHouse – especially high-frequency row-level updates.

In my experience, a good PoC covers:

Query latency comparison (p50, p95, p99)
Storage compression ratio
Write throughput (rows/second)
Concurrency under load (100+ simultaneous queries)

ClickHouse schema changes are more disruptive than in Postgres. You can't just ALTER TABLE ADD COLUMN DEFAULT and call it a day.

Best practices I've learned:

Use Nullable() sparingly – it degrades compression
Choose ORDER BY keys that match your most common WHERE clauses
Use TTL clauses for automatic data retention
Pre-aggregate materialized views for common queries

Because it does. Set up monitoring before you cut over:

-- Check merge queue depth
SELECT database, table, 
       count() AS pending_merges
FROM system.merges
WHERE database != 'system'
GROUP BY database, table;

-- Monitor query performance over time
SELECT query, 
       round(avg(query_duration_ms), 2) AS avg_ms,
       count() AS executions
FROM system.query_log
WHERE event_date = today()
  AND type = 'QueryFinish'
GROUP BY query
ORDER BY avg_ms DESC
LIMIT 20;

-- Check data replication lag
SELECT database, table,  
       absolute_delay,
       is_currently_leader
FROM system.replicas
WHERE is_readonly = 1;

According to OneUptime's 2026 guide on ClickHouse migration between cloud providers, monitoring replication lag is critical because ClickHouse's asynchronous replication means some nodes can fall behind without obvious errors.

Your dataset is under 500GB
You have a ClickHouse expert on staff
Your schema is simple (under 20 tables)
You can tolerate 4-8 hours of downtime
Your team is comfortable with Go, SQL, and data pipelines

Over 1TB of data
Real-time requirements (under 1 minute lag)
Complex schemas with hundreds of tables
Strict SLAs for uptime (99.99%+)
No in-house ClickHouse experience

The real cost of DIY isn't tools. It's time. I've seen teams spend 6 months on a migration that a competent partner could complete in 6 weeks. The hidden cost? Missed revenue from delayed analytics capabilities.

"How many Postgres-to-ClickHouse migrations have you completed?" – Look for 10+ real projects
"What's your maximum throughput on CDC?" – You need at least 50K events/sec
"How do you handle schema conflicts?" – They should have automated reconciliation
"What's your rollback plan?" – If they say "we never rollback," run
"Do you use open-source tooling?" – Proprietary lock-in is a red flag

Postgres integers are larger than ClickHouse integers. Postgres BIGINT maps to ClickHouse Int64, but INTEGER maps to Int32. Miss this, and you get silent truncation.

Solution: Use a mapping table like this:

Source Type	ClickHouse Type	Risk Level
`INTEGER (4 bytes)`	`Int32`	Safe
`BIGINT (8 bytes)`	`Int64`	Safe
`NUMERIC(10,2)`	`Decimal(10,2)`	Safe
`TIMESTAMP WITH TZ`	`DateTime64(3, 'UTC')`	Requires timezone conversion
`JSONB`	`JSON` (ClickHouse 24.3+)	Type mapping needed
`UUID`	`UUID`	Direct
`TEXT[]`	`Array(String)`	Direct

ClickHouse uses lightweight locks by default. But large ALTER TABLE operations can block. According to ClickHouse's knowledge base on schema migration tools, you should always:

Test schema changes on a replica first
Use ON CLUSTER with caution
Have a rollback migration ready
Monitor merge queue depth

The hardest lesson I learned: Always verify row counts and checksums after migration.

-- Post-production verification script
SELECT 
    'events' AS table_name,
    count() AS source_count,
    sum(cityHash64(toString(tuple(*)))) AS source_checksum
FROM postgres_mysql.events
UNION ALL
SELECT 
    'events',
    count(),
    sum(cityHash64(toString(tuple(*))))
FROM clickhouse.events
FINAL;

Run this every hour during the first 48 hours post-migration. I've caught partition-boundary mismatches this way that would have poisoned dashboards for months.

What is a ClickHouse migration partner?
A specialist company or consultant that helps move data from existing databases (Postgres, MySQL, MongoDB, etc.) to ClickHouse, handling schema translation, data transfer, and production cutover.

How long does a ClickHouse migration typically take?
Small datasets (under 100GB): 1-2 weeks. Medium (100GB-1TB): 3-6 weeks. Large (1TB-10TB): 6-12 weeks. Multi-terabyte migrations with CDC can take 8-16 weeks.

Can I migrate from Postgres to ClickHouse without downtime?
Yes, if you use change data capture (CDC). Tools like Debezium + Kafka can keep your source and target in sync while you cut over traffic gradually.

What tools do ClickHouse migration partners use?
Common tooling includes clickhouse-local, Apache Kafka Connect, Debezium, PeerDB, and custom Go/Java pipelines. Top partners also use schema migration tools like Atlas or golang-migrate.

How much does a ClickHouse migration cost?
Partner fees range from $15K-$50K for small migrations, $50K-$200K for medium, and $200K-$1M+ for large-scale enterprise migrations with CDC and zero-downtime requirements.

What happens if the migration fails mid-process?
A good partner has rollback plans with point-in-time recovery. CDC-based migrations can simply replays events from the last checkpoint without data loss.

Do I need to redesign my schema for ClickHouse?
Almost always. ClickHouse is an OLAP database optimized for columnar storage. Star schemas, denormalized wide tables, and materialized views often perform better than normalized relational schemas.

Is AI used in ClickHouse migrations?
Increasingly, yes. According to FiveOneFour's AI-powered migration from Postgres to ClickHouse, modern tools use machine learning to recommend optimal schema mappings, partition keys, and sort orders based on query patterns.

Here's what matters:

Choose your migration partner based on schema complexity and data volume, not price
CDC-based migrations are the only way to achieve zero downtime above 500GB
Always validate row counts and checksums – trust nothing
Plan for schema evolution – ClickHouse DDL is more restrictive than Postgres

Your next move: Run a small proof-of-concept with your highest-impact table. Measure query performance before and after. If you see 10x speed improvements (which I've seen consistently), scale from there.

If you're serious about ClickHouse migration, reach out to SIVARO. We've handled petabytes of production data migrations. We know the pitfalls because we've fallen into them.

Nishaant Dixit - Founder of SIVARO. Building data infrastructure and production AI systems since 2018. Led migrations processing over 200K events/sec for clients including fintech and ad-tech platforms. Connect with me on LinkedIn.

Sources

Originally published at https://sivaro.in/articles/why-most-clickhouse-migrations-fail-and-how-to-choose-the.

DEV Community

Why Most ClickHouse Migrations Fail (And How to Choose the Right Partner)

Top comments (0)