Everyone tells you ClickHouse is cheap. Faster queries, lower storage costs, magic.
Here's what I learned the hard way after migrating three production systems to ClickHouse: the first cost estimate is always wrong. By a lot.
What is a ClickHouse migration cost estimate? It's the total financial commitment required to move your analytical workloads from an existing database (PostgreSQL, MySQL, MongoDB, Snowflake) to ClickHouse. This includes compute, storage, network egress, data transfer during migration, engineering time, and ongoing operational overhead.
I've seen teams save 6x on query costs while spending more on engineering. I've also seen teams burn $50,000 on migration and roll back in six weeks.
This guide breaks down the real costs. The ones hidden in fine print. The ones nobody talks about at conferences.
You cannot estimate ClickHouse migration costs with a single spreadsheet row. The architecture is fundamentally different from row-oriented databases.
According to ClickHouse Cloud Pricing, the primary cost drivers are:
Compute Tiers
- Development tier: $0.50/hour (single replica, no auto-scaling)
- Production tier: $1.25/hour per replica (auto-scaling enabled)
- Enterprise tier: Custom pricing with dedicated hardware
Storage Costs
- Hot storage: $0.068/GB/month (SSD-backed, sub-millisecond access)
- Object storage: $0.023/GB/month (S3-compatible, 10x slower queries)
Hidden Costs
- Data transfer between regions: $0.09/GB
- Backup storage: $0.12/GB/month for 30-day retention
- Idle compute: you pay for running replicas even when nobody queries
In my experience, teams underestimate data transfer costs by 300-500%. Moving 10TB of historical data sounds manageable. The first month's transfer bill? $900 if you're lucky. $4,500 if you cross region boundaries.
A 2025 analysis from Quesma revealed that ClickHouse Cloud's pricing change in January 2025 increased compute costs for burst workloads by 40%. Teams running spiky analytical queries got hit hardest.
The hard truth: your cost estimate must account for the shape of your workload, not just the volume of data.
Three years ago, the answer was obvious: self-managed ClickHouse on bare metal was cheaper. Not anymore.
Let's look at GitLab's actual numbers. Their self-managed ClickHouse cost analysis breaks down a real production deployment:
| Component | Monthly Cost |
|---|---|
| 3x i3en.3xlarge instances | $2,184 |
| 3x 500GB EBS gp3 volumes | $144 |
| NAT Gateway + network | $312 |
| S3 for backups (5TB) | $115 |
| Engineering overhead (0.5 FTE) | $9,000 |
| Total | $11,755 |
The engineering cost dominates. Always.
Cloud Comparison (ClickHouse Cloud)
- Same workload: 3TB compressed data
- Standard tier with auto-scaling: $3,200/month
- Includes backups, replication, monitoring, updates
I've found that self-managed breaks even only when you have:
- Predictable workloads (no burst traffic)
- Existing DevOps infrastructure
- Data volumes exceeding 20TB compressed
- Engineering teams with ClickHouse expertise
Most teams should start with cloud. The migration itself is expensive enough without adding cluster management complexity.
Let's talk numbers from actual migrations. I built this framework after watching teams fail and succeed.
Small Migration (500GB - 2TB)
- Engineering setup: 40-60 hours
- Cloud compute during migration: $400-800
- Data transfer: $50-200
- Testing + validation: 20-40 hours
- Total estimated cost: $8,000 - $15,000
Medium Migration (2TB - 10TB)
- Engineering setup: 80-120 hours
- Cloud compute during migration: $1,500-4,000
- Data transfer: $200-900
- Schema redesign: 40-80 hours
- Application rewrites: 80-160 hours
- Total estimated cost: $30,000 - $60,000
Large Migration (10TB - 100TB)
- Engineering setup: 200-400 hours
- Cloud compute during migration: $5,000-20,000
- Data transfer: $900-9,000
- Parallel running period: 2-4 months
- Total estimated cost: $150,000 - $500,000
According to a Tinybird vs ClickHouse Cloud cost comparison, the hidden cost is query optimization. Without proper table design, your compute costs can be 3-5x higher than necessary.
Here's the command I use for estimating data transfer costs before migration:
SOURCE_SIZE_GB=5000
TRANSFER_COST_PER_GB=0.09
echo "Data Transfer Cost Estimate:"
echo "Same region: \$(echo "scale=2; $SOURCE_SIZE_GB * 0.02" | bc)"
echo "Cross region: \$(echo "scale=2; $SOURCE_SIZE_GB * $TRANSFER_COST_PER_GB" | bc)"
echo "With 5x compression savings, effective cost:"
echo "Same region: \$(echo "scale=2; $SOURCE_SIZE_GB / 5 * 0.02" | bc)"
echo "Cross region: \$(echo "scale=2; $SOURCE_SIZE_GB / 5 * 0.09" | bc)"
This is where most cost estimates fail. Schema design in ClickHouse directly determines your compute and storage costs.
The MergeTree Engine Myth
Everyone assumes MergeTree is the default choice. Fine for logs. Terrible for high-cardinality updates.
Here's a concrete example. I migrated a PostgreSQL table with 200 million orders:
-- Bad schema: causes massive write amplification
CREATE TABLE orders_bad (
order_id UInt64,
user_id UInt64,
status String,
created_at DateTime,
updated_at DateTime
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(created_at)
ORDER BY (created_at, user_id);
-- Good schema: optimized for ClickHouse
CREATE TABLE orders_good (
order_id UInt64,
user_id UInt64,
status LowCardinality(String),
created_at DateTime,
updated_at DateTime,
-- Pre-computed aggregations
hour_slot SimpleAggregateFunction(anyLast, UInt64)
) ENGINE = ReplacingMergeTree(updated_at)
PARTITION BY toYYYYMM(created_at)
ORDER BY (user_id, toStartOfHour(created_at));
The bad schema caused 8x higher storage costs and 3x slower queries. The LowCardinality type alone reduced storage by 60% for the status field.
Materialized Views: Cost Saver or Cost Creator?
Materialized views can reduce query costs by pre-computing aggregations. But they increase ingestion costs by 20-50%.
-- Create a materialized view for common queries
CREATE MATERIALIZED VIEW order_summary_mv
ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(created_at)
ORDER BY (user_id, toStartOfDay(created_at))
POPULATE
AS SELECT
user_id,
toStartOfDay(created_at) AS day,
countState() AS order_count,
sumState(total_amount) AS revenue
FROM orders_good
GROUP BY user_id, day;
I've found that materialized views are worth it when:
- The same aggregation query runs 100+ times per day
- The base table has 10M+ rows
- Query latency needs to be under 100ms
The ClickHouse cost optimization guide from e6data recommends profiling your materialized views monthly. Views that aren't queried are dead weight.
Everyone budgets for migration. Nobody budgets for losing data.
According to ClickHouse backup documentation, backup costs depend on:
- Backup frequency: Daily backups cost 3x more than weekly
- Retention period: 90-day retention is standard
- Backup type: Full vs incremental
Here's a realistic backup cost calculation:
#!/bin/bash
DATA_SIZE_GB=10000
BACKUP_FREQUENCY="daily" RETENTION_DAYS=90
if [ "$BACKUP_FREQUENCY" == "daily" ]; then
FULL_BACKUPS_PER_MONTH=1
INCREMENTAL_BACKUPS_PER_MONTH=29
elif [ "$BACKUP_FREQUENCY" == "weekly" ]; then
FULL_BACKUPS_PER_MONTH=1
INCREMENTAL_BACKUPS_PER_MONTH=3
fi
echo "Monthly backup cost estimate:"
echo "Full backups: \$DATA_SIZE_GB * 0.12 = $(echo "$DATA_SIZE_GB * 0.12" | bc)"
echo "Incremental backups: \$DATA_SIZE_GB * 0.04 * \$INCREMENTAL_BACKUPS_PER_MONTH = $(echo "$DATA_SIZE_GB * 0.04 * $INCREMENTAL_BACKUPS_PER_MONTH" | bc)"
echo "Total: \$((echo "$DATA_SIZE_GB * 0.12" | bc) + (echo "$DATA_SIZE_GB * 0.04 * $INCREMENTAL_BACKUPS_PER_MONTH" | bc))"
I learned this the hard way: after three months of daily backups for a 5TB cluster, the backup bill was $1,800/month. That was 30% of our compute costs.
After running ClickHouse in production for three years, here's my framework.
1. Build a Small Prototype First
Spend one week loading 1% of your data. Query it. Measure everything:
- Compression ratio (expect 3-10x)
- Query latency for your top 10 queries
- CPU utilization per query
- Storage growth rate
According to OneUptime's TCO estimation guide (from March 2026), teams that skip prototyping overshoot their budget by an average of 4.2x.
2. Model for the 99th Percentile
Your average query cost is irrelevant. The expensive queries are:
- Full table scans without filters
- JOINs on non-sharded keys
- SELECT * on wide tables
- GROUP BY on high-cardinality columns
I've found that 5% of queries drive 80% of compute costs.
3. Plan for Schema Evolution
Your schema today is wrong. It will change. Budget 20% overhead for:
- Adding new columns (cheap in ClickHouse)
- Changing partition keys (expensive - requires table rebuild)
- Adding data types (usually backward compatible)
-- Schema evolution pattern that minimizes cost
ALTER TABLE orders_good
ADD COLUMN IF NOT EXISTS
payment_method LowCardinality(String)
AFTER status;
-- Rebuild partition key (expensive - budget for this)
-- Requires creating new table, inserting, renaming
INSERT INTO orders_v2 SELECT * FROM orders_good;
RENAME TABLE orders_good TO orders_old, orders_v2 TO orders_good;
DROP TABLE orders_old;
The decision matrix is simpler than you think.
Choose Cloud when:
- Your team has 0-2 DevOps engineers
- Workloads are unpredictable or growing
- You need native integrations with Kafka, S3, GCS
- Compliance requires automatic backups and encryption
Choose Self-Managed when:
- Data volumes exceed 50TB compressed
- You have dedicated infrastructure engineers
- Query patterns are 95% predictable
- You need custom kernel tuning or hardware
A case study from TipRanks highlighted a company that reduced query latency by 7x while cutting costs by 6x after migrating to ClickHouse Cloud. The key insight? They didn't need to manage infrastructure. Their engineers focused on queries, not cluster health.
The Hybrid Approach I Prefer
Start with cloud. Learn the patterns. Measure actual costs. After 6-12 months, you can make an informed decision about self-managing.
Three specific problems I've encountered:
Challenge 1: Query Bloat
Your application queries start optimizing for developer convenience, not cost.
Fix: Implement query cost tracking. ClickHouse provides system.query_log with CPU, memory, and bytes read.
-- Track expensive queries daily
SELECT
toStartOfDay(query_start_time) AS day,
count() AS query_count,
sum(query_duration_ms) / 1000 AS total_seconds,
sum(read_bytes) / pow(2, 30) AS total_gb_read,
avg(memory_usage) / pow(2, 20) AS avg_memory_mb
FROM system.query_log
WHERE type = 'QueryFinish'
AND query_start_time > now() - INTERVAL 7 DAY
GROUP BY day
ORDER BY day DESC;
Challenge 2: Data Growth Surprise
Historical data outgrows your projection. The first month's 500GB becomes 2TB by month six.
Fix: Implement data tiering from day one. Use Hot/Warm/Cold storage.
ALTER TABLE orders_good
MODIFY TTL updated_at + INTERVAL 30 DAY TO VOLUME 'cold';
-- Create cold storage volume
CREATE VOLUME cold
TYPE S3
LOCATION 's3://clickhouse-backups/production/cold/';
Challenge 3: Migration Downtime
You estimated 2 hours of downtime. It took 8. Business stakeholders are unhappy.
Fix: Budget for parallel running. Run both systems for 2-4 weeks. Test queries against both. Verify results match.
What is the average ClickHouse migration cost for a 10TB database?
Around $40,000-$80,000 including engineering time. Cloud compute adds $2,000-$4,000/month. Data transfer costs $200-$900 for same-region migrations.
How much does ClickHouse Cloud cost per TB per month?
$68-$120/TB/month for hot storage. Object storage costs $23/TB/month. Compute is separate at $0.50-$1.25/hour per replica.
Is self-managed ClickHouse cheaper than cloud?
Only above 20TB compressed data with a dedicated DevOps team. Below that, cloud is 30-50% cheaper when including engineering overhead.
How long does a typical ClickHouse migration take?
Small migrations (under 2TB): 2-4 weeks. Medium (2-10TB): 4-8 weeks. Large (10TB+): 8-16 weeks. Parallel running adds 2-4 weeks.
What hidden costs should I expect with ClickHouse?
Data transfer fees ($0.02-0.09/GB), backup storage ($0.12/GB/month), idle compute for replicas, and schema redesign time (40-160 hours).
Can I reduce ClickHouse costs after migration?
Yes. Implement data tiering (TTL moves), optimize schema with LowCardinality, use materialized views for common queries, and monitor query costs monthly.
What's the worst-case ClickHouse migration cost I should budget for?
150% of your estimate. Data transfers cost more than expected. Query optimization takes longer. Parallel running reveals issues.
How do I compare ClickHouse cost to PostgreSQL or Snowflake?
ClickHouse is typically 3-10x cheaper for analytical queries, but 2-5x more expensive for point lookups. Use the TCO model for your specific workload.
ClickHouse migration cost estimation isn't about spreadsheets. It's about understanding your data patterns, query shapes, and engineering capacity.
Here's what I know for certain:
- Build a prototype first. Measure everything.
- Budget for schema evolution. Your first design is wrong.
- Track query costs from day one. The expensive queries hide.
- Plan for parallel running. It saves your reputation.
Next step: Load 100GB of your data into ClickHouse Cloud. Run your top 10 queries. Measure the cost. Then decide.
The migration is hard. The cost is higher than you think. But the performance improvement is real. I've seen queries go from 30 seconds to 200 milliseconds. That speed changes how your team builds.
Nishaant Dixit: Founder of SIVARO. Building data infrastructure and production AI systems since 2018. Built systems processing 200K events/sec. Connect on LinkedIn.
- ClickHouse Pricing
- ClickHouse Cloud Billing Overview
- Tinybird vs ClickHouse Cloud Cost Comparison
- How to Estimate Total Cost of Ownership for ClickHouse
- GitLab Self-Managed ClickHouse Costs
- ClickHouse Cloud Pricing Change January 2025
- Customer Migration Highlights Performance and Cost Advantages
- ClickHouse Cost Optimization 2025
- How I Migrated to Clickhouse and Speed Up My Backend 7x
- ClickHouse Backup Overview
Originally published at https://sivaro.in/articles/clickhouse-migration-cost-estimate-the-real-numbers-from.
Top comments (0)