A lot of people initially think ClickHouse performance problems come from:
- large queries
- bad joins
- massive datasets
- missing indexes
And honestly, those things can matter.
But one of the most common operational problems in ClickHouse often starts much earlier:
too many tiny parts.
This is one of those issues that usually stays invisible at first.
Then suddenly:
- merges fall behind
- queries slow down
- memory usage increases
- inserts become unstable
And the cluster starts behaving strangely.
Every Insert Creates Parts
This is the first thing that’s important to understand.
In MergeTree-based engines, ClickHouse stores data as immutable parts.
Something as simple as:
INSERT INTO events VALUES (...);
creates new parts on disk.
And this is completely normal.
ClickHouse is designed around this storage model.
So:
parts themselves are not the problem.
The real issue starts when parts begin accumulating faster than merges can stabilize them.
Why Tiny Inserts Become Dangerous
At smaller scale, tiny inserts may seem harmless.
For example:
- inserting row-by-row
- extremely frequent micro-batches
- tiny streaming flush intervals
Initially:
everything still works.
But over time, the number of parts starts growing aggressively.
Now ClickHouse has to manage:
- more metadata
- more merges
- more scheduling
- more file operations
This creates operational overhead.
Meaning:
the system starts spending increasing resources managing fragmentation itself.
Why Merges Matter So Much
ClickHouse relies heavily on background merges.
These merges:
- combine smaller parts
- reduce fragmentation
- improve compression
- optimize query performance
Under healthy ingestion patterns, merges naturally keep the system stable over time.
That is the ideal state.
But problems start when:
parts created per second
>
parts merged per second
Now fragmented parts begin accumulating faster than ClickHouse can compact them.
And this is usually where instability slowly starts building.
The Dangerous Part Is That It Builds Slowly
This is what makes the issue tricky operationally.
You usually do not notice the problem immediately.
The cluster may look perfectly healthy initially.
Then gradually:
- insert latency increases
- merges lag behind
- CPU usage becomes unstable
- queries become heavier
- replication slows down
And eventually ClickHouse may start throwing errors like:
Too many parts
At that point, the merge system is already under serious pressure.
Queries Also Become More Expensive
A lot of people think parts only affect inserts.
But queries suffer too.
Because queries now need to:
- open more parts
- scan more metadata
- coordinate more files
Even when the actual dataset itself is not massive.
So sometimes:
performance degradation comes more from fragmentation than raw data volume.
That is a very important operational insight.
FINAL Does Not Really Solve This
One thing that’s important to understand:
FINAL is not really a solution for too many parts.
For example:
SELECT *
FROM events FINAL;
FINAL applies merge logic during query execution.
But the fragmented parts still physically exist underneath.
So if the system already has excessive fragmentation:
- queries still scan many parts
- merge pressure still exists
- query execution can become heavier
Which means:
FINAL can actually become more expensive when fragmentation becomes unhealthy.
The real fix is usually improving ingestion and merge behavior itself.
Over-Partitioning Can Quietly Make This Worse
Another thing that often accelerates part explosion is overly granular partitioning.
For example:
PARTITION BY toYYYYMMDDhh(timestamp)
instead of something broader like:
PARTITION BY toYYYYMM(timestamp)
Now even small inserts may create parts across many partitions simultaneously.
Which means:
a single insert can end up creating multiple fragmented parts underneath.
And over time, merge pressure increases much faster than expected.
ClickHouse Also Has Ways to Help
Modern ClickHouse versions also support features like async inserts to help reduce excessive tiny-part creation.
Instead of immediately flushing every small insert into separate parts, ClickHouse can buffer inserts internally before writing larger parts to disk.
This helps reduce fragmentation and merge pressure in workloads that naturally produce smaller inserts.
But async inserts are not a replacement for healthy ingestion patterns themselves.
Stable batching still matters a lot.
Why Batch Size Matters So Much
ClickHouse generally performs much better with:
- larger batches
- fewer inserts
- healthier merge behavior
Because fewer parts means:
- fewer merges
- lower metadata overhead
- better compression
- more efficient scans
This is one of the reasons ClickHouse ingestion patterns often look very different from traditional OLTP systems.
Too Many Parts Also Affects Startup and Recovery
Another thing people often discover late:
Large numbers of parts also affect:
- startup time
- replication recovery
- metadata loading
- server restarts
Because ClickHouse now has to:
- scan part metadata
- validate parts
- rebuild internal state
before the server becomes fully operational again.
So the issue is not just query performance.
It becomes an overall operational stability problem.
The Important Lesson
One thing I’ve noticed with ClickHouse is that many performance problems are actually merge-management problems underneath.
And too many parts is one of the clearest examples of that.
Because the issue usually is not:
“ClickHouse cannot handle large data.”
The issue is more often:
fragmentation and merge pressure slowly became unhealthy.
That is a very different operational problem.
Final Thought
ClickHouse is extremely good at handling massive analytical workloads.
But it performs best when the storage engine is allowed to merge parts efficiently.
And sometimes the biggest performance problem is not the query itself.
It is the thousands of tiny fragmented parts quietly building underneath the system over time.
Top comments (0)