Mohamed Hussain S

Posted on May 27

Why ClickHouse Loves Append-Heavy Workloads

#clickhouse #olap #databasearchitecture #dataengineering

One thing that makes ClickHouse feel very different from traditional OLTP databases is how much it prefers append-heavy workloads.

And once you understand why, many ClickHouse behaviors suddenly start making sense:

immutable parts
background merges
ingestion batching
merge pressure
even why FINAL exists

At first, this can feel strange if you are coming from databases like PostgreSQL or MySQL where updates and row modifications are extremely normal.

But analytical databases think very differently internally.

Traditional OLTP Systems Think in Terms of Updates

In most transactional databases, modifying rows constantly is completely normal.

For example:

UPDATE inventory
SET stock = stock - 1
WHERE product_id = 101;

or:

UPDATE users
SET last_login = now()
WHERE user_id = 42;

These systems are heavily optimized for:

transactional correctness
row-level updates
operational consistency
frequent modifications

Because that is exactly what OLTP workloads need.

And honestly, PostgreSQL is incredibly good at this.

ClickHouse Thinks Very Differently

ClickHouse is not primarily designed around transactional row updates.

It is designed around analytical workloads:

metrics
logs
observability
event streams
historical analytics
large aggregations

And these workloads are naturally append-heavy.

For example:

INSERT INTO events VALUES (...);

or:

INSERT INTO metrics VALUES (...);

New events continuously arrive.

Old data is rarely modified frequently.

That changes the entire storage philosophy underneath.

ClickHouse Stores Data as Immutable Parts

This is one of the most important concepts to understand.

In MergeTree engines, ClickHouse stores inserts as immutable parts on disk.

Meaning:

inserts create new parts instead of constantly rewriting existing rows directly.

And honestly, this is one of the biggest reasons ClickHouse scales analytical ingestion so well.

Because append-heavy writes are operationally much cheaper than constantly rewriting data in place.

Why Immutable Storage Works So Well

Immutable storage gives ClickHouse several advantages:

efficient sequential writes
better compression
reduced locking pressure
faster analytical scans
simpler background merging

Instead of constantly modifying rows directly, ClickHouse can:

append data quickly
merge parts later
optimize storage asynchronously

This model fits analytical systems extremely well.

Especially when ingesting:

logs
metrics
telemetry
clickstream data
observability events

at very large scale.

Columnar Storage Makes This Even More Powerful

Another reason append-heavy storage works so well in ClickHouse is because data is stored by column instead of by row.

This matters a lot for analytical workloads.

Because queries often need only a few columns from massive datasets.

For example:

SELECT avg(response_time_ms)
FROM metrics;

does not need to read:

user_agent
request_headers
payload columns

at all.

And because parts are immutable, ClickHouse can compress these columns extremely efficiently using specialized compression algorithms.

This is one of the reasons analytical scans in ClickHouse can remain surprisingly fast even at very large scale.

This Is Why Background Merges Exist

One thing that confused me initially was:

why ClickHouse relies so heavily on merges.

But once you understand immutable parts, merges make perfect sense.

Because inserts continuously create smaller parts.

And background merges later:

combine them
reduce fragmentation
improve compression
optimize query performance

This is also why:

tiny inserts become dangerous
too many parts create pressure
unhealthy fragmentation slows systems down

Many ClickHouse operational behaviors trace back to this append-heavy storage philosophy underneath.

Why Updates Feel Different in ClickHouse

This does not mean ClickHouse cannot handle updates.

It absolutely can.

But updates behave differently because the storage engine is optimized differently.

In many cases, updates are internally handled through:

mutations
part rewrites
asynchronous merge operations

instead of lightweight in-place row modifications like traditional OLTP systems.

And this is why large-scale frequent updates can feel operationally heavier in ClickHouse.

Because the system is optimizing for analytical scale first.

Not transactional mutation-heavy workloads.

Many Systems Handle Updates as New Inserts Instead

One thing I found interesting is that many ClickHouse workloads avoid frequent in-place updates entirely.

Instead, systems often:

insert newer versions of rows
append updated events
track timestamps or versions

and later use things like:

ReplacingMergeTree
argMax()
merge logic

to retrieve the latest state.

For example:

SELECT
    user_id,
    argMax(status, updated_at)
FROM user_status
GROUP BY user_id;

This fits naturally into ClickHouse’s append-heavy design philosophy.

Instead of constantly rewriting rows directly, systems continuously append newer versions while merges and analytical queries reconcile state later.

Why Event Streams Fit ClickHouse So Naturally

This is honestly where ClickHouse feels extremely powerful.

Modern systems continuously generate:

logs
metrics
traces
user events
telemetry streams

And these workloads naturally behave like:

append-heavy event streams.

New records continuously arrive.

Historical records mostly remain unchanged.

That is exactly the kind of workload ClickHouse loves.

Which is why architectures like:

Applications
      ↓
Kafka / Streaming
      ↓
ClickHouse

feel so natural operationally.

The storage model aligns perfectly with the workload behavior.

This Also Explains Why FINAL Exists

A lot of ClickHouse behavior becomes easier to understand once you think in terms of append-heavy storage.

For example:
ReplacingMergeTree may temporarily contain multiple versions of rows until merges eventually reconcile them.

That is why queries sometimes use:

SELECT * FROM events FINAL;

to apply merge logic during query execution.

Again:

immutable parts
append-heavy ingestion
asynchronous merging

all connect back together underneath.

The Important Lesson

One thing I’ve started realizing with ClickHouse is that many operational behaviors make much more sense once you stop thinking in terms of:

"traditional transactional databases."

ClickHouse is optimizing for:

analytical ingestion
historical querying
append-heavy workloads
large-scale scans

And once you understand that design philosophy, many of its storage behaviors stop feeling strange.

Final Thought

ClickHouse is not trying to behave like a traditional OLTP database.

It is optimizing for analytical scale.

And append-heavy design is one of the biggest reasons it performs so well for:

observability
metrics
event streams
analytical systems
real-time analytics workloads

DEV Community

Why ClickHouse Loves Append-Heavy Workloads

Traditional OLTP Systems Think in Terms of Updates

ClickHouse Thinks Very Differently

ClickHouse Stores Data as Immutable Parts

Why Immutable Storage Works So Well

Columnar Storage Makes This Even More Powerful

This Is Why Background Merges Exist

Why Updates Feel Different in ClickHouse

Many Systems Handle Updates as New Inserts Instead

Why Event Streams Fit ClickHouse So Naturally

This Also Explains Why FINAL Exists

The Important Lesson

Final Thought

Top comments (0)