DEV Community

Cover image for Why ClickHouse Loves Append-Heavy Workloads
Mohamed Hussain S
Mohamed Hussain S

Posted on

Why ClickHouse Loves Append-Heavy Workloads

One thing that makes ClickHouse feel very different from traditional OLTP databases is how much it prefers append-heavy workloads.

And once you understand why, many ClickHouse behaviors suddenly start making sense:

  • immutable parts
  • background merges
  • ingestion batching
  • merge pressure
  • even why FINAL exists

At first, this can feel strange if you are coming from databases like PostgreSQL or MySQL where updates and row modifications are extremely normal.

But analytical databases think very differently internally.


Traditional OLTP Systems Think in Terms of Updates

In most transactional databases, modifying rows constantly is completely normal.

For example:

UPDATE inventory
SET stock = stock - 1
WHERE product_id = 101;
Enter fullscreen mode Exit fullscreen mode

or:

UPDATE users
SET last_login = now()
WHERE user_id = 42;
Enter fullscreen mode Exit fullscreen mode

These systems are heavily optimized for:

  • transactional correctness
  • row-level updates
  • operational consistency
  • frequent modifications

Because that is exactly what OLTP workloads need.

And honestly, PostgreSQL is incredibly good at this.


ClickHouse Thinks Very Differently

ClickHouse is not primarily designed around transactional row updates.

It is designed around analytical workloads:

  • metrics
  • logs
  • observability
  • event streams
  • historical analytics
  • large aggregations

And these workloads are naturally append-heavy.

For example:

INSERT INTO events VALUES (...);
Enter fullscreen mode Exit fullscreen mode

or:

INSERT INTO metrics VALUES (...);
Enter fullscreen mode Exit fullscreen mode

New events continuously arrive.

Old data is rarely modified frequently.

That changes the entire storage philosophy underneath.


ClickHouse Stores Data as Immutable Parts

This is one of the most important concepts to understand.

In MergeTree engines, ClickHouse stores inserts as immutable parts on disk.

Meaning:

inserts create new parts instead of constantly rewriting existing rows directly.

And honestly, this is one of the biggest reasons ClickHouse scales analytical ingestion so well.

Because append-heavy writes are operationally much cheaper than constantly rewriting data in place.


Why Immutable Storage Works So Well

Immutable storage gives ClickHouse several advantages:

  • efficient sequential writes
  • better compression
  • reduced locking pressure
  • faster analytical scans
  • simpler background merging

Instead of constantly modifying rows directly, ClickHouse can:

  • append data quickly
  • merge parts later
  • optimize storage asynchronously

This model fits analytical systems extremely well.

Especially when ingesting:

  • logs
  • metrics
  • telemetry
  • clickstream data
  • observability events

at very large scale.


Columnar Storage Makes This Even More Powerful

Another reason append-heavy storage works so well in ClickHouse is because data is stored by column instead of by row.

This matters a lot for analytical workloads.

Because queries often need only a few columns from massive datasets.

For example:

SELECT avg(response_time_ms)
FROM metrics;
Enter fullscreen mode Exit fullscreen mode

does not need to read:

  • user_agent
  • request_headers
  • payload columns

at all.

And because parts are immutable, ClickHouse can compress these columns extremely efficiently using specialized compression algorithms.

This is one of the reasons analytical scans in ClickHouse can remain surprisingly fast even at very large scale.


This Is Why Background Merges Exist

One thing that confused me initially was:

why ClickHouse relies so heavily on merges.

But once you understand immutable parts, merges make perfect sense.

Because inserts continuously create smaller parts.

And background merges later:

  • combine them
  • reduce fragmentation
  • improve compression
  • optimize query performance

This is also why:

  • tiny inserts become dangerous
  • too many parts create pressure
  • unhealthy fragmentation slows systems down

Many ClickHouse operational behaviors trace back to this append-heavy storage philosophy underneath.


Why Updates Feel Different in ClickHouse

This does not mean ClickHouse cannot handle updates.

It absolutely can.

But updates behave differently because the storage engine is optimized differently.

In many cases, updates are internally handled through:

  • mutations
  • part rewrites
  • asynchronous merge operations

instead of lightweight in-place row modifications like traditional OLTP systems.

And this is why large-scale frequent updates can feel operationally heavier in ClickHouse.

Because the system is optimizing for analytical scale first.

Not transactional mutation-heavy workloads.


Many Systems Handle Updates as New Inserts Instead

One thing I found interesting is that many ClickHouse workloads avoid frequent in-place updates entirely.

Instead, systems often:

  • insert newer versions of rows
  • append updated events
  • track timestamps or versions

and later use things like:

  • ReplacingMergeTree
  • argMax()
  • merge logic

to retrieve the latest state.

For example:

SELECT
    user_id,
    argMax(status, updated_at)
FROM user_status
GROUP BY user_id;
Enter fullscreen mode Exit fullscreen mode

This fits naturally into ClickHouse’s append-heavy design philosophy.

Instead of constantly rewriting rows directly, systems continuously append newer versions while merges and analytical queries reconcile state later.


Why Event Streams Fit ClickHouse So Naturally

This is honestly where ClickHouse feels extremely powerful.

Modern systems continuously generate:

  • logs
  • metrics
  • traces
  • user events
  • telemetry streams

And these workloads naturally behave like:

append-heavy event streams.

New records continuously arrive.

Historical records mostly remain unchanged.

That is exactly the kind of workload ClickHouse loves.

Which is why architectures like:

Applications
      ↓
Kafka / Streaming
      ↓
ClickHouse
Enter fullscreen mode Exit fullscreen mode

feel so natural operationally.

The storage model aligns perfectly with the workload behavior.


This Also Explains Why FINAL Exists

A lot of ClickHouse behavior becomes easier to understand once you think in terms of append-heavy storage.

For example:
ReplacingMergeTree may temporarily contain multiple versions of rows until merges eventually reconcile them.

That is why queries sometimes use:

SELECT * FROM events FINAL;
Enter fullscreen mode Exit fullscreen mode

to apply merge logic during query execution.

Again:

  • immutable parts
  • append-heavy ingestion
  • asynchronous merging

all connect back together underneath.


The Important Lesson

One thing I’ve started realizing with ClickHouse is that many operational behaviors make much more sense once you stop thinking in terms of:

"traditional transactional databases."

ClickHouse is optimizing for:

  • analytical ingestion
  • historical querying
  • append-heavy workloads
  • large-scale scans

And once you understand that design philosophy, many of its storage behaviors stop feeling strange.


Final Thought

ClickHouse is not trying to behave like a traditional OLTP database.

It is optimizing for analytical scale.

And append-heavy design is one of the biggest reasons it performs so well for:

  • observability
  • metrics
  • event streams
  • analytical systems
  • real-time analytics workloads

Top comments (0)