DEV Community

Cover image for FINAL in ClickHouse Isn’t as Expensive as It Used to Be
Mohamed Hussain S
Mohamed Hussain S

Posted on

FINAL in ClickHouse Isn’t as Expensive as It Used to Be

For a long time, the advice around FINAL in ClickHouse was pretty straightforward:

Avoid it whenever possible.

And honestly, that advice existed for good reasons.

Older versions of ClickHouse could make FINAL extremely expensive depending on:

  • table size
  • partitioning
  • number of parts
  • merge state
  • query patterns

So people started treating FINAL almost like a red flag.

But modern ClickHouse has changed a lot.

And I think the conversation around FINAL deserves a bit more nuance now.


Why FINAL Existed in the First Place

To understand why FINAL was historically considered expensive, you first need to understand what it actually does.

In engines like:

  • ReplacingMergeTree
  • CollapsingMergeTree
  • VersionedCollapsingMergeTree

ClickHouse does not immediately rewrite rows in place.

Instead:

  • inserts create new parts
  • background merges reconcile rows later
  • deduplication happens asynchronously

That means queries can temporarily see:

  • duplicate versions
  • old versions
  • intermediate states

Example:

SELECT *
FROM users
FINAL;
Enter fullscreen mode Exit fullscreen mode

FINAL forces ClickHouse to apply merge logic during query execution itself.

That means the query may:

  • read more data
  • perform additional deduplication work
  • consume more CPU and memory

This is why older advice strongly discouraged using it everywhere.


The Old FINAL Problem

Historically, FINAL could become painful on large datasets.

Especially when:

  • partitions were large
  • too many parts existed
  • merges lagged behind
  • queries scanned massive ranges

People would add:

FINAL
Enter fullscreen mode Exit fullscreen mode

to "fix" duplicate rows without understanding why duplicates existed in the first place.

The result was often:

  • slower queries
  • higher memory usage
  • unnecessary query overhead

So the community advice became:

Design your schema properly and avoid FINAL whenever possible.

And honestly?

That advice still matters.

But the implementation of FINAL itself has improved significantly over time.


Modern ClickHouse Has Improved FINAL a Lot

Recent ClickHouse versions introduced multiple improvements around FINAL.

Things like:

  • parallel execution
  • partition-aware optimizations
  • improved memory behavior
  • smarter merge execution
  • reduced unnecessary reads

Which means:

FINAL is no longer the monster it used to be.

And this is important because newer ClickHouse guidance has also become more practical about using it when necessary.

Even in some recent discussions and office hours from the ClickHouse ecosystem, using FINAL for latest-state queries is no longer treated as automatically wrong.

That would have sounded controversial a few years ago.


FINAL vs argMax Isn’t Always a Simple Comparison

For a long time, many ClickHouse users avoided FINAL by using patterns like:

SELECT
    id,
    argMax(status, version)
FROM users
GROUP BY id;
Enter fullscreen mode Exit fullscreen mode

And honestly, for older ClickHouse versions and large workloads, that often made sense.

But modern ClickHouse has improved FINAL significantly enough that the tradeoff is no longer as one-sided as it used to be.

In some latest-state query scenarios, using FINAL can now be:

  • simpler
  • easier to maintain
  • and completely reasonable

depending on:

  • table size
  • partitioning
  • query filters
  • merge behavior

The important part is understanding the workload instead of blindly following older rules.


So… Is FINAL Safe to Use Now?

This is where nuance matters.

The answer is not:

  • "FINAL bad"

and also not:

  • "FINAL free now"

The real answer is:

FINAL is much more practical in modern ClickHouse, but workload design still matters.

That distinction is important.


Where FINAL Makes Sense

There are legitimate cases where FINAL is completely reasonable now.

For example:

  • latest-state queries
  • smaller partitions
  • low-latency analytical workloads
  • deduplicated views over mutable datasets
  • operational analytics

Especially when using:

  • proper partitioning
  • controlled part counts
  • optimized schemas

In these cases, modern ClickHouse handles FINAL much better than older versions did.


Where FINAL Can Still Hurt

Even with improvements, FINAL is not magically free.

It can still become expensive when:

  • scanning huge datasets
  • querying many partitions
  • merges are heavily delayed
  • part counts explode
  • schema design is poor

For example:

SELECT *
FROM massive_events_table
FINAL
WHERE timestamp >= now() - INTERVAL 30 DAY;
Enter fullscreen mode Exit fullscreen mode

On very large analytical datasets, this can still force substantial extra work.

So blindly adding FINAL everywhere is still not a great idea.


SELECT ... FINAL vs OPTIMIZE TABLE ... FINAL

One important distinction:

SELECT * FROM users FINAL;
Enter fullscreen mode Exit fullscreen mode

and

OPTIMIZE TABLE users FINAL;
Enter fullscreen mode Exit fullscreen mode

are completely different operations.

SELECT ... FINAL applies merge logic during query execution.

OPTIMIZE TABLE ... FINAL forces a heavy merge operation on storage parts themselves.

The first is a query-time behavior.

The second is a storage-level operation that can become extremely expensive on large datasets.

People often mix these two together when discussing FINAL performance, but they solve very different problems.


The Bigger Lesson Is Understanding Why You Need FINAL

This is honestly the most important part.

A lot of people use FINAL reactively.

They see:

  • duplicate rows
  • outdated versions
  • inconsistent query results

and immediately add:

FINAL
Enter fullscreen mode Exit fullscreen mode

without understanding:

  • merge behavior
  • part lifecycle
  • asynchronous deduplication
  • storage engine behavior

That usually creates larger problems later.

The better approach is:

Understand why the table requires FINAL in the first place.

Because sometimes:

  • the schema can improve
  • partitioning can improve
  • merges can stabilize naturally
  • query design can change

And sometimes:

  • using FINAL is actually perfectly acceptable.

ClickHouse Advice Evolves Too

One thing I find interesting about ClickHouse is how quickly operational advice evolves as the engine improves.

Advice that was absolutely correct for older versions can become incomplete later.

And I think FINAL is one of the best examples of that.

Older guidance:

avoid FINAL aggressively

Modern reality:

understand FINAL properly before deciding whether to avoid it

That is a much more useful mental model now.


Final Thought

I still would not recommend blindly adding FINAL everywhere.

But I also do not think modern ClickHouse users should automatically treat it like a disaster anymore.

The real question is not:

"Is FINAL bad?"

The real question is:

"Why does this query need FINAL, and is that tradeoff acceptable for this workload?"

That mindset leads to much better ClickHouse designs than simply following old rules blindly.


References

ClickHouse Docs - FINAL Modifier

Altinity KB - FINAL Clause Speed

Top comments (0)