For a long time, the advice around FINAL in ClickHouse was pretty straightforward:
Avoid it whenever possible.
And honestly, that advice existed for good reasons.
Older versions of ClickHouse could make FINAL extremely expensive depending on:
- table size
- partitioning
- number of parts
- merge state
- query patterns
So people started treating FINAL almost like a red flag.
But modern ClickHouse has changed a lot.
And I think the conversation around FINAL deserves a bit more nuance now.
Why FINAL Existed in the First Place
To understand why FINAL was historically considered expensive, you first need to understand what it actually does.
In engines like:
ReplacingMergeTreeCollapsingMergeTreeVersionedCollapsingMergeTree
ClickHouse does not immediately rewrite rows in place.
Instead:
- inserts create new parts
- background merges reconcile rows later
- deduplication happens asynchronously
That means queries can temporarily see:
- duplicate versions
- old versions
- intermediate states
Example:
SELECT *
FROM users
FINAL;
FINAL forces ClickHouse to apply merge logic during query execution itself.
That means the query may:
- read more data
- perform additional deduplication work
- consume more CPU and memory
This is why older advice strongly discouraged using it everywhere.
The Old FINAL Problem
Historically, FINAL could become painful on large datasets.
Especially when:
- partitions were large
- too many parts existed
- merges lagged behind
- queries scanned massive ranges
People would add:
FINAL
to "fix" duplicate rows without understanding why duplicates existed in the first place.
The result was often:
- slower queries
- higher memory usage
- unnecessary query overhead
So the community advice became:
Design your schema properly and avoid FINAL whenever possible.
And honestly?
That advice still matters.
But the implementation of FINAL itself has improved significantly over time.
Modern ClickHouse Has Improved FINAL a Lot
Recent ClickHouse versions introduced multiple improvements around FINAL.
Things like:
- parallel execution
- partition-aware optimizations
- improved memory behavior
- smarter merge execution
- reduced unnecessary reads
Which means:
FINALis no longer the monster it used to be.
And this is important because newer ClickHouse guidance has also become more practical about using it when necessary.
Even in some recent discussions and office hours from the ClickHouse ecosystem, using FINAL for latest-state queries is no longer treated as automatically wrong.
That would have sounded controversial a few years ago.
FINAL vs argMax Isn’t Always a Simple Comparison
For a long time, many ClickHouse users avoided FINAL by using patterns like:
SELECT
id,
argMax(status, version)
FROM users
GROUP BY id;
And honestly, for older ClickHouse versions and large workloads, that often made sense.
But modern ClickHouse has improved FINAL significantly enough that the tradeoff is no longer as one-sided as it used to be.
In some latest-state query scenarios, using FINAL can now be:
- simpler
- easier to maintain
- and completely reasonable
depending on:
- table size
- partitioning
- query filters
- merge behavior
The important part is understanding the workload instead of blindly following older rules.
So… Is FINAL Safe to Use Now?
This is where nuance matters.
The answer is not:
- "FINAL bad"
and also not:
- "FINAL free now"
The real answer is:
FINAL is much more practical in modern ClickHouse, but workload design still matters.
That distinction is important.
Where FINAL Makes Sense
There are legitimate cases where FINAL is completely reasonable now.
For example:
- latest-state queries
- smaller partitions
- low-latency analytical workloads
- deduplicated views over mutable datasets
- operational analytics
Especially when using:
- proper partitioning
- controlled part counts
- optimized schemas
In these cases, modern ClickHouse handles FINAL much better than older versions did.
Where FINAL Can Still Hurt
Even with improvements, FINAL is not magically free.
It can still become expensive when:
- scanning huge datasets
- querying many partitions
- merges are heavily delayed
- part counts explode
- schema design is poor
For example:
SELECT *
FROM massive_events_table
FINAL
WHERE timestamp >= now() - INTERVAL 30 DAY;
On very large analytical datasets, this can still force substantial extra work.
So blindly adding FINAL everywhere is still not a great idea.
SELECT ... FINAL vs OPTIMIZE TABLE ... FINAL
One important distinction:
SELECT * FROM users FINAL;
and
OPTIMIZE TABLE users FINAL;
are completely different operations.
SELECT ... FINAL applies merge logic during query execution.
OPTIMIZE TABLE ... FINAL forces a heavy merge operation on storage parts themselves.
The first is a query-time behavior.
The second is a storage-level operation that can become extremely expensive on large datasets.
People often mix these two together when discussing FINAL performance, but they solve very different problems.
The Bigger Lesson Is Understanding Why You Need FINAL
This is honestly the most important part.
A lot of people use FINAL reactively.
They see:
- duplicate rows
- outdated versions
- inconsistent query results
and immediately add:
FINAL
without understanding:
- merge behavior
- part lifecycle
- asynchronous deduplication
- storage engine behavior
That usually creates larger problems later.
The better approach is:
Understand why the table requires FINAL in the first place.
Because sometimes:
- the schema can improve
- partitioning can improve
- merges can stabilize naturally
- query design can change
And sometimes:
- using FINAL is actually perfectly acceptable.
ClickHouse Advice Evolves Too
One thing I find interesting about ClickHouse is how quickly operational advice evolves as the engine improves.
Advice that was absolutely correct for older versions can become incomplete later.
And I think FINAL is one of the best examples of that.
Older guidance:
avoid FINAL aggressively
Modern reality:
understand FINAL properly before deciding whether to avoid it
That is a much more useful mental model now.
Final Thought
I still would not recommend blindly adding FINAL everywhere.
But I also do not think modern ClickHouse users should automatically treat it like a disaster anymore.
The real question is not:
"Is FINAL bad?"
The real question is:
"Why does this query need FINAL, and is that tradeoff acceptable for this workload?"
That mindset leads to much better ClickHouse designs than simply following old rules blindly.
Top comments (0)