One thing that confused me when I first started learning ClickHouse was the word FINAL.
Because eventually you'll come across both:
SELECT *
FROM events FINAL;
and:
OPTIMIZE TABLE events FINAL;
At first glance, they sound like they should do roughly the same thing.
After all, both contain the word FINAL.
But they actually solve two completely different problems.
One affects query results.
The other affects how data is physically stored.
Understanding this distinction can save a lot of confusion when working with MergeTree tables.
Why This Confusion Happens
Most people encounter FINAL while working with engines like:
- ReplacingMergeTree
- SummingMergeTree
- AggregatingMergeTree
Sooner or later they notice something like:
SELECT *
FROM users;
returns duplicate versions of rows.
Then they discover:
SELECT *
FROM users FINAL;
and suddenly the results look correct.
Naturally, many people assume:
FINAL merges the table.
But that's not exactly what is happening.
What SELECT FINAL Actually Does
When you run:
SELECT *
FROM users FINAL;
ClickHouse applies merge logic during query execution.
Think of it as:
"Show me what the table would look like if all relevant merges had already happened."
The important part:
It only affects the query result.
After the query finishes:
- parts remain unchanged
- storage remains unchanged
- nothing is rewritten on disk
The merge logic happens temporarily while the query is running.
Once the query completes, the table is exactly as it was before.
What OPTIMIZE FINAL Actually Does
Now let's look at:
OPTIMIZE TABLE users FINAL;
This is a completely different operation.
Instead of modifying query results, ClickHouse physically merges parts on disk.
The operation:
- rewrites data
- merges eligible parts
- removes obsolete versions
- creates larger merged parts
Unlike SELECT FINAL, the effects remain after the command completes.
This is a storage operation, not a query operation.
The Simplest Way to Remember It
Whenever I think about these commands, I use a very simple mental model:
| Command | Purpose |
|---|---|
| SELECT FINAL | Clean the result |
| OPTIMIZE FINAL | Clean the storage |
That's really the core difference.
One affects what you see.
The other affects how the data is stored.
Does OPTIMIZE FINAL Create One Giant Part?
This is another common misconception.
Suppose your table is partitioned like this:
PARTITION BY toYYYYMM(event_date)
and contains:
2025-01
2025-02
2025-03
Many people assume:
OPTIMIZE TABLE events FINAL;
will merge the entire table into one huge part.
It won't.
Merge operations do not cross partition boundaries.
What you are more likely to end up with is:
2025-01 -> one large part
2025-02 -> one large part
2025-03 -> one large part
Each partition is optimized independently.
This distinction becomes important when working with large datasets.
Should You Use SELECT FINAL Everywhere?
Not really.
FINAL is incredibly useful when correctness matters.
For example:
SELECT *
FROM users FINAL;
may be exactly what you need when querying a ReplacingMergeTree table and you want the latest state immediately.
But it still introduces additional work during query execution.
So while modern ClickHouse versions have significantly improved FINAL performance, it shouldn't automatically become your default query pattern.
Use it when you need the merge logic.
Not because it's available.
Should You Run OPTIMIZE FINAL Regularly?
Also no.
This is another mistake people sometimes make.
OPTIMIZE FINAL is a heavy operation.
It forces merges that ClickHouse would normally schedule on its own.
In many cases, background merges already do a good job of maintaining healthy storage.
Running:
OPTIMIZE TABLE events FINAL;
every time you insert data is usually unnecessary.
Think of it as an operational tool.
Not a routine query optimization technique.
When Would You Use Each?
SELECT FINAL
Useful when:
- querying ReplacingMergeTree tables
- validating latest state
- merge results are needed immediately
OPTIMIZE FINAL
Useful when:
- forcing merges intentionally
- maintenance operations
- testing storage behavior
- special operational situations
Both have valid use cases.
They simply solve different problems.
Final Thoughts
The word FINAL appears in both commands, which makes them easy to confuse.
But once you understand the difference, many ClickHouse behaviors start making a lot more sense.
SELECT FINAL does not physically merge your table.
It only applies merge logic while reading data.
OPTIMIZE FINAL actually rewrites and merges parts on disk.
Or put another way:
SELECT FINAL cleans what you see.
OPTIMIZE FINAL cleans how the data is stored.
And that's a distinction every ClickHouse engineer should understand.
Top comments (0)