Ankit malik

Posted on May 2

ClickHouse ReplacingMergeTree - Why Your Row Count Does Not Match

#clickhouse #database #warehouse #analytics

Why ClickHouse Silently Drops Your Rows

I was running a batch test and noticed something strange. I inserted 3 rows and got back 2 when I ran SELECT count(). No error, no warning. ClickHouse just quietly dropped one.

Turns out this is not a bug. It is the table engine doing exactly what it is designed to do.

The Engine — ReplacingMergeTree

When you create a table in ClickHouse, you pick an engine. The engine decides how data is stored and whether duplicates are kept or removed.

ReplacingMergeTree is the engine that removes duplicates. You define an ORDER BY key and a version column. If two rows share the same ORDER BY key, only one survives — the one with the highest version value.

Here is the table we will use for all the examples below:

CREATE TABLE test_events
(
    client_id   UInt32,
    username    String,
    event_id    String,
    event_name  String,
    created_at  DateTime64(3, 'UTC')
)
ENGINE = ReplacingMergeTree(created_at)
ORDER BY (client_id, username, event_id);

The dedup key is (client_id, username, event_id). The version column is created_at. When two rows have the same key, the one with the latest created_at wins.

Behaviour 1 — Duplicate in the Same INSERT

Insert 3 rows where row 1 and row 2 have the same key:

INSERT INTO test_events VALUES
(1, 'alice', 'evt_001', 'login',    '2024-01-01 10:00:00.000'),
(1, 'alice', 'evt_001', 'login',    '2024-01-01 10:00:00.000'),
(1, 'alice', 'evt_002', 'purchase', '2024-01-01 10:05:00.000');

Now check what landed in the table:

SELECT * FROM test_events ORDER BY event_id;

You will see 2 rows. The duplicate evt_001 was dropped immediately. When two rows with the same key arrive in the same INSERT block, ClickHouse resolves them right there.

Verify by checking the data parts:

SELECT name, rows, active
FROM system.parts
WHERE database = 'default'
  AND table = 'test_events'
  AND active = 1;

One active part, 2 rows.

Behaviour 2 — Duplicate Across Two Separate INSERTs

Now insert evt_002 again in a new query:

INSERT INTO test_events VALUES
(1, 'alice', 'evt_002', 'purchase', '2024-01-01 10:05:00.000');

Check the parts:

SELECT name, rows, active
FROM system.parts
WHERE database = 'default'
  AND table = 'test_events'
  AND active = 1;

This time you see 2 active parts. The first has 2 rows. The second has 1 row. The duplicate evt_002 is sitting there in its own part.

This is the important thing to understand. ClickHouse does not deduplicate at insert time across separate inserts. It only deduplicates when it merges data parts together. Merges happen in the background automatically, but you have no control over when.

To force it right now:

OPTIMIZE TABLE test_events FINAL;

Check parts again:

SELECT name, rows, active
FROM system.parts
WHERE database = 'default'
  AND table = 'test_events'
  AND active = 1;

Back to 1 active part with 2 rows. The duplicate is gone.

Behaviour 3 — Updating a Row the ClickHouse Way

You cannot do UPDATE in ClickHouse the same way you do in Postgres. Instead you insert a new version of the row with a later timestamp and let ReplacingMergeTree handle it.

Insert evt_001 again with corrected data and a newer created_at:

INSERT INTO test_events VALUES
(1, 'alice', 'evt_001', 'login_success', '2024-01-01 12:00:00.000');

Force merge and check:

OPTIMIZE TABLE test_events FINAL;

SELECT * FROM test_events ORDER BY event_id;

event_id	event_name	created_at
evt_001	login_success	2024-01-01 12:00:00
evt_002	purchase	2024-01-01 10:05:00

The original login row is gone. login_success won because its created_at is higher. This is how updates work in ClickHouse.

Reading Clean Data Without Running OPTIMIZE

If you do not want to wait for a background merge, add FINAL to your SELECT. It applies the dedup logic at read time.

SELECT * FROM test_events FINAL ORDER BY event_id;

This always returns the correct deduplicated result regardless of whether a merge has happened or not. Use this anywhere accurate data matters. Don't use this query in production because this very expensive query.

Bonus — The Second Dedup Layer

While testing this, you will notice something else. Run the same INSERT query twice:

INSERT INTO test_events VALUES
(1, 'alice', 'evt_003', 'checkout', '2024-01-01 13:00:00.000');

SELECT count() FROM test_events;

The first run adds 1 row. The second run changes nothing. No error, just silence.

This is not ReplacingMergeTree. This is INSERT block deduplication — a completely separate layer. ClickHouse tracks the checksum of every INSERT block it receives. If it sees the exact same block again it skips the write entirely.

Change even one value and it goes through again:

INSERT INTO test_events VALUES
(1, 'alice', 'evt_003', 'checkout', '2024-01-01 13:00:00.001');

Different checksum, different block — new row is added.

So there are two separate reasons your row count might not be what you expect:

INSERT block dedup — exact same INSERT repeated, ClickHouse ignores it
ReplacingMergeTree — same logical key across different inserts, ClickHouse keeps only the latest version after a merge

Summary

Situation	What happens
Duplicate in the same INSERT	Dropped immediately
Same key inserted again separately	Both parts exist until a merge
Exact same INSERT run again	Blocked by INSERT block dedup
Background merge runs	Latest version survives, duplicates dropped
`SELECT ... FINAL`	Always reads deduplicated data
`OPTIMIZE TABLE FINAL`	Forces merge right now

Cleanup

DROP TABLE test_events;

DEV Community