DEV Community

Cover image for ๐Ÿง  ClickHouse LEFT JOINs: Why join_use_nulls Matters
Shahab Ranjbary
Shahab Ranjbary

Posted on

๐Ÿง  ClickHouse LEFT JOINs: Why join_use_nulls Matters

๐Ÿง  Understanding join_use_nulls in ClickHouse

ClickHouse is famous for being blazing fast โ€” but sometimes its SQL semantics can surprise you, especially around JOINs.

Hereโ€™s a simple example that shows how the join_use_nulls setting can completely change your results.

๐Ÿงฉ The Setup

Letโ€™s create two tiny tables:

CREATE TABLE test.id_val(`id` UInt32, `val` UInt32) ENGINE = TinyLog;
INSERT INTO test.id_val VALUES (1,11),(2,12),(3,13);

CREATE TABLE test.id_val2(`id` UInt32, `val` UInt32) ENGINE = TinyLog;
INSERT INTO test.id_val2 VALUES (1,21),(1,22),(3,23);
Enter fullscreen mode Exit fullscreen mode

Weโ€™ve got:

id_val: three rows with IDs 1, 2, and 3
id_val2: three rows, but ID 2 is missing and ID 1 appears twice

๐Ÿงฎ The Default JOIN

Letโ€™s run a LEFT JOIN:

SELECT *
FROM test.id_val
LEFT JOIN test.id_val2 USING (id);
Enter fullscreen mode Exit fullscreen mode

Output:

โ”Œโ”€idโ”€โ”ฌโ”€valโ”€โ”ฌโ”€val_1โ”€โ”
โ”‚  1 โ”‚  11 โ”‚   21  โ”‚
โ”‚  1 โ”‚  11 โ”‚   22  โ”‚
โ”‚  2 โ”‚  12 โ”‚    0  โ”‚
โ”‚  3 โ”‚  13 โ”‚   23  โ”‚
โ””โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Enter fullscreen mode Exit fullscreen mode

Notice something interesting?
When thereโ€™s no matching row for id=2, ClickHouse fills it with a default value โ€” in this case, 0.

For numeric columns, the default is 0.
For strings, it would be an empty string ''.

๐Ÿงฉ Enter join_use_nulls

If you want standard SQL behavior, where missing values become NULL, use:

SELECT *
FROM test.id_val
LEFT JOIN test.id_val2 USING (id)
SETTINGS join_use_nulls = 1;
Enter fullscreen mode Exit fullscreen mode

Now the output looks like this:

โ”Œโ”€idโ”€โ”ฌโ”€valโ”€โ”ฌโ”€val_1โ”€โ”
โ”‚  1 โ”‚  11 โ”‚   21  โ”‚
โ”‚  1 โ”‚  11 โ”‚   22  โ”‚
โ”‚  2 โ”‚  12 โ”‚  NULL โ”‚
โ”‚  3 โ”‚  13 โ”‚   23  โ”‚
โ””โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Enter fullscreen mode Exit fullscreen mode

Thatโ€™s closer to what most SQL engines (like PostgreSQL or MySQL) would do.

โš™๏ธ Why This Matters

This setting affects how ClickHouse represents non-matching rows in joins โ€” which can have downstream effects on:

  • Aggregations
    SELECT count(val_1) FROM ...
    will count 0s differently than NULLs.

  • Exports & BI tools
    Some tools treat 0 as a valid number, but NULL as missing data.

  • Data semantics
    A missing match isnโ€™t always โ€œzeroโ€ โ€” it might mean โ€œunknownโ€.

๐Ÿš€ Performance Note

The default behavior (without join_use_nulls) is slightly faster and more memory-efficient because ClickHouse doesnโ€™t have to track NULL bitmaps.

If youโ€™re doing high-volume analytical joins and donโ€™t care about SQL-standard nulls, keep it off.
If you need correctness and consistency with other systems, turn it on.

๐Ÿ” Check Your Settings

You can check the current value of this setting in your ClickHouse session:

SELECT name, value
FROM system.settings
WHERE name LIKE '%join_use_nulls%';
Enter fullscreen mode Exit fullscreen mode

Or set it globally in your configuration file or session:

SET join_use_nulls = 1;
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“š Reference

For more detail on ClickHouseโ€™s Join engine and its settings (including join_use_nulls), see: ClickHouse docs โ€” Join table engine

๐Ÿ—จ๏ธ Discussion

Have you ever been surprised by ClickHouse JOIN behavior?
Do you leave join_use_nulls off for performance, or always turn it on for clarity?

Top comments (0)