DEV Community

Cover image for arrayJoin in ClickHouse: Why Your Rows Are Duplicating (and How to Control It)
Mohamed Hussain S
Mohamed Hussain S

Posted on

arrayJoin in ClickHouse: Why Your Rows Are Duplicating (and How to Control It)

When working with arrays in ClickHouse, arrayJoin feels straightforward.

Until your query suddenly returns far more rows than expected.


The Use Case

Let’s say you have a table like this:

CREATE TABLE events (
    user_id UInt32,
    actions Array(String)
) ENGINE = MergeTree
ORDER BY user_id;
Enter fullscreen mode Exit fullscreen mode

Example row:

user_id: 1
actions: ['click', 'scroll', 'purchase']
Enter fullscreen mode Exit fullscreen mode

Now you want each action as a separate row.


The Tool: arrayJoin

SELECT user_id, arrayJoin(actions) AS action
FROM events;
Enter fullscreen mode Exit fullscreen mode

Output:

1   click
1   scroll
1   purchase
Enter fullscreen mode Exit fullscreen mode

So far, everything looks correct.


Where Things Go Wrong

Now let’s say you write:

SELECT user_id,
       arrayJoin(actions) AS action,
       arrayJoin(actions) AS action2
FROM events;
Enter fullscreen mode Exit fullscreen mode

You might expect:

  • 3 rows

But you actually get:

  • 9 rows

Why This Happens

arrayJoin doesn’t just flatten arrays.

It expands rows.

Each element in the array creates a new row.

So when you use it multiple times:

  • First arrayJoin → expands rows
  • Second arrayJoin → expands again

Result:

3 elements → 3 × 3 = 9 rows

This is effectively a cartesian multiplication of rows.


The Hidden Impact

This becomes a real problem when:

  • Arrays are large
  • Multiple arrayJoins are used
  • You don’t expect row multiplication

Result:

  • Incorrect output
  • Sudden increase in row count
  • Slower queries

The Better Approach

1. Use a single arrayJoin when possible

SELECT user_id,
       arrayJoin(actions) AS action
FROM events;
Enter fullscreen mode Exit fullscreen mode

2. Use ARRAY JOIN syntax (cleaner and explicit)

SELECT user_id, action
FROM events
ARRAY JOIN actions AS action;
Enter fullscreen mode Exit fullscreen mode

3. Use arrayZip to avoid unintended multiplication

If you’re working with multiple arrays:

SELECT user_id,
       arrayJoin(arrayZip(actions, actions)) AS zipped
FROM events;
Enter fullscreen mode Exit fullscreen mode

This ensures elements are paired instead of multiplied.


Why This Matters

arrayJoin is powerful-but easy to misuse.

If used without understanding:

  • Row count can explode
  • Queries become expensive
  • Results can be misleading

Real-World Use Cases

  • Event tracking pipelines
  • Flattening nested JSON
  • Working with semi-structured logs
  • Exploding arrays into rows for analysis

One Important Gotcha

Every arrayJoin multiplies rows.

If your result size looks unexpectedly large, this is one of the first things to check.


Final Thoughts

arrayJoin is one of the most useful tools in ClickHouse.

But its behavior is not always intuitive.

In many cases, the issue is not the data itself-but how the query expands it.

Understanding this early can save a lot of debugging time.

Top comments (0)