When working with arrays in ClickHouse, arrayJoin feels straightforward.
Until your query suddenly returns far more rows than expected.
The Use Case
Let’s say you have a table like this:
CREATE TABLE events (
user_id UInt32,
actions Array(String)
) ENGINE = MergeTree
ORDER BY user_id;
Example row:
user_id: 1
actions: ['click', 'scroll', 'purchase']
Now you want each action as a separate row.
The Tool: arrayJoin
SELECT user_id, arrayJoin(actions) AS action
FROM events;
Output:
1 click
1 scroll
1 purchase
So far, everything looks correct.
Where Things Go Wrong
Now let’s say you write:
SELECT user_id,
arrayJoin(actions) AS action,
arrayJoin(actions) AS action2
FROM events;
You might expect:
- 3 rows
But you actually get:
- 9 rows
Why This Happens
arrayJoin doesn’t just flatten arrays.
It expands rows.
Each element in the array creates a new row.
So when you use it multiple times:
- First
arrayJoin→ expands rows - Second
arrayJoin→ expands again
Result:
3 elements → 3 × 3 = 9 rows
This is effectively a cartesian multiplication of rows.
The Hidden Impact
This becomes a real problem when:
- Arrays are large
- Multiple
arrayJoins are used - You don’t expect row multiplication
Result:
- Incorrect output
- Sudden increase in row count
- Slower queries
The Better Approach
1. Use a single arrayJoin when possible
SELECT user_id,
arrayJoin(actions) AS action
FROM events;
2. Use ARRAY JOIN syntax (cleaner and explicit)
SELECT user_id, action
FROM events
ARRAY JOIN actions AS action;
3. Use arrayZip to avoid unintended multiplication
If you’re working with multiple arrays:
SELECT user_id,
arrayJoin(arrayZip(actions, actions)) AS zipped
FROM events;
This ensures elements are paired instead of multiplied.
Why This Matters
arrayJoin is powerful-but easy to misuse.
If used without understanding:
- Row count can explode
- Queries become expensive
- Results can be misleading
Real-World Use Cases
- Event tracking pipelines
- Flattening nested JSON
- Working with semi-structured logs
- Exploding arrays into rows for analysis
One Important Gotcha
Every arrayJoin multiplies rows.
If your result size looks unexpectedly large, this is one of the first things to check.
Final Thoughts
arrayJoin is one of the most useful tools in ClickHouse.
But its behavior is not always intuitive.
In many cases, the issue is not the data itself-but how the query expands it.
Understanding this early can save a lot of debugging time.
Top comments (0)