Introduction
Modern analytical workloads often involve working with multi-valued and semi-structured data. Traditional relational database designs typically represent these relationships using separate tables connected through joins. While this approach works well, it can increase schema complexity and slow down analytical queries as datasets grow.
ClickHouse® provides a powerful Array data type that allows multiple values to be stored within a single column. Combined with a rich collection of built-in array functions, arrays make it possible to efficiently store, query, and transform complex datasets while maintaining the high-performance analytics that ClickHouse® is known for.
In this article, you'll learn how to create tables with array columns, insert and query array data, explore commonly used array functions, and understand real-world scenarios where arrays simplify data modeling and improve performance.
Why Use Arrays in ClickHouse®?
Arrays allow you to store multiple related values inside a single column instead of spreading them across multiple rows or tables.
This approach offers several advantages:
- Reduces the need for joins
- Simplifies database schema design
- Improves query performance
- Efficiently stores lists, tags, events, and user activities
- Supports a wide range of optimized built-in array functions
Arrays are widely used in applications such as:
- Website analytics
- User behavior tracking
- IoT monitoring
- Recommendation systems
- Product catalogs
- Event logging
Understanding the Array Data Type
An Array in ClickHouse® is a collection of values of the same data type stored together in a single column.
The general syntax is:
Array(DataType)
Examples include:
Array(String)
Array(Int32)
Array(Float64)
Arrays are commonly used for storing:
- Product lists
- User interests
- Website page visits
- Sensor readings
- Event sequences
- Tags and categories
Instead of storing product names across multiple rows, they can be stored together in a single array:
['Laptop', 'Mouse', 'Keyboard']
This keeps related information together and often makes analytical queries much simpler.
Creating a Table with Array Columns
Let's create a table that stores each user's purchase history.
CREATE TABLE user_purchases
(
user_id UInt32,
purchased_items Array(String)
)
ENGINE = MergeTree
ORDER BY user_id;
Table Structure
| Column | Description |
|---|---|
| user_id | Unique user identifier |
| purchased_items | Array containing purchased products |
This design works well because all purchased items naturally belong to the same user.
Inserting Array Data
Insert a single record:
INSERT INTO user_purchases VALUES
(
1,
['Laptop', 'Mouse', 'Keyboard']
);
Insert multiple records:
INSERT INTO user_purchases VALUES
(2, ['Monitor', 'Webcam']),
(3, ['SSD', 'RAM', 'CPU']);
Querying Array Data
Retrieve all records:
SELECT *
FROM user_purchases;
Example output:
| user_id | purchased_items |
|---|---|
| 1 | ['Laptop','Mouse','Keyboard'] |
| 2 | ['Monitor','Webcam'] |
| 3 | ['SSD','RAM','CPU'] |
Accessing Individual Array Elements
ClickHouse® arrays use 1-based indexing, meaning the first element starts at index 1.
Retrieve the first purchased item:
SELECT
purchased_items[1] AS first_item
FROM user_purchases;
Output:
Laptop
You can access any position using the corresponding index.
Common Array Functions
ClickHouse® includes many optimized functions for working with arrays.
1. arrayJoin()
The arrayJoin() function expands every array element into a separate row.
Example:
SELECT
user_id,
arrayJoin(purchased_items) AS item
FROM user_purchases;
Output:
| user_id | item |
|---|---|
| 1 | Laptop |
| 1 | Mouse |
| 1 | Keyboard |
Common Use Cases
- Product analytics
- Event tracking
- Aggregations
- Reporting
- Visualization
2. arrayMap()
arrayMap() applies a transformation to every element in an array.
Example:
SELECT
arrayMap(x -> upperUTF8(x), purchased_items)
FROM user_purchases;
Output:
['LAPTOP','MOUSE','KEYBOARD']
Common Use Cases
- Data transformation
- String formatting
- Mathematical operations
- Data standardization
3. arrayFilter()
arrayFilter() returns only the elements that satisfy a condition.
Example:
SELECT
arrayFilter(x -> x LIKE '%o%', purchased_items)
FROM user_purchases;
Output:
['Laptop','Mouse']
Common Use Cases
- Product filtering
- Event selection
- Preference analysis
- Data cleansing
4. arrayExists()
arrayExists() checks whether at least one element satisfies a condition.
Example:
SELECT
arrayExists(x -> x = 'Mouse', purchased_items)
FROM user_purchases;
Output:
1
Where:
- 1 = True
- 0 = False
Common Use Cases
- Membership testing
- Product existence checks
- User activity validation
5. arrayDistinct()
arrayDistinct() removes duplicate values.
Example:
SELECT
arrayDistinct(
['Laptop','Mouse','Laptop','Keyboard']
);
Output:
['Laptop','Mouse','Keyboard']
Common Use Cases
- Removing duplicates
- Data cleaning
- Extracting unique values
6. arrayCount()
arrayCount() counts elements matching a condition.
Example:
SELECT
arrayCount(x -> x > 10, [5,15,20,8]);
Output:
2
Only 15 and 20 satisfy the condition.
Common Use Cases
- Threshold monitoring
- User activity analysis
- Counting qualifying values
- Analytical calculations
Working with Related Arrays
Arrays become even more useful when multiple arrays represent related information.
Consider website analytics:
CREATE TABLE website_sessions
(
session_id UInt64,
pages Array(String),
durations Array(UInt32)
)
ENGINE = MergeTree
ORDER BY session_id;
Insert sample data:
INSERT INTO website_sessions VALUES
(
1,
['/home','/products','/checkout'],
[10,25,15]
);
This represents:
| Page | Duration (Seconds) |
|---|---|
| /home | 10 |
| /products | 25 |
| /checkout | 15 |
Keeping these arrays together preserves both sequence and context, making session analysis much easier.
Common applications include:
- Website navigation tracking
- User journey analysis
- IoT sensor monitoring
- Event stream processing
- Application monitoring
Performance Considerations
To get the best performance when using arrays:
- Keep arrays reasonably sized.
- Avoid storing thousands of elements in a single row unless necessary.
- Use optimized functions such as
arrayMap(),arrayFilter(), andarrayExists(). - Minimize the use of
arrayJoin()since it expands rows and can significantly increase query processing. - Store related values together when they are frequently queried.
- Benchmark queries on production-like datasets before deployment.
Following these recommendations helps maintain ClickHouse®'s excellent analytical performance.
Real-World Use Cases
E-Commerce Analytics
Store purchased products in a transaction.
['Laptop','Mouse','Keyboard']
Benefits:
- Simplified schema
- Fewer joins
- Faster analysis
User Activity Tracking
Track user actions during a session.
['Login','Search','AddToCart','Checkout']
Useful for:
- Customer journey analysis
- Behavioral analytics
- Funnel analysis
Website Navigation Analysis
Store page visit sequences.
['Home','Products','Pricing','Contact']
Useful for:
- Clickstream analysis
- Conversion optimization
- Navigation insights
IoT Sensor Monitoring
Store multiple sensor readings.
[23.1, 23.4, 22.9, 24.0]
Useful for:
- Time-series analytics
- Monitoring systems
- Device telemetry
Recommendation Systems
Store user interests.
['Technology','Programming','AI']
Useful for:
- Personalized recommendations
- Content suggestions
- Product recommendations
Best Practices
When working with arrays:
- Use arrays only for naturally grouped data.
- Avoid extremely large arrays.
- Prefer built-in array functions over custom processing.
- Avoid excessive nesting of arrays.
- Monitor query performance using ClickHouse® system tables and query logs.
-
Document array structures, including:
- Element ordering
- Data types
- Business meaning
- Expected size limits
Good documentation helps maintain consistency as applications evolve.
Conclusion
Arrays are one of the most powerful data types available in ClickHouse®. They provide an efficient way to store and analyze multi-valued data without relying heavily on additional tables or complex joins.
Functions such as arrayJoin(), arrayMap(), arrayFilter(), arrayExists(), arrayDistinct(), and arrayCount() make it easy to manipulate and analyze array data while taking advantage of ClickHouse®'s optimized execution engine.
Whether you're building analytics for e-commerce platforms, monitoring IoT devices, tracking website sessions, or powering recommendation systems, arrays help create cleaner schemas, simplify queries, and deliver fast analytical performance.
Understanding when and how to use arrays effectively will allow you to model complex datasets more naturally while getting the most out of ClickHouse®'s high-performance architecture.
Top comments (0)