DEV Community

Cover image for Day 30 of 100 Days of ClickHouse® - Using ClickHouse® Arrays for Complex Data Structures
Kanishga Subramani
Kanishga Subramani

Posted on

Day 30 of 100 Days of ClickHouse® - Using ClickHouse® Arrays for Complex Data Structures

Introduction

Modern analytical workloads often involve working with multi-valued and semi-structured data. Traditional relational database designs typically represent these relationships using separate tables connected through joins. While this approach works well, it can increase schema complexity and slow down analytical queries as datasets grow.

ClickHouse® provides a powerful Array data type that allows multiple values to be stored within a single column. Combined with a rich collection of built-in array functions, arrays make it possible to efficiently store, query, and transform complex datasets while maintaining the high-performance analytics that ClickHouse® is known for.

In this article, you'll learn how to create tables with array columns, insert and query array data, explore commonly used array functions, and understand real-world scenarios where arrays simplify data modeling and improve performance.


Why Use Arrays in ClickHouse®?

Arrays allow you to store multiple related values inside a single column instead of spreading them across multiple rows or tables.

This approach offers several advantages:

  • Reduces the need for joins
  • Simplifies database schema design
  • Improves query performance
  • Efficiently stores lists, tags, events, and user activities
  • Supports a wide range of optimized built-in array functions

Arrays are widely used in applications such as:

  • Website analytics
  • User behavior tracking
  • IoT monitoring
  • Recommendation systems
  • Product catalogs
  • Event logging

Understanding the Array Data Type

An Array in ClickHouse® is a collection of values of the same data type stored together in a single column.

The general syntax is:

Array(DataType)
Enter fullscreen mode Exit fullscreen mode

Examples include:

Array(String)

Array(Int32)

Array(Float64)
Enter fullscreen mode Exit fullscreen mode

Arrays are commonly used for storing:

  • Product lists
  • User interests
  • Website page visits
  • Sensor readings
  • Event sequences
  • Tags and categories

Instead of storing product names across multiple rows, they can be stored together in a single array:

['Laptop', 'Mouse', 'Keyboard']
Enter fullscreen mode Exit fullscreen mode

This keeps related information together and often makes analytical queries much simpler.


Creating a Table with Array Columns

Let's create a table that stores each user's purchase history.

CREATE TABLE user_purchases
(
    user_id UInt32,
    purchased_items Array(String)
)
ENGINE = MergeTree
ORDER BY user_id;
Enter fullscreen mode Exit fullscreen mode

Table Structure

Column Description
user_id Unique user identifier
purchased_items Array containing purchased products

This design works well because all purchased items naturally belong to the same user.


Inserting Array Data

Insert a single record:

INSERT INTO user_purchases VALUES
(
    1,
    ['Laptop', 'Mouse', 'Keyboard']
);
Enter fullscreen mode Exit fullscreen mode

Insert multiple records:

INSERT INTO user_purchases VALUES
(2, ['Monitor', 'Webcam']),
(3, ['SSD', 'RAM', 'CPU']);
Enter fullscreen mode Exit fullscreen mode

Querying Array Data

Retrieve all records:

SELECT *
FROM user_purchases;
Enter fullscreen mode Exit fullscreen mode

Example output:

user_id purchased_items
1 ['Laptop','Mouse','Keyboard']
2 ['Monitor','Webcam']
3 ['SSD','RAM','CPU']

Accessing Individual Array Elements

ClickHouse® arrays use 1-based indexing, meaning the first element starts at index 1.

Retrieve the first purchased item:

SELECT
    purchased_items[1] AS first_item
FROM user_purchases;
Enter fullscreen mode Exit fullscreen mode

Output:

Laptop
Enter fullscreen mode Exit fullscreen mode

You can access any position using the corresponding index.


Common Array Functions

ClickHouse® includes many optimized functions for working with arrays.

1. arrayJoin()

The arrayJoin() function expands every array element into a separate row.

Example:

SELECT
    user_id,
    arrayJoin(purchased_items) AS item
FROM user_purchases;
Enter fullscreen mode Exit fullscreen mode

Output:

user_id item
1 Laptop
1 Mouse
1 Keyboard

Common Use Cases

  • Product analytics
  • Event tracking
  • Aggregations
  • Reporting
  • Visualization

2. arrayMap()

arrayMap() applies a transformation to every element in an array.

Example:

SELECT
    arrayMap(x -> upperUTF8(x), purchased_items)
FROM user_purchases;
Enter fullscreen mode Exit fullscreen mode

Output:

['LAPTOP','MOUSE','KEYBOARD']
Enter fullscreen mode Exit fullscreen mode

Common Use Cases

  • Data transformation
  • String formatting
  • Mathematical operations
  • Data standardization

3. arrayFilter()

arrayFilter() returns only the elements that satisfy a condition.

Example:

SELECT
    arrayFilter(x -> x LIKE '%o%', purchased_items)
FROM user_purchases;
Enter fullscreen mode Exit fullscreen mode

Output:

['Laptop','Mouse']
Enter fullscreen mode Exit fullscreen mode

Common Use Cases

  • Product filtering
  • Event selection
  • Preference analysis
  • Data cleansing

4. arrayExists()

arrayExists() checks whether at least one element satisfies a condition.

Example:

SELECT
    arrayExists(x -> x = 'Mouse', purchased_items)
FROM user_purchases;
Enter fullscreen mode Exit fullscreen mode

Output:

1
Enter fullscreen mode Exit fullscreen mode

Where:

  • 1 = True
  • 0 = False

Common Use Cases

  • Membership testing
  • Product existence checks
  • User activity validation

5. arrayDistinct()

arrayDistinct() removes duplicate values.

Example:

SELECT
    arrayDistinct(
        ['Laptop','Mouse','Laptop','Keyboard']
    );
Enter fullscreen mode Exit fullscreen mode

Output:

['Laptop','Mouse','Keyboard']
Enter fullscreen mode Exit fullscreen mode

Common Use Cases

  • Removing duplicates
  • Data cleaning
  • Extracting unique values

6. arrayCount()

arrayCount() counts elements matching a condition.

Example:

SELECT
    arrayCount(x -> x > 10, [5,15,20,8]);
Enter fullscreen mode Exit fullscreen mode

Output:

2
Enter fullscreen mode Exit fullscreen mode

Only 15 and 20 satisfy the condition.

Common Use Cases

  • Threshold monitoring
  • User activity analysis
  • Counting qualifying values
  • Analytical calculations

Working with Related Arrays

Arrays become even more useful when multiple arrays represent related information.

Consider website analytics:

CREATE TABLE website_sessions
(
    session_id UInt64,
    pages Array(String),
    durations Array(UInt32)
)
ENGINE = MergeTree
ORDER BY session_id;
Enter fullscreen mode Exit fullscreen mode

Insert sample data:

INSERT INTO website_sessions VALUES
(
    1,
    ['/home','/products','/checkout'],
    [10,25,15]
);
Enter fullscreen mode Exit fullscreen mode

This represents:

Page Duration (Seconds)
/home 10
/products 25
/checkout 15

Keeping these arrays together preserves both sequence and context, making session analysis much easier.

Common applications include:

  • Website navigation tracking
  • User journey analysis
  • IoT sensor monitoring
  • Event stream processing
  • Application monitoring

Performance Considerations

To get the best performance when using arrays:

  • Keep arrays reasonably sized.
  • Avoid storing thousands of elements in a single row unless necessary.
  • Use optimized functions such as arrayMap(), arrayFilter(), and arrayExists().
  • Minimize the use of arrayJoin() since it expands rows and can significantly increase query processing.
  • Store related values together when they are frequently queried.
  • Benchmark queries on production-like datasets before deployment.

Following these recommendations helps maintain ClickHouse®'s excellent analytical performance.


Real-World Use Cases

E-Commerce Analytics

Store purchased products in a transaction.

['Laptop','Mouse','Keyboard']
Enter fullscreen mode Exit fullscreen mode

Benefits:

  • Simplified schema
  • Fewer joins
  • Faster analysis

User Activity Tracking

Track user actions during a session.

['Login','Search','AddToCart','Checkout']
Enter fullscreen mode Exit fullscreen mode

Useful for:

  • Customer journey analysis
  • Behavioral analytics
  • Funnel analysis

Website Navigation Analysis

Store page visit sequences.

['Home','Products','Pricing','Contact']
Enter fullscreen mode Exit fullscreen mode

Useful for:

  • Clickstream analysis
  • Conversion optimization
  • Navigation insights

IoT Sensor Monitoring

Store multiple sensor readings.

[23.1, 23.4, 22.9, 24.0]
Enter fullscreen mode Exit fullscreen mode

Useful for:

  • Time-series analytics
  • Monitoring systems
  • Device telemetry

Recommendation Systems

Store user interests.

['Technology','Programming','AI']
Enter fullscreen mode Exit fullscreen mode

Useful for:

  • Personalized recommendations
  • Content suggestions
  • Product recommendations

Best Practices

When working with arrays:

  • Use arrays only for naturally grouped data.
  • Avoid extremely large arrays.
  • Prefer built-in array functions over custom processing.
  • Avoid excessive nesting of arrays.
  • Monitor query performance using ClickHouse® system tables and query logs.
  • Document array structures, including:

    • Element ordering
    • Data types
    • Business meaning
    • Expected size limits

Good documentation helps maintain consistency as applications evolve.


Conclusion

Arrays are one of the most powerful data types available in ClickHouse®. They provide an efficient way to store and analyze multi-valued data without relying heavily on additional tables or complex joins.

Functions such as arrayJoin(), arrayMap(), arrayFilter(), arrayExists(), arrayDistinct(), and arrayCount() make it easy to manipulate and analyze array data while taking advantage of ClickHouse®'s optimized execution engine.

Whether you're building analytics for e-commerce platforms, monitoring IoT devices, tracking website sessions, or powering recommendation systems, arrays help create cleaner schemas, simplify queries, and deliver fast analytical performance.

Understanding when and how to use arrays effectively will allow you to model complex datasets more naturally while getting the most out of ClickHouse®'s high-performance architecture.

Top comments (0)