Introduction
As data volumes continue to grow, running aggregation queries directly on raw datasets becomes increasingly expensive. Business dashboards, analytics platforms, and reporting systems often execute the same calculations repeatedly—such as total sales, daily active users, page views, or revenue trends. While ClickHouse® is designed to process analytical workloads at remarkable speed, repeatedly scanning billions of records still consumes valuable CPU, memory, and storage resources.
This is where AggregatingMergeTree proves its value.
Rather than calculating aggregates every time a query is executed, AggregatingMergeTree stores intermediate aggregation states that are merged automatically in the background. This approach allows analytical queries to read compact, pre-aggregated datasets, resulting in dramatically faster response times and reduced infrastructure costs.
In this guide, you'll learn how AggregatingMergeTree works, why aggregate states matter, how to build an automated aggregation pipeline using Materialized Views, and when this engine is the right choice for your ClickHouse® workloads.
What is AggregatingMergeTree?
AggregatingMergeTree is a specialized ClickHouse® table engine designed to store aggregate function states instead of raw records.
Unlike the standard MergeTree engine, which stores every inserted row, AggregatingMergeTree keeps partially aggregated values that ClickHouse combines during background merge operations. This significantly reduces the amount of data that must be processed when generating analytical reports.
Because much of the computational work happens during data ingestion, dashboards and reporting applications can retrieve summarized information much more efficiently.
Typical scenarios include:
- Sales reporting
- Website traffic analytics
- Financial summaries
- IoT sensor monitoring
- Business KPI dashboards
- Application observability metrics
Why Use AggregatingMergeTree?
Imagine an online marketplace processing millions of transactions every day.
A dashboard needs to display total revenue generated by each product.
A typical query might look like this:
SELECT
product,
sum(amount) AS total_revenue
FROM sales
GROUP BY product;
Although ClickHouse executes aggregation queries efficiently, scanning billions of records every time a dashboard refreshes eventually becomes costly.
As datasets grow larger, repeated aggregations increase:
- CPU utilization
- Memory consumption
- Disk reads
- Query latency
AggregatingMergeTree addresses this challenge by storing aggregation states as data arrives. Instead of recalculating totals from the beginning, queries simply merge pre-computed states to produce final results.
This significantly reduces execution time for frequently accessed reports.
How AggregatingMergeTree Works
A typical aggregation pipeline consists of five stages:
- Incoming events are written into a raw MergeTree table.
- A Materialized View processes newly inserted records.
- Aggregate state functions generate intermediate values.
- AggregatingMergeTree stores those aggregate states.
- Background merges combine compatible states automatically.
Application
│
▼
Raw MergeTree Table
│
▼
Materialized View
│
▼
Aggregate States
(sumState(), avgState())
│
▼
AggregatingMergeTree
│
▼
Background Merge
│
▼
sumMerge()
│
▼
Analytics Dashboard
This architecture shifts expensive aggregation work from query execution time to data ingestion, making analytical queries much faster.
Aggregate Functions vs. AggregatingMergeTree
Although their names are similar, aggregate functions and AggregatingMergeTree serve different purposes.
| Aggregate Functions | AggregatingMergeTree |
|---|---|
| SQL functions | Table engine |
| Execute calculations during queries | Stores aggregate states |
| Used in SELECT statements | Used for summary tables |
| Examples: sum(), avg(), count() | Stores AggregateFunction columns |
A simple way to think about it is:
- Aggregate functions perform calculations.
- AggregatingMergeTree stores those calculations in an intermediate format that can be merged later.
Understanding Aggregate States
Aggregate states are the foundation of AggregatingMergeTree.
Consider this standard query:
SELECT sum(amount)
FROM sales;
The query immediately returns a final numeric result.
With AggregatingMergeTree, ClickHouse instead stores an intermediate aggregation state using functions such as:
sumState(amount)
This state is not the final answer.
Later, when querying the summary table, ClickHouse combines multiple stored states using:
sumMerge(total_sales)
Think of an aggregate state as a partially completed calculation that can continue growing as new data arrives.
Because ClickHouse only merges these compact states instead of scanning every raw row, reporting becomes significantly more efficient.
Creating the Raw Events Table
Let's build a simple example that tracks website page views.
First, create a table that stores every page visit.
CREATE TABLE page_views
(
event_date Date,
page String,
views UInt64
)
ENGINE = MergeTree
ORDER BY (event_date, page);
Example data:
| Date | Page | Views |
|---|---|---|
| 2026-07-01 | Home | 1 |
| 2026-07-01 | Home | 1 |
| 2026-07-01 | Products | 1 |
| 2026-07-01 | Contact | 1 |
| 2026-07-01 | Home | 1 |
Every page visit is stored individually inside the MergeTree table.
Creating the Summary Table
Instead of saving completed totals, the summary table stores aggregation states.
CREATE TABLE page_views_summary
(
event_date Date,
page String,
total_views AggregateFunction(sum, UInt64)
)
ENGINE = AggregatingMergeTree
ORDER BY (event_date, page);
Notice that the total_views column uses:
AggregateFunction(sum, UInt64)
rather than a standard numeric type.
This enables ClickHouse to merge partial aggregations automatically.
Automatically Building Aggregates with Materialized Views
To avoid manually updating summary tables, create a Materialized View that processes every new insert.
CREATE MATERIALIZED VIEW mv_page_views
TO page_views_summary
AS
SELECT
event_date,
page,
sumState(views) AS total_views
FROM page_views
GROUP BY
event_date,
page;
Whenever fresh data is inserted into the raw events table, the Materialized View calculates aggregation states and stores them in the AggregatingMergeTree table automatically.
This eliminates repeated aggregation work during query execution.
Querying Aggregated Data
Since the summary table contains aggregate states rather than completed values, use merge functions to retrieve final results.
SELECT
event_date,
page,
sumMerge(total_views) AS total_views
FROM page_views_summary
GROUP BY
event_date,
page
ORDER BY
event_date,
page;
Example output:
| Date | Page | Total Views |
|---|---|---|
| 2026-07-01 | Home | 35,842 |
| 2026-07-01 | Products | 12,614 |
| 2026-07-01 | Contact | 4,238 |
Instead of scanning millions of individual page-view events, ClickHouse reads a compact summary table, making dashboards load much faster.
When Should You Use AggregatingMergeTree?
AggregatingMergeTree is best suited for analytical workloads where identical aggregation queries are executed repeatedly.
Ideal use cases include:
- Interactive BI dashboards
- Daily and monthly reporting
- Executive KPI dashboards
- Website analytics
- Application monitoring
- Event and log analysis
- Financial reporting
- IoT telemetry aggregation
By storing pre-aggregated information, organizations can reduce query execution times while minimizing resource consumption.
When It May Not Be the Right Choice
Although powerful, AggregatingMergeTree isn't appropriate for every workload.
Consider alternative engines when your application requires:
- Transactional (OLTP) processing
- Frequent row-level updates
- Point lookups
- Individual record modifications
- Highly mutable datasets
For these scenarios, standard MergeTree variants are generally a better fit.
Common Mistakes to Avoid
When working with AggregatingMergeTree, developers often encounter a few common pitfalls.
Avoid these mistakes:
- Using
sum()instead ofsumState()while inserting aggregate data. - Querying aggregate states without merge functions like
sumMerge(). - Defining aggregation columns as numeric types instead of
AggregateFunction. - Expecting background merges to happen instantly after every insert.
- Using AggregatingMergeTree for transactional workloads rather than analytical reporting.
Understanding these concepts helps ensure accurate query results and better long-term performance.
Best Practices
To get the most out of AggregatingMergeTree:
- Keep raw data in a MergeTree table.
- Automate aggregation with Materialized Views.
- Store only aggregation states inside summary tables.
- Query summary tables using merge functions.
- Choose appropriate sorting keys for efficient merges.
- Monitor background merge activity for optimal performance.
These practices help build scalable analytics pipelines capable of handling billions of records efficiently.
Conclusion
AggregatingMergeTree is one of the most powerful storage engines available in ClickHouse® for accelerating analytical workloads. By storing intermediate aggregate states instead of recalculating values from raw data, it dramatically improves reporting performance while reducing CPU, memory, and disk usage.
When combined with Materialized Views and aggregate state functions such as sumState() and sumMerge(), it provides an elegant solution for building high-performance dashboards, business intelligence systems, monitoring platforms, and real-time analytics applications.
If your organization regularly executes the same aggregation queries over massive datasets, adopting AggregatingMergeTree can significantly reduce query latency, improve scalability, and simplify the architecture of your analytics platform.
Read more ... https://www.quantrail-data.com/mastering-clickhouse-aggregatingmergetree
Top comments (0)