Introduction
Analyzing data often requires comparing rows, calculating rankings, finding running totals, or computing moving averages without changing the number of rows returned. Traditionally, these tasks required complex self-joins or nested subqueries, making SQL queries harder to write and maintain.
ClickHouse® Window Functions simplify these analytical operations by allowing calculations across a group of related rows while preserving every row in the result set. This enables cleaner SQL, better readability, and more powerful analytics.
In this guide, you'll learn what window functions are, how the OVER() clause works, and how to use the most common window functions in ClickHouse® with practical examples.
What Are Window Functions?
A window function performs calculations over a set of rows called a window.
Unlike GROUP BY, which combines multiple rows into a single result, window functions calculate values while keeping every original row.
Consider the following employee salary data:
| Employee | Department | Salary |
|---|---|---|
| Alice | HR | 50,000 |
| Bob | HR | 65,000 |
| Charlie | HR | 55,000 |
Using GROUP BY, you can calculate the average salary for HR, but the result contains only one row.
With a window function, every employee remains visible while also displaying the department's average salary.
| Employee | Salary | Department Average |
|---|---|---|
| Alice | 50,000 | 56,666 |
| Bob | 65,000 | 56,666 |
| Charlie | 55,000 | 56,666 |
This ability to retain individual rows while performing analytical calculations is what makes window functions so powerful.
Basic Syntax
Every window function uses the OVER() clause.
SELECT
column_name,
window_function() OVER (
PARTITION BY column
ORDER BY column
)
FROM table_name;
The OVER() clause defines how the window is created.
It can specify:
- How rows are grouped (
PARTITION BY) - How rows are ordered (
ORDER BY) - Which rows belong to the window (
ROWS BETWEEN)
Sample Table
We'll use the following table throughout the examples.
CREATE TABLE employee_sales
(
Employee String,
Department String,
SaleDate Date,
Sales UInt32
)
ENGINE = MergeTree
ORDER BY SaleDate;
Sample data:
| Employee | Department | SaleDate | Sales |
|---|---|---|---|
| Alice | Electronics | 2025-01-01 | 120 |
| Bob | Electronics | 2025-01-02 | 150 |
| Charlie | Electronics | 2025-01-03 | 180 |
| David | Furniture | 2025-01-01 | 90 |
| Emma | Furniture | 2025-01-01 | 110 |
Ranking Functions
ROW_NUMBER()
Assigns a unique sequential number to every row.
SELECT
Employee,
Sales,
ROW_NUMBER() OVER (ORDER BY Sales DESC) AS RowNum
FROM employee_sales;
Output:
| Employee | Sales | RowNum |
|---|---|---|
| Charlie | 180 | 1 |
| Bob | 150 | 2 |
| Alice | 120 | 3 |
| Emma | 110 | 4 |
Common Use Cases
- Ranking reports
- Pagination
- Top-N queries
RANK()
Assigns the same rank to equal values.
SELECT
Employee,
Sales,
RANK() OVER (ORDER BY Sales DESC) AS Rank
FROM employee_sales;
Output:
| Employee | Sales | Rank |
|---|---|---|
| Alice | 200 | 1 |
| Bob | 180 | 2 |
| Charlie | 180 | 2 |
| David | 150 | 4 |
Notice that Rank 3 is skipped because two employees share Rank 2.
DENSE_RANK()
Works like RANK(), but does not leave gaps.
SELECT
Employee,
Sales,
DENSE_RANK() OVER (ORDER BY Sales DESC) AS DenseRank
FROM employee_sales;
Output:
| Employee | Sales | DenseRank |
|---|---|---|
| Alice | 200 | 1 |
| Bob | 180 | 2 |
| Charlie | 180 | 2 |
| David | 150 | 3 |
NTILE()
Splits ordered rows into approximately equal-sized buckets.
SELECT
Employee,
Sales,
NTILE(2) OVER (ORDER BY Sales DESC) AS SalesGroup
FROM employee_sales;
Output:
| Employee | Sales | SalesGroup |
|---|---|---|
| Charlie | 180 | 1 |
| Bob | 150 | 1 |
| Alice | 120 | 2 |
| David | 90 | 2 |
Common Use Cases
- Customer segmentation
- Sales quartiles
- Performance grading
- Ranking analysis
Aggregate Window Functions
Traditional aggregate functions such as SUM(), AVG(), MIN(), MAX(), and COUNT() can also operate as window functions.
SUM() OVER()
Calculates running totals.
SELECT
SaleDate,
Sales,
SUM(Sales)
OVER (ORDER BY SaleDate) AS RunningTotal
FROM employee_sales;
Output:
| SaleDate | Sales | RunningTotal |
|---|---|---|
| Jan 1 | 120 | 120 |
| Jan 2 | 150 | 270 |
| Jan 3 | 180 | 450 |
Ideal for cumulative sales reports.
AVG() OVER()
Calculates averages without collapsing rows.
SELECT
Employee,
Department,
Sales,
AVG(Sales)
OVER (PARTITION BY Department) AS DepartmentAverage
FROM employee_sales;
Output:
| Employee | Department | Sales | DepartmentAverage |
|---|---|---|---|
| Alice | Electronics | 120 | 150 |
| Bob | Electronics | 150 | 150 |
| Charlie | Electronics | 180 | 150 |
Moving Average
Moving averages become very simple with window functions.
SELECT
SaleDate,
Sales,
AVG(Sales)
OVER (
ORDER BY SaleDate
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS MovingAverage
FROM employee_sales;
Output:
| SaleDate | Sales | MovingAverage |
|---|---|---|
| Jan 1 | 120 | 120 |
| Jan 2 | 150 | 135 |
| Jan 3 | 180 | 150 |
The calculation includes:
- Current row
- Previous row
- Two rows before the current row
Understanding Window Frames
Window frames determine which rows participate in the calculation.
Example:
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
This means:
- 2 PRECEDING → Include the previous two rows.
- CURRENT ROW → Include the current row.
Window frames are commonly used for:
- Moving averages
- Rolling sums
- Sliding calculations
- Time-series analytics
Navigation Functions
LAG()
Returns the value from the previous row.
SELECT
SaleDate,
Sales,
LAG(Sales)
OVER (ORDER BY SaleDate) AS PreviousSales
FROM employee_sales;
Output:
| SaleDate | Sales | PreviousSales |
|---|---|---|
| Jan 1 | 120 | NULL |
| Jan 2 | 150 | 120 |
| Jan 3 | 180 | 150 |
Useful for day-over-day comparisons.
LEAD()
Returns the value from the next row.
SELECT
SaleDate,
Sales,
LEAD(Sales)
OVER (ORDER BY SaleDate) AS NextSales
FROM employee_sales;
Output:
| SaleDate | Sales | NextSales |
|---|---|---|
| Jan 1 | 120 | 150 |
| Jan 2 | 150 | 180 |
| Jan 3 | 180 | NULL |
Useful for forecasting and sequential analysis.
FIRST_VALUE() and LAST_VALUE()
Retrieve the first or last value from the window.
SELECT
Employee,
Sales,
FIRST_VALUE(Sales)
OVER (
ORDER BY Sales DESC
) AS HighestSale,
LAST_VALUE(Sales)
OVER (
ORDER BY Sales
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING
) AS LastSale
FROM employee_sales;
Output:
| Employee | Sales | HighestSale | LastSale |
|---|---|---|---|
| Charlie | 180 | 180 | 180 |
| Bob | 150 | 180 | 180 |
| Alice | 120 | 180 | 180 |
Here:
-
FIRST_VALUE()returns the highest sales value after sorting in descending order. -
LAST_VALUE()returns the final value across the complete window.
Window Functions vs GROUP BY
| GROUP BY | Window Functions |
|---|---|
| Combines rows | Keeps all rows |
| One result per group | One result per row |
| Used for summaries | Used for row-level analytics |
| Cannot easily compare neighboring rows | Easily compares previous, next, ranked, and cumulative values |
Performance Best Practices
For better performance when using window functions in ClickHouse®:
- Choose an efficient
ORDER BYkey. - Use
PARTITION BYonly when necessary. - Filter rows before applying window functions whenever possible.
- Keep window frames as small as practical.
- Use
MergeTreetables for analytical workloads. - Avoid unnecessary sorting on very large datasets.
Conclusion
Window Functions are one of the most powerful SQL features available in ClickHouse® for analytical workloads. They allow you to perform rankings, running totals, moving averages, cumulative calculations, and row-to-row comparisons while preserving every row in the result set.
By understanding the OVER() clause and mastering functions such as ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE(), SUM(), AVG(), LAG(), LEAD(), FIRST_VALUE(), and LAST_VALUE(), you can write cleaner SQL and solve complex analytical problems without relying on complicated joins or subqueries.
For anyone working with ClickHouse®, mastering window functions is an essential skill for building efficient, readable, and high-performance analytical queries.
Top comments (0)