Vivek Kumar

Posted on Apr 27

SQL Window Functions Explained: Stop Collapsing Your Data with GROUP BY

#database #postgres #sql #tutorial

You're writing a query to find each employee's salary alongside the average salary for their department. Your first instinct might be to reach for GROUP BY — but then you'd lose the individual employee rows. So you write a subquery, or a join against an aggregated CTE, and suddenly a simple question becomes a 20-line monster.

Window functions exist precisely to solve this. They let you perform calculations across a set of related rows without collapsing your result set. The row stays intact; the calculation rides alongside it. Once you understand them, you'll wonder how you ever lived without them.

This guide covers the fundamentals: what a window is, how OVER, PARTITION BY, and ORDER BY work inside it, and the most useful window functions you'll reach for every day.

What Is a Window Function?

A window function performs a calculation over a "window" — a defined set of rows related to the current row. Unlike GROUP BY, which aggregates many rows into one, a window function adds a new computed column to each row.

The syntax always follows this pattern:

function_name() OVER (
  [PARTITION BY column(s)]
  [ORDER BY column(s)]
  [ROWS/RANGE frame_clause]
)

The OVER clause is what tells SQL "this is a window function." Without it, SUM() is a regular aggregate; with it, SUM() OVER (...) is a window function.

The Setup: A Sample Schema

We'll use two tables throughout:

-- employees table
CREATE TABLE employees (
  id         INT,
  name       VARCHAR(100),
  department VARCHAR(50),
  salary     NUMERIC(10,2),
  hire_date  DATE
);

INSERT INTO employees VALUES
  (1,  'Alice',   'Engineering', 95000, '2019-03-15'),
  (2,  'Bob',     'Engineering', 88000, '2020-07-01'),
  (3,  'Carol',   'Engineering', 102000,'2018-11-20'),
  (4,  'Dave',    'Marketing',   72000, '2021-01-10'),
  (5,  'Eve',     'Marketing',   78000, '2020-05-22'),
  (6,  'Frank',   'Marketing',   68000, '2022-03-30'),
  (7,  'Grace',   'HR',          61000, '2021-09-14'),
  (8,  'Heidi',   'HR',          65000, '2019-06-01');

PARTITION BY: The Heart of Windowing

PARTITION BY divides your rows into groups — like GROUP BY, but the groups are only used to scope the calculation, not reduce the rows.

Example 1: Department average salary alongside each employee

SELECT
  name,
  department,
  salary,
  AVG(salary) OVER (PARTITION BY department) AS dept_avg_salary
FROM employees
ORDER BY department, salary DESC;

Result:

name	department	salary	dept_avg_salary
Carol	Engineering	102000	95000.00
Alice	Engineering	95000	95000.00
Bob	Engineering	88000	95000.00
Eve	Marketing	78000	72666.67
Dave	Marketing	72000	72666.67
Frank	Marketing	68000	72666.67
Heidi	HR	65000	63000.00
Grace	HR	61000	63000.00

Every row is preserved. Each employee now has their department's average salary right next to their own. With a traditional GROUP BY, you'd need a join or subquery to get this result.

ORDER BY Inside OVER: Running Calculations

When you add ORDER BY inside OVER, SQL calculates the function cumulatively as it processes rows in order. This is perfect for running totals or running averages.

Example 2: Running total of salary by hire date within each department

SELECT
  name,
  department,
  hire_date,
  salary,
  SUM(salary) OVER (
    PARTITION BY department
    ORDER BY hire_date
  ) AS running_total
FROM employees
ORDER BY department, hire_date;

Result for Engineering:

name	department	hire_date	salary	running_total
Carol	Engineering	2018-11-20	102000	102000
Alice	Engineering	2019-03-15	95000	197000
Bob	Engineering	2020-07-01	88000	285000

The running_total grows as we move through hire dates — that's the ORDER BY hire_date doing its work.

Ranking Functions: ROW_NUMBER, RANK, DENSE_RANK

Three of the most useful window functions are ranking functions. They all assign a number to each row, but they handle ties differently.

Example 3: Ranking employees by salary within their department

SELECT
  name,
  department,
  salary,
  ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num,
  RANK()       OVER (PARTITION BY department ORDER BY salary DESC) AS rank,
  DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rank
FROM employees
ORDER BY department, salary DESC;

To see the difference between RANK and DENSE_RANK, let's imagine two Marketing employees with the same salary. Here's what happens:

name	salary	ROW_NUMBER	RANK	DENSE_RANK
Eve	78000	1	1	1
Dave	72000	2	2	2
Frank	68000	3	3	3

The differences become clear with ties:

ROW_NUMBER() — Always assigns a unique number, even to ties. Ties get arbitrary ordering (non-deterministic).
RANK() — Ties share the same rank, and the next rank skips numbers. Two employees tied at rank 2 means the next rank is 4.
DENSE_RANK() — Ties share the same rank, but the next rank is always consecutive. Two tied at rank 2 means the next is rank 3.

Use ROW_NUMBER() when you need exactly one row per partition (e.g., "get the single most recent order per customer"). Use DENSE_RANK() when the gap in RANK() would be confusing to end users.

A Practical Use Case: Top-N Per Group

One of the most common interview questions and real-world problems: "Give me the top 2 earners in each department."

WITH ranked_employees AS (
  SELECT
    name,
    department,
    salary,
    ROW_NUMBER() OVER (
      PARTITION BY department
      ORDER BY salary DESC
    ) AS salary_rank
  FROM employees
)
SELECT name, department, salary
FROM ranked_employees
WHERE salary_rank <= 2
ORDER BY department, salary DESC;

Result:

name	department	salary
Carol	Engineering	102000
Alice	Engineering	95000
Eve	Marketing	78000
Dave	Marketing	72000
Heidi	HR	65000
Grace	HR	61000

This pattern — window function in a CTE, filter in the outer query — is a workhorse of SQL analytics.

Common Mistakes and Gotchas

1. Filtering on a window function directly in WHERE

This doesn't work:

-- ❌ This will error
SELECT name, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees
WHERE rn <= 2;

Window functions are evaluated after WHERE, so you can't filter on them in the same query level. Always wrap them in a CTE or subquery first, then filter — as shown in the top-N example above.

2. Forgetting that ORDER BY inside OVER changes the frame

Adding ORDER BY to your OVER clause implicitly changes the window frame from "all rows in the partition" to "all rows up to and including the current row." This is usually what you want for running totals, but it can surprise you when you use SUM() or AVG():

-- These two behave differently!
AVG(salary) OVER (PARTITION BY department)                    -- whole partition
AVG(salary) OVER (PARTITION BY department ORDER BY hire_date) -- cumulative avg

If you want the whole-partition average but with a specific sort for row numbering, you may need to use separate window function expressions.

3. Mixing window functions and GROUP BY

Window functions and GROUP BY can coexist in the same query, but the window function operates on the already-grouped result set. This can lead to unexpected results if you're not careful about what rows exist after grouping.

4. Performance on large tables without proper indexes

Window functions with PARTITION BY and ORDER BY can be expensive. Make sure the columns you're partitioning and ordering by are indexed, especially on large tables. Use EXPLAIN ANALYZE to check the query plan.

Key Takeaways

Window functions are one of the most powerful tools in SQL for analytics and reporting:

The OVER() clause is what makes a function a window function — it defines the rows in scope for each calculation.
PARTITION BY divides rows into independent groups, like GROUP BY but without collapsing rows.
ORDER BY inside OVER enables running/cumulative calculations.
ROW_NUMBER(), RANK(), and DENSE_RANK() solve ranking and top-N problems elegantly.
You cannot filter on window function results in WHERE — use a CTE or subquery instead.

The next step after mastering these basics is exploring LAG() and LEAD() for comparing a row to previous/next rows, and frame clauses (ROWS BETWEEN) for precise control over the window range. Those are the power tools on top of this foundation.

Over to You

Window functions clicked for me the moment I stopped thinking of them as "fancy aggregates" and started thinking of them as "a calculation that knows its neighbors." What was the moment it clicked for you?

Drop your favorite window function use case in the comments — I'd love to see the creative ways people apply these in production. And if you're just getting started, try rewriting one of your correlated subqueries as a window function. You might never go back.

DEV Community