DEV Community

Vivek Kumar
Vivek Kumar

Posted on

SQL Window Functions Explained: Stop Collapsing Your Data with GROUP BY

You're writing a query to find each employee's salary alongside the average salary for their department. Your first instinct might be to reach for GROUP BY — but then you'd lose the individual employee rows. So you write a subquery, or a join against an aggregated CTE, and suddenly a simple question becomes a 20-line monster.

Window functions exist precisely to solve this. They let you perform calculations across a set of related rows without collapsing your result set. The row stays intact; the calculation rides alongside it. Once you understand them, you'll wonder how you ever lived without them.

This guide covers the fundamentals: what a window is, how OVER, PARTITION BY, and ORDER BY work inside it, and the most useful window functions you'll reach for every day.


What Is a Window Function?

A window function performs a calculation over a "window" — a defined set of rows related to the current row. Unlike GROUP BY, which aggregates many rows into one, a window function adds a new computed column to each row.

The syntax always follows this pattern:

function_name() OVER (
  [PARTITION BY column(s)]
  [ORDER BY column(s)]
  [ROWS/RANGE frame_clause]
)
Enter fullscreen mode Exit fullscreen mode

The OVER clause is what tells SQL "this is a window function." Without it, SUM() is a regular aggregate; with it, SUM() OVER (...) is a window function.


The Setup: A Sample Schema

We'll use two tables throughout:

-- employees table
CREATE TABLE employees (
  id         INT,
  name       VARCHAR(100),
  department VARCHAR(50),
  salary     NUMERIC(10,2),
  hire_date  DATE
);

INSERT INTO employees VALUES
  (1,  'Alice',   'Engineering', 95000, '2019-03-15'),
  (2,  'Bob',     'Engineering', 88000, '2020-07-01'),
  (3,  'Carol',   'Engineering', 102000,'2018-11-20'),
  (4,  'Dave',    'Marketing',   72000, '2021-01-10'),
  (5,  'Eve',     'Marketing',   78000, '2020-05-22'),
  (6,  'Frank',   'Marketing',   68000, '2022-03-30'),
  (7,  'Grace',   'HR',          61000, '2021-09-14'),
  (8,  'Heidi',   'HR',          65000, '2019-06-01');
Enter fullscreen mode Exit fullscreen mode

PARTITION BY: The Heart of Windowing

PARTITION BY divides your rows into groups — like GROUP BY, but the groups are only used to scope the calculation, not reduce the rows.

Example 1: Department average salary alongside each employee

SELECT
  name,
  department,
  salary,
  AVG(salary) OVER (PARTITION BY department) AS dept_avg_salary
FROM employees
ORDER BY department, salary DESC;
Enter fullscreen mode Exit fullscreen mode

Result:

name department salary dept_avg_salary
Carol Engineering 102000 95000.00
Alice Engineering 95000 95000.00
Bob Engineering 88000 95000.00
Eve Marketing 78000 72666.67
Dave Marketing 72000 72666.67
Frank Marketing 68000 72666.67
Heidi HR 65000 63000.00
Grace HR 61000 63000.00

Every row is preserved. Each employee now has their department's average salary right next to their own. With a traditional GROUP BY, you'd need a join or subquery to get this result.


ORDER BY Inside OVER: Running Calculations

When you add ORDER BY inside OVER, SQL calculates the function cumulatively as it processes rows in order. This is perfect for running totals or running averages.

Example 2: Running total of salary by hire date within each department

SELECT
  name,
  department,
  hire_date,
  salary,
  SUM(salary) OVER (
    PARTITION BY department
    ORDER BY hire_date
  ) AS running_total
FROM employees
ORDER BY department, hire_date;
Enter fullscreen mode Exit fullscreen mode

Result for Engineering:

name department hire_date salary running_total
Carol Engineering 2018-11-20 102000 102000
Alice Engineering 2019-03-15 95000 197000
Bob Engineering 2020-07-01 88000 285000

The running_total grows as we move through hire dates — that's the ORDER BY hire_date doing its work.


Ranking Functions: ROW_NUMBER, RANK, DENSE_RANK

Three of the most useful window functions are ranking functions. They all assign a number to each row, but they handle ties differently.

Example 3: Ranking employees by salary within their department

SELECT
  name,
  department,
  salary,
  ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num,
  RANK()       OVER (PARTITION BY department ORDER BY salary DESC) AS rank,
  DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rank
FROM employees
ORDER BY department, salary DESC;
Enter fullscreen mode Exit fullscreen mode

To see the difference between RANK and DENSE_RANK, let's imagine two Marketing employees with the same salary. Here's what happens:

name salary ROW_NUMBER RANK DENSE_RANK
Eve 78000 1 1 1
Dave 72000 2 2 2
Frank 68000 3 3 3

The differences become clear with ties:

  • ROW_NUMBER() — Always assigns a unique number, even to ties. Ties get arbitrary ordering (non-deterministic).
  • RANK() — Ties share the same rank, and the next rank skips numbers. Two employees tied at rank 2 means the next rank is 4.
  • DENSE_RANK() — Ties share the same rank, but the next rank is always consecutive. Two tied at rank 2 means the next is rank 3.

Use ROW_NUMBER() when you need exactly one row per partition (e.g., "get the single most recent order per customer"). Use DENSE_RANK() when the gap in RANK() would be confusing to end users.


A Practical Use Case: Top-N Per Group

One of the most common interview questions and real-world problems: "Give me the top 2 earners in each department."

WITH ranked_employees AS (
  SELECT
    name,
    department,
    salary,
    ROW_NUMBER() OVER (
      PARTITION BY department
      ORDER BY salary DESC
    ) AS salary_rank
  FROM employees
)
SELECT name, department, salary
FROM ranked_employees
WHERE salary_rank <= 2
ORDER BY department, salary DESC;
Enter fullscreen mode Exit fullscreen mode

Result:

name department salary
Carol Engineering 102000
Alice Engineering 95000
Eve Marketing 78000
Dave Marketing 72000
Heidi HR 65000
Grace HR 61000

This pattern — window function in a CTE, filter in the outer query — is a workhorse of SQL analytics.


Common Mistakes and Gotchas

1. Filtering on a window function directly in WHERE

This doesn't work:

-- ❌ This will error
SELECT name, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees
WHERE rn <= 2;
Enter fullscreen mode Exit fullscreen mode

Window functions are evaluated after WHERE, so you can't filter on them in the same query level. Always wrap them in a CTE or subquery first, then filter — as shown in the top-N example above.

2. Forgetting that ORDER BY inside OVER changes the frame

Adding ORDER BY to your OVER clause implicitly changes the window frame from "all rows in the partition" to "all rows up to and including the current row." This is usually what you want for running totals, but it can surprise you when you use SUM() or AVG():

-- These two behave differently!
AVG(salary) OVER (PARTITION BY department)                    -- whole partition
AVG(salary) OVER (PARTITION BY department ORDER BY hire_date) -- cumulative avg
Enter fullscreen mode Exit fullscreen mode

If you want the whole-partition average but with a specific sort for row numbering, you may need to use separate window function expressions.

3. Mixing window functions and GROUP BY

Window functions and GROUP BY can coexist in the same query, but the window function operates on the already-grouped result set. This can lead to unexpected results if you're not careful about what rows exist after grouping.

4. Performance on large tables without proper indexes

Window functions with PARTITION BY and ORDER BY can be expensive. Make sure the columns you're partitioning and ordering by are indexed, especially on large tables. Use EXPLAIN ANALYZE to check the query plan.


Key Takeaways

Window functions are one of the most powerful tools in SQL for analytics and reporting:

  • The OVER() clause is what makes a function a window function — it defines the rows in scope for each calculation.
  • PARTITION BY divides rows into independent groups, like GROUP BY but without collapsing rows.
  • ORDER BY inside OVER enables running/cumulative calculations.
  • ROW_NUMBER(), RANK(), and DENSE_RANK() solve ranking and top-N problems elegantly.
  • You cannot filter on window function results in WHERE — use a CTE or subquery instead.

The next step after mastering these basics is exploring LAG() and LEAD() for comparing a row to previous/next rows, and frame clauses (ROWS BETWEEN) for precise control over the window range. Those are the power tools on top of this foundation.


Over to You

Window functions clicked for me the moment I stopped thinking of them as "fancy aggregates" and started thinking of them as "a calculation that knows its neighbors." What was the moment it clicked for you?

Drop your favorite window function use case in the comments — I'd love to see the creative ways people apply these in production. And if you're just getting started, try rewriting one of your correlated subqueries as a window function. You might never go back.

Top comments (0)