Snappy Tools

Posted on May 30

SQL Window Functions: A Practical Guide to ROW_NUMBER, RANK, LAG, and LEAD

#database #sql #beginners #webdev

Window functions are one of SQL's most powerful features, but many developers avoid them because the syntax looks unfamiliar. This guide explains when to use them and how to write them correctly.

What is a window function?

A window function performs a calculation across a set of rows related to the current row — without collapsing the rows into a single output row (which is what GROUP BY does).

The key difference:

-- GROUP BY: one row per group (collapses rows)
SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department;

-- Window function: keeps all rows, adds AVG per department
SELECT name, department, salary,
       AVG(salary) OVER (PARTITION BY department) as dept_avg_salary
FROM employees;

With the window function, you can see each employee and the department average on the same row. This is impossible with GROUP BY alone.

The OVER clause

Every window function uses OVER(...) to define the window:

function_name() OVER (
  PARTITION BY column1, column2   -- Optional: groups rows (like GROUP BY)
  ORDER BY column3 [ASC|DESC]     -- Optional: orders rows within each partition
  ROWS/RANGE BETWEEN ...          -- Optional: limits the window to a frame
)

PARTITION BY: Divides rows into groups. The function resets per group.
ORDER BY: Defines row ordering within the window. Required for ranking and lag/lead functions.
Frame clause: Limits which rows the window sees (used with aggregate functions).

ROW_NUMBER, RANK, and DENSE_RANK

These three functions assign sequence numbers to rows — useful for pagination, top-N queries, and deduplication.

ROW_NUMBER(): Assigns a unique sequential number. No ties.

SELECT name, department, salary,
       ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as row_num
FROM employees;

RANK(): Same as ROW_NUMBER but ties get the same rank, and the next rank skips (1, 2, 2, 4 — not 3).

DENSE_RANK(): Ties get the same rank, but no gaps (1, 2, 2, 3 — not 4).

Classic use case — top N per group:

-- Top 3 earners per department
SELECT * FROM (
  SELECT name, department, salary,
         ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rn
  FROM employees
) ranked
WHERE rn <= 3;

LAG and LEAD

LAG looks at the previous row; LEAD looks at the next row. Both require ORDER BY.

SELECT
  date,
  revenue,
  LAG(revenue, 1) OVER (ORDER BY date) as prev_revenue,
  LEAD(revenue, 1) OVER (ORDER BY date) as next_revenue,
  revenue - LAG(revenue, 1) OVER (ORDER BY date) as revenue_change
FROM daily_revenue
ORDER BY date;

LAG(column, offset, default):

column: the value to look back at
offset: how many rows back (default 1)
default: value if no previous row exists (default NULL)

Use cases:

Calculate day-over-day or month-over-month changes
Compare each value to the previous period
Find gaps in sequential data

FIRST_VALUE and LAST_VALUE

Returns the first or last value in a window frame.

SELECT name, department, salary,
       FIRST_VALUE(name) OVER (PARTITION BY department ORDER BY salary DESC) as top_earner_in_dept
FROM employees;

Note: LAST_VALUE requires a frame clause to work as expected (the default frame stops at the current row):

LAST_VALUE(name) OVER (
  PARTITION BY department
  ORDER BY salary DESC
  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) as lowest_earner_in_dept

SUM and AVG as window functions

Any aggregate function can be used as a window function with OVER():

-- Running total
SELECT date, revenue,
       SUM(revenue) OVER (ORDER BY date) as running_total
FROM daily_revenue;

-- Running total reset per month
SELECT date, revenue,
       SUM(revenue) OVER (
         PARTITION BY DATE_TRUNC('month', date)
         ORDER BY date
       ) as monthly_running_total
FROM daily_revenue;

-- Moving average (7-day)
SELECT date, revenue,
       AVG(revenue) OVER (
         ORDER BY date
         ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
       ) as moving_avg_7d
FROM daily_revenue;

NTILE

Divides rows into N approximately equal groups (like percentiles):

SELECT name, salary,
       NTILE(4) OVER (ORDER BY salary) as salary_quartile
FROM employees;
-- salary_quartile = 1 means lowest 25%, 4 means highest 25%

PERCENT_RANK and CUME_DIST

PERCENT_RANK: Returns the relative rank as a value between 0 and 1.

(rank - 1) / (total rows in partition - 1)

CUME_DIST: Returns the cumulative distribution — the fraction of rows at or below the current row's value.

SELECT name, salary,
       PERCENT_RANK() OVER (ORDER BY salary) as pct_rank,
       CUME_DIST() OVER (ORDER BY salary) as cum_dist
FROM employees;

Deduplication with ROW_NUMBER

Window functions are the cleanest way to deduplicate — keep the most recent record per user, per product, etc.:

-- Keep only the most recent order per customer
WITH ranked AS (
  SELECT *,
    ROW_NUMBER() OVER (
      PARTITION BY customer_id
      ORDER BY created_at DESC
    ) as rn
  FROM orders
)
SELECT * FROM ranked WHERE rn = 1;

This is more reliable than DISTINCT ON (PostgreSQL-specific) or correlated subqueries.

Performance considerations

Window functions don't change the number of output rows, but they do sort and scan data:

PARTITION BY and ORDER BY cause sorting — index on the ORDER BY column helps
Running totals over millions of rows are expensive without an index
Some databases (PostgreSQL, SQL Server) compute window functions after WHERE but before ORDER BY — optimise the WHERE clause first

SQL Formatter tip

Window functions can get verbose. A well-formatted query is much easier to audit:

-- Hard to read
SELECT name,salary,ROW_NUMBER()OVER(PARTITION BY department ORDER BY salary DESC)rn,SUM(salary)OVER(PARTITION BY department)dept_total FROM employees WHERE active=true

-- Properly formatted
SELECT
  name,
  salary,
  ROW_NUMBER() OVER (
    PARTITION BY department
    ORDER BY salary DESC
  ) AS rn,
  SUM(salary) OVER (
    PARTITION BY department
  ) AS dept_total
FROM employees
WHERE active = true;

A SQL formatter that handles window function syntax keeps complex queries readable. The OVER clause with nested PARTITION BY and ORDER BY is a common place where ad-hoc formatting breaks down.

Window functions unlock patterns that are otherwise impossible in SQL: top-N per group, running totals, period-over-period comparisons, and clean deduplication. Once the OVER clause syntax clicks, they become an indispensable part of analytical SQL.

DEV Community