Lucas Braun

Posted on Jun 26

SQL window functions: what separates a mid-level from a senior developer

#backend #database #postgres #sql

Most developers know SELECT, JOIN, and GROUP BY. Window functions are where the gap between mid and senior starts to show.

What makes window functions different

GROUP BY collapses rows. You lose the individual record — it becomes part of an aggregate.

Window functions don't collapse anything. They compute an aggregate over a set of rows while keeping every row intact. That's the core idea.

-- GROUP BY: one row per department
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;

-- Window function: every employee row, plus their department average
SELECT
    name,
    department,
    salary,
    AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;

Same average calculation. Completely different result shape. Window functions let you ask "how does this row relate to the group?" — something GROUP BY simply can't do.

The anatomy of a window function

function_name() OVER (
    PARTITION BY col1, col2   -- defines the "group" (optional)
    ORDER BY col3             -- defines row order within the group
    ROWS BETWEEN ...          -- defines the frame (optional)
)

PARTITION BY — splits rows into independent windows. Omit it and the entire result set is one window.
ORDER BY — required for ranking and frame-sensitive functions. Changes the meaning of aggregates.
frame clause — defines which rows are "visible" to the function. Default varies by function.

Ranking functions

`ROW_NUMBER()`

Assigns a unique sequential number to each row within the partition. No ties — if two rows are equal, the order is arbitrary but distinct.

SELECT
    name,
    department,
    salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees;

Classic use case: get the top N rows per group without a subquery per group.

-- Top earner per department
SELECT * FROM (
    SELECT
        name,
        department,
        salary,
        ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
    FROM employees
) ranked
WHERE rn = 1;

`RANK()` vs `DENSE_RANK()`

Both handle ties, but differently:

RANK() — tied rows get the same rank, next rank skips. (1, 2, 2, 4)
DENSE_RANK() — tied rows get the same rank, next rank does not skip. (1, 2, 2, 3)

SELECT
    name,
    salary,
    RANK()       OVER (ORDER BY salary DESC) AS rank,
    DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees;

Use DENSE_RANK() when you want "2nd place" to actually mean second, not "tied for second, third place vacant."

Offset functions: LAG and LEAD

These let you access another row's value from the current row — without a self-join.

LAG(column, offset, default)  OVER (PARTITION BY ... ORDER BY ...)
LEAD(column, offset, default) OVER (PARTITION BY ... ORDER BY ...)

-- Month-over-month revenue change
SELECT
    month,
    revenue,
    LAG(revenue, 1, 0) OVER (ORDER BY month) AS prev_month_revenue,
    revenue - LAG(revenue, 1, 0) OVER (ORDER BY month) AS delta
FROM monthly_revenue;

-- Time between consecutive orders per user
SELECT
    user_id,
    order_date,
    LEAD(order_date) OVER (PARTITION BY user_id ORDER BY order_date) AS next_order,
    LEAD(order_date) OVER (PARTITION BY user_id ORDER BY order_date) - order_date AS days_until_next
FROM orders;

LAG/LEAD replace a category of self-joins that are painful to read and slow to execute. If you see a query joining a table to itself on id - 1 or date arithmetic, it's almost always a LAG in disguise.

Aggregate window functions

Any aggregate (SUM, AVG, COUNT, MIN, MAX) can be used as a window function. This unlocks running totals and moving averages without cursors or subqueries.

Running total

SELECT
    order_date,
    amount,
    SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders;

Running total reset per partition

SELECT
    user_id,
    order_date,
    amount,
    SUM(amount) OVER (PARTITION BY user_id ORDER BY order_date) AS user_running_total
FROM orders;

Moving average (last 7 days)

This is where the frame clause matters:

SELECT
    date,
    daily_revenue,
    AVG(daily_revenue) OVER (
        ORDER BY date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS moving_avg_7d
FROM daily_revenue;

ROWS BETWEEN 6 PRECEDING AND CURRENT ROW — the window includes the current row and the 6 rows before it. Classic 7-day rolling average.

ROWS vs RANGE

ROWS — physical rows. ROWS BETWEEN 6 PRECEDING AND CURRENT ROW means exactly 6 rows back.
RANGE — logical range based on the ORDER BY value. RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW means all rows within 6 days, regardless of row count.

For time series with gaps, RANGE is usually what you actually want.

-- ROWS: always exactly 6 rows back (may span more than 6 days if data is sparse)
AVG(revenue) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)

-- RANGE: all rows within the last 6 days (correct for sparse data)
AVG(revenue) OVER (ORDER BY date RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW)

`NTILE()`: bucketing rows

Divides rows into N equal-sized buckets. Useful for percentiles, quartiles, and cohort analysis.

-- Divide customers into 4 quartiles by total spend
SELECT
    customer_id,
    total_spend,
    NTILE(4) OVER (ORDER BY total_spend DESC) AS quartile
FROM customer_spend;

Quartile 1 = top spenders. Combine with a CTE to filter or label each group.

`FIRST_VALUE()` and `LAST_VALUE()`

Return the first or last value in the window frame.

SELECT
    name,
    department,
    salary,
    FIRST_VALUE(name) OVER (
        PARTITION BY department ORDER BY salary DESC
    ) AS highest_paid_in_dept
FROM employees;

LAST_VALUE() has a trap: the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, so "last" means the current row, not the end of the partition. Fix it explicitly:

LAST_VALUE(name) OVER (
    PARTITION BY department
    ORDER BY salary DESC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)

Reading execution plans

Window functions aren't free. Know what to look for in EXPLAIN ANALYZE.

EXPLAIN ANALYZE
SELECT
    user_id,
    order_date,
    SUM(amount) OVER (PARTITION BY user_id ORDER BY order_date)
FROM orders;

Look for:

WindowAgg — the window function node. Normal.
Sort — a sort before WindowAgg. May be expensive on large tables.
Seq Scan vs Index Scan — if your PARTITION BY or ORDER BY columns aren't indexed, Postgres will sort the full table.

When to add an index

If the same PARTITION BY / ORDER BY combination appears in multiple queries:

CREATE INDEX idx_orders_user_date ON orders (user_id, order_date);

This lets Postgres avoid a full sort and use the index order directly.

When NOT to use window functions

Small datasets where a simple subquery is more readable
When you need to filter on the window result in the same query level — you can't WHERE rank = 1 directly; you need a CTE or subquery
When the frame logic gets complex enough that a CTE makes the intent clearer

Practical patterns worth bookmarking

Deduplicate keeping latest record

WITH ranked AS (
    SELECT *,
        ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS rn
    FROM events
)
SELECT * FROM ranked WHERE rn = 1;

Percent of total

SELECT
    department,
    SUM(salary) AS dept_total,
    SUM(salary) * 100.0 / SUM(SUM(salary)) OVER () AS pct_of_company
FROM employees
GROUP BY department;

Note: SUM(SUM(salary)) OVER () — a window function over an aggregate. This is valid SQL and very useful.

Gap detection between dates

SELECT
    user_id,
    order_date,
    LAG(order_date) OVER (PARTITION BY user_id ORDER BY order_date) AS prev_order,
    order_date - LAG(order_date) OVER (PARTITION BY user_id ORDER BY order_date) AS gap_days
FROM orders
WHERE gap_days > 30;  -- wrap in CTE to filter

Summary

Function	Use case
`ROW_NUMBER()`	Unique rank, top-N per group, deduplication
`RANK()` / `DENSE_RANK()`	Ranking with ties
`LAG()` / `LEAD()`	Compare current row to previous/next
`SUM() OVER`	Running totals
`AVG() OVER` + frame	Moving averages
`NTILE()`	Percentiles, quartiles, bucketing
`FIRST_VALUE()` / `LAST_VALUE()`	Boundary values within a partition

Window functions don't replace every query pattern — but they replace the ugly ones. Self-joins, correlated subqueries, and cursor-based loops are often a window function with cleaner syntax and a better execution plan.

Learn the frame clause. It's the thing most tutorials skip, and it's where the real power is.

What's your most-used window function pattern in production? Drop it in the comments.

Tags: sql postgres backend database

DEV Community

SQL window functions: what separates a mid-level from a senior developer

What makes window functions different

The anatomy of a window function

Ranking functions

`ROW_NUMBER()`

`RANK()` vs `DENSE_RANK()`

Offset functions: LAG and LEAD

Aggregate window functions

Running total

Running total reset per partition

Moving average (last 7 days)

ROWS vs RANGE

`NTILE()`: bucketing rows

`FIRST_VALUE()` and `LAST_VALUE()`

Reading execution plans

When to add an index

When NOT to use window functions

Practical patterns worth bookmarking

Deduplicate keeping latest record

Percent of total

Gap detection between dates

Summary

Top comments (0)

What makes window functions different

The anatomy of a window function

Ranking functions

ROW_NUMBER()

RANK() vs DENSE_RANK()

Offset functions: LAG and LEAD

Aggregate window functions

Running total

Running total reset per partition

Moving average (last 7 days)

ROWS vs RANGE

NTILE(): bucketing rows

FIRST_VALUE() and LAST_VALUE()

Reading execution plans

When to add an index

When NOT to use window functions

Practical patterns worth bookmarking

Deduplicate keeping latest record

Percent of total

Gap detection between dates

Summary

`ROW_NUMBER()`

`RANK()` vs `DENSE_RANK()`

`NTILE()`: bucketing rows

`FIRST_VALUE()` and `LAST_VALUE()`