DEV Community

Lucas Braun
Lucas Braun

Posted on

SQL window functions: what separates a mid-level from a senior developer

Most developers know SELECT, JOIN, and GROUP BY. Window functions are where the gap between mid and senior starts to show.


What makes window functions different

GROUP BY collapses rows. You lose the individual record — it becomes part of an aggregate.

Window functions don't collapse anything. They compute an aggregate over a set of rows while keeping every row intact. That's the core idea.

-- GROUP BY: one row per department
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;

-- Window function: every employee row, plus their department average
SELECT
    name,
    department,
    salary,
    AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;
Enter fullscreen mode Exit fullscreen mode

Same average calculation. Completely different result shape. Window functions let you ask "how does this row relate to the group?" — something GROUP BY simply can't do.


The anatomy of a window function

function_name() OVER (
    PARTITION BY col1, col2   -- defines the "group" (optional)
    ORDER BY col3             -- defines row order within the group
    ROWS BETWEEN ...          -- defines the frame (optional)
)
Enter fullscreen mode Exit fullscreen mode
  • PARTITION BY — splits rows into independent windows. Omit it and the entire result set is one window.
  • ORDER BY — required for ranking and frame-sensitive functions. Changes the meaning of aggregates.
  • frame clause — defines which rows are "visible" to the function. Default varies by function.

Ranking functions

ROW_NUMBER()

Assigns a unique sequential number to each row within the partition. No ties — if two rows are equal, the order is arbitrary but distinct.

SELECT
    name,
    department,
    salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees;
Enter fullscreen mode Exit fullscreen mode

Classic use case: get the top N rows per group without a subquery per group.

-- Top earner per department
SELECT * FROM (
    SELECT
        name,
        department,
        salary,
        ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
    FROM employees
) ranked
WHERE rn = 1;
Enter fullscreen mode Exit fullscreen mode

RANK() vs DENSE_RANK()

Both handle ties, but differently:

  • RANK() — tied rows get the same rank, next rank skips. (1, 2, 2, 4)
  • DENSE_RANK() — tied rows get the same rank, next rank does not skip. (1, 2, 2, 3)
SELECT
    name,
    salary,
    RANK()       OVER (ORDER BY salary DESC) AS rank,
    DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees;
Enter fullscreen mode Exit fullscreen mode

Use DENSE_RANK() when you want "2nd place" to actually mean second, not "tied for second, third place vacant."


Offset functions: LAG and LEAD

These let you access another row's value from the current row — without a self-join.

LAG(column, offset, default)  OVER (PARTITION BY ... ORDER BY ...)
LEAD(column, offset, default) OVER (PARTITION BY ... ORDER BY ...)
Enter fullscreen mode Exit fullscreen mode
-- Month-over-month revenue change
SELECT
    month,
    revenue,
    LAG(revenue, 1, 0) OVER (ORDER BY month) AS prev_month_revenue,
    revenue - LAG(revenue, 1, 0) OVER (ORDER BY month) AS delta
FROM monthly_revenue;
Enter fullscreen mode Exit fullscreen mode
-- Time between consecutive orders per user
SELECT
    user_id,
    order_date,
    LEAD(order_date) OVER (PARTITION BY user_id ORDER BY order_date) AS next_order,
    LEAD(order_date) OVER (PARTITION BY user_id ORDER BY order_date) - order_date AS days_until_next
FROM orders;
Enter fullscreen mode Exit fullscreen mode

LAG/LEAD replace a category of self-joins that are painful to read and slow to execute. If you see a query joining a table to itself on id - 1 or date arithmetic, it's almost always a LAG in disguise.


Aggregate window functions

Any aggregate (SUM, AVG, COUNT, MIN, MAX) can be used as a window function. This unlocks running totals and moving averages without cursors or subqueries.

Running total

SELECT
    order_date,
    amount,
    SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders;
Enter fullscreen mode Exit fullscreen mode

Running total reset per partition

SELECT
    user_id,
    order_date,
    amount,
    SUM(amount) OVER (PARTITION BY user_id ORDER BY order_date) AS user_running_total
FROM orders;
Enter fullscreen mode Exit fullscreen mode

Moving average (last 7 days)

This is where the frame clause matters:

SELECT
    date,
    daily_revenue,
    AVG(daily_revenue) OVER (
        ORDER BY date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS moving_avg_7d
FROM daily_revenue;
Enter fullscreen mode Exit fullscreen mode

ROWS BETWEEN 6 PRECEDING AND CURRENT ROW — the window includes the current row and the 6 rows before it. Classic 7-day rolling average.

ROWS vs RANGE

  • ROWS — physical rows. ROWS BETWEEN 6 PRECEDING AND CURRENT ROW means exactly 6 rows back.
  • RANGE — logical range based on the ORDER BY value. RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW means all rows within 6 days, regardless of row count.

For time series with gaps, RANGE is usually what you actually want.

-- ROWS: always exactly 6 rows back (may span more than 6 days if data is sparse)
AVG(revenue) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)

-- RANGE: all rows within the last 6 days (correct for sparse data)
AVG(revenue) OVER (ORDER BY date RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW)
Enter fullscreen mode Exit fullscreen mode

NTILE(): bucketing rows

Divides rows into N equal-sized buckets. Useful for percentiles, quartiles, and cohort analysis.

-- Divide customers into 4 quartiles by total spend
SELECT
    customer_id,
    total_spend,
    NTILE(4) OVER (ORDER BY total_spend DESC) AS quartile
FROM customer_spend;
Enter fullscreen mode Exit fullscreen mode

Quartile 1 = top spenders. Combine with a CTE to filter or label each group.


FIRST_VALUE() and LAST_VALUE()

Return the first or last value in the window frame.

SELECT
    name,
    department,
    salary,
    FIRST_VALUE(name) OVER (
        PARTITION BY department ORDER BY salary DESC
    ) AS highest_paid_in_dept
FROM employees;
Enter fullscreen mode Exit fullscreen mode

LAST_VALUE() has a trap: the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, so "last" means the current row, not the end of the partition. Fix it explicitly:

LAST_VALUE(name) OVER (
    PARTITION BY department
    ORDER BY salary DESC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
Enter fullscreen mode Exit fullscreen mode

Reading execution plans

Window functions aren't free. Know what to look for in EXPLAIN ANALYZE.

EXPLAIN ANALYZE
SELECT
    user_id,
    order_date,
    SUM(amount) OVER (PARTITION BY user_id ORDER BY order_date)
FROM orders;
Enter fullscreen mode Exit fullscreen mode

Look for:

  • WindowAgg — the window function node. Normal.
  • Sort — a sort before WindowAgg. May be expensive on large tables.
  • Seq Scan vs Index Scan — if your PARTITION BY or ORDER BY columns aren't indexed, Postgres will sort the full table.

When to add an index

If the same PARTITION BY / ORDER BY combination appears in multiple queries:

CREATE INDEX idx_orders_user_date ON orders (user_id, order_date);
Enter fullscreen mode Exit fullscreen mode

This lets Postgres avoid a full sort and use the index order directly.

When NOT to use window functions

  • Small datasets where a simple subquery is more readable
  • When you need to filter on the window result in the same query level — you can't WHERE rank = 1 directly; you need a CTE or subquery
  • When the frame logic gets complex enough that a CTE makes the intent clearer

Practical patterns worth bookmarking

Deduplicate keeping latest record

WITH ranked AS (
    SELECT *,
        ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS rn
    FROM events
)
SELECT * FROM ranked WHERE rn = 1;
Enter fullscreen mode Exit fullscreen mode

Percent of total

SELECT
    department,
    SUM(salary) AS dept_total,
    SUM(salary) * 100.0 / SUM(SUM(salary)) OVER () AS pct_of_company
FROM employees
GROUP BY department;
Enter fullscreen mode Exit fullscreen mode

Note: SUM(SUM(salary)) OVER () — a window function over an aggregate. This is valid SQL and very useful.

Gap detection between dates

SELECT
    user_id,
    order_date,
    LAG(order_date) OVER (PARTITION BY user_id ORDER BY order_date) AS prev_order,
    order_date - LAG(order_date) OVER (PARTITION BY user_id ORDER BY order_date) AS gap_days
FROM orders
WHERE gap_days > 30;  -- wrap in CTE to filter
Enter fullscreen mode Exit fullscreen mode

Summary

Function Use case
ROW_NUMBER() Unique rank, top-N per group, deduplication
RANK() / DENSE_RANK() Ranking with ties
LAG() / LEAD() Compare current row to previous/next
SUM() OVER Running totals
AVG() OVER + frame Moving averages
NTILE() Percentiles, quartiles, bucketing
FIRST_VALUE() / LAST_VALUE() Boundary values within a partition

Window functions don't replace every query pattern — but they replace the ugly ones. Self-joins, correlated subqueries, and cursor-based loops are often a window function with cleaner syntax and a better execution plan.

Learn the frame clause. It's the thing most tutorials skip, and it's where the real power is.


What's your most-used window function pattern in production? Drop it in the comments.


Tags: sql postgres backend database

Top comments (0)