Most developers know SELECT, JOIN, and GROUP BY. Window functions are where the gap between mid and senior starts to show.
What makes window functions different
GROUP BY collapses rows. You lose the individual record — it becomes part of an aggregate.
Window functions don't collapse anything. They compute an aggregate over a set of rows while keeping every row intact. That's the core idea.
-- GROUP BY: one row per department
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;
-- Window function: every employee row, plus their department average
SELECT
name,
department,
salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;
Same average calculation. Completely different result shape. Window functions let you ask "how does this row relate to the group?" — something GROUP BY simply can't do.
The anatomy of a window function
function_name() OVER (
PARTITION BY col1, col2 -- defines the "group" (optional)
ORDER BY col3 -- defines row order within the group
ROWS BETWEEN ... -- defines the frame (optional)
)
- PARTITION BY — splits rows into independent windows. Omit it and the entire result set is one window.
- ORDER BY — required for ranking and frame-sensitive functions. Changes the meaning of aggregates.
- frame clause — defines which rows are "visible" to the function. Default varies by function.
Ranking functions
ROW_NUMBER()
Assigns a unique sequential number to each row within the partition. No ties — if two rows are equal, the order is arbitrary but distinct.
SELECT
name,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees;
Classic use case: get the top N rows per group without a subquery per group.
-- Top earner per department
SELECT * FROM (
SELECT
name,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees
) ranked
WHERE rn = 1;
RANK() vs DENSE_RANK()
Both handle ties, but differently:
-
RANK()— tied rows get the same rank, next rank skips. (1, 2, 2, 4) -
DENSE_RANK()— tied rows get the same rank, next rank does not skip. (1, 2, 2, 3)
SELECT
name,
salary,
RANK() OVER (ORDER BY salary DESC) AS rank,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees;
Use DENSE_RANK() when you want "2nd place" to actually mean second, not "tied for second, third place vacant."
Offset functions: LAG and LEAD
These let you access another row's value from the current row — without a self-join.
LAG(column, offset, default) OVER (PARTITION BY ... ORDER BY ...)
LEAD(column, offset, default) OVER (PARTITION BY ... ORDER BY ...)
-- Month-over-month revenue change
SELECT
month,
revenue,
LAG(revenue, 1, 0) OVER (ORDER BY month) AS prev_month_revenue,
revenue - LAG(revenue, 1, 0) OVER (ORDER BY month) AS delta
FROM monthly_revenue;
-- Time between consecutive orders per user
SELECT
user_id,
order_date,
LEAD(order_date) OVER (PARTITION BY user_id ORDER BY order_date) AS next_order,
LEAD(order_date) OVER (PARTITION BY user_id ORDER BY order_date) - order_date AS days_until_next
FROM orders;
LAG/LEAD replace a category of self-joins that are painful to read and slow to execute. If you see a query joining a table to itself on id - 1 or date arithmetic, it's almost always a LAG in disguise.
Aggregate window functions
Any aggregate (SUM, AVG, COUNT, MIN, MAX) can be used as a window function. This unlocks running totals and moving averages without cursors or subqueries.
Running total
SELECT
order_date,
amount,
SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders;
Running total reset per partition
SELECT
user_id,
order_date,
amount,
SUM(amount) OVER (PARTITION BY user_id ORDER BY order_date) AS user_running_total
FROM orders;
Moving average (last 7 days)
This is where the frame clause matters:
SELECT
date,
daily_revenue,
AVG(daily_revenue) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7d
FROM daily_revenue;
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW — the window includes the current row and the 6 rows before it. Classic 7-day rolling average.
ROWS vs RANGE
-
ROWS— physical rows.ROWS BETWEEN 6 PRECEDING AND CURRENT ROWmeans exactly 6 rows back. -
RANGE— logical range based on the ORDER BY value.RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROWmeans all rows within 6 days, regardless of row count.
For time series with gaps, RANGE is usually what you actually want.
-- ROWS: always exactly 6 rows back (may span more than 6 days if data is sparse)
AVG(revenue) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
-- RANGE: all rows within the last 6 days (correct for sparse data)
AVG(revenue) OVER (ORDER BY date RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW)
NTILE(): bucketing rows
Divides rows into N equal-sized buckets. Useful for percentiles, quartiles, and cohort analysis.
-- Divide customers into 4 quartiles by total spend
SELECT
customer_id,
total_spend,
NTILE(4) OVER (ORDER BY total_spend DESC) AS quartile
FROM customer_spend;
Quartile 1 = top spenders. Combine with a CTE to filter or label each group.
FIRST_VALUE() and LAST_VALUE()
Return the first or last value in the window frame.
SELECT
name,
department,
salary,
FIRST_VALUE(name) OVER (
PARTITION BY department ORDER BY salary DESC
) AS highest_paid_in_dept
FROM employees;
LAST_VALUE() has a trap: the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, so "last" means the current row, not the end of the partition. Fix it explicitly:
LAST_VALUE(name) OVER (
PARTITION BY department
ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
Reading execution plans
Window functions aren't free. Know what to look for in EXPLAIN ANALYZE.
EXPLAIN ANALYZE
SELECT
user_id,
order_date,
SUM(amount) OVER (PARTITION BY user_id ORDER BY order_date)
FROM orders;
Look for:
- WindowAgg — the window function node. Normal.
-
Sort — a sort before
WindowAgg. May be expensive on large tables. -
Seq Scan vs Index Scan — if your
PARTITION BYorORDER BYcolumns aren't indexed, Postgres will sort the full table.
When to add an index
If the same PARTITION BY / ORDER BY combination appears in multiple queries:
CREATE INDEX idx_orders_user_date ON orders (user_id, order_date);
This lets Postgres avoid a full sort and use the index order directly.
When NOT to use window functions
- Small datasets where a simple subquery is more readable
- When you need to filter on the window result in the same query level — you can't
WHERE rank = 1directly; you need a CTE or subquery - When the frame logic gets complex enough that a CTE makes the intent clearer
Practical patterns worth bookmarking
Deduplicate keeping latest record
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS rn
FROM events
)
SELECT * FROM ranked WHERE rn = 1;
Percent of total
SELECT
department,
SUM(salary) AS dept_total,
SUM(salary) * 100.0 / SUM(SUM(salary)) OVER () AS pct_of_company
FROM employees
GROUP BY department;
Note: SUM(SUM(salary)) OVER () — a window function over an aggregate. This is valid SQL and very useful.
Gap detection between dates
SELECT
user_id,
order_date,
LAG(order_date) OVER (PARTITION BY user_id ORDER BY order_date) AS prev_order,
order_date - LAG(order_date) OVER (PARTITION BY user_id ORDER BY order_date) AS gap_days
FROM orders
WHERE gap_days > 30; -- wrap in CTE to filter
Summary
| Function | Use case |
|---|---|
ROW_NUMBER() |
Unique rank, top-N per group, deduplication |
RANK() / DENSE_RANK()
|
Ranking with ties |
LAG() / LEAD()
|
Compare current row to previous/next |
SUM() OVER |
Running totals |
AVG() OVER + frame |
Moving averages |
NTILE() |
Percentiles, quartiles, bucketing |
FIRST_VALUE() / LAST_VALUE()
|
Boundary values within a partition |
Window functions don't replace every query pattern — but they replace the ugly ones. Self-joins, correlated subqueries, and cursor-based loops are often a window function with cleaner syntax and a better execution plan.
Learn the frame clause. It's the thing most tutorials skip, and it's where the real power is.
What's your most-used window function pattern in production? Drop it in the comments.
Tags: sql postgres backend database
Top comments (0)