SQL window functions are how candidates jump from "I can write a GROUP BY" to "I can decorate every row with its cohort context without losing the row" — and that single move powers most of the harder sql interview questions in data engineering loops. A window function computes a value for each row by looking at a related set of rows (the window), yet — unlike aggregates with GROUP BY — it never collapses the result set. The original row stays; you simply gain an extra column reflecting partition-level or ordered-neighbor context.
In a typical Postgres-first data engineering interview, you will be graded on five distinct window-function muscles: choosing the right ranking function (ROW_NUMBER vs RANK vs DENSE_RANK), reaching backward or forward across rows with LAG / LEAD, computing running totals and moving averages with SUM / AVG OVER (…), controlling the frame clause (ROWS vs RANGE), and stacking a CTE in front of a window so you can filter on the rank you just produced — the canonical Top-N per group pattern. Every section that follows ends as sql interview questions with answers: a runnable Postgres-flavoured query, a traced execution, an output table, and a concept-by-concept why this works breakdown.
When you want hands-on reps immediately after reading, browse Window functions practice →, drill the window-functions SQL deep dive →, sharpen ranking SQL →, rehearse aggregation SQL →, or widen coverage on the full SQL practice library →.
On this page
- Why SQL window functions matter in data engineering interviews
- Anatomy of
OVER()— PARTITION BY, ORDER BY, frame clause - ROW_NUMBER vs RANK vs DENSE_RANK — pick a ranking on purpose
- Top‑N per group — the CTE + ROW_NUMBER interview pattern
- LAG and LEAD — peek backward and forward across rows
- Running totals and moving averages with SUM / AVG OVER
- Frame clause deep dive — ROWS vs RANGE, UNBOUNDED, default trap
- Choosing a window function (cheat sheet)
- Frequently asked questions
- Practice on PipeCode
1. Why SQL window functions matter in data engineering interviews
Decorate every row with cohort context — without collapsing the result set
What a window function is — at a glance.
- Definition — a function that computes a value for each row by looking at a related set of rows (the window), without collapsing the result.
- Row count guarantee — input row count equals output row count; the window only adds columns, never removes rows.
-
Two "scopes" stacked inside one window —
-
Partition — the rivalry group the row belongs to (defined by
PARTITION BY). - Frame — the subset of the partition the function actually consumes when evaluating the current row (defined by the frame clause).
-
Partition — the rivalry group the row belongs to (defined by
-
Required suffix — every window-function call is followed by an
OVER (...)clause. WithoutOVER, an aggregate likeSUMcollapses rows theGROUP BYway instead of decorating them.
How it differs from GROUP BY (the headline interview talking point).
-
GROUP BYcollapses — one row per group; you cannot ask "show me each employee and their department's average" in the same query without re-joining. - Window function decorates — one row per input; cohort metrics ride alongside, no re-join needed.
-
Same aggregate, different role —
SUM(salary) GROUP BY dept_idreturns one row per department;SUM(salary) OVER (PARTITION BY dept_id)returns one row per employee with the dept total broadcast onto every row. -
Mixed use is legal — a query can have
GROUP BYand a window function; the window evaluates after the grouping collapse, so the window sees the already-grouped rows as its input.
Logical processing order (memorise this — almost every interview asks).
- The clauses execute in this order, regardless of the order you write them:
-
FROM/JOIN— assemble the row source. -
WHERE— pre-filter raw rows. -
GROUP BY— collapse to one row per group. -
HAVING— post-filter the grouped rows. - window functions — decorate the rows that survived above.
-
SELECT— project the final columns. -
ORDER BY— sort the final output.
-
-
Consequence #1 — you cannot reference a window-function alias in the same query's
WHERE,GROUP BY, orHAVING. The alias does not exist yet at those phases. - Consequence #2 — wrap the window in a CTE (or subquery) and filter the alias in the outer query. That is the canonical Top-N per group pattern (covered in §4).
-
Consequence #3 — outer
ORDER BYcannot influence the window's internal ordering; onlyORDER BYinsideOVER (...)does that.
What interviewers listen for when you mention windows.
- Do you say "decorate, don't collapse"? — signals you grok the row-preservation invariant.
- Do you name the logical processing order when explaining why
WHERE rn <= 3does not work? — signals senior fluency. - Do you distinguish partition from frame? — signals you have seen the frame-clause trap (§7).
- Do you reach for a CTE (not a nested subquery) to filter on the rank? — signals readability bias that reviewers reward.
Worked example — average salary attached to every employee
Question. Show every employee with the average salary of their department alongside, so a reviewer can spot under- and over-performers without re-joining an aggregate.
Input. employees(emp_id, name, dept_id, salary).
| emp_id | name | dept_id | salary |
|---|---|---|---|
| 1 | Ava | 10 | 92,500 |
| 2 | Omar | 10 | 88,000 |
| 3 | Mei | 20 | 130,400 |
| 4 | Zoe | 20 | 110,200 |
Code.
SELECT emp_id,
name,
dept_id,
salary,
AVG(salary) OVER (PARTITION BY dept_id) AS dept_avg
FROM employees;
Step-by-step explanation.
- The planner scans
employeesonce. - For each row, it computes
AVG(salary)inside the partition defined bydept_id— every employee in dept 10 sees the same broadcast value. - The original
emp_id,name, andsalaryare preserved; one new columndept_avgis appended. - No
GROUP BYappears in the outer query, so the row count never drops below the input.
Output.
| emp_id | name | dept_id | salary | dept_avg |
|---|---|---|---|---|
| 1 | Ava | 10 | 92,500 | 90,250 |
| 2 | Omar | 10 | 88,000 | 90,250 |
| 3 | Mei | 20 | 130,400 | 120,300 |
| 4 | Zoe | 20 | 110,200 | 120,300 |
Rule of thumb: if the question is "show me each row and its cohort metric," reach for OVER(); if the question is "give me one row per cohort," reach for GROUP BY.
Common beginner mistakes
- Reaching for
GROUP BYwhen the prompt clearly wants every input row preserved — and then re-joining the aggregate back, doubling work. - Trying to put a window function inside a
WHEREclause (WHERE ROW_NUMBER() OVER (...) <= 3) — illegal because windows are evaluated afterWHERE. - Forgetting that
PARTITION BYis optional — without it, the whole result set is one partition and every row sees the global aggregate. - Conflating window with frame — partition is the rivalry group; frame is the moving subset within it.
SQL
Topic — window functions
Window-function SQL problems
SQL
Topic — aggregation
Aggregation drills (the GROUP BY contrast)
2. Anatomy of OVER() — PARTITION BY, ORDER BY, frame clause
Three slots inside OVER(), each doing one well-defined job
Every window function call has the same silhouette:
window_function(args) OVER (
[PARTITION BY expr1, expr2, ...]
[ORDER BY sort_expr [ASC|DESC] [NULLS FIRST|LAST], ...]
[frame_clause]
)
Warehouses disagree on edge-case trivia; they agree on this grammar.
Slot 1 — PARTITION BY (the rivalry group).
- What it does — divides the result set into independent buckets; the window function restarts within each bucket.
- Optional — omit it entirely and the whole result set becomes one partition.
-
Accepts multiple expressions —
PARTITION BY region, channelcreates one bucket per(region, channel)combination. -
Expressions, not just columns —
PARTITION BY DATE_TRUNC('month', ts)partitions by month derived on the fly. -
NULL is a partition value — rows with
NULLin the partition column form their own bucket together. -
Restart semantics — running aggregates, ranks, and
LAG/LEADall reset at every partition boundary.
Slot 2 — ORDER BY (the row sequence inside each partition).
- What it does — defines the order in which rows are processed inside each partition.
-
Required for — every ranking function (
ROW_NUMBER,RANK,DENSE_RANK,NTILE), every neighbour function (LAG,LEAD,FIRST_VALUE,LAST_VALUE,NTH_VALUE), and any running aggregate. -
Optional for partition-wide aggregates —
SUM(x) OVER (PARTITION BY g)withoutORDER BYbroadcasts the total onto every row. -
ASCvsDESC— control the direction; matters for "top first" vs "bottom first" rankings. -
NULLS FIRST/NULLS LAST— explicit NULL placement; defaults vary by dialect (Postgres default isNULLS LASTforASC,NULLS FIRSTforDESC). -
Tiebreaker columns — append a unique-ish column (e.g. primary key) to make
ROW_NUMBERdeterministic across runs. -
Independent of outer
ORDER BY— the outerORDER BYsorts the final output; onlyORDER BYinsideOVER (...)influences the window's internals.
Slot 3 — frame_clause (which rows the function sees).
- What it does — within the ordered partition, restricts the function to a moving subset (the frame) around the current row.
-
Three frame types:
-
ROWS— physical row offsets; deterministic; recommended for moving averages. -
RANGE— logical value offsets against theORDER BYcolumn; peer rows share the frame. -
GROUPS(Postgres 11+) — frame defined by peer groups of equalORDER BYkeys.
-
-
Boundary keywords —
UNBOUNDED PRECEDING,N PRECEDING,CURRENT ROW,N FOLLOWING,UNBOUNDED FOLLOWING. -
Default frame trap (this catches almost every junior candidate):
- With
ORDER BYinsideOVERand no explicit frame → default isRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. - Without
ORDER BY→ default frame is the entire partition. - That default is why
LAST_VALUEdoes not return the partition's last row unless you override the frame toROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.
- With
- We dissect frames in depth in §7 — pay particular attention to ROWS vs RANGE on uneven time series.
Worked example — partition-wide AVG vs running SUM
Question. From a sales(region, sale_date, revenue) table, show every row with both (a) the region's overall average revenue (a partition-wide broadcast, no ORDER BY inside OVER) and (b) the running total of revenue through the current date (a partition-wide aggregate with ORDER BY inside OVER).
Input. sales(region, sale_date, revenue).
| region | sale_date | revenue |
|---|---|---|
| US | 2026-01-01 | 1,000 |
| US | 2026-01-02 | 1,500 |
| US | 2026-01-03 | 1,200 |
| EU | 2026-01-01 | 800 |
| EU | 2026-01-02 | 900 |
Code.
SELECT region,
sale_date,
revenue,
AVG(revenue) OVER (PARTITION BY region) AS region_avg,
SUM(revenue) OVER (PARTITION BY region ORDER BY sale_date) AS running_total
FROM sales;
Step-by-step explanation.
- The planner sorts the rows by
region(and again bysale_datewithin each region) once and reuses the sort for both window expressions. -
region_avguses noORDER BYinsideOVER, so the default frame for that call is the entire partition — every row inUSreads the same broadcast number. -
running_totalincludesORDER BY sale_date, so the default frame becomesRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW— partition rows up to and including the current date. - Each output row keeps its original
region,sale_date, andrevenue; two new columns are appended.
Output.
| region | sale_date | revenue | region_avg | running_total |
|---|---|---|---|---|
| EU | 2026-01-01 | 800 | 850 | 800 |
| EU | 2026-01-02 | 900 | 850 | 1,700 |
| US | 2026-01-01 | 1,000 | 1,233.33 | 1,000 |
| US | 2026-01-02 | 1,500 | 1,233.33 | 2,500 |
| US | 2026-01-03 | 1,200 | 1,233.33 | 3,700 |
Named WINDOW clause — share one definition across multiple functions
When two or more window calls share the same partition and order, name the window once and reuse it:
SELECT region,
sale_date,
revenue,
AVG(revenue) OVER w AS region_avg,
SUM(revenue) OVER w AS running_total
FROM sales
WINDOW w AS (PARTITION BY region ORDER BY sale_date);
This is purely sugar — the plan is identical — but interviewers reward the cleaner read and notice that you know the syntax exists.
Common beginner mistakes
- Leaving
ORDER BYout of theOVERclause when computing a running total — without ordering, the "running" semantics evaporate and you get a partition-wide aggregate broadcast onto every row. - Putting
ORDER BYin the outer query and expecting it to control the window — onlyORDER BYinsideOVERaffects the window; the outerORDER BYonly sorts the final output. - Reusing the same
OVER (…)clause verbatim three or four times — pull it into aWINDOW w AS (…)named window. - Assuming the default frame is "the whole partition" — it is only when
ORDER BYis absent; otherwise it isRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
Rule of thumb: read every OVER (…) clause as three slots — partition, order, frame — and you will never be surprised by a result again.
SQL
Topic — window functions
Window-function topic lane
SQL
Topic — aggregation
Aggregation SQL drills
3. ROW_NUMBER vs RANK vs DENSE_RANK — pick a ranking on purpose
The three rankers differ only in how they handle ties — that's the whole interview
The ranking trio answers one question with three different contracts: what should ties do? Stating the difference correctly is the most-asked sql window function question on screening calls, and the correct answer takes three sentences.
At-a-glance comparison table.
| Function | Behaviour at equal ORDER BY keys |
Skips rank values after a tie? | Output on (1, 1, 1, 4)
|
|---|---|---|---|
ROW_NUMBER() |
broken arbitrarily by trailing keys (or by the engine if no tiebreaker) | n/a — always emits 1..N | 1, 2, 3, 4 |
RANK() |
tied rows share the same rank | yes — rank values are skipped | 1, 1, 1, 4 |
DENSE_RANK() |
tied rows share the same rank | no — next rank is contiguous | 1, 1, 1, 2 |
ROW_NUMBER() — strict 1..N (no ties allowed).
-
What it does — assigns a unique integer 1..N per partition, in
ORDER BYorder. -
Ties — broken arbitrarily by the engine (or deterministically by your trailing
ORDER BYcolumns). -
Determinism — only guaranteed if
ORDER BYincludes a unique tiebreaker (e.g. the primary key); without one, the same query can return differentrnvalues across runs. -
Use it when —
- You want exactly N rows per partition (deterministic Top-N).
- You need a stable surrogate key inside a partition.
- You want to deduplicate "first occurrence per group" (
WHERE rn = 1).
-
Avoid when — the business rule is "all rows tied at the top should qualify" — that needs
RANKorDENSE_RANK.
RANK() — competition ranking (ties share, next rank skips).
- What it does — tied rows share a rank value; subsequent rank values skip by the number of tied rows.
-
Output shape — three rows tied at rank 1 produce
1, 1, 1, 4(rank 2 and 3 are skipped). - Mental model — track meet: two silver medals, then bronze is "medal 4," not "medal 3."
-
Use it when —
- "Top N including ties" is the business rule (Top 3 sales reps; if 4 are tied at #1, all 4 qualify).
- Reports show medal-style awards and the gap should be visible.
-
Watch out for —
WHERE rk <= 3can return more than 3 rows if there are ties at the top — sometimes that is exactly what you want, sometimes a surprise.
DENSE_RANK() — contiguous ranking (ties share, no gaps).
- What it does — tied rows share a rank value; subsequent rank values are contiguous with no gaps.
-
Output shape — three rows tied at rank 1 produce
1, 1, 1, 2. - Mental model — "rank by distinct value": first distinct value is 1, next distinct value is 2, regardless of how many rows tie.
-
Use it when —
- The business cares about distinct value bands, not row positions ("the 2nd-highest distinct salary").
- You want to slice rows by tier without gaps in tier numbers.
-
Trap —
DENSE_RANK() = 2returns "rows at the second-highest distinct value," not "rows ranked 2 by row position." Phrase the requirement carefully.
Cousins to know (even if not the star of this section).
-
NTILE(n)— bucket the partition intonroughly-equal groups (quartiles, deciles, A/B cohorts). -
PERCENT_RANK()—(rank - 1) / (rows - 1); produces a percentile rank from 0 to 1. -
CUME_DIST()— cumulative distribution;(rows <= current) / total_rows. -
FIRST_VALUE,LAST_VALUE,NTH_VALUE— read a specific position within the frame (see §7 for theLAST_VALUEtrap).
Picking the right ranker — quick decision tree.
-
"Exactly N rows per group" →
ROW_NUMBER()with a tiebreaker. -
"Top N including ties" →
RANK(). -
"Top N distinct values" →
DENSE_RANK(). -
"Quartile / decile" →
NTILE(n). -
"Percentile" →
PERCENT_RANK()orCUME_DIST().
ROW_NUMBER() — deterministic 1..N (needs a tiebreaker)
Use when the consumer wants exactly N distinct integers per partition with no duplicates. To make the result deterministic across runs, always add a unique trailing column to ORDER BY (typically the primary key):
SELECT emp_id,
name,
dept_id,
salary,
ROW_NUMBER() OVER (
PARTITION BY dept_id
ORDER BY salary DESC, emp_id ASC
) AS rn
FROM employees;
Without the emp_id tiebreaker, two employees with the same salary could swap rn values between runs and the same SQL would emit different "top 3" sets.
RANK() vs DENSE_RANK() — gaps vs no gaps
RANK() mirrors competition rankings ("two silver medals, then the next award is bronze 4"). DENSE_RANK() mirrors "ranking by distinct values" ("two rows share rank 1, the next distinct value is rank 2"). Both share rank within ties; only RANK() skips integers.
Question. Rank employees by salary inside each department; show the difference between RANK and DENSE_RANK when two employees tie at the top.
Input. employees(emp_id, name, dept_id, salary).
| emp_id | name | dept_id | salary |
|---|---|---|---|
| 1 | Ava | 10 | 100,000 |
| 2 | Omar | 10 | 100,000 |
| 3 | Lin | 10 | 90,000 |
| 4 | Sam | 10 | 85,000 |
Code.
SELECT emp_id,
name,
salary,
ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC, emp_id) AS rn,
RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rk,
DENSE_RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS dr
FROM employees;
Step-by-step explanation.
- The engine partitions by
dept_id(one partition here,dept_id = 10). - Inside the partition it sorts rows by
salary DESC, then byemp_idfor theROW_NUMBERtiebreaker. - Ava and Omar tie on salary.
ROW_NUMBERbreaks the tie deterministically viaemp_id.RANKandDENSE_RANKboth award them rank 1. - After the tie,
RANKjumps to 3 (skipping 2);DENSE_RANKcontinues at 2.
Output.
| emp_id | name | salary | rn | rk | dr |
|---|---|---|---|---|---|
| 1 | Ava | 100,000 | 1 | 1 | 1 |
| 2 | Omar | 100,000 | 2 | 1 | 1 |
| 3 | Lin | 90,000 | 3 | 3 | 2 |
| 4 | Sam | 85,000 | 4 | 4 | 3 |
Common beginner mistakes
- Using
ROW_NUMBER()without a tiebreaker column, then being surprised that the same query returns different "top 3" rows across runs. - Treating
RANKandDENSE_RANKas interchangeable — they only agree when there are zero ties. - Picking
RANKwhen the business definition is "rank by distinct value" — that'sDENSE_RANK. - Trying to express "top 3" with
RANK() <= 3and forgetting that ties at rank 1 could push you past 3 rows.
Rule of thumb: if the question is "top-N rows" (deterministic count), reach for ROW_NUMBER with a tiebreaker. If the question is "top-N values" (with ties), reach for DENSE_RANK. Use RANK only when business explicitly wants gaps.
SQL
Topic — ranking
Ranking SQL problems
SQL
Topic — window functions
Window functions SQL drills
4. Top‑N per group — the CTE + ROW_NUMBER interview pattern
Materialise ranks in a CTE, then filter in the outer query
The classic Top-N pattern shows up in onsite SQL rounds at every FAANG-tier data engineering loop. The shape is always the same: (1) decorate every row with a rank inside its group, (2) filter to keep only ranks ≤ N, (3) return the survivors. Because window functions evaluate after WHERE in the same SELECT, you cannot fold both steps into one query — you decorate inside a CTE, then filter in the outer query.
Why two beats are mandatory (the logical processing order recap).
-
WHEREruns before window functions — at the momentWHEREevaluates, the window alias (rn,rk) does not exist yet. -
HAVINGandGROUP BYalso run before windows — you cannot reference the window alias there either. -
Window aliases only become referenceable in —
- The
SELECTprojection of the same query. - The outer
ORDER BYof the same query. - Any query that consumes the current one as a CTE / subquery / view.
- The
-
Therefore Top-N requires two query layers:
- Inner layer (CTE / subquery) — compute the rank.
-
Outer layer — filter on the rank with a plain
WHERE rk <= N.
Choice of ranker — three different "Top-N" contracts.
-
ROW_NUMBER()— exactly N rows per group, no exceptions.- Requires a deterministic tiebreaker in
ORDER BY. - Best when downstream systems assume "exactly N per category."
- Requires a deterministic tiebreaker in
-
RANK()— top N positions including ties; gaps after ties may push past N rows.- Best when the business rule is "everyone tied for second still qualifies."
-
DENSE_RANK()— top N distinct value bands; ties share bands, no gaps in band numbers.- Best when ranking by "highest 2 distinct salaries," not "2 highest-paid people."
Why a CTE is preferred over a nested subquery.
-
Readable for review — each named CTE has a documented purpose (
ranked,top_two, …) which reviewers can scan in seconds. - Refactor-friendly — comment out one CTE while debugging without re-wiring parentheses.
- Equivalent plan — modern optimisers (Postgres 12+) treat ordinary CTEs as inlined subqueries by default, so readability comes free of cost.
-
MATERIALIZEDhint when you need it — Postgres lets you force CTE materialisation withWITH cte AS MATERIALIZED (...)if the planner is making a bad inlining choice.
Performance considerations.
-
Indexed
(partition, order)columns — an index on(category, total_spend DESC)lets the planner avoid an explicit sort for the window. - Pre-filter inside the CTE — push date filters and tenancy filters as early as possible to shrink the input the window sorts.
-
Avoid full table scans on huge tables — for very wide tables, a covering index on the window's
PARTITION BY+ORDER BYcolumns is usually the biggest win.
Alternatives worth knowing (interview bonus points).
-
DISTINCT ON(Postgres-specific) —SELECT DISTINCT ON (category) … ORDER BY category, total_spend DESCreturns one row per category by the ordering — a Postgres shortcut for "Top-1 per group" without a window. -
LATERALjoin —… JOIN LATERAL (SELECT … FROM t2 WHERE t2.cat = t1.cat ORDER BY … LIMIT N) sub ON truereturns Top-N per group when the outer table is small and indexed lookups are cheap. -
QUALIFYclause (Snowflake / BigQuery) —SELECT … FROM t QUALIFY ROW_NUMBER() OVER (...) <= Ncollapses the CTE+filter into one query; mention it when interviewing for a Snowflake / BigQuery shop.
Worked example — top 2 products by revenue per category
Question. Show the top 2 highest-grossing products within each category in 2022, including the total spend for each. Ties on spend should share a rank slot; we use RANK() so the business rule "everyone tied for second still qualifies" is honoured.
Input. product_spend(category, product, user_id, spend, transaction_date).
| category | product | spend | transaction_date |
|---|---|---|---|
| appliance | refrigerator | 246.00 | 2021-12-26 |
| appliance | refrigerator | 299.99 | 2022-03-02 |
| appliance | washing machine | 219.80 | 2022-03-02 |
| electronics | vacuum | 152.00 | 2022-04-05 |
| electronics | wireless headset | 249.90 | 2022-07-08 |
| electronics | vacuum | 189.00 | 2022-07-15 |
Code.
WITH ranked AS (
SELECT category,
product,
SUM(spend) AS total_spend,
RANK() OVER (PARTITION BY category ORDER BY SUM(spend) DESC) AS rk
FROM product_spend
WHERE EXTRACT(YEAR FROM transaction_date) = 2022
GROUP BY category, product
)
SELECT category,
product,
total_spend
FROM ranked
WHERE rk <= 2
ORDER BY category, rk;
Step-by-step explanation.
- The
WHERE EXTRACT(YEAR FROM transaction_date) = 2022filter applies before the window — the planner shrinks the input to 2022 rows. -
GROUP BY category, productcollapses to one row per(category, product)withSUM(spend)precomputed. - The window function
RANK() OVER (PARTITION BY category ORDER BY SUM(spend) DESC)decorates each group-row with its in-category rank. Crucially, the window runs inside the CTE'sSELECT, after theGROUP BY. - The outer query filters
rk <= 2— legal here becauserkis a column of the materialised CTE relation, not a same-level window alias. - Final
ORDER BY category, rkcontrols only the output ordering, not the window.
SQL interview question — top 2 highest-grossing products per category in 2022
Assume product_spend(category, product, user_id, spend, transaction_date) is populated as above. Return the top 2 products per category in 2022 by SUM(spend), breaking nothing about ties — if two products tie for second, both should appear.
Solution Using a RANK-then-filter CTE
Code solution.
WITH ranked AS (
SELECT category,
product,
SUM(spend) AS total_spend,
RANK() OVER (PARTITION BY category ORDER BY SUM(spend) DESC) AS rk
FROM product_spend
WHERE EXTRACT(YEAR FROM transaction_date) = 2022
GROUP BY category, product
)
SELECT category,
product,
total_spend
FROM ranked
WHERE rk <= 2
ORDER BY category, rk;
Step-by-step trace.
| step | relation | outcome |
|---|---|---|
| 1 | Scan product_spend
|
filter to 2022 rows only |
| 2 | GROUP BY category, product |
one row per (category, product) with total_spend
|
| 3 | RANK() OVER (PARTITION BY category ORDER BY total_spend DESC) |
each group-row now carries an in-category rank; ties share a slot, next rank skips |
| 4 |
WHERE rk <= 2 (outer) |
predicate on the materialised CTE column — legal here, illegal next to the window itself |
| 5 | Outer ORDER BY category, rk
|
output ordering only |
Output:
| category | product | total_spend |
|---|---|---|
| appliance | refrigerator | 299.99 |
| appliance | washing machine | 219.80 |
| electronics | vacuum | 341.00 |
| electronics | wireless headset | 249.90 |
Why this works — concept by concept:
-
CTE materialisation — the
rankedCTE names the decorated relation so the outerWHEREcan reference the window aliasrk; this is not optimisation theatre but a hard rule of the logical processing order. -
RANK semantics — ties at the same
SUM(spend)would share a rank value and still be admitted byrk <= 2; that's the business rule "both runners-up qualify." Swap toROW_NUMBER()(with a tiebreaker) only when "exactly two rows per category, always" is the contract. -
PARTITION BY rivalry — each category is its own ranking universe;
electronicsrows never compete withappliancerows. -
Filter after decoration —
WHERE rk <= 2is a post-rank predicate; trying to push it down into the CTE'sWHEREwould be a syntax error. -
Cost — typically Θ(n log n) for the partitioned sort underneath the window (n = rows after the year filter); a covering index on
(category, transaction_date)can shrink the input cost but the per-partition sort is the dominant term.
SQL
Topic — window functions
Top-N window SQL set
SQL
Topic — ranking
Ranking SQL problems
SQL
Topic — CTE/SQL
CTE-with-window combos
5. LAG and LEAD — peek backward and forward across rows
Compare each row to an ordered neighbor without self-joining
The LAG and LEAD functions answer "what was the value one (or N) rows back / forward inside this ordered partition?" — period-over-period deltas, gap detection, sessionization, repeated-event flagging, and "did the user do X immediately after Y" questions all collapse to a single LAG or LEAD call.
Syntax and arguments.
-
LAG(expr [, offset [, default]]) OVER (PARTITION BY … ORDER BY …)— reads backward. -
LEAD(expr [, offset [, default]]) OVER (PARTITION BY … ORDER BY …)— reads forward. -
expr— the column or expression you want from the neighbouring row (e.g.revenue,event_ts). -
offset— how many rows back (or forward); optional, defaults to 1. -
default— the fallback value returned when the offset walks off the edge of the partition; optional, defaults toNULL.
Direction and partition behaviour.
-
Ordered by
OVER'sORDER BY— withoutORDER BYinsideOVER, the result is undefined; engines may emit a warning or pick an implementation-dependent order. -
LAGdirection — pulls the value from the row that isoffsetpositions earlier in the partition's ordered sequence. -
LEADdirection — pulls from the row that isoffsetpositions later. -
Partition awareness — the offset never crosses a
PARTITION BYboundary; the first row of each partition reads thedefault(orNULL) forLAG, the last row reads the same forLEAD. -
NULLShandling — PostgreSQL always uses the standard'sRESPECT NULLSmode;IGNORE NULLS(skip past NULLs to find the next non-NULL neighbour) is not implemented in Postgres.
Interview use case — period-over-period deltas.
-
Pattern —
metric - LAG(metric) OVER (PARTITION BY entity ORDER BY period). -
Examples —
- Month-over-month revenue change per region.
- Daily active users delta vs yesterday per app.
- Stock price change vs previous trading day per ticker.
-
Why it beats self-join — one window pass vs a join on
t.period = t2.period + INTERVAL '1 month'; faster and clearer.
Interview use case — gap and idle-time detection.
-
Pattern —
event_ts - LAG(event_ts) OVER (PARTITION BY user_id ORDER BY event_ts). -
Examples —
- Flag user sessions that crossed a 30-minute idle threshold.
- Detect SLA breaches when the interval between heartbeats exceeds N seconds.
- Find missing days in a daily timestream (where the gap > 1 day).
-
Sessionization extension — combine
LAGwith a cumulativeSUMover aCASE WHEN gap > 30 minutes THEN 1 ELSE 0 ENDto label session IDs without procedural code.
Interview use case — duplicate / repeated event flagging.
-
Pattern — partition by the duplicate signature, order by timestamp, compare
LAG(ts). -
Examples —
- "Repeated payments within 10 minutes" (Stripe-style; see this section's solution).
- "Refresh storms" — multiple identical reads inside N seconds.
- "Form re-submits" — same form payload submitted twice within a minute.
Interview use case — sequential pattern matching.
-
Pattern —
LAG(prev_event)then filter onprev_event = 'X' AND current_event = 'Y'. -
Examples —
- "Did the user buy AirPods immediately after iPhone?" (Apple-style).
- "Did the visitor view the pricing page right before signup?"
- "Did the order get refunded immediately after delivery?"
Bonus power moves (mention these in onsites).
-
LAG(x, n, default)— replace the leadingNULLwith0,'', or a sentinel; saves aCOALESCElater. -
Multiple
LAGs —LAG(x, 1),LAG(x, 2),LAG(x, 7)to read 1 day ago, 2 days ago, and a week ago in one pass. -
LAG+CASE— combine to encode state transitions like "row changed from approved to denied." -
LEADfor future-aware features — useful for "predicted churn" labels where the label depends on the next 30-day behaviour.
Worked example — month-over-month revenue delta per region
Question. For each (region, month) row, compute the prior month's revenue and the month-over-month delta. The first month per region should show NULL for the prior value and the delta.
Input. monthly_revenue(region, month_start, revenue).
| region | month_start | revenue |
|---|---|---|
| US | 2026-01-01 | 10,000 |
| US | 2026-02-01 | 12,000 |
| US | 2026-03-01 | 11,500 |
| EU | 2026-01-01 | 8,000 |
| EU | 2026-02-01 | 9,500 |
Code.
SELECT region,
month_start,
revenue,
LAG(revenue) OVER (PARTITION BY region ORDER BY month_start) AS prev_revenue,
revenue - LAG(revenue) OVER (PARTITION BY region ORDER BY month_start) AS mom_delta
FROM monthly_revenue;
Step-by-step explanation.
- Rows are partitioned by
region; inside each partition they are ordered bymonth_startascending. -
LAG(revenue)returns the previous row'srevenueinside the same partition; the very first row of each partition readsNULL. - The delta is computed as
revenue - LAG(revenue) OVER (…); the first row's delta isNULLbecause subtractingNULLpropagatesNULLin standard SQL. - No self-join; one sequential scan plus a window sort per partition.
Output.
| region | month_start | revenue | prev_revenue | mom_delta |
|---|---|---|---|---|
| EU | 2026-01-01 | 8,000 | NULL | NULL |
| EU | 2026-02-01 | 9,500 | 8,000 | 1,500 |
| US | 2026-01-01 | 10,000 | NULL | NULL |
| US | 2026-02-01 | 12,000 | 10,000 | 2,000 |
| US | 2026-03-01 | 11,500 | 12,000 | -500 |
Common beginner mistakes
- Forgetting
PARTITION BYand producing deltas that cross partition boundaries (e.g. EU February reading US December as its "previous" row). - Using
LAGwithoutORDER BYinsideOVER— the offset is undefined and the engine may emit a warning or implementation-dependent results. - Trying to do period-over-period with a self-join on
month - 1and then asking the optimiser to clean it up —LAGis faster and clearer. - Ignoring the
defaultargument — replacing the first row'sNULLwith 0 (or any sentinel) is a one-character change:LAG(revenue, 1, 0).
SQL interview question — flag repeated payments within 10 minutes
Assume transactions(transaction_id, merchant_id, credit_card_id, amount, transaction_timestamp). Identify any payment made at the same merchant on the same credit card for the same amount within 10 minutes of a prior payment, and count such repeated payments.
Solution Using LAG on transaction_timestamp
Code solution.
WITH payments AS (
SELECT merchant_id,
credit_card_id,
amount,
transaction_timestamp,
EXTRACT(EPOCH FROM transaction_timestamp -
LAG(transaction_timestamp) OVER (
PARTITION BY merchant_id, credit_card_id, amount
ORDER BY transaction_timestamp
)
) / 60.0 AS minutes_since_prev
FROM transactions
)
SELECT COUNT(*) AS repeated_payment_count
FROM payments
WHERE minutes_since_prev <= 10;
Step-by-step trace.
| step | relation | outcome |
|---|---|---|
| 1 | Scan transactions
|
input multiset of payments |
| 2 | PARTITION BY merchant_id, credit_card_id, amount |
partitions group payments that share the duplicate-signature |
| 3 |
ORDER BY transaction_timestamp (inside OVER) |
establishes chronological ordering inside each partition |
| 4 | LAG(transaction_timestamp) OVER (…) |
each row reads the previous matching payment's timestamp; first row per partition reads NULL
|
| 5 | EXTRACT(EPOCH FROM diff) / 60.0 |
converts the interval to minutes; NULL propagates for the first-of-partition rows and they fail the predicate harmlessly |
| 6 | Outer WHERE minutes_since_prev <= 10 + COUNT(*)
|
counts every row that landed within 10 minutes of its predecessor signature |
Output:
| repeated_payment_count |
|---|
| 1 |
(Demonstrative count based on the DataLemur Stripe example schema; the shape generalises to any deduplication-by-time problem.)
Why this works — concept by concept:
-
PARTITION BY duplicate-signature — by partitioning on
(merchant_id, credit_card_id, amount), the window restricts neighbor-comparison to payments that could plausibly be duplicates; cross-signature comparisons are noise. -
ORDER BY transaction_timestamp — the offset semantics need a chronological direction; without it
LAGis undefined. -
LAG temporal-neighbor — replaces a self-join on
t1.ts - t2.ts <= INTERVAL '10 minutes'with a single window pass; the planner avoids the quadratic join cost. -
EXTRACT(EPOCH FROM …) / 60.0 — converts the timestamp difference (an
INTERVAL) into minutes as a numeric so the outer predicate compares apples to apples. -
Predicate after decoration — same lesson as §4; the
WHERElives outside the CTE because the window alias does not yet exist inside it. - Cost — Θ(n log n) per partitioned sort; far cheaper than the Θ(n²) worst-case of a naive temporal self-join when duplicates are sparse.
SQL
Topic — window functions
LAG / LEAD SQL practice
SQL
Topic — aggregation
Aggregation drills (period roll-ups)
6. Running totals and moving averages with SUM / AVG OVER
Aggregates become window functions the moment OVER() follows them
Any ordinary aggregate (SUM, AVG, COUNT, MIN, MAX) behaves as a window function when the call is followed by an OVER(...) clause; the same aggregate without OVER collapses rows the GROUP BY way. The behaviour depends on three knobs: the partition, the order, and the frame.
Which aggregates work as windows.
-
All standard aggregates —
SUM,AVG,COUNT,COUNT(DISTINCT …)(where the dialect allows),MIN,MAX. -
Statistical aggregates —
STDDEV,VARIANCE,PERCENTILE_CONT,PERCENTILE_DISC(dialect-dependent). -
String aggregates —
STRING_AGG(Postgres),LISTAGG(Snowflake / Oracle),GROUP_CONCAT(MySQL) — most support windowing. - User-defined aggregates — any UDAF that the engine has marked window-safe.
-
Distinct nuance — Postgres allows
COUNT(DISTINCT x) OVER (...)only in some forms; if a dialect rejects it, wrap the distinct logic in a CTE first.
Three behaviour modes, one syntax — picked by the OVER clause shape.
-
Mode A — partition-wide broadcast (no
ORDER BYinsideOVER):- Frame defaults to the entire partition.
-
SUM(revenue) OVER (PARTITION BY region)→ every row gets the region's total. - Use case: attach group totals to detail rows for percentage-of-total calculations.
-
Mode B — running aggregate (
ORDER BYinsideOVER, no explicit frame):- Default frame becomes
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. -
SUM(revenue) OVER (PARTITION BY region ORDER BY sale_date)→ cumulative through current date. - Use case: running totals, running counts, running max / min ("watermark" tracking).
- Default frame becomes
-
Mode C — sliding-window aggregate (explicit
ROWS BETWEEN …frame):- Trailing N-row moving average:
ROWS BETWEEN (N-1) PRECEDING AND CURRENT ROW. - Centred N-row moving average:
ROWS BETWEEN (N/2) PRECEDING AND (N/2) FOLLOWING. - Leading window:
ROWS BETWEEN CURRENT ROW AND N FOLLOWING(useful for forward-looking features).
- Trailing N-row moving average:
Running totals — the canonical recipe.
-
Shape —
SUM(metric) OVER (PARTITION BY entity ORDER BY ordering_col). - What you get — cumulative metric per entity, restarting at every partition boundary.
-
No
PARTITION BY— produces a single global running total across the whole result set. -
Variants worth knowing:
-
COUNT(*) OVER (PARTITION BY user_id ORDER BY event_ts)→ "this is your Nth event." -
MAX(price) OVER (PARTITION BY ticker ORDER BY trade_ts)→ running high-water mark per ticker. -
MIN(salary) OVER (PARTITION BY dept_id ORDER BY hire_date)→ running floor per department.
-
Moving averages — pick ROWS or RANGE deliberately.
-
ROWS BETWEEN (N-1) PRECEDING AND CURRENT ROW— physical row offsets:- Counts exactly N rows regardless of gaps in dates.
- Right for "last 7 trading days" when weekends are missing.
-
RANGE BETWEEN '6 days' PRECEDING AND CURRENT ROW— logical date offsets (Postgres):- Covers every row whose
ORDER BYvalue lies within6 daysof the current row. - Right for "last 7 calendar days" semantics on uneven dates.
- Covers every row whose
- Leading edges are partial — the first N-1 rows of each partition have fewer than N rows in the frame; mention that in interviews ("the first 6 days of a 7-day MA aren't fully populated").
Partition reset and broadcast tricks.
-
PARTITION BYresets the aggregate — switching regions zeroes the running total again. -
No
PARTITION BY= single global window — useful for "rank of this row against the whole dataset." -
Combine running and broadcast in one query — name two
WINDOWclauses; the planner can share the sort if partition and order align.
Cost and indexing notes.
-
One sort per unique window spec — two functions with the same
PARTITION BY+ORDER BYshare a sort; differing specs need separate sorts. -
Index that supports the window — a B-tree on
(partition_cols, order_col)lets the engine skip the sort entirely. -
Memory bound — sliding-window frames are streaming; partition-wide aggregates buffer the whole partition; size your
work_memaccordingly on big partitions. -
Order is
Θ(n log n)worst-case — dominated by the sort underneath the window unless the index removes it.
Worked example — daily running total and 7-day trailing moving average per region
Question. From daily_sales(region, sale_date, revenue), produce the daily revenue, the running total of revenue per region, and the 7-day trailing moving average of revenue per region (counting the current day).
Input. A small daily_sales slice (one region shown for brevity).
| region | sale_date | revenue |
|---|---|---|
| US | 2026-01-01 | 100 |
| US | 2026-01-02 | 150 |
| US | 2026-01-03 | 200 |
| US | 2026-01-04 | 180 |
| US | 2026-01-05 | 220 |
Code.
SELECT region,
sale_date,
revenue,
SUM(revenue) OVER (
PARTITION BY region
ORDER BY sale_date
) AS running_total,
AVG(revenue) OVER (
PARTITION BY region
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS ma7
FROM daily_sales;
Step-by-step explanation.
- Partitioning by
regionresets both aggregates at each region boundary. -
running_totaluses the default frame (everything up to and including the current row in date order). -
ma7uses an explicitROWS BETWEEN 6 PRECEDING AND CURRENT ROWframe — exactly 7 physical rows (or fewer at the leading edge of a region). - Both aggregates share the same partition and order, so an optimiser can compute them in a single sort pass.
Output.
| region | sale_date | revenue | running_total | ma7 |
|---|---|---|---|---|
| US | 2026-01-01 | 100 | 100 | 100 |
| US | 2026-01-02 | 150 | 250 | 125 |
| US | 2026-01-03 | 200 | 450 | 150 |
| US | 2026-01-04 | 180 | 630 | 157.50 |
| US | 2026-01-05 | 220 | 850 | 170 |
SQL interview question — daily revenue and a 7-day trailing moving average per region
Assume daily_sales(region, sale_date, revenue). Return every (region, sale_date) row with that day's revenue, the region's running total through that date, and the 7-day trailing moving average — counting the current day as the seventh day of the window.
Solution Using SUM OVER and AVG OVER with an explicit ROWS frame
Code solution.
SELECT region,
sale_date,
revenue,
SUM(revenue) OVER w_running AS running_total,
AVG(revenue) OVER w_ma7 AS ma7
FROM daily_sales
WINDOW
w_running AS (PARTITION BY region ORDER BY sale_date),
w_ma7 AS (PARTITION BY region ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
ORDER BY region, sale_date;
Step-by-step trace.
| step | relation | outcome |
|---|---|---|
| 1 | Scan daily_sales
|
input rows |
| 2 | Sort by (region, sale_date) once |
shared sort used by both named windows |
| 3 | Evaluate w_running
|
default frame RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW; cumulative SUM per region |
| 4 | Evaluate w_ma7
|
explicit ROWS BETWEEN 6 PRECEDING AND CURRENT ROW; trailing 7-row physical window |
| 5 | Final ORDER BY region, sale_date
|
output ordering only |
Output:
| region | sale_date | revenue | running_total | ma7 |
|---|---|---|---|---|
| US | 2026-01-01 | 100 | 100 | 100 |
| US | 2026-01-02 | 150 | 250 | 125 |
| US | 2026-01-03 | 200 | 450 | 150 |
| US | 2026-01-04 | 180 | 630 | 157.50 |
| US | 2026-01-05 | 220 | 850 | 170 |
Why this works — concept by concept:
-
Aggregate-with-OVER — the same
SUM/AVGyou know fromGROUP BYbecomes a window function the moment you appendOVER (…); row identity survives, the cohort context attaches. -
Default frame for running total —
ORDER BY sale_dateplus no explicit frame meansRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW— a textbook running total without writing the frame at all. -
ROWS vs RANGE for moving average —
ROWS BETWEEN 6 PRECEDING AND CURRENT ROWis physical; for a daily series it is unambiguous.RANGEwould interpret the same numbers asINTERVAL '6 days'-style peer ranges and behave differently when dates are missing. -
Named WINDOW clause — two functions share
(PARTITION BY region ORDER BY sale_date); pulling that intoWINDOW w_running AS (…)removes copy-paste and signals that you know the syntax. -
PARTITION BY scope reset — each region's running total restarts from zero; without
PARTITION BYthe cumulative sum would bleed across regions. - Cost — Θ(n log n) for one partitioned sort, Θ(n) for the two streaming aggregates over the sorted output; one sort feeds both windows.
SQL
Topic — aggregation
Aggregation SQL drills
SQL
Topic — window functions
Running totals and moving averages
7. Frame clause deep dive — ROWS vs RANGE, UNBOUNDED, default trap
The frame controls what each row actually "sees" inside its partition
The frame clause is the third slot in OVER (…) and the slot most candidates skip — which is why interviewers love asking about it. PARTITION BY defines the rivalry group; ORDER BY (inside OVER) defines the row sequence; the frame defines the subset of the ordered partition the window function actually consumes when evaluating the current row.
ROWS — physical row offsets (deterministic, peer-blind).
- What it does — counts physical rows in the ordered partition; the current row is row 0.
-
Deterministic — counts exactly the requested number of rows even when several rows tie on the
ORDER BYvalue. -
ROWS BETWEEN N PRECEDING AND M FOLLOWING— frame coversNrows before plus the current row plusMrows after (N + M + 1rows total when the frame is fully inside the partition). -
Use it when —
- You want exactly N rows in the frame (moving averages, trailing sums).
- The
ORDER BYcolumn has ties and you do not want peer-sharing semantics. - The series is uneven (missing dates) and you want "last 7 actual rows," not "last 7 calendar days."
- Edge effect — at the leading edge of a partition there may be fewer than N rows; the frame contracts silently.
RANGE — logical value offsets (peer-sharing).
-
What it does — frame is defined by a logical value range around the current row's
ORDER BYvalue. -
Peer behaviour — rows with equal
ORDER BYvalues are peers and share a frame slot, so symmetric aggregates produce identical values for tied rows. -
RANGE BETWEEN '6 days' PRECEDING AND CURRENT ROW— covers every row whoseORDER BYvalue is within 6 days of the current row. -
Default frame — when
ORDER BYis present and you specify nothing, the default isRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. -
Use it when —
- You want "last 7 calendar days" semantics, even when some days are missing rows.
- You want tied-row consistency (all rows tied at
score = 100get the same running rank).
- Avoid for — moving averages on uneven time series; ROWS is usually safer.
GROUPS — peer-group offsets (Postgres 11+, less common).
-
What it does — frame is defined by peer groups of equal
ORDER BYkeys; offsets count peer groups, not rows. -
GROUPS BETWEEN 2 PRECEDING AND CURRENT ROW— covers the current peer group plus the two previous peer groups. -
Use it when —
- You want "last 3 distinct values worth of rows" semantics.
- Reporting on a tiered dataset where each distinct
ORDER BYvalue is a tier.
-
Niche — most interview questions stick with
ROWSandRANGE.
Boundary keywords — five anchors you compose into frames.
-
UNBOUNDED PRECEDING— first row of the partition (or the lowestORDER BYvalue forRANGE). -
N PRECEDING—Nrows before the current row (physical forROWS, value-based forRANGE). -
CURRENT ROW— the current row (or the current peer group forRANGE). -
N FOLLOWING—Nrows after the current row. -
UNBOUNDED FOLLOWING— last row of the partition.
Default-frame trap — LAST_VALUE and friends.
-
The trap — with
ORDER BYpresent and no explicit frame, the default isRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. -
Consequence —
LAST_VALUE(price) OVER (PARTITION BY g ORDER BY price)returns the current row's price, not the partition's last row, because the default frame ends at the current row. -
The fix —
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWINGopens the frame to the whole partition. -
Same trap affects —
LAST_VALUE,NTH_VALUE, partition-wideMAX/MINcalls that includeORDER BYfor some other reason. -
One-liner check — whenever you write
ORDER BYinsideOVER, ask: "is the default frame what I want?" If no, override it explicitly.
Frame recipe cheat-bullets (memorise these).
-
Running total —
ORDER BY t(default frame isUNBOUNDED PRECEDING → CURRENT ROW; fine). -
Trailing 7-row moving average —
ORDER BY t ROWS BETWEEN 6 PRECEDING AND CURRENT ROW. -
Centred 7-row moving average —
ORDER BY t ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING. -
Leading 7-day forecast feature —
ORDER BY t ROWS BETWEEN CURRENT ROW AND 6 FOLLOWING. -
Partition max broadcast onto every row — no
ORDER BYinsideOVER, ORROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. -
Calendar-day window on uneven dates —
RANGE BETWEEN '6 days' PRECEDING AND CURRENT ROW(Postgres date / timestampORDER BY).
Worked example — LAST_VALUE and the default-frame trap
Question. From products(product_id, group_id, price) return each product with the highest price inside its group attached. A naive LAST_VALUE will silently return the wrong value because of the default frame.
Input. products(product_id, group_id, price).
| product_id | group_id | price |
|---|---|---|
| 1 | Tablet | 150 |
| 2 | Tablet | 200 |
| 3 | Tablet | 700 |
Code (the trap).
SELECT product_id,
group_id,
price,
LAST_VALUE(price) OVER (
PARTITION BY group_id
ORDER BY price
) AS highest_price_per_group_wrong
FROM products;
The default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, so LAST_VALUE reads only rows up to the current row — it returns the current price for each row, not the partition max.
Code (the fix).
SELECT product_id,
group_id,
price,
LAST_VALUE(price) OVER (
PARTITION BY group_id
ORDER BY price
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS highest_price_per_group
FROM products;
Step-by-step explanation.
- Partitioning by
group_idisolates each product group. - The explicit
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWINGframe opens the full partition toLAST_VALUE. - With
ORDER BY priceascending and the full-partition frame,LAST_VALUE(price)correctly returns the highest price per group.
Output.
| product_id | group_id | price | highest_price_per_group |
|---|---|---|---|
| 1 | Tablet | 150 | 700 |
| 2 | Tablet | 200 | 700 |
| 3 | Tablet | 700 | 700 |
Common frame recipes
| Intent | Frame to use |
|---|---|
| Running total / cumulative count |
ORDER BY t (default frame is fine) |
| Trailing N-row moving average | ROWS BETWEEN (N-1) PRECEDING AND CURRENT ROW |
| Centred N-row moving average | ROWS BETWEEN (N/2) PRECEDING AND (N/2) FOLLOWING |
| Partition-wide aggregate broadcast | no ORDER BY inside OVER, or ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
|
LAST_VALUE returning the partition's last row |
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING |
Common beginner mistakes
- Reading the documentation's "default frame" and assuming it covers the whole partition — it does not when
ORDER BYis present. - Using
RANGEfor a moving average — when theORDER BYcolumn has ties,RANGEshares the frame across peers and you get an over-counted average for the tied rows. - Forgetting that
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWINGis the same as no order, full partition — they produce identical results for symmetric aggregates.
Rule of thumb: if you write ORDER BY inside OVER, always consider whether the default frame is what you want — and override it the moment the answer is "no".
SQL
Topic — window functions
Frame-aware window practice
SQL
Topic — ranking
Ranking + frame combos
Choosing a window function (cheat sheet)
| You want … | Reach for … | Frame to consider |
|---|---|---|
| Exactly N rows per partition with a deterministic order |
ROW_NUMBER() with a tiebreaker |
n/a — ROW_NUMBER ignores frames |
| Ranked rows where ties share a slot and the next rank skips | RANK() |
n/a |
| Ranked rows where ties share a slot and the next rank is contiguous | DENSE_RANK() |
n/a |
| Previous / next neighbour value in an ordered partition |
LAG(expr, n, default) / LEAD(expr, n, default)
|
n/a |
| Running total or cumulative count |
SUM() / COUNT() OVER (PARTITION BY … ORDER BY …)
|
default (RANGE … UNBOUNDED PRECEDING TO CURRENT ROW) |
| Trailing N-row moving average |
AVG() OVER (… ORDER BY … ROWS BETWEEN (N-1) PRECEDING AND CURRENT ROW)
|
physical ROWS
|
| Partition max / min broadcast onto every row |
MAX() / MIN() OVER (PARTITION BY …)
|
omit ORDER BY or use UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
|
| First / last value in a partition |
FIRST_VALUE() / LAST_VALUE()
|
for LAST_VALUE use ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
|
| Bucket rows into N percentile bands |
NTILE(N) OVER (ORDER BY …)
|
n/a |
Frequently asked questions
What's the difference between ROW_NUMBER, RANK, and DENSE_RANK in SQL?
ROW_NUMBER() assigns a unique sequential integer 1..N inside each partition; ties are broken by trailing ORDER BY columns or, if you provide none, by the engine. RANK() shares the same rank value across tied rows and then skips rank values — three rows tied at rank 1 produce a 1, 1, 1, 4 sequence. DENSE_RANK() also shares ranks across ties but does not skip — the same input produces 1, 1, 1, 2. Use ROW_NUMBER for deterministic Top-N slicing with a tiebreaker, RANK for competition-style rankings, and DENSE_RANK for "rank by distinct value" semantics.
How do LAG and LEAD work in SQL?
LAG(expr, offset, default) returns the value of expr from the row that is offset rows before the current row inside the same partition, ordered by the OVER clause's ORDER BY. LEAD(expr, offset, default) does the same but reads forward. Both default offset to 1 and default to NULL. They never cross partition boundaries — the first row of each partition reads the default (or NULL) for LAG, and the last row reads the same for LEAD. Reach for them whenever a question requires comparing the current row to its previous or next ordered neighbour without writing a self-join.
How do I write a running total with a SQL window function?
Use SUM(metric) OVER (PARTITION BY partition_cols ORDER BY ordering_col) and rely on the default frame (RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW). The result is a cumulative sum from the first row of each partition through the current row, with the partition restarting the count at every boundary. If you want the cumulative total to count rows physically (rather than by ORDER BY value peers), add ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW explicitly. Drop PARTITION BY to get a global running total across the entire result set.
What's the frame clause and what's the default frame in SQL window functions?
The frame clause is the optional third slot inside OVER (…) — ROWS, RANGE, or GROUPS BETWEEN <lower> AND <upper> — that defines which subset of the ordered partition the function actually sees when evaluating each row. The default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW whenever ORDER BY is present inside OVER and you do not specify a frame; if ORDER BY is also absent, the frame defaults to the entire partition. The default is what makes LAST_VALUE famously "broken" — you almost always have to override it to ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.
Why can't I use a window function alias inside WHERE?
Because window functions are evaluated after FROM/JOIN/WHERE/GROUP BY/HAVING and before SELECT's final projection and the outer ORDER BY. That means the window alias (rn, rk, ma7, …) does not exist yet when WHERE runs — referencing it would be a logical-ordering violation. The standard workaround is to push the window into a CTE (or subquery) and filter the alias in the outer query that consumes the CTE. The same logical order also explains why you cannot reference a window alias in GROUP BY or HAVING from the same SELECT.
Practice on PipeCode
PipeCode ships 450+ data-engineering interview problems — including PostgreSQL-first SQL practice keyed to window functions, ranking, aggregation, CTE + window combinations, and frame-aware running totals.
Kick off via Explore practice →; drill the dedicated window functions SQL lane →; fan out into ranking SQL →; reinforce aggregation SQL → whenever grouped metrics underpin your windows; rehearse CTE + window patterns → for the Top-N per group classic; widen coverage on the full SQL practice library →.





Top comments (0)