A PostgreSQL SQL cheat sheet is only useful when every row in it maps to something you can drop straight into a query — not a wall of syntax with no operational explanation. This guide condenses real PostgreSQL fluency to four primitives: the logical clause order (FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT), the six join shapes and the grain trap they create, GROUP BY with HAVING plus conditional aggregates for one-pass metrics, and window functions like ROW_NUMBER, RANK, DENSE_RANK, LAG, and LEAD for ranking and lookback. These four cover the bulk of analytical SQL — and the cheat-sheet style below is built so you can scan, copy a snippet, and tweak it for your own schema.
Every section walks through a detailed topic explanation, sub-topics with worked examples and runnable solutions, common beginner mistakes, and a worked interview-style scenario with a full traced answer. PostgreSQL syntax throughout — the dialect that drives DataLemur, CoderPad, most product-analytics live screens, and the bulk of modern data-engineering SQL corpora.
Top PostgreSQL SQL cheat sheet topics
The four numbered sections below follow this topic map — one row per H2, every row expanded into a full section with sub-topics, worked examples, a worked interview question, and a step-by-step traced solution:
| # | Topic | Why it shows up in a PostgreSQL cheat sheet |
|---|---|---|
| 1 | Logical clause order — FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT |
The single most useful PostgreSQL mental model: the order you write clauses is not the order the engine evaluates them; knowing the evaluation order eliminates 80% of parse errors and explains why WHERE cannot reference aggregates or column aliases. |
| 2 | Joins and grain — INNER, LEFT, RIGHT, FULL, SELF, CROSS |
Joins combine rows but they also change grain; a careless 1:N join inflates row counts silently, and the LEFT JOIN ... IS NULL anti-join is the canonical "find rows in A with no match in B" pattern (orphan customers, churned users). |
| 3 | GROUP BY, HAVING, and conditional aggregates |
WHERE filters rows before grouping; HAVING filters groups after; COUNT(*) FILTER (WHERE …) and SUM(CASE WHEN …) express many metrics in one query — the universal duplicate finder HAVING COUNT(*) > 1 lives here. |
| 4 | Window functions — ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD |
Per-partition ranking without collapsing rows, top-N-per-group, second-highest salary, running totals with SUM(...) OVER (PARTITION BY ... ORDER BY ...), and month-over-month deltas via LAG; the most-graded primitive in modern SQL screens. |
Beginner-friendly framing: every analytical SQL question reduces to four steps — filter rows, join tables without changing grain by accident, aggregate or rank, then present the result. Holding the clause-order diagram in your head (Section 1) lets you write SQL outside-in: pick the grain, then the joins, then the filters, then the projection. The cheat sheet below is organized in the same order you would write a real query.
1. PostgreSQL Logical Clause Order — FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT
The seven-stage evaluation order every PostgreSQL query follows
"Why does WHERE customer_count > 5 give me a parse error when I'm clearly counting customers?" is the signature beginner question — and the answer is logical clause order. The mental model: PostgreSQL evaluates clauses in a fixed order that is different from the order you write them; FROM/JOIN builds the row set, WHERE filters rows, GROUP BY collapses rows into groups, HAVING filters groups, SELECT projects columns, ORDER BY sorts, LIMIT/OFFSET trims. WHERE cannot reference aggregate functions because aggregates do not exist until after GROUP BY; column aliases declared in SELECT cannot be referenced in WHERE for the same reason.
Pro tip: Memorize one sentence — "From-Where-Group-Having-Select-Order-Limit" — and you can decode any PostgreSQL parse error in under five seconds. The error
column "customer_count" does not existalmost always means the column is aSELECT-level alias being referenced inWHERE, which runs three stages earlier; lift the predicate intoHAVING(if it references an aggregate) or repeat the expression inline inWHERE.
FROM and JOIN — build the working row set
The FROM/JOIN invariant: the first stage assembles a candidate row set by listing the tables (and how they join); every subsequent stage operates on this row set. Subqueries in FROM are also evaluated here, and LATERAL joins let later subqueries reference earlier rows.
-
Single table —
FROM ordersproduces one row perordersrow. -
Joined tables —
FROM orders o JOIN customers c ON c.id = o.customer_idproduces one row per matching pair. -
Subquery in
FROM—FROM (SELECT ...) tmaterializes the inner result, then treats it as a table. -
LATERALsubquery —FROM orders o, LATERAL (SELECT ... WHERE x = o.id) sre-evaluates the inner subquery per outer row.
Worked example. A FROM with a LEFT JOIN that produces the right row set before any filter runs.
| step | output cardinality |
|---|---|
FROM customers alone |
3 rows |
LEFT JOIN orders |
4 rows (Alice has 2 orders, Bob 1, Carol 0 padded with NULLs) |
ready for WHERE filtering |
4 rows |
Step-by-step explanation.
- The engine reads
customersfirst, producing three rows (Alice, Bob, Carol). - For each customer, it scans
ordersfor matchingcustomer_idrows; Alice matches 2 orders, Bob matches 1, Carol matches 0. - Because the join is
LEFT, Carol's row is preserved with the right-side columns filled withNULLs — total 4 rows. - This 4-row stream is what
WHEREwill see; no filtering has happened yet. - Without understanding
FROMruns first, you can't reason about why aWHEREpredicate on the right side of aLEFT JOINsilently converts the join into anINNER JOIN.
Worked-example solution.
SELECT c.name, o.order_id, o.amount
FROM customers c
LEFT JOIN orders o
ON o.customer_id = c.id;
Rule of thumb: if your LEFT JOIN is producing fewer rows than expected, check whether you have a WHERE predicate that references the right-side table — that predicate runs after the join and discards the NULL-padded rows.
WHERE — row-level predicates before grouping
The WHERE invariant: WHERE filters individual rows from the FROM/JOIN output before GROUP BY runs; it can reference any column from the joined row set, but cannot reference aggregate functions or SELECT-level aliases. This is the cheapest place to drop rows — push predicates here whenever possible.
-
Row predicates —
WHERE amount > 30,WHERE order_date >= '2026-01-01'. -
IN/EXISTS—WHERE customer_id IN (SELECT id FROM premium),WHERE EXISTS (...). -
BETWEEN— inclusive on both ends;WHERE x BETWEEN 1 AND 10isx >= 1 AND x <= 10. -
IS NULL/IS NOT NULL— the only way to check forNULL; never= NULL.
Worked example. Filter to one day of orders before grouping.
| filter | rows surviving |
|---|---|
| no filter | 12,847 (full day's orders) |
WHERE order_date = '2026-05-10' |
12,847 (already today's) |
WHERE order_date = '2026-05-10' AND amount > 100 |
4,290 (high-value only) |
Step-by-step explanation.
-
FROM ordersreturns the full row stream. -
WHERE order_date = '2026-05-10'is evaluated per row; rows with other dates are dropped. -
AND amount > 100is evaluated next; this is a row predicate (not an aggregate), so it lives inWHEREcorrectly. - The surviving row set (4,290 rows) flows into
GROUP BYif one is present, otherwise intoSELECT. - Pushing the date filter into
WHERErather thanHAVINGis critical for index usage: a B-tree index onorder_datecan prune 95% of the table before any grouping happens.
Worked-example solution.
SELECT customer_id, SUM(amount) AS total
FROM orders
WHERE order_date = '2026-05-10'
AND amount > 100
GROUP BY customer_id;
Rule of thumb: if the predicate uses only raw row columns, it belongs in WHERE; if it uses SUM, COUNT, AVG, MIN, MAX, it belongs in HAVING.
GROUP BY → HAVING → SELECT → ORDER BY → LIMIT
The downstream invariant: after WHERE, the engine evaluates GROUP BY (collapsing rows into one row per distinct key combination), then HAVING (filtering groups), then SELECT (projecting columns and computing expressions), then ORDER BY (sorting the final result), then LIMIT/OFFSET (trimming for pagination). SELECT-level aliases become referenceable only in ORDER BY and the outer query (in a subquery context).
-
GROUP BY col1, col2— one output row per distinct(col1, col2)combination. -
HAVING agg_pred— filter groups; can referenceCOUNT(*),SUM(col), etc. -
SELECT col, agg(col2) AS x— project columns; aggregates and aliases are computed here. -
ORDER BY x DESC, col— can referenceSELECTaliases; deterministic with a tiebreaker. -
LIMIT N OFFSET M— page slicing; always pair withORDER BYfor determinism.
Worked example. Group by customer, filter to high-spend customers, sort descending, top 5.
| stage | rows |
|---|---|
FROM orders WHERE order_date = '2026-05-10' |
4,290 |
GROUP BY customer_id |
1,720 (one row per customer) |
HAVING SUM(amount) > 500 |
312 (high-spend) |
SELECT customer_id, SUM(amount) AS spend |
312 (projected) |
ORDER BY spend DESC, customer_id |
312 (sorted) |
LIMIT 5 |
5 |
Step-by-step explanation.
-
WHEREproduces 4,290 rows for one day withamount > 100. -
GROUP BY customer_idcollapses them into 1,720 buckets, one per customer. -
HAVING SUM(amount) > 500keeps only the 312 buckets whose total spend exceeds $500. -
SELECTcomputes the aliasspend = SUM(amount)and projects two columns. -
ORDER BY spend DESC, customer_idsorts the 312 surviving rows by descending spend with a deterministic tiebreaker;LIMIT 5returns just the top five.
Worked-example solution.
SELECT customer_id, SUM(amount) AS spend
FROM orders
WHERE order_date = '2026-05-10'
AND amount > 100
GROUP BY customer_id
HAVING SUM(amount) > 500
ORDER BY spend DESC, customer_id
LIMIT 5;
Rule of thumb: every clause has a fixed slot; if you find yourself wanting WHERE to reference an aggregate, the predicate belongs in HAVING instead — and if you want ORDER BY to use a long expression, alias it in SELECT and reference the alias.
Common beginner mistakes
-
WHERE COUNT(*) > 1— parse error; aggregates do not exist until afterGROUP BY. UseHAVING. - Referencing a
SELECTalias inWHERE—WHERE spend > 100afterSELECT SUM(amount) AS spendfails; either repeat the expression or move toHAVING. - Selecting a non-aggregated, non-
GROUP BYcolumn — strict PostgreSQL errors out with "must appear in GROUP BY"; some other dialects pick an arbitrary row silently. -
LIMIT 5withoutORDER BY— non-deterministic; two runs of the same query return different rows. - Putting
HAVINGbeforeGROUP BY— syntax error; the clause order is mandatory.
PostgreSQL Interview Question on Clause Order
Given orders(order_id, customer_id, amount, order_date), find every customer who placed more than 3 orders today with total spend above $500. Return customer_id and total_spend, sorted by total_spend descending.
Solution Using WHERE + GROUP BY + HAVING in the Right Slots
SELECT customer_id,
SUM(amount) AS total_spend
FROM orders
WHERE order_date = CURRENT_DATE
GROUP BY customer_id
HAVING COUNT(*) > 3
AND SUM(amount) > 500
ORDER BY total_spend DESC, customer_id;
Why this works: WHERE order_date = CURRENT_DATE filters to today's row set first (cheap, index-friendly); GROUP BY customer_id collapses to one row per customer; HAVING evaluates the two aggregate predicates together (more than 3 orders AND total > $500); SELECT projects the alias total_spend; ORDER BY total_spend DESC, customer_id produces a deterministic ordering. Single pass over today's rows with hash aggregation.
Step-by-step trace for sample data on 2026-05-10:
| customer_id | orders today | sum(amount) | passes HAVING? |
|---|---|---|---|
| 101 | 5 | 720 | ✓ |
| 102 | 4 | 410 | ✗ (sum ≤ 500) |
| 103 | 6 | 1,250 | ✓ |
| 104 | 2 | 800 | ✗ (count ≤ 3) |
| 105 | 4 | 520 | ✓ |
Three customers survive both predicates.
Output:
| customer_id | total_spend |
|---|---|
| 103 | 1250 |
| 101 | 720 |
| 105 | 520 |
Why this works — concept by concept:
-
WHEREfirst —order_date = CURRENT_DATEis a row predicate using a non-aggregated column; pushing it intoWHEREshrinks the row set before grouping and lets the planner use a B-tree index onorder_date. -
GROUP BY customer_id— collapses today's rows into one bucket per customer; every subsequent aggregate is computed inside this bucket. -
HAVINGtwo-predicate AND —COUNT(*) > 3andSUM(amount) > 500are both aggregate predicates; combining them withANDin a singleHAVINGis the canonical multi-condition group filter. -
SELECTprojection + alias —SUM(amount) AS total_spendis computed here; the alias becomes available toORDER BY(but not toWHERE/HAVING). -
ORDER BY total_spend DESC, customer_id— descending sort on the metric with a deterministic tiebreaker viacustomer_id; reviewers depend on stable ordering. -
O(|today's orders| + G log G)time — single hash aggregation producesGgroups; final sort isG log G. With an index on(order_date, customer_id)the planner can stream rather than hash.
Inline CTA: Drill the SQL filtering practice page for WHERE patterns and the SQL aggregation practice page for GROUP BY + HAVING shapes.
SQL
Topic — filtering
SQL filtering problems
SQL
Topic — aggregation
SQL aggregation problems
SQL
Language — SQL
All SQL practice problems
2. PostgreSQL Joins and Grain — INNER, LEFT, RIGHT, FULL, SELF, CROSS
Joins, anti-joins, and the grain-inflation trap in PostgreSQL
"Why is SUM(amount) returning double what I expect after I add a JOIN?" is the signature grain-inflation question — and the answer is that joins do not just combine columns; they change the row cardinality of the result. The mental model: INNER JOIN keeps only matching pairs, LEFT JOIN keeps every left row and pads the right side with NULLs, RIGHT JOIN is the mirror, FULL OUTER JOIN keeps both sides' unmatched rows, SELF JOIN joins a table to itself (for hierarchies and pair queries), CROSS JOIN produces a Cartesian product (one row per (left, right) pair). The cardinality of any join is bounded by |left| × |right|, and a 1:N relationship inflates left rows by N — the silent source of doubled metrics.
Pro tip: Before writing any join, ask "what is the grain of the result?" — orders, order lines, customer-day, or
(customer, product)pair. A1:Njoin (e.g.,customerstoorders) inflates customer rows by the number of orders;SUM(customer.lifetime_value)after that join returns lifetime value × order count, not lifetime value. Always state the grain out loud.
INNER JOIN — keep only matching pairs (no padding)
The INNER JOIN invariant: a left row is paired with a right row iff the join predicate is TRUE; unmatched rows on either side are discarded; the result cardinality is the count of matching pairs. This is the most common join and the fastest because the planner can short-circuit on no-match.
-
ON l.key = r.key— single-column equi-join; the planner hashes the right table. -
Multi-column —
ON l.a = r.a AND l.b = r.bfor composite keys. -
Non-equi —
ON l.range_start <= r.point AND l.range_end >= r.point(range join). -
USING (col)— shorthand when both sides share the column name; merges the column.
Worked example. Two tables, three customers, two orders; one customer has no order.
| customer | order_id |
|---|---|
| Alice | 101 |
| Bob | 102 |
Carol (no orders) does not appear — INNER JOIN dropped her.
Step-by-step explanation.
- The engine reads
customers(Alice, Bob, Carol) andorders(101 for Alice, 102 for Bob). - For each
customersrow, it scansordersfor a matchingcustomer_id. - Alice matches
order_id = 101; Bob matchesorder_id = 102; Carol has no match. - Carol's row is silently discarded because the join is
INNER— noNULL-padded row is produced. - The output has two rows because there were two matching pairs; the result cardinality is
min(|customers|, |orders|) ≤ N ≤ |customers| × |orders|.
Worked-example solution.
SELECT c.name AS customer, o.order_id
FROM customers c
INNER JOIN orders o
ON o.customer_id = c.id;
Rule of thumb: reach for INNER JOIN whenever the question is "rows where both sides exist"; it is the smallest, fastest, most common join.
LEFT JOIN — keep every left row, pad the right with NULLs (anti-join trick)
The LEFT JOIN invariant: every row from the left table appears in the output; if no right row matches, the right columns are NULL; LEFT JOIN ... WHERE right.key IS NULL keeps exactly the left rows that had no match — the anti-join idiom. RIGHT JOIN is the mirror; flip the table order and use LEFT for consistency.
-
LEFT JOIN— preserves every left row. -
Right columns
NULLwhen no match — the key signal for anti-joins. -
LEFT JOIN ... IS NULLanti-join — "find rows in A with no match in B". -
RIGHT JOIN— mirror image; rarely needed (just flip table order and useLEFT).
Worked example. Same customers + orders; Carol is preserved with NULL right-side columns.
| customer | order_id |
|---|---|
| Alice | 101 |
| Bob | 102 |
| Carol | NULL |
Step-by-step explanation.
- For each
customersrow, scanordersfor a matchingcustomer_id. - Alice matches → row
(Alice, 101); Bob matches → row(Bob, 102). - Carol does not match → row
(Carol, NULL)is produced because the join isLEFT. - To find Carol via the anti-join: add
WHERE o.order_id IS NULLafter theLEFT JOIN; only Carol's row passes the filter. - Equivalent to
WHERE NOT EXISTS (SELECT 1 FROM orders WHERE customer_id = c.id)and (underNOT NULLconstraints)WHERE c.id NOT IN (SELECT customer_id FROM orders)— but the anti-join is immune to theNOT INNULL-swallowing bug.
Worked-example solution.
SELECT c.name AS customer
FROM customers c
LEFT JOIN orders o
ON o.customer_id = c.id
WHERE o.order_id IS NULL;
Rule of thumb: "find X with no Y" → LEFT JOIN ... WHERE Y.id IS NULL. Memorize this; it is the most-asked join shape in SQL interviews and the cleanest fix for the NOT IN ... NULL trap.
FULL OUTER, SELF, and CROSS joins — the rarer shapes
The rarer-joins invariant: FULL OUTER JOIN keeps every left row AND every right row (with NULL padding on the unmatched side); SELF JOIN joins a table to itself by aliasing it twice (employees-and-managers, parent-child, pair queries); CROSS JOIN produces every (left, right) combination — the Cartesian product. Each has a narrow but important use.
-
FULL OUTER JOIN— reconcile two sources; rows from either side without a match get padded. -
SELF JOIN— employee/manager, hierarchical recursion (alternative to recursive CTE), pair queries. -
CROSS JOIN— generate every combination (small tables only) or paired withLATERALfor top-N per row. -
Implicit cross join — comma-separated tables (
FROM a, b) without anONis aCROSS JOIN— usually a bug.
Worked example. Self-join employees to itself to surface each person's manager.
| name | manager_name |
|---|---|
| Alice | Carol |
| Bob | Carol |
| Carol | NULL (CEO) |
Step-by-step explanation.
- Alias the same
employeestable twice:e(for employees) andm(for managers). -
LEFT JOINone.manager_id = m.emp_idlooks each employee up against the manager rows. - Alice's
manager_idpoints to Carol → row(Alice, Carol). - Bob's
manager_idpoints to Carol → row(Bob, Carol). - Carol is the CEO so her
manager_id IS NULL→ no match → row(Carol, NULL)because the join isLEFT.
Worked-example solution.
SELECT e.name,
m.name AS manager_name
FROM employees e
LEFT JOIN employees m
ON m.emp_id = e.manager_id;
Rule of thumb: SELF JOIN is one-level hierarchy; for arbitrary-depth recursion (org chart traversal, BOM tree), reach for WITH RECURSIVE instead.
Common beginner mistakes
- Forgetting that a
1:NJOINinflates the left side —SUM(left.col)returnsleft.col × N. - Filtering the right table inside
WHEREafter aLEFT JOIN(e.g.,WHERE o.amount > 0) — silently turns theLEFT JOINinto anINNER JOINbecauseNULL > 0isNULL. - Using
NOT IN (subquery)when the subquery can containNULL— returns zero rows becausex NOT IN (..., NULL, ...)isNULL, which fails the predicate. - Comma-separated
FROM a, bwith noONclause — produces a Cartesian product (CROSS JOIN); usually a bug. - Joining on the wrong column (
o.id = c.idinstead ofo.customer_id = c.id) — produces nonsense rows.
PostgreSQL Interview Question on Customers With No Orders
Given customers(id, name) and orders(order_id, customer_id, amount), return the names of customers who have never placed an order.
Solution Using LEFT JOIN ... WHERE orders.order_id IS NULL
SELECT c.name
FROM customers c
LEFT JOIN orders o
ON o.customer_id = c.id
WHERE o.order_id IS NULL
ORDER BY c.name;
Why this works: the LEFT JOIN preserves every customer row regardless of whether a matching order exists; for matched customers, o.order_id carries a real value; for unmatched customers, the right-side columns are NULL and the WHERE o.order_id IS NULL predicate is TRUE; the filter keeps only the unmatched customers — the anti-join. Single pass over customers; one keyed lookup into orders per customer; no subquery materialization needed.
Step-by-step trace for the sample input:
| customers.id | customers.name | LEFT JOIN orders.order_id | IS NULL? | survives? |
|---|---|---|---|---|
| 1 | Alice | 101 | ✗ | ✗ |
| 2 | Bob | 102 | ✗ | ✗ |
| 3 | Carol | NULL | ✓ | ✓ |
| 4 | Dan | NULL | ✓ | ✓ |
Carol and Dan survive the filter.
Output:
| name |
|---|
| Carol |
| Dan |
Why this works — concept by concept:
-
LEFT JOINsemantics — keeps every left row; right side isNULLwhen there is no match. ThisNULLis the entire signal we filter on. -
WHERE o.order_id IS NULL—o.order_idis the right-side primary key; it isNULLonly when the join produced a synthetic unmatched row. A real-NULLorder-id from the source table never happens because primary keys areNOT NULL. -
Anti-join semantics — equivalent to
NOT EXISTS (SELECT 1 FROM orders WHERE customer_id = c.id); theLEFT JOIN ... IS NULLform is typically faster on planners that materialize a hash join. -
No
NULL-swallowing — unlikeNOT IN, the predicate isIS NULL, which is well-defined forNULLvalues. There is no silent zero-row failure. -
ORDER BY c.name— deterministic ordering for reviewer stability. -
O(|customers| + |orders|)time — hash-join build onorders.customer_id, single probe per customer. With an index onorders.customer_idthis is near-linear.
Inline CTA: Drill the SQL joins practice page for INNER, LEFT, and anti-join shapes, and the SQL null-handling practice page for NULL-aware predicates.
SQL
Topic — joins
SQL join problems
SQL
Topic — null handling
SQL null-handling problems
SQL
Topic — filtering
SQL filtering problems
3. PostgreSQL GROUP BY, HAVING, and Conditional Aggregates
GROUP BY with HAVING, FILTER, and CASE for one-pass metrics in PostgreSQL
"Compute total revenue, refunded revenue, and the percentage refunded — in a single query" is the signature conditional-aggregate prompt — and the cleanest PostgreSQL answer is SUM(... ) FILTER (WHERE …) clauses inside a single SELECT. The mental model: GROUP BY col collapses rows into buckets; COUNT(*), SUM(...), AVG(...), MIN(...), MAX(...) summarize each bucket; WHERE filters individual rows before grouping; HAVING filters groups after grouping; FILTER (WHERE …) and CASE WHEN … express conditional aggregates that count or sum only certain rows per group. The duplicate-finder pattern GROUP BY key HAVING COUNT(*) > 1 lives here too.
Pro tip: PostgreSQL supports the SQL standard
FILTER (WHERE …)clause on every aggregate —COUNT(*) FILTER (WHERE status = 'refunded'). It produces clearer queries thanSUM(CASE WHEN … THEN 1 ELSE 0 END)and is exactly what interviewers like to see. TheCASEvariant still works for portability across dialects.
COUNT, SUM, AVG, MIN, MAX — NULL-aware aggregates
The aggregate-NULL invariant: COUNT(*) counts every row including ones with NULL columns; COUNT(col) counts only rows where col is not NULL; SUM, AVG, MIN, MAX skip NULL values entirely; if every value in a group is NULL, the result is NULL (not 0). The distinction between COUNT(*) and COUNT(col) is the #1 source of "my counts are off by 10%" bugs.
-
COUNT(*)— every row in the bucket, regardless ofNULLs. -
COUNT(col)— non-NULLvalues ofcolonly. -
COUNT(DISTINCT col)— unique non-NULLvalues; essential after aJOINthat may have inflated rows. -
SUM/AVG— numeric only;AVGis sum-of-non-null-divided-by-count-of-non-null, soNULLdoes not count as0.
Worked example. Three rows in one customer's bucket: amount = 10, NULL, 30.
| aggregate | result |
|---|---|
COUNT(*) |
3 |
COUNT(amount) |
2 |
SUM(amount) |
40 |
AVG(amount) |
20 |
MIN(amount) |
10 |
MAX(amount) |
30 |
Step-by-step explanation.
-
COUNT(*)= 3 because every row in the bucket counts, regardless ofamount's value. -
COUNT(amount)= 2 because theNULLrow is skipped; only10and30contribute. -
SUM(amount)= 10 + 30 = 40; theNULLis treated as missing, not as0. -
AVG(amount)= (10 + 30) / 2 = 20; the denominator isCOUNT(amount) = 2, notCOUNT(*) = 3. -
MINandMAXskip theNULLand return the smallest/largest non-NULLvalue.
Worked-example solution.
SELECT customer_id,
COUNT(*) AS n_rows,
COUNT(amount) AS n_known,
SUM(amount) AS total,
AVG(amount) AS mean,
MIN(amount) AS lo,
MAX(amount) AS hi
FROM orders
GROUP BY customer_id;
Rule of thumb: if the metric is "people who clicked" use COUNT(DISTINCT user_id); if it is "click events" use COUNT(*); if it is "rows with a known value" use COUNT(col).
WHERE vs HAVING — row filter vs group filter
The two-clause invariant: WHERE runs before GROUP BY and references raw row columns; HAVING runs after grouping and can reference aggregate functions; trying to use WHERE COUNT(*) > 1 is a parse error because aggregates do not exist until after grouping. Both can appear in the same query.
-
WHERE— filter rows; usescol,col2, etc. -
HAVING— filter groups; usesCOUNT(*),SUM(col), etc. -
Order of evaluation —
FROM→WHERE→GROUP BY→HAVING→SELECT. -
Performance — push predicates into
WHEREwhenever possible;WHEREfilters before the (often expensive) sort/hash step.
Worked example. Six employees across eng and sales; find departments whose average salary exceeds 50,000 across employees earning more than 30,000.
| department | salary |
|---|---|
| eng | 40,000 |
| eng | 70,000 |
| eng | 25,000 |
| sales | 60,000 |
| sales | 60,000 |
| sales | 20,000 |
Step-by-step explanation.
-
WHERE salary > 30000drops the two rows below the threshold (eng 25,000 and sales 20,000) — 4 rows remain. -
GROUP BY departmentcollapses to two buckets: eng (40,000 + 70,000) and sales (60,000 + 60,000). - The planner computes
AVG(salary)per bucket: eng = 55,000; sales = 60,000. -
HAVING AVG(salary) > 50000keeps both buckets (both averages exceed 50,000). - The
SELECTprojects the department name and its average; final result is two rows.
Worked-example solution.
SELECT department, AVG(salary) AS avg_salary
FROM employees
WHERE salary > 30000
GROUP BY department
HAVING AVG(salary) > 50000;
Rule of thumb: aggregate predicate → HAVING; row predicate → WHERE. If the predicate uses SUM / COUNT / AVG / MIN / MAX, it must live in HAVING.
FILTER (WHERE …) and CASE — conditional aggregates
The conditional-aggregate invariant: SUM(col) FILTER (WHERE pred) and COUNT(*) FILTER (WHERE pred) apply the aggregate only to rows where the predicate is TRUE; the portable alternative is SUM(CASE WHEN pred THEN col ELSE 0 END) and COUNT(CASE WHEN pred THEN 1 END). PostgreSQL supports both; pick FILTER for clarity in PostgreSQL-only code, CASE for cross-dialect portability.
-
FILTER (WHERE …)— PostgreSQL/SQL-standard syntax; applies per-aggregate. -
SUM(CASE WHEN … THEN col ELSE 0 END)— portable across dialects. -
COUNT(CASE WHEN … THEN 1 END)— counts only matching rows;NULLs in theELSEbranch are skipped. -
Multiple aggregates, one query — combine many
FILTERclauses to compute several metrics in one pass.
Worked example. One pass over orders to compute total revenue, refunded revenue, and the refund rate.
| customer_id | total_revenue | refunded_revenue | refund_pct |
|---|---|---|---|
| 101 | 500 | 50 | 10.0 |
| 102 | 1,000 | 0 | 0.0 |
| 103 | 800 | 200 | 25.0 |
Step-by-step explanation.
-
SUM(amount)aggregates every row in the bucket →total_revenue. -
SUM(amount) FILTER (WHERE status = 'refunded')aggregates only refunded rows →refunded_revenue. - The refund percentage is
refunded_revenue / total_revenue * 100; cast one side toNUMERICto avoid integer division. - PostgreSQL evaluates every
FILTERindependently per row of input; one scan computes all metrics. - The portable variant uses
SUM(CASE WHEN status = 'refunded' THEN amount ELSE 0 END)— same result, slightly more verbose.
Worked-example solution.
SELECT customer_id,
SUM(amount) AS total_revenue,
SUM(amount) FILTER (WHERE status = 'refunded') AS refunded_revenue,
ROUND(
SUM(amount) FILTER (WHERE status = 'refunded')::NUMERIC
/ NULLIF(SUM(amount), 0) * 100, 1
) AS refund_pct
FROM orders
GROUP BY customer_id;
Rule of thumb: whenever you find yourself running two queries with different WHERE clauses against the same table and joining the results, refactor to a single query with two FILTER clauses — same answer, half the cost.
Common beginner mistakes
-
WHERE COUNT(*) > 1— parse error; aggregates do not exist until afterGROUP BY. UseHAVING. -
AVG(col)and assumingNULLrows count as0— they are excluded from both numerator and denominator. UseAVG(COALESCE(col, 0))only if "missing means 0" is the business rule. -
COUNT(DISTINCT col)forgotten after aJOINthat inflates rows — reports inflated counts. - Integer division —
5 / 100 = 0in PostgreSQL. Cast one operand toNUMERICorFLOAT. - Division by zero —
NULLIF(denom, 0)converts0toNULL, so the division returnsNULLinstead of erroring.
PostgreSQL Interview Question on Duplicate Emails
Given users(id, email), return every email that appears more than once, along with the number of copies.
Solution Using GROUP BY email HAVING COUNT(*) > 1
SELECT email,
COUNT(*) AS n_copies
FROM users
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY n_copies DESC, email;
Why this works: GROUP BY email collapses every row with the same email into a single bucket; COUNT(*) counts how many rows fell into each bucket; HAVING COUNT(*) > 1 keeps only buckets with at least two rows; ORDER BY n_copies DESC, email produces a deterministic, reviewer-friendly output. Single pass over users; sort cost dominates only when email cardinality is huge.
Step-by-step trace for the sample input:
| id | |
|---|---|
| 1 | alice@example.com |
| 2 | bob@example.com |
| 3 | alice@example.com |
| 4 | carol@example.com |
| 5 | bob@example.com |
| 6 | alice@example.com |
-
FROM users— read all six rows. -
No
WHERE— every row passes. -
GROUP BY email— three buckets: alice (3 rows), bob (2 rows), carol (1 row). -
COUNT(*)— 3, 2, 1 respectively. -
HAVING COUNT(*) > 1— drops the carol bucket. -
ORDER BY n_copies DESC, email— alice (3), then bob (2).
Output:
| n_copies | |
|---|---|
| alice@example.com | 3 |
| bob@example.com | 2 |
Why this works — concept by concept:
-
GROUP BY email— collapses to one bucket per distinct email; the bucket is the unit of all subsequent aggregates and group-level filters. -
COUNT(*)— counts every row in the bucket, perfect for "how many copies". -
HAVING COUNT(*) > 1— group-level filter; the aggregate predicate must live here, not inWHERE. This is the precise interview signal for duplicate detection. -
ORDER BY n_copies DESC, email— deterministic ordering; tie-broken byemailso the output is stable across runs. -
O(|users| + G log G)time — single hash aggregation producesGgroup rows; the final sort isG log G. With an index onemail, the planner may use stream aggregation and skip the hash step.
Inline CTA: Drill the SQL aggregation practice page for GROUP BY and HAVING shapes, and the SQL filtering practice page for WHERE vs HAVING distinctions.
SQL
Topic — aggregation
SQL aggregation problems
SQL
Topic — filtering
SQL filtering problems
SQL
Topic — null handling
SQL null-handling problems
4. PostgreSQL Window Functions — ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD
Ranking, top-N-per-group, running totals, and lookback in PostgreSQL window functions
"Find the second-highest distinct salary" and "compute a running total of daily revenue" are the two signature window-function prompts — and both reduce to a window function with OVER (PARTITION BY … ORDER BY …). The mental model: a window function computes a value across a window of rows related to the current row without collapsing the rows like GROUP BY does; OVER (PARTITION BY col) defines the window boundary; OVER (ORDER BY col) defines the order within the window. ROW_NUMBER assigns unique integers; RANK skips after ties (1, 2, 2, 4); DENSE_RANK does not skip (1, 2, 2, 3); LAG looks back; LEAD looks forward; SUM/AVG/COUNT(...) OVER (...) compute running totals and moving averages.
Pro tip: Window functions cannot be referenced in
WHEREof the sameSELECTbecause they execute afterWHERE. Wrap the window in a CTE or subquery, then filter on the alias. The errorcolumn "rn" does not existafter writingWHERE rn = 1almost always means you forgot this rule.
ROW_NUMBER — unique sequential numbering per partition
The ROW_NUMBER invariant: ROW_NUMBER() OVER (PARTITION BY p ORDER BY o) assigns a unique integer 1, 2, 3, … to every row inside each partition p, ordered by o; ties in o are broken arbitrarily by the planner. Use it when you need a unique sequence per group regardless of tie semantics — most often for deduplication (keep rn = 1 per business key).
-
OVER (PARTITION BY …)— bucket the rows; without this, the window is the whole result set. -
OVER (ORDER BY …)— order within the bucket; required forROW_NUMBER/RANK/LAG/LEAD. -
Ties broken arbitrarily — add a tiebreaker column to
ORDER BYfor determinism. -
Top-N-per-group —
WHERE rn <= NafterROW_NUMBER; works only when ties at rank N can be ignored.
Worked example. employees with three engineers; rank by salary descending.
| department | name | salary | row_number |
|---|---|---|---|
| eng | Alice | 90,000 | 1 |
| eng | Bob | 80,000 | 2 |
| eng | Carol | 80,000 | 3 |
Bob and Carol tie on salary; ROW_NUMBER still gives them unique ranks (planner-chosen unless you add a tiebreaker).
Step-by-step explanation.
-
PARTITION BY departmentdefines the boundary — onlyengrows are compared with each other; if there were asalespartition it would have its own1, 2, 3sequence. -
ORDER BY salary DESC, nameorders rows within the partition: Alice (90,000) first, then Bob and Carol (tied at 80,000) broken by name. -
ROW_NUMBER()assigns1, 2, 3sequentially regardless of ties; Bob gets2and Carol gets3becausenamebreaks the tie. - Without the
, nametiebreaker, Bob/Carol order is undefined — two query runs could swap them. - To deduplicate a table that has multiple rows per
(business_key, source_ts), useROW_NUMBER() OVER (PARTITION BY business_key ORDER BY source_ts DESC) = 1to keep the latest.
Worked-example solution.
SELECT department, name, salary,
ROW_NUMBER() OVER (
PARTITION BY department
ORDER BY salary DESC, name
) AS rn
FROM employees;
Rule of thumb: ROW_NUMBER is the right tool for deduplication (WHERE rn = 1) and for ordered streams; reach for RANK or DENSE_RANK when ties must be honored.
RANK vs DENSE_RANK — tie semantics
The rank-vs-dense-rank invariant: both assign the same rank to tied rows; RANK then skips the next k-1 ranks (gap), while DENSE_RANK continues without a gap. For "find the Nth distinct value" questions, DENSE_RANK = N is the correct filter; for "find the Nth row in skip-aware ranking order", RANK = N is correct.
-
RANK—1, 2, 2, 4— skips after ties. -
DENSE_RANK—1, 2, 2, 3— no skip. -
ROW_NUMBER—1, 2, 3, 4— never ties. -
Pick by semantics — "Nth highest distinct salary" →
DENSE_RANK = N; "Nth-ranked row in skip ordering" →RANK = N.
Worked example. Four employees; Bob and Carol tied at second-highest salary.
| name | salary | rank | dense_rank | row_number |
|---|---|---|---|---|
| Alice | 90,000 | 1 | 1 | 1 |
| Bob | 80,000 | 2 | 2 | 2 |
| Carol | 80,000 | 2 | 2 | 3 |
| Dan | 70,000 | 4 | 3 | 4 |
RANK jumps 2 → 4 (skipping 3); DENSE_RANK continues 2 → 3 (no skip).
Step-by-step explanation.
- All three window functions agree on Alice (rank 1) because she is alone at the top.
- Bob and Carol both get
rank = 2anddense_rank = 2because they tie on salary;row_numbergives them distinct values 2 and 3. - Dan is the next-lowest salary;
RANKskips ahead by the number of tied rows (2 tied → next rank is2 + 2 = 4);DENSE_RANKcontinues with no gap (3). - For "second highest distinct salary",
DENSE_RANK = 2correctly returns 80,000;RANK = 2would also work here, butRANKwould not return 80,000 if three people tied for first (it would skip to 4). - For "top 3 distinct salaries", use
DENSE_RANK <= 3— it returns Alice, Bob, Carol, Dan (four rows because Bob/Carol both havedr = 2).
Worked-example solution.
SELECT name, salary,
RANK() OVER (ORDER BY salary DESC) AS rnk,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dr,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS rn
FROM employees;
Rule of thumb: "second highest salary" → DENSE_RANK = 2; "top 3 distinct salaries" → DENSE_RANK <= 3; never use RANK for these unless the spec explicitly says ties should consume rank slots.
LAG, LEAD, and running totals — lookback, lookahead, and SUM(...) OVER (...)
The lookback-and-running invariant: LAG(col, n) returns the value of col n rows back within the partition (default n=1); LEAD(col, n) is the symmetric forward; SUM(col) OVER (PARTITION BY p ORDER BY o) produces a running total within each partition. These three primitives drive month-over-month deltas, sessionization, running balances, and moving averages.
-
LAG(amount) OVER (ORDER BY date)— previous day's amount. -
LEAD(amount) OVER (ORDER BY date)— next day's amount. -
amount - LAG(amount) OVER (ORDER BY date)— day-over-day delta. -
SUM(amount) OVER (ORDER BY date)— running total from start of partition through current row.
Worked example. Three days of sales; compute previous-day amount, day-over-day delta, and running total.
| sales_date | amount | prev_amount | dod_delta | running_total |
|---|---|---|---|---|
| 2026-05-09 | 100 | NULL | NULL | 100 |
| 2026-05-10 | 130 | 100 | 30 | 230 |
| 2026-05-11 | 120 | 130 | -10 | 350 |
The first day has LAG = NULL because no prior row exists; consumers usually COALESCE(delta, 0) for display.
Step-by-step explanation.
-
LAG(amount) OVER (ORDER BY sales_date)returns the previous row's amount, ordered by date. - Day 1 (May 9): no previous row, so
LAG = NULL;amount - LAG = NULL. - Day 2 (May 10):
LAG = 100;delta = 130 - 100 = 30. - Day 3 (May 11):
LAG = 130;delta = 120 - 130 = -10. -
SUM(amount) OVER (ORDER BY sales_date)accumulates from the start of the partition through the current row: 100, 230, 350.
Worked-example solution.
SELECT sales_date,
amount,
LAG(amount) OVER (ORDER BY sales_date) AS prev_amount,
amount - LAG(amount) OVER (ORDER BY sales_date) AS dod_delta,
SUM(amount) OVER (ORDER BY sales_date) AS running_total
FROM sales
ORDER BY sales_date;
Rule of thumb: LAG for "compare this row to its predecessor" (delta, retention, gap); LEAD for "what happens next" (sessionization, churn-from-here); SUM(...) OVER (...) for running totals — always PARTITION BY the entity if the table holds multiple series.
Common beginner mistakes
- Using
RANKwhen the question wants the Nth distinct value —RANK = 2skips entirely if two rows tie for first. - Forgetting
PARTITION BYfor a per-group ranking — produces a global ranking instead of per-department. - Referencing the window-function alias in
WHEREof the sameSELECT— window functions execute afterWHERE; wrap in a CTE or subquery first. - Confusing
LAG(previous) withLEAD(next) — quietly produces inverted deltas. - Forgetting
ORDER BYinsideOVERforROW_NUMBER/RANK/LAG/LEAD— required; the result is non-deterministic without it.
PostgreSQL Interview Question on Top 3 Salaries Per Department
Given employees(emp_id, name, department, salary), return the top 3 distinct salaries per department, with ties at rank 3 included. Output department, name, salary, and the rank.
Solution Using DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) in a CTE
WITH ranked AS (
SELECT department,
name,
salary,
DENSE_RANK() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS dr
FROM employees
)
SELECT department, name, salary, dr
FROM ranked
WHERE dr <= 3
ORDER BY department, dr, name;
Why this works: the CTE ranked materializes a per-department DENSE_RANK keyed by salary descending — dr = 1 is the highest distinct salary in that department, dr = 2 is the second-highest, and so on; the outer WHERE dr <= 3 keeps every row whose salary is in the top three distinct salaries of its department, including all ties at rank 3; the ORDER BY produces a deterministic, reviewer-friendly output. DENSE_RANK over RANK because the spec wants the top three distinct salaries; DENSE_RANK over ROW_NUMBER because ties at rank 3 must be retained.
Step-by-step trace for the sample input:
| emp_id | name | department | salary |
|---|---|---|---|
| 1 | Alice | eng | 90,000 |
| 2 | Bob | eng | 80,000 |
| 3 | Carol | eng | 80,000 |
| 4 | Dan | eng | 70,000 |
| 5 | Eve | eng | 60,000 |
| 6 | Frank | sales | 100,000 |
| 7 | Grace | sales | 90,000 |
| 8 | Heidi | sales | 80,000 |
-
CTE
ranked— partition bydepartment; order bysalary DESC. -
DENSE_RANKper partition — eng: Alice → 1, Bob → 2, Carol → 2, Dan → 3, Eve → 4. sales: Frank → 1, Grace → 2, Heidi → 3. -
Outer
WHERE dr <= 3— drops Eve (dr = 4); keeps both Bob and Carol (tied at 2) and Dan (3). -
ORDER BY department, dr, name— eng rows first, then sales; within department bydr, thennamefor tiebreak.
Output:
| department | name | salary | dr |
|---|---|---|---|
| eng | Alice | 90000 | 1 |
| eng | Bob | 80000 | 2 |
| eng | Carol | 80000 | 2 |
| eng | Dan | 70000 | 3 |
| sales | Frank | 100000 | 1 |
| sales | Grace | 90000 | 2 |
| sales | Heidi | 80000 | 3 |
Why this works — concept by concept:
-
CTE
ranked— names the intermediate ranked result; the outer query then filters it like a regular table. Far cleaner than a nested subquery. -
PARTITION BY department— restarts the rank at each department boundary; without this, the rank is global and the answer is wrong. -
ORDER BY salary DESC— defines "highest first" inside each partition; required for any deterministic ranking. -
DENSE_RANKoverRANK— the spec wants the top three distinct salaries;RANKwould skip after ties and miss the third distinct salary if there is a two-way tie above it. -
WHERE dr <= 3in the outer — window functions cannot be referenced inWHEREof the sameSELECT; the CTE provides the materialized column the outer can filter on. -
O(N log N)time — sort within each partition dominates; with an index on(department, salary DESC)the planner can stream rather than sort.
Inline CTA: More SQL window-function practice problems and SQL CTE practice problems on PipeCode.
SQL
Topic — window functions
SQL window-function problems
SQL
Topic — CTE
SQL CTE problems
SQL
Topic — date functions
SQL date-function problems
Tips to use this PostgreSQL cheat sheet effectively
Hold the clause-order diagram in your head
FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT. Memorize this sentence and 80% of "weird" PostgreSQL parse errors decode themselves in five seconds. The error column "x" does not exist almost always means you referenced a SELECT alias in WHERE; the error aggregate functions are not allowed in WHERE means you wanted HAVING instead.
State the grain before any JOIN
Before writing the JOIN, name the grain you're producing: "this is order-line grain", "this is customer-day grain", "this is (customer, product) grain". The single most common bug in analytical SQL is SUM(left.col) after a 1:N join — the metric is silently multiplied by N. If grain doubles, you'll spot it immediately.
Use LEFT JOIN ... IS NULL over NOT IN for anti-joins
NOT IN (subquery) returns zero rows when the subquery contains a single NULL because x NOT IN (..., NULL, ...) is NULL, which fails the WHERE predicate. LEFT JOIN ... WHERE right.id IS NULL and NOT EXISTS (...) are immune. Production engineers who have been burned once never write NOT IN again.
Pick DENSE_RANK for "Nth distinct"; pick ROW_NUMBER for deduplication
The single most-graded ranking distinction: DENSE_RANK = N is the Nth distinct value; RANK = N is the Nth row in skip-aware ranking order; ROW_NUMBER = N is the Nth row in arbitrary order. For "second-highest distinct salary" → DENSE_RANK = 2. For "remove duplicate rows keeping the canonical one" → ROW_NUMBER() OVER (PARTITION BY key ORDER BY tiebreaker) = 1.
Use FILTER (WHERE …) for one-pass conditional metrics
SUM(amount) FILTER (WHERE status = 'refunded') is cleaner than SUM(CASE WHEN status = 'refunded' THEN amount ELSE 0 END) — PostgreSQL supports both. Use FILTER in PostgreSQL-only code, CASE for cross-dialect portability. One scan, many metrics, half the cost of two queries.
Always ORDER BY + tiebreaker; pair LIMIT with ORDER BY
Window functions, LIMIT N, and "top result" queries all require an ORDER BY with a deterministic tiebreaker (e.g., ORDER BY salary DESC, name). Without one, two runs of the same query can return different rows in the tie band — silently wrong in production and visibly wrong in an interview if the reviewer's reference answer locks an ordering.
Use PostgreSQL-specific helpers — EXTRACT, DATE_TRUNC, INTERVAL, ::DATE cast
EXTRACT(MONTH FROM ts), DATE_TRUNC('month', ts), ts - INTERVAL '1 month', ts::DATE. These four cover 95% of date arithmetic. Reach for DATE_TRUNC whenever the spec says "by month" or "by week" — it groups timestamps to the bucket boundary deterministically.
Where to practice on PipeCode
Start with the SQL practice surface for the all-language SQL corpus. Drill the four-primitive pages: SQL filtering for WHERE patterns, SQL joins for join shapes, SQL aggregation for GROUP BY + HAVING, SQL window functions for ranking and lookback. Add adjacent topics: SQL CTE, SQL subqueries, SQL null-handling, SQL date functions. The interview courses page bundles structured curricula — start with SQL for Data Engineering Interviews — From Zero to FAANG. For broader coverage, browse by topic or read the related SQL interview questions for data engineering and data lake architecture for data engineering interviews blogs.
Frequently Asked Questions
What is the logical clause order in a PostgreSQL query?
PostgreSQL evaluates clauses in the order FROM / JOIN → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT / OFFSET, regardless of the order you write them. This is why WHERE cannot reference aggregate functions (they don't exist until after GROUP BY) and why SELECT-level aliases cannot be referenced in WHERE (they're computed in stage 5). Aliases become available only in ORDER BY and the outer query in a nested context.
What is the difference between WHERE and HAVING in PostgreSQL?
WHERE filters individual rows before the GROUP BY step and can reference only raw row columns. HAVING filters whole groups after the GROUP BY step and can reference aggregate functions like COUNT(*), SUM(col), AVG(col). Trying to use an aggregate in WHERE (e.g., WHERE COUNT(*) > 1) is a parse error because the aggregate does not yet exist. Both clauses can appear in the same query.
How do I find rows in table A that have no match in table B?
The canonical PostgreSQL pattern is SELECT a.* FROM a LEFT JOIN b ON b.fk = a.pk WHERE b.pk IS NULL — the LEFT JOIN preserves every left row, and the WHERE b.pk IS NULL filter keeps only the ones where no right-side match was found. This is the anti-join pattern. Equivalent to WHERE NOT EXISTS (SELECT 1 FROM b WHERE b.fk = a.pk). Both are safer than NOT IN (subquery), which returns zero rows if the subquery contains a single NULL.
What is the difference between RANK, DENSE_RANK, and ROW_NUMBER?
All three assign integers within a window. ROW_NUMBER gives every row a unique sequential integer (1, 2, 3, 4), even on ties. RANK gives tied rows the same rank but skips after them (1, 2, 2, 4). DENSE_RANK gives tied rows the same rank with no skip (1, 2, 2, 3). For "Nth distinct value" use DENSE_RANK = N; for "Nth row in skip-aware ranking order" use RANK = N; for "Nth row in arbitrary order" or "deduplicate keeping one canonical row" use ROW_NUMBER = 1.
What does FILTER (WHERE …) do in PostgreSQL aggregates?
SUM(col) FILTER (WHERE pred) and COUNT(*) FILTER (WHERE pred) apply the aggregate only to rows where the predicate is TRUE; rows where the predicate is FALSE or NULL are skipped for that aggregate, while other aggregates in the same SELECT still see them. The portable cross-dialect equivalent is SUM(CASE WHEN pred THEN col ELSE 0 END) and COUNT(CASE WHEN pred THEN 1 END). Use FILTER for clarity in PostgreSQL-only code.
How do I compute a running total in PostgreSQL?
Use SUM(col) OVER (PARTITION BY p ORDER BY o) — the window aggregate accumulates from the start of each partition through the current row in the order defined by ORDER BY. Example: SUM(amount) OVER (PARTITION BY customer_id ORDER BY order_date) gives a per-customer running total of order amounts ordered by date. Drop PARTITION BY for a single global running total.
Why is LIMIT 5 returning different rows on different runs?
LIMIT without ORDER BY is non-deterministic — PostgreSQL returns whatever rows it sees first, which depends on the query plan, parallelism, and table physical layout. Always pair LIMIT with ORDER BY <col> DESC, <tiebreaker> so two runs return the same rows. Reviewers depend on stable ordering, and dashboards break silently when row order drifts.





Top comments (0)