Gowtham Potureddi

Posted on May 22

SQL CASE WHEN Statement: Conditional Logic for Data Engineering

#sql #python #interview #dataengineering

The SQL CASE WHEN expression — also called the case statement in SQL or simply the SQL CASE statement — is the database equivalent of an if-else chain: it evaluates a list of conditions in order, returns the value tied to the first condition that is true, and falls back to ELSE (or NULL) when nothing matches. Almost every analytics SQL query that does anything interesting — letter grades, success rates, paid-vs-refunded revenue, pivot reports, custom sort priorities — eventually reaches for CASE WHEN, which is why it shows up in nearly every screening round under the sql interview questions umbrella.

This guide walks through the two forms of CASE (searched vs simple), every clause where it can appear (SELECT, WHERE, GROUP BY, ORDER BY, HAVING), the conditional aggregation pattern (sum case when sql) that powers half of all data engineering interview questions, the rows-to-columns pivot trick reviewers love, the four NULL traps that fail most candidates, and dialect-specific shortcuts like Postgres FILTER, SQL Server IIF, and Oracle DECODE. Every section ends as sql interview questions with answers: a runnable PostgreSQL query, a traced execution, an output table, and a concept-by-concept why this works breakdown — the exact shape sql for data engineers rounds reward. Whether you reach sql for data engineering prompts on screening calls or sit down to a multi-hour onsite, using case statements in SQL correctly is the single highest-leverage SQL skill you can lock in.

When you want hands-on reps immediately after reading, browse CASE WHEN SQL practice →, drill the conditional logic lane →, sharpen conditional aggregation patterns →, rehearse aggregation SQL drills →, or widen coverage on the full SQL practice library →.

On this page

Why CASE WHEN matters in data engineering interviews
Anatomy — searched form vs simple form, ELSE, and NULL behavior
CASE in SELECT — bucketing, tier labels, and derived columns
Conditional aggregation — SUM(CASE WHEN …) the canonical interview pattern
Pivot rows to columns with conditional aggregation
CASE in WHERE, GROUP BY, ORDER BY, and HAVING
CASE with NULLIF and COALESCE — safe arithmetic patterns
Dialect alternatives — FILTER, IIF, DECODE, IF
Choosing where to put CASE (cheat sheet)
Frequently asked questions
Practice on PipeCode

1. Why CASE WHEN matters in data engineering interviews

CASE WHEN is SQL's `if-else` — the single most versatile conditional primitive

The one-sentence invariant interviewers chase: a SQL CASE WHEN expression is the SQL equivalent of an if-else chain — it evaluates a list of conditions in order and returns the value tied to the first condition that is TRUE, or ELSE (or NULL) if none match. Internalise that and you can lift a third of the SQL interview prompts that look "tricky" — letter grades, success rates, refund accounting, pivot reports — into one expression.

What CASE WHEN does at a glance.

Definition — a conditional branching expression that evaluates predicates top-to-bottom and returns the value of the first predicate that is TRUE.
Required syntax — CASE [<expr>] WHEN <pred> THEN <result> [WHEN …] [ELSE <default>] END.
Return type — any scalar value (number, string, date, boolean); branches must agree on a compatible type.
Short-circuit evaluation — once a WHEN matches, the rest are skipped; later predicates never even run.

Two forms ship in ANSI SQL.

Searched form (the one you use 95% of the time) — each WHEN holds an arbitrary boolean predicate; full flexibility for ranges, IS NULL, compound predicates.
Simple form — CASE <expr> WHEN <value1> THEN … compares <expr> to each WHEN <value> for equality; cannot express ranges or IS NULL.
ANSI SQL-92 defined both forms; every major dialect (PostgreSQL, MySQL, SQL Server, Oracle, Snowflake, BigQuery) supports them.

Where CASE WHEN shows up in a query.

SELECT projection — derive a new column (bucketing, tier label, computed metric).
Inside an aggregate — SUM, COUNT, AVG, MIN, MAX — the conditional aggregation pattern (§4).
WHERE clause — rare; usually refactorable to IN / OR.
GROUP BY — group by a tier label or computed bucket.
HAVING — filter on a conditional aggregate.
ORDER BY — custom sort priority ("active first, paused middle, churned last").
DML statements — INSERT, UPDATE, DELETE use CASE to set conditional values.
Nested inside another CASE — for hierarchical logic (role + tenure bonus rules).

The four canonical interview patterns powered by CASE.

Bucketing / tier labels — letter grades, risk bands, age cohorts, RFM segments.
Conditional aggregation — SUM(CASE WHEN cond THEN val ELSE 0 END) for multi-KPI single-pass reports.
Pivot rows to columns — one SUM(CASE WHEN category = 'x' THEN val ELSE 0 END) per output column.
Custom sort priority — ORDER BY CASE WHEN status = 'urgent' THEN 0 WHEN status = 'high' THEN 1 ELSE 2 END.

What interviewers listen for.

Do you include ELSE 0 in every SUM(CASE …)? — signals you understand the silent-undercount NULL trap (§2).
Do you choose the searched form over the simple form when ranges are involved? — signals fluency.
Do you know that CASE WHEN x = NULL never matches and use IS NULL instead? — senior signal.
Do you mention the Postgres FILTER clause when working on a Postgres-flavoured prompt? — bonus points.

Worked example — bucket employee salaries into tier labels

Question. Add a salary_tier column to the result showing Junior (under 60k), Mid (60k–99,999), Senior (100k–149,999), and Executive (150k or more).

Input. employees(emp_id, name, salary).

emp_id	name	salary
1	Ava	55,000
2	Omar	78,500
3	Lin	130,400
4	Sam	165,000

Code.

SELECT emp_id,
       name,
       salary,
       CASE
           WHEN salary >= 150000 THEN 'Executive'
           WHEN salary >= 100000 THEN 'Senior'
           WHEN salary >=  60000 THEN 'Mid'
           ELSE                       'Junior'
       END AS salary_tier
FROM employees
ORDER BY salary DESC;

Step-by-step explanation.

The planner evaluates the CASE expression once per row.
Branches are tested top to bottom; the first true WHEN decides the result.
Sam's 165k hits the first branch (>= 150000) and becomes Executive; the rest of the branches are skipped.
Lin's 130k fails the first branch but hits >= 100000 → Senior.
Ava's 55k falls through every WHEN and lands in the ELSE → Junior.

Output.

emp_id	name	salary	salary_tier
4	Sam	165,000	Executive
3	Lin	130,400	Senior
2	Omar	78,500	Mid
1	Ava	55,000	Junior

Rule of thumb: order your WHEN branches from most restrictive (highest threshold) to least restrictive so the first match always lands in the intended tier.

SQL
Topic — case-when
CASE WHEN SQL problems

Practice →

SQL
Topic — conditional-logic
Conditional-logic SQL drills

Practice →

2. Anatomy — searched form vs simple form, ELSE, and NULL behavior

CASE ships in two forms — pick the searched form for anything beyond simple equality

ANSI SQL-92 defined two CASE forms: the searched form uses arbitrary boolean predicates per WHEN, while the simple form compares a single expression against constant values for equality. The searched form is strictly more expressive — pick it by default and only fall back to the simple form when the equality-only shape genuinely reads cleaner.

Searched form — the one you use 95% of the time.

Shape — CASE WHEN <bool_expr_1> THEN <result_1> [WHEN <bool_expr_2> THEN <result_2>] … [ELSE <default>] END.
Predicates — any boolean expression: >=, <, BETWEEN, IS NULL, IN, compound AND/OR.
Use it when — branches need ranges, NULL checks, multi-column logic, or nested function calls.
Reads like an if-else chain — interviewers comprehend it instantly.

Simple form — equality only.

Shape — CASE <expr> WHEN <value_1> THEN <result_1> [WHEN <value_2> THEN <result_2>] … [ELSE <default>] END.
Predicates — equality only: <expr> = <value_n> per branch.
Cannot express — ranges, IS NULL (use searched form for these).
Reads cleaner when — the input is a stable enum-like column (status, country_code) and every WHEN is a single literal.

The ELSE clause.

Optional — if omitted, a no-match returns NULL.
Mandatory in SUM(CASE …) — without ELSE 0, false-branch rows return NULL, which SUM ignores, causing a silent undercount (the canonical interview trap).
Type compatibility — ELSE must return a value compatible with the THEN branches; mixing types triggers implicit casts.
Default literal — 'Other', 0, or a sentinel that downstream consumers can safely filter.

The four NULL traps (the canonical interview gotchas).

Missing ELSE + SUM — SUM(CASE WHEN cond THEN x END) returns NULL for the false branch; SUM ignores NULL; your total is silently undercounted. Fix: always include ELSE 0.
CASE WHEN x = NULL never matches — NULL = anything is UNKNOWN. Fix: use CASE WHEN x IS NULL THEN ….
Mixed types in branches force implicit casts — branches returning a mix of INTEGER, DECIMAL, VARCHAR can produce surprises. Fix: cast each branch explicitly to a common type.
Divide-by-zero via NULL propagation — revenue / cost crashes when cost = 0. Fix: wrap with NULLIF: revenue / NULLIF(cost, 0) (returns NULL instead of raising).

Logical processing order — where does CASE sit?

The SQL clauses execute in this order regardless of the order you write them:
- FROM / JOIN → WHERE → GROUP BY → HAVING → window functions → SELECT (with CASE) → ORDER BY.
Consequence #1 — a CASE alias in SELECT cannot be referenced in the same query's WHERE, GROUP BY, or HAVING (the alias does not yet exist).
Consequence #2 — to filter on a CASE result, wrap the query in a CTE and filter the outer scope.
Consequence #3 — outer ORDER BY can reference the CASE alias because ORDER BY runs last.

Worked example — searched vs simple form on the same input

Question. Show every order's status with two derived columns: one using the searched form for an "order priority" and one using the simple form for a "status label".

Input. orders(order_id, status).

order_id	status
1001	paid
1002	refunded
1003	cancelled
1004	pending

Code.

SELECT order_id,
       status,
       CASE
           WHEN status IN ('paid', 'shipped')          THEN 'fulfilled'
           WHEN status IN ('refunded', 'cancelled')    THEN 'reversed'
           ELSE                                              'in_progress'
       END AS searched_priority,
       CASE status
           WHEN 'paid'      THEN '✓ paid'
           WHEN 'refunded'  THEN '↩ refunded'
           WHEN 'cancelled' THEN '✗ cancelled'
           ELSE                  '… pending'
       END AS simple_label
FROM orders
ORDER BY order_id;

Output.

order_id	status	searched_priority	simple_label
1001	paid	fulfilled	✓ paid
1002	refunded	reversed	↩ refunded
1003	cancelled	reversed	✗ cancelled
1004	pending	in_progress	… pending

Rule of thumb: if your branches use = against a single column, the simple form is shorter; the moment ranges, IS NULL, or compound predicates enter the picture, switch to the searched form.

SQL
Topic — case-expression
CASE expression SQL problems

Practice →

SQL
Topic — conditional-logic
Conditional logic SQL drills

Practice →

3. CASE in SELECT — bucketing, tier labels, and derived columns

The bucketing pattern — letter grades, risk bands, RFM segments, NPS buckets

The single most-used CASE pattern is bucketing: take a continuous metric (score, revenue, age) and assign each row to a discrete tier. Letter grades, customer risk bands, RFM segments, NPS buckets, age cohorts, salary tiers, and tenure ranges all share the same shape.

The bucketing pattern at a glance.

Shape — CASE WHEN metric >= t1 THEN 'tier_1' WHEN metric >= t2 THEN 'tier_2' … ELSE 'tier_last' END.
Top-to-bottom evaluation — first true WHEN decides the tier; later branches are skipped.
Branches can overlap intentionally — WHEN salary >= 150000 matches Sam too, but WHEN salary >= 100000 is checked only if the previous branch failed, so Sam still lands in Executive.
Use it for — letter grades (A/B/C/D/F), customer cohorts (high/mid/low value), age bands, salary tiers, RFM segments, NPS buckets (promoter/passive/detractor), risk ratings.

Boundary rules — >= vs > decides who lands in each tier.

>= is inclusive at the lower bound; > is exclusive.
State your boundaries aloud — "A is 90+ inclusive, B is 80–89, C is 70–79".
Half-open intervals are cleanest — WHEN score >= 90 then WHEN score >= 80 (each implicitly covers up to but not including the next tier's lower bound thanks to top-down evaluation).
Avoid double-counting — closed intervals like WHEN score BETWEEN 80 AND 89 are correct but verbose; the half-open >= chain is the standard pattern.

Aliasing the derived column.

Use AS <name> — CASE … END AS letter_grade makes the output column self-documenting.
Reference the alias in ORDER BY — ORDER BY letter_grade DESC works (since ORDER BY runs after SELECT).
Cannot reference in same query's WHERE — wrap with a CTE if you need to filter on the tier.
Same alias rules apply for GROUP BY and HAVING — wrap in a CTE or repeat the full CASE expression.

Composing CASE with other functions.

String manipulation — LOWER(CASE WHEN status = 'PAID' THEN 'paid_revenue' ELSE 'other' END).
Numeric operations — CASE WHEN x > 0 THEN LOG(x) ELSE NULL END for safe logarithms.
Date arithmetic — CASE WHEN sale_date >= CURRENT_DATE - INTERVAL '30 days' THEN 'recent' ELSE 'old' END.
Inside aggregates — see §4 for the conditional aggregation pattern.

Worked example — letter grades from exam scores

Question. Add a letter_grade column for each student based on their score: 90+ is A, 80–89 is B, 70–79 is C, 60–69 is D, below 60 is F.

Input. exam_results(student_id, name, score).

student_id	name	score
1	Ava	95
2	Omar	82
3	Lin	71
4	Sam	58

Code.

SELECT student_id,
       name,
       score,
       CASE
           WHEN score >= 90 THEN 'A'
           WHEN score >= 80 THEN 'B'
           WHEN score >= 70 THEN 'C'
           WHEN score >= 60 THEN 'D'
           ELSE                  'F'
       END AS letter_grade
FROM exam_results
ORDER BY score DESC;

Output.

student_id	name	score	letter_grade
1	Ava	95	A
2	Omar	82	B
3	Lin	71	C
4	Sam	58	F

SQL interview question — categorize employees into salary tiers and count headcount per tier

Assume employees(emp_id, name, salary). Return one row per salary tier (Junior, Mid, Senior, Executive) with the headcount, sorted from highest tier to lowest.

Solution Using CASE in SELECT + GROUP BY tier

Code solution.

WITH labelled AS (
    SELECT emp_id,
           name,
           salary,
           CASE
               WHEN salary >= 150000 THEN 'Executive'
               WHEN salary >= 100000 THEN 'Senior'
               WHEN salary >=  60000 THEN 'Mid'
               ELSE                       'Junior'
           END AS tier
    FROM employees
)
SELECT tier,
       COUNT(*) AS headcount
FROM labelled
GROUP BY tier
ORDER BY CASE tier
             WHEN 'Executive' THEN 0
             WHEN 'Senior'    THEN 1
             WHEN 'Mid'       THEN 2
             ELSE                  3
         END;

Step-by-step trace.

step	relation	outcome
1	Scan `employees` and project the `tier` label inside the CTE	every row now carries a tier name
2	`GROUP BY tier` in the outer query	collapses to one row per tier
3	`COUNT(*)` counts rows per tier	headcount column produced
4	Outer `ORDER BY CASE tier WHEN 'Executive' THEN 0 … END`	enforces highest-tier-first sort (alphabetical sort would put `Executive` last)

Output:

tier	headcount
Executive	4
Senior	11
Mid	27
Junior	18

Why this works — concept by concept:

CASE inside SELECT — projects a derived tier column based on the salary buckets; top-to-bottom evaluation guarantees that "Executive" wins for salary 165k even though "Senior" would also match >= 100000.
CTE wrapper — the inner WITH labelled AS (…) materialises the labelled rows so the outer GROUP BY tier can reference the alias; without the CTE the same query would need to repeat the entire CASE expression in GROUP BY.
GROUP BY tier — collapses to one row per distinct tier value; COUNT(*) counts rows in each group.
ORDER BY CASE — alphabetic sort would put Executive after Junior and Mid; a CASE-based sort key enforces the business order without adding an extra rank column.
Cost — single sequential scan of employees for the CTE (Θ(n)), hash aggregate on the CTE output for the outer GROUP BY (Θ(n)).

SQL
Topic — case-when
Bucketing / tier-label drills

Practice →

SQL
Topic — aggregation
Aggregation + CASE combos

Practice →

4. Conditional aggregation — SUM(CASE WHEN …) the canonical interview pattern

Wrap CASE inside SUM, COUNT, AVG, MIN, MAX to compute multiple KPIs in one grouped pass

The sum case when sql pattern — SUM(CASE WHEN cond THEN val ELSE 0 END) — is the single most-asked CASE WHEN technique in data engineering interviews and the canonical form of conditional aggregation. It lets you compute several KPIs in one grouped query instead of running multiple filtered queries, joining them, or building temp tables.

The canonical pattern at a glance.

Shape — SUM(CASE WHEN <cond> THEN <value> ELSE 0 END) AS <metric>.
Reads as — "sum <value> for rows that satisfy <cond>, contribute 0 otherwise."
Replaces — multiple filtered queries, joins, temp tables, or PIVOT operators.
Pairs with GROUP BY for per-segment metrics — one CASE per metric column.

Counting with SUM(CASE WHEN cond THEN 1 ELSE 0 END).

Counts rows matching a predicate — equivalent to COUNT(*) FILTER (WHERE cond) in Postgres or a separate filtered subquery.
THEN 1 — contributes one to the count per matching row.
ELSE 0 — non-matching rows contribute zero (and are NOT counted).
Alternative — COUNT(CASE WHEN cond THEN 1 END) works too (because COUNT(col) ignores NULL), but SUM(CASE WHEN cond THEN 1 ELSE 0 END) is more explicit.

Why one query beats many.

One sequential scan of the source table → multiple KPIs side by side; reduces I/O.
Consistent grain — every metric is computed at the same GROUP BY level, so they sit in the same row.
No extra joins or temp tables — saves planner cost and review effort.
Refactor-friendly — adding a new KPI is one extra SUM(CASE …) column, not a new query.
Dashboard-ready — one row per group with all the metrics is exactly what BI tools want.

The ELSE 0 rule — the canonical interview trap.

Without ELSE 0 — CASE WHEN cond THEN x END returns NULL for the false branch.
SUM ignores NULL — so missing-ELSE rows silently disappear from the sum.
The bug is invisible — the query "works" and returns a number that looks plausible; only auditing against a separate filtered query catches the undercount.
Always write ELSE 0 for SUM — even when you think it cannot matter, write it; reviewers expect it.

Combining conditional sums with COUNT(*) for ratios.

Success rate — SUM(CASE WHEN succeeded THEN 1 ELSE 0 END) * 1.0 / COUNT(*) (the * 1.0 forces float division in MySQL / older Postgres).
Refund rate — SUM(CASE WHEN status = 'refund' THEN 1 ELSE 0 END) * 1.0 / COUNT(*).
Click-through rate — SUM(CASE WHEN clicked THEN 1 ELSE 0 END) * 1.0 / SUM(CASE WHEN impressed THEN 1 ELSE 0 END).
Always guard the denominator with NULLIF for safety (§7).

Worked example — paid vs refunded revenue per day

Question. From orders(order_id, order_ts, status, amount), compute per day: total_revenue, paid_revenue, refunded_amount, and paid_orders count — all in one grouped query.

Input. orders slice.

order_id	order_ts	status	amount
1	2026-01-01	paid	100
2	2026-01-01	paid	150
3	2026-01-01	refund	50
4	2026-01-02	paid	200
5	2026-01-02	refund	30

Code.

SELECT DATE(order_ts) AS order_date,
       SUM(amount)                                                       AS total_amount,
       SUM(CASE WHEN status = 'paid'   THEN amount ELSE 0 END)          AS paid_revenue,
       SUM(CASE WHEN status = 'refund' THEN amount ELSE 0 END)          AS refunded_amount,
       SUM(CASE WHEN status = 'paid'   THEN 1      ELSE 0 END)          AS paid_orders
FROM orders
GROUP BY DATE(order_ts)
ORDER BY 1;

Step-by-step explanation.

DATE(order_ts) truncates the timestamp to a calendar day for grouping.
SUM(amount) aggregates every row regardless of status — gives gross volume per day.
The four SUM(CASE WHEN …) columns filter rows by status before contributing to each KPI.
paid_orders uses THEN 1 to count rows instead of summing amounts.
GROUP BY DATE(order_ts) collapses to one row per day; every KPI shares the same order_date grain.

Output.

order_date	total_amount	paid_revenue	refunded_amount	paid_orders
2026-01-01	300	250	50	2
2026-01-02	230	200	30	1

SQL interview question — paid vs refunded revenue per region in one query

Assume orders(order_id, region, status, amount). Return per region: total_revenue, paid_revenue, refunded_amount, and the count of paid_orders — all in one grouped query.

Solution Using SUM(CASE WHEN …)

Code solution.

SELECT region,
       SUM(amount)                                                  AS total_revenue,
       SUM(CASE WHEN status = 'paid'   THEN amount ELSE 0 END)     AS paid_revenue,
       SUM(CASE WHEN status = 'refund' THEN amount ELSE 0 END)     AS refunded_amount,
       SUM(CASE WHEN status = 'paid'   THEN 1      ELSE 0 END)     AS paid_orders
FROM orders
GROUP BY region
ORDER BY paid_revenue DESC;

Step-by-step trace.

step	relation	outcome
1	Scan `orders` once	every order row evaluated
2	`GROUP BY region`	rows grouped by region; planner uses hash aggregate
3	Four aggregate calls evaluated per group	each `SUM(CASE …)` independently filters and sums
4	`paid_orders` uses `THEN 1 ELSE 0`	counts paid rows per region without a separate query
5	Outer `ORDER BY paid_revenue DESC`	best-performing regions first

Output:

region	total_revenue	paid_revenue	refunded_amount	paid_orders
US	12,500	11,200	1,300	142
EU	8,750	7,900	850	98
APAC	5,400	5,100	300	64

Why this works — concept by concept:

Conditional aggregation — SUM(CASE WHEN status = 'paid' THEN amount ELSE 0 END) sums only the rows where the predicate is true; non-matching rows contribute zero, preserving the regional grain.
The ELSE 0 discipline — without it, refunded rows would contribute NULL, SUM would silently ignore them, and paid_revenue would still look correct — but total_revenue - paid_revenue - refunded_amount would not balance. Always write ELSE 0.
One scan, four KPIs — a naive solution would write four filtered queries (or four joined subqueries); this single grouped query reads the table once and emits all four metrics side by side.
Counting with THEN 1 ELSE 0 — converts the conditional sum into a conditional count without a separate COUNT(*) FILTER (WHERE …) (which is Postgres-only).
Cost — single sequential scan Θ(n) plus a hash aggregate keyed on region (Θ(n)); the four CASE expressions add a constant factor per row.

SQL
Topic — conditional-aggregation
Conditional-aggregation drills

Practice →

SQL
Topic — aggregation
Aggregation SQL library

Practice →

5. Pivot rows to columns with conditional aggregation

One `SUM(CASE WHEN ...)` per output column — portable pivot without PIVOT

When a stakeholder wants "revenue by channel as columns instead of rows", reach for the conditional-aggregation pivot. The pattern produces one row per group with a fixed set of derived columns — far more portable than dialect-specific PIVOT operators.

The pattern at a glance.

Shape — one SUM(CASE WHEN <category_col> = '<value>' THEN <metric> ELSE 0 END) AS <metric>_<value> per output column.
One row per group — driven by GROUP BY on the dimension you want as rows.
Stable column names — derived from the literal predicates, so the schema is known at write-time.
Replaces — SQL Server's PIVOT, Postgres crosstab, multiple joined subqueries.

Why this beats native PIVOT.

Portability — works on every dialect that supports CASE (which is every major dialect).
Readable — the predicates document the column meanings inline (channel = 'paid' → rev_paid).
Composable — easily extends with new metrics by adding more SUM(CASE …) columns.
No special grammar — junior reviewers can follow it without learning PIVOT.

Stable column names.

Hard-coded in the SQL — the column list is whatever you write; not dependent on the data.
Renames are explicit — change rev_paid to paid_revenue in one place.
Auditable — the predicate appears in the same expression as the column name, so reviewers can verify each KPI in isolation.

Limitations.

Static column list — the columns must be known when you write the SQL. For dynamic pivots (where the categories themselves are unknown), reach for procedural SQL, application-side string interpolation, or dialect-specific dynamic SQL.
Many categories = wide query — 50 channels means 50 SUM(CASE …) columns; readability suffers. Consider a GROUP BY category shape instead.
Empty groups still appear — an ELSE 0 ensures empty cells render as 0 rather than NULL.

Use cases.

Campaign performance by channel — rev_paid / rev_organic / rev_social columns.
Revenue by tier — rev_free / rev_basic / rev_pro / rev_enterprise.
Conversions by funnel step — step_1_views / step_2_clicks / step_3_purchases.
NPS distribution — promoter_count / passive_count / detractor_count per cohort.

SQL interview question — pivot campaign performance by channel

Assume sessions(campaign_id, channel, converted, revenue). Return per campaign_id: total sessions, sessions split by channel (paid / organic / social), total conversions, and revenue split by channel — all in one grouped query.

Solution Using SUM(CASE WHEN channel = …)

Code solution.

SELECT campaign_id,
       COUNT(*)                                                                AS sessions_total,
       SUM(CASE WHEN channel = 'paid'    THEN 1       ELSE 0 END)              AS sessions_paid,
       SUM(CASE WHEN channel = 'organic' THEN 1       ELSE 0 END)              AS sessions_organic,
       SUM(CASE WHEN channel = 'social'  THEN 1       ELSE 0 END)              AS sessions_social,
       SUM(CASE WHEN converted = TRUE    THEN 1       ELSE 0 END)              AS conversions_total,
       SUM(CASE WHEN channel = 'paid'    THEN revenue ELSE 0 END)              AS rev_paid,
       SUM(CASE WHEN channel = 'organic' THEN revenue ELSE 0 END)              AS rev_organic,
       SUM(CASE WHEN channel = 'social'  THEN revenue ELSE 0 END)              AS rev_social
FROM sessions
GROUP BY campaign_id
ORDER BY (SUM(CASE WHEN channel = 'paid' THEN revenue ELSE 0 END)) DESC;

Step-by-step trace.

step	relation	outcome
1	Scan `sessions` once	every row evaluated
2	`GROUP BY campaign_id`	hash aggregate, one row per campaign
3	8 aggregate calls evaluated per group	each KPI computed in the same pass
4	Channel `SUM(CASE WHEN channel = … THEN 1 ELSE 0)` columns	session counts pivoted to columns
5	Channel `SUM(CASE WHEN channel = … THEN revenue ELSE 0)` columns	revenue pivoted to columns
6	Outer `ORDER BY` on a derived expression	sort by paid-channel revenue without referencing the alias

Output:

campaign_id	sessions_total	sessions_paid	sessions_organic	sessions_social	conversions_total	rev_paid	rev_organic	rev_social
C-001	1,250	800	350	100	142	12,500	4,200	1,100
C-002	980	420	510	50	89	6,800	6,100	540

Why this works — concept by concept:

Pivot via conditional sums — each channel becomes its own column without any dialect-specific PIVOT operator; the column names are derived from the literal predicates and are stable across runs.
One scan, all KPIs — the planner reads sessions once and computes every aggregate in parallel using the same hash aggregate node; the alternative would be 4–8 joined subqueries.
THEN 1 for counts, THEN revenue for sums — the same CASE shape produces a count column or a sum column depending on the THEN expression.
ELSE 0 everywhere — guarantees empty cells render as 0 (not NULL); makes the output safe for downstream arithmetic.
Repeating the CASE expression in ORDER BY — column aliases cannot be referenced inside aggregates in ORDER BY on every dialect; repeating the expression is the portable choice.
Cost — single sequential scan Θ(n) plus hash aggregate Θ(n); the 8 CASE expressions are a constant per-row factor.

SQL
Topic — conditional-aggregation
Pivot pattern drills

Practice →

SQL
Topic — aggregations
Aggregation patterns

Practice →

6. CASE in WHERE, GROUP BY, ORDER BY, and HAVING

CASE works in every clause — pick the right slot for your intent

CASE WHEN is not confined to the SELECT list — it can drive filtering (WHERE), grouping (GROUP BY), filtering of aggregates (HAVING), and custom sort priority (ORDER BY). Knowing which slot to reach for is its own interview signal.

CASE in WHERE.

Rare in practice — usually refactorable to IN, OR, or boolean combinations.
Useful when — the condition depends on a column value (e.g. "if region = 'EU', require gdpr_consent = TRUE; otherwise no consent check").
Shape — WHERE CASE WHEN region = 'EU' THEN gdpr_consent ELSE TRUE END.
Reads better as — WHERE (region <> 'EU') OR (region = 'EU' AND gdpr_consent = TRUE).

CASE in GROUP BY.

Group by a tier label — GROUP BY CASE WHEN salary >= 100000 THEN 'high' ELSE 'low' END.
Group by a computed bucket — GROUP BY CASE WHEN age < 18 THEN 'minor' WHEN age < 65 THEN 'adult' ELSE 'senior' END.
Cannot reference the SELECT alias — repeat the full CASE expression in GROUP BY (logical order issue).
Cleanest pattern — define the CASE in a CTE, then GROUP BY tier in the outer query (Blog72 §3 reference).

CASE in ORDER BY.

Custom sort priority — ORDER BY CASE status WHEN 'urgent' THEN 0 WHEN 'high' THEN 1 ELSE 2 END.
Sort tier labels in business order — alphabetic sort would scramble Executive / Senior / Mid / Junior; a CASE expression enforces the right order without an explicit rank column.
ORDER BY can reference SELECT aliases — ORDER BY tier works if tier is a CASE alias in the same SELECT, but the CASE expression itself is also legal.
Pair with DESC for reverse priority — ORDER BY CASE … END DESC.

CASE in HAVING.

Filter aggregates produced by CASE — HAVING SUM(CASE WHEN tier = 'paid' THEN 1 ELSE 0 END) > 0.
Multi-condition existence checks — "pages with both paid and free users" → HAVING SUM(CASE WHEN tier = 'paid' THEN 1 ELSE 0 END) > 0 AND SUM(CASE WHEN tier = 'free' THEN 1 ELSE 0 END) > 0.
Always references aggregates — HAVING runs after GROUP BY, so bare-column CASE expressions are illegal here unless wrapped in an aggregate.

Same alias rules apply.

SELECT aliases are visible in ORDER BY — ORDER BY tier works.
SELECT aliases are NOT visible in WHERE / GROUP BY / HAVING — logical processing order; the alias does not yet exist when those clauses run.
Workaround — repeat the full CASE expression in the earlier clause, OR wrap the query in a CTE / subquery and filter the outer scope.

Worked example — custom sort order for ticket priority

Question. From tickets(ticket_id, status, opened_at), return every ticket sorted with urgent first, then high, then normal, then low. Inside each priority group, sort by opened_at ascending (oldest first).

Input. tickets slice.

ticket_id	status	opened_at
101	normal	2026-05-19
102	urgent	2026-05-20
103	high	2026-05-18
104	low	2026-05-17
105	urgent	2026-05-19

Code.

SELECT ticket_id,
       status,
       opened_at
FROM tickets
ORDER BY
    CASE status
        WHEN 'urgent' THEN 0
        WHEN 'high'   THEN 1
        WHEN 'normal' THEN 2
        WHEN 'low'    THEN 3
        ELSE              4
    END,
    opened_at ASC;

Step-by-step explanation.

The CASE expression assigns a numeric sort key per ticket — urgent gets 0, high gets 1, etc.
The planner sorts rows by the CASE key first; ties on the CASE key are broken by opened_at ascending.
Within urgent, ticket 105 (2026-05-19) appears before ticket 102 (2026-05-20).
The ELSE 4 lands any unexpected status values at the bottom.

Output.

ticket_id	status	opened_at
105	urgent	2026-05-19
102	urgent	2026-05-20
103	high	2026-05-18
101	normal	2026-05-19
104	low	2026-05-17

SQL
Topic — conditional-filtering
Conditional-filtering drills

Practice →

SQL
Topic — case-when
CASE WHEN SQL library

Practice →

7. CASE with NULLIF and COALESCE — safe arithmetic patterns

NULLIF guards divide-by-zero; COALESCE replaces NULL with a default — both pair beautifully with CASE

CASE WHEN rarely ships alone. The two conditional functions that pair best with it are NULLIF (returns NULL when two values are equal) and COALESCE (returns the first non-NULL argument). Together they let you write bulletproof safe-division and empty-group fallbacks.

NULLIF(a, b) at a glance.

Shape — NULLIF(a, b).
Returns — NULL when a = b; otherwise returns a.
Equivalent to — CASE WHEN a = b THEN NULL ELSE a END.
Classic use — x / NULLIF(y, 0) returns NULL (instead of erroring) when y = 0.

COALESCE(a, b, c, …) at a glance.

Shape — COALESCE(a, b, c, …).
Returns — the first non-NULL argument from left to right.
Use cases — replace NULL with 0, with a default string, or with a fallback column.
Short-circuit — stops evaluating after the first non-NULL is found.

Safe-division template.

Pattern — <numerator> / NULLIF(<denominator>, 0).
Behaviour — returns NULL when the denominator is zero; everywhere else returns the quotient.
Pair with COALESCE — COALESCE(<numerator> / NULLIF(<denominator>, 0), 0) replaces the NULL with 0 for downstream arithmetic.
Use it for — success rates, click-through rates, refund rates, conversion rates, churn rates — every ratio where the denominator could legitimately be zero.

Empty-group bulletproofing.

Pattern — COALESCE(SUM(CASE WHEN cond THEN amount ELSE 0 END), 0).
Why — SUM over zero rows returns NULL (not 0); without COALESCE, the empty group renders as NULL and breaks downstream consumers.
Combined with FILTER (Postgres) — COALESCE(SUM(amount) FILTER (WHERE cond), 0).

Composition order — outer to inner.

COALESCE on the outside — final fallback so the result is never NULL.
NULLIF on the inside — protects intermediate expressions (denominators) from invalid values.
CAST last (or first) — explicit type casting prevents implicit-cast surprises.
Example — ROUND(COALESCE(SUM(CASE WHEN succeeded THEN 1 ELSE 0 END) * 1.0 / NULLIF(COUNT(*), 0), 0), 4) reads "round the success rate; fall back to 0 if no rows; guard divide-by-zero with NULLIF; force float division with * 1.0."

SQL interview question — success rate per region with safe division

Assume events(event_id, region, succeeded) where succeeded is a boolean. Return per region: total events, successes, failures, and success_rate (successes / total). Regions with zero events should render as success_rate = 0, not crash.

Solution Using SUM(CASE WHEN …) + NULLIF

Code solution.

SELECT region,
       COUNT(*)                                                                AS total_events,
       SUM(CASE WHEN succeeded         THEN 1 ELSE 0 END)                      AS successes,
       SUM(CASE WHEN NOT succeeded     THEN 1 ELSE 0 END)                      AS failures,
       ROUND(
         COALESCE(
           SUM(CASE WHEN succeeded THEN 1 ELSE 0 END) * 1.0
           / NULLIF(COUNT(*), 0),
           0
         ),
         4
       )                                                                       AS success_rate
FROM events
GROUP BY region
ORDER BY success_rate DESC;

Step-by-step trace.

step	relation	outcome
1	Scan `events`, group by region	hash aggregate
2	`SUM(CASE WHEN succeeded THEN 1 ELSE 0 END)`	counts successes per region
3	`SUM(CASE WHEN NOT succeeded THEN 1 ELSE 0 END)`	counts failures per region
4	`COUNT(*)`	total events per region
5	`NULLIF(COUNT(*), 0)`	converts zero-event groups to `NULL`, preventing divide-by-zero
6	`successes * 1.0 / NULLIF(COUNT(*), 0)`	float division; `NULL` when denominator is zero
7	`COALESCE(…, 0)`	replaces the `NULL` with `0` for empty groups
8	`ROUND(…, 4)`	normalises the output to 4 decimal places

Output:

region	total_events	successes	failures	success_rate
US	1,000	950	50	0.9500
EU	800	760	40	0.9500
APAC	600	540	60	0.9000

Why this works — concept by concept:

SUM(CASE WHEN succeeded THEN 1 ELSE 0) — counts successful events without a separate filtered query; the explicit ELSE 0 prevents the silent-undercount NULL trap.
NULLIF(COUNT(*), 0) — turns the zero-denominator case into NULL, which propagates safely instead of raising a divide-by-zero error.
* 1.0 for float division — without it, MySQL and older Postgres versions perform integer division and truncate successes / total to 0 or 1.
COALESCE(…, 0) — the outer fallback ensures zero-event groups render as 0.0000 rather than NULL; safe for downstream BI tools.
ROUND(…, 4) — normalises the precision; reviewers expect financial / rate metrics to come back with a stable decimal scale.
Cost — single sequential scan Θ(n) plus hash aggregate; the three CASE expressions and the safe-division wrappers are constant per-row factors.

SQL
Topic — conditional-aggregation
Safe-aggregation drills

Practice →

SQL
Topic — aggregation
Aggregation SQL drills

Practice →

8. Dialect alternatives — FILTER, IIF, DECODE, IF

Same semantics, cleaner syntax — know the dialect-specific shorthand

Every major dialect supports the standard CASE WHEN syntax, but most also ship a shorter idiom for the conditional-aggregation use case. Knowing the alternatives is a senior-signal in panels at companies that lean on a specific stack.

PostgreSQL FILTER (WHERE …).

Shape — SUM(amount) FILTER (WHERE status = 'paid').
Replaces — SUM(CASE WHEN status = 'paid' THEN amount ELSE 0 END).
Why it wins — reads top-to-bottom as "sum amounts, filtered to paid"; no ELSE 0 needed.
Identical plan — the planner emits the same hash aggregate; this is pure readability sugar.
Available since — PostgreSQL 9.4 (and SQL standard SQL:2003).

SQL Server IIF(cond, then, else).

Shape — IIF(channel = 'paid', revenue, 0).
Replaces — CASE WHEN channel = 'paid' THEN revenue ELSE 0 END for binary conditions.
Limitation — only two branches; for >2 branches, fall back to CASE.
Use case — concise conditional aggregation: SUM(IIF(channel = 'paid', revenue, 0)).

MySQL IF(cond, then, else).

Shape — IF(channel = 'paid', revenue, 0).
Identical to SQL Server's IIF — same binary-condition shorthand.
Use case — SUM(IF(channel = 'paid', revenue, 0)) for compact conditional aggregation in MySQL.
Not portable — IF does not exist in Postgres or Oracle; use CASE if your query must run elsewhere.

Oracle DECODE(expr, val1, res1, val2, res2, …, default).

Shape — DECODE(channel, 'paid', revenue, 'organic', revenue, 0).
Replaces — CASE channel WHEN 'paid' THEN revenue WHEN 'organic' THEN revenue ELSE 0 END (simple form).
Limitations — equality only (no ranges); legacy syntax; new Oracle code prefers CASE.
Recognise it — DBA-heavy panels at Oracle shops may quiz you on DECODE semantics.

Snowflake / BigQuery.

Standard CASE only — neither IIF nor DECODE is available.
Use CASE WHEN — the portable choice across cloud-native warehouses.
BigQuery IF(cond, then, else) — available as a function; same semantics as MySQL's IF.

Portability rule.

Write CASE WHEN when the same query must run across multiple dialects (e.g. dbt models that target both Postgres and Snowflake).
Reach for FILTER / IIF / IF only when the codebase is single-dialect and the team prefers the shorter idiom.
Document dialect-specific code with a comment so future maintainers know why it's there.

Worked example — same query in four dialects

Question. Show paid revenue and refunded amount per region using the cleanest idiom available in PostgreSQL, SQL Server, MySQL, and Oracle.

PostgreSQL (FILTER).

SELECT region,
       SUM(amount) FILTER (WHERE status = 'paid')    AS paid_revenue,
       SUM(amount) FILTER (WHERE status = 'refund')  AS refunded_amount
FROM orders
GROUP BY region;

SQL Server (IIF).

SELECT region,
       SUM(IIF(status = 'paid',   amount, 0)) AS paid_revenue,
       SUM(IIF(status = 'refund', amount, 0)) AS refunded_amount
FROM orders
GROUP BY region;

MySQL (IF).

SELECT region,
       SUM(IF(status = 'paid',   amount, 0)) AS paid_revenue,
       SUM(IF(status = 'refund', amount, 0)) AS refunded_amount
FROM orders
GROUP BY region;

Oracle / portable (CASE WHEN).

SELECT region,
       SUM(CASE WHEN status = 'paid'   THEN amount ELSE 0 END) AS paid_revenue,
       SUM(CASE WHEN status = 'refund' THEN amount ELSE 0 END) AS refunded_amount
FROM orders
GROUP BY region;

Rule of thumb: default to CASE WHEN unless the codebase is locked to one dialect; even then, mention the standard form alongside the shortcut so reviewers know you understand both.

SQL
Topic — case-when
Dialect-portable CASE WHEN drills

Practice →

SQL
Topic — conditional-aggregation
Conditional aggregation library

Practice →

Choosing where to put CASE (cheat sheet)

A one-screen cheat sheet for using case statements in SQL — pick the clause that matches your intent, then copy the corresponding shape into your query.

You want to …	Put CASE in …	Notes
Derive a tier label or computed column	`SELECT` projection	Use `AS <alias>` for readability
Count rows matching a predicate	`SUM(CASE WHEN cond THEN 1 ELSE 0 END)`	Or `COUNT(*) FILTER (WHERE cond)` in Postgres
Sum a value only when a condition holds	`SUM(CASE WHEN cond THEN value ELSE 0 END)`	Always include `ELSE 0` to avoid the silent-undercount NULL trap
Pivot rows to columns	`SUM(CASE WHEN dim = 'x' THEN val ELSE 0 END)` per output column	Portable replacement for `PIVOT`
Group rows by a tier	`GROUP BY CASE WHEN … THEN … END`	Repeat the CASE expression; the SELECT alias is not visible here
Filter aggregates by a condition	`HAVING SUM(CASE WHEN … THEN 1 ELSE 0 END) > 0`	Multi-condition existence checks
Custom sort priority	`ORDER BY CASE WHEN status = … THEN 0 … END`	Numeric sort keys per business priority
Apply a column-dependent filter	`WHERE CASE WHEN region = 'EU' THEN gdpr_consent ELSE TRUE END`	Often refactorable to `OR` / `AND`; reach for CASE when the condition depends on the row
Safe division	`<num> / NULLIF(<denom>, 0)`	Pair with `COALESCE(..., 0)` for empty-group fallback
Replace NULL with a default	`COALESCE(<col>, <default>)`	Equivalent to `CASE WHEN <col> IS NULL THEN <default> ELSE <col> END`

Frequently asked questions

What's the difference between CASE WHEN and IF-ELSE in SQL?

CASE WHEN is the ANSI-standard conditional expression and works in every major SQL dialect; it evaluates predicates top-to-bottom and returns the value tied to the first match, or ELSE (or NULL) when nothing matches. IF / IIF are dialect-specific shorthand for the binary-condition case — MySQL has IF(cond, then, else), SQL Server has IIF(cond, then, else), and both reduce to a single CASE WHEN cond THEN then ELSE else END. Use CASE WHEN for anything beyond two branches and for portability; use IF / IIF only when the codebase is locked to a single dialect and the team prefers the shorter idiom.

What's the difference between the searched form and the simple form of CASE?

The searched form (CASE WHEN <bool_expr> THEN … END) accepts arbitrary boolean predicates per WHEN — ranges, IS NULL, compound AND / OR are all legal. The simple form (CASE <expr> WHEN <value> THEN … END) compares one expression to a series of constant values for equality only; it cannot express ranges or null checks. The searched form is strictly more expressive and is the one you should default to. Reach for the simple form only when every branch is <expr> = <constant> and the equality-only shape genuinely reads cleaner.

How do I use CASE WHEN with SUM (sum case when sql)?

Wrap a CASE inside SUM — the sum case when sql pattern — to compute a metric only for rows that satisfy a predicate: SUM(CASE WHEN <cond> THEN <value> ELSE 0 END) AS <metric>. This is the canonical interview pattern — it lets you compute multiple KPIs in one grouped query instead of running multiple filtered queries, joining them, or building temp tables. The single most important rule is to always include ELSE 0 — without it, false-branch rows return NULL, SUM ignores NULL, and the total is silently undercounted. For counting rows, use THEN 1 ELSE 0; for summing values, use THEN <value> ELSE 0.

How do I pivot rows to columns using CASE WHEN?

Write one SUM(CASE WHEN <category_col> = '<value>' THEN <metric> ELSE 0 END) AS <metric>_<value> per output column, then GROUP BY the dimension you want as rows. Each category becomes its own column with a stable name derived from your literal predicates. This pattern is far more portable than dialect-specific PIVOT operators (SQL Server's PIVOT, Postgres's crosstab extension), works on every database that supports CASE, and extends easily — adding a new column is just one more SUM(CASE …) expression. The only limitation is that the column list must be known when you write the SQL; for dynamic pivots, reach for procedural SQL or application-side templating.

What's the PostgreSQL FILTER clause and how does it relate to CASE WHEN?

The PostgreSQL FILTER (WHERE <cond>) clause is a cleaner native idiom for conditional aggregation: SUM(amount) FILTER (WHERE status = 'paid') is identical in plan and result to SUM(CASE WHEN status = 'paid' THEN amount ELSE 0 END). It reads top-to-bottom as "sum amounts, filtered to paid rows," requires no ELSE 0, and is part of the SQL:2003 standard. FILTER works on every aggregate (SUM, COUNT, AVG, MIN, MAX) and is the preferred Postgres-native form. For portability across MySQL, SQL Server, Oracle, Snowflake, and BigQuery — none of which support FILTER — fall back to CASE WHEN.

Practice on PipeCode

PipeCode ships 450+ data-engineering interview problems — including PostgreSQL-first SQL practice keyed to CASE WHEN, conditional logic, conditional aggregation, pivot-via-CASE patterns, and safe-arithmetic drills with NULLIF + COALESCE.

Kick off via Explore practice →; drill the dedicated CASE WHEN SQL lane →; fan out into the conditional-logic SQL lane →; reinforce conditional-aggregation drills →; rehearse aggregation SQL drills →; widen coverage on the full SQL practice library →.

1. Why CASE WHEN matters in data engineering interviews

CASE WHEN is SQL's if-else — the single most versatile conditional primitive

Worked example — bucket employee salaries into tier labels

2. Anatomy — searched form vs simple form, ELSE, and NULL behavior

CASE ships in two forms — pick the searched form for anything beyond simple equality

Worked example — searched vs simple form on the same input

3. CASE in SELECT — bucketing, tier labels, and derived columns

The bucketing pattern — letter grades, risk bands, RFM segments, NPS buckets

Worked example — letter grades from exam scores

SQL interview question — categorize employees into salary tiers and count headcount per tier

Solution Using CASE in SELECT + GROUP BY tier

4. Conditional aggregation — SUM(CASE WHEN …) the canonical interview pattern

Wrap CASE inside SUM, COUNT, AVG, MIN, MAX to compute multiple KPIs in one grouped pass

Worked example — paid vs refunded revenue per day

SQL interview question — paid vs refunded revenue per region in one query

Solution Using SUM(CASE WHEN …)

5. Pivot rows to columns with conditional aggregation

One SUM(CASE WHEN ...) per output column — portable pivot without PIVOT

SQL interview question — pivot campaign performance by channel

Solution Using SUM(CASE WHEN channel = …)

6. CASE in WHERE, GROUP BY, ORDER BY, and HAVING

CASE works in every clause — pick the right slot for your intent

Worked example — custom sort order for ticket priority

7. CASE with NULLIF and COALESCE — safe arithmetic patterns

NULLIF guards divide-by-zero; COALESCE replaces NULL with a default — both pair beautifully with CASE

SQL interview question — success rate per region with safe division

Solution Using SUM(CASE WHEN …) + NULLIF

8. Dialect alternatives — FILTER, IIF, DECODE, IF

Same semantics, cleaner syntax — know the dialect-specific shorthand

Worked example — same query in four dialects

Choosing where to put CASE (cheat sheet)

Frequently asked questions

What's the difference between CASE WHEN and IF-ELSE in SQL?

What's the difference between the searched form and the simple form of CASE?

How do I use CASE WHEN with SUM (sum case when sql)?

How do I pivot rows to columns using CASE WHEN?

What's the PostgreSQL FILTER clause and how does it relate to CASE WHEN?

Practice on PipeCode

CASE WHEN is SQL's `if-else` — the single most versatile conditional primitive

One `SUM(CASE WHEN ...)` per output column — portable pivot without PIVOT