DEV Community

Cover image for SQL CASE WHEN Statement: Conditional Logic for Data Engineering
Gowtham Potureddi
Gowtham Potureddi

Posted on

SQL CASE WHEN Statement: Conditional Logic for Data Engineering

The SQL CASE WHEN expression — also called the case statement in SQL or simply the SQL CASE statement — is the database equivalent of an if-else chain: it evaluates a list of conditions in order, returns the value tied to the first condition that is true, and falls back to ELSE (or NULL) when nothing matches. Almost every analytics SQL query that does anything interesting — letter grades, success rates, paid-vs-refunded revenue, pivot reports, custom sort priorities — eventually reaches for CASE WHEN, which is why it shows up in nearly every screening round under the sql interview questions umbrella.

This guide walks through the two forms of CASE (searched vs simple), every clause where it can appear (SELECT, WHERE, GROUP BY, ORDER BY, HAVING), the conditional aggregation pattern (sum case when sql) that powers half of all data engineering interview questions, the rows-to-columns pivot trick reviewers love, the four NULL traps that fail most candidates, and dialect-specific shortcuts like Postgres FILTER, SQL Server IIF, and Oracle DECODE. Every section ends as sql interview questions with answers: a runnable PostgreSQL query, a traced execution, an output table, and a concept-by-concept why this works breakdown — the exact shape sql for data engineers rounds reward. Whether you reach sql for data engineering prompts on screening calls or sit down to a multi-hour onsite, using case statements in SQL correctly is the single highest-leverage SQL skill you can lock in.

PipeCode blog header for a SQL CASE WHEN tutorial — bold white headline 'SQL CASE WHEN' with subtitle 'conditional logic for data engineering' and a minimal code snippet on a dark gradient with purple, green, and orange accents and a small pipecode.ai attribution.

When you want hands-on reps immediately after reading, browse CASE WHEN SQL practice →, drill the conditional logic lane →, sharpen conditional aggregation patterns →, rehearse aggregation SQL drills →, or widen coverage on the full SQL practice library →.


On this page


1. Why CASE WHEN matters in data engineering interviews

CASE WHEN is SQL's if-else — the single most versatile conditional primitive

The one-sentence invariant interviewers chase: a SQL CASE WHEN expression is the SQL equivalent of an if-else chain — it evaluates a list of conditions in order and returns the value tied to the first condition that is TRUE, or ELSE (or NULL) if none match. Internalise that and you can lift a third of the SQL interview prompts that look "tricky" — letter grades, success rates, refund accounting, pivot reports — into one expression.

What CASE WHEN does at a glance.

  • Definition — a conditional branching expression that evaluates predicates top-to-bottom and returns the value of the first predicate that is TRUE.
  • Required syntaxCASE [<expr>] WHEN <pred> THEN <result> [WHEN …] [ELSE <default>] END.
  • Return type — any scalar value (number, string, date, boolean); branches must agree on a compatible type.
  • Short-circuit evaluation — once a WHEN matches, the rest are skipped; later predicates never even run.

Two forms ship in ANSI SQL.

  • Searched form (the one you use 95% of the time) — each WHEN holds an arbitrary boolean predicate; full flexibility for ranges, IS NULL, compound predicates.
  • Simple formCASE <expr> WHEN <value1> THEN … compares <expr> to each WHEN <value> for equality; cannot express ranges or IS NULL.
  • ANSI SQL-92 defined both forms; every major dialect (PostgreSQL, MySQL, SQL Server, Oracle, Snowflake, BigQuery) supports them.

Where CASE WHEN shows up in a query.

  • SELECT projection — derive a new column (bucketing, tier label, computed metric).
  • Inside an aggregateSUM, COUNT, AVG, MIN, MAX — the conditional aggregation pattern (§4).
  • WHERE clause — rare; usually refactorable to IN / OR.
  • GROUP BY — group by a tier label or computed bucket.
  • HAVING — filter on a conditional aggregate.
  • ORDER BY — custom sort priority ("active first, paused middle, churned last").
  • DML statementsINSERT, UPDATE, DELETE use CASE to set conditional values.
  • Nested inside another CASE — for hierarchical logic (role + tenure bonus rules).

The four canonical interview patterns powered by CASE.

  • Bucketing / tier labels — letter grades, risk bands, age cohorts, RFM segments.
  • Conditional aggregationSUM(CASE WHEN cond THEN val ELSE 0 END) for multi-KPI single-pass reports.
  • Pivot rows to columns — one SUM(CASE WHEN category = 'x' THEN val ELSE 0 END) per output column.
  • Custom sort priorityORDER BY CASE WHEN status = 'urgent' THEN 0 WHEN status = 'high' THEN 1 ELSE 2 END.

What interviewers listen for.

  • Do you include ELSE 0 in every SUM(CASE …)? — signals you understand the silent-undercount NULL trap (§2).
  • Do you choose the searched form over the simple form when ranges are involved? — signals fluency.
  • Do you know that CASE WHEN x = NULL never matches and use IS NULL instead? — senior signal.
  • Do you mention the Postgres FILTER clause when working on a Postgres-flavoured prompt? — bonus points.

Worked example — bucket employee salaries into tier labels

Question. Add a salary_tier column to the result showing Junior (under 60k), Mid (60k–99,999), Senior (100k–149,999), and Executive (150k or more).

Input. employees(emp_id, name, salary).

emp_id name salary
1 Ava 55,000
2 Omar 78,500
3 Lin 130,400
4 Sam 165,000

Code.

SELECT emp_id,
       name,
       salary,
       CASE
           WHEN salary >= 150000 THEN 'Executive'
           WHEN salary >= 100000 THEN 'Senior'
           WHEN salary >=  60000 THEN 'Mid'
           ELSE                       'Junior'
       END AS salary_tier
FROM employees
ORDER BY salary DESC;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The planner evaluates the CASE expression once per row.
  2. Branches are tested top to bottom; the first true WHEN decides the result.
  3. Sam's 165k hits the first branch (>= 150000) and becomes Executive; the rest of the branches are skipped.
  4. Lin's 130k fails the first branch but hits >= 100000Senior.
  5. Ava's 55k falls through every WHEN and lands in the ELSEJunior.

Output.

emp_id name salary salary_tier
4 Sam 165,000 Executive
3 Lin 130,400 Senior
2 Omar 78,500 Mid
1 Ava 55,000 Junior

Rule of thumb: order your WHEN branches from most restrictive (highest threshold) to least restrictive so the first match always lands in the intended tier.

SQL
Topic — case-when
CASE WHEN SQL problems

Practice →

SQL
Topic — conditional-logic
Conditional-logic SQL drills

Practice →


2. Anatomy — searched form vs simple form, ELSE, and NULL behavior

Side-by-side diagram of the SQL CASE statement in its two forms — searched (boolean predicates per WHEN) on the left and simple (equality comparison against a single expression) on the right, both with a labelled ELSE branch, on a light PipeCode-branded card.

CASE ships in two forms — pick the searched form for anything beyond simple equality

ANSI SQL-92 defined two CASE forms: the searched form uses arbitrary boolean predicates per WHEN, while the simple form compares a single expression against constant values for equality. The searched form is strictly more expressive — pick it by default and only fall back to the simple form when the equality-only shape genuinely reads cleaner.

Searched form — the one you use 95% of the time.

  • ShapeCASE WHEN <bool_expr_1> THEN <result_1> [WHEN <bool_expr_2> THEN <result_2>] … [ELSE <default>] END.
  • Predicates — any boolean expression: >=, <, BETWEEN, IS NULL, IN, compound AND/OR.
  • Use it when — branches need ranges, NULL checks, multi-column logic, or nested function calls.
  • Reads like an if-else chain — interviewers comprehend it instantly.

Simple form — equality only.

  • ShapeCASE <expr> WHEN <value_1> THEN <result_1> [WHEN <value_2> THEN <result_2>] … [ELSE <default>] END.
  • Predicates — equality only: <expr> = <value_n> per branch.
  • Cannot express — ranges, IS NULL (use searched form for these).
  • Reads cleaner when — the input is a stable enum-like column (status, country_code) and every WHEN is a single literal.

The ELSE clause.

  • Optional — if omitted, a no-match returns NULL.
  • Mandatory in SUM(CASE …) — without ELSE 0, false-branch rows return NULL, which SUM ignores, causing a silent undercount (the canonical interview trap).
  • Type compatibilityELSE must return a value compatible with the THEN branches; mixing types triggers implicit casts.
  • Default literal'Other', 0, or a sentinel that downstream consumers can safely filter.

The four NULL traps (the canonical interview gotchas).

  • Missing ELSE + SUMSUM(CASE WHEN cond THEN x END) returns NULL for the false branch; SUM ignores NULL; your total is silently undercounted. Fix: always include ELSE 0.
  • CASE WHEN x = NULL never matchesNULL = anything is UNKNOWN. Fix: use CASE WHEN x IS NULL THEN ….
  • Mixed types in branches force implicit casts — branches returning a mix of INTEGER, DECIMAL, VARCHAR can produce surprises. Fix: cast each branch explicitly to a common type.
  • Divide-by-zero via NULL propagationrevenue / cost crashes when cost = 0. Fix: wrap with NULLIF: revenue / NULLIF(cost, 0) (returns NULL instead of raising).

Logical processing order — where does CASE sit?

  • The SQL clauses execute in this order regardless of the order you write them:
    • FROM / JOINWHEREGROUP BYHAVINGwindow functionsSELECT (with CASE)ORDER BY.
  • Consequence #1 — a CASE alias in SELECT cannot be referenced in the same query's WHERE, GROUP BY, or HAVING (the alias does not yet exist).
  • Consequence #2 — to filter on a CASE result, wrap the query in a CTE and filter the outer scope.
  • Consequence #3 — outer ORDER BY can reference the CASE alias because ORDER BY runs last.

Worked example — searched vs simple form on the same input

Question. Show every order's status with two derived columns: one using the searched form for an "order priority" and one using the simple form for a "status label".

Input. orders(order_id, status).

order_id status
1001 paid
1002 refunded
1003 cancelled
1004 pending

Code.

SELECT order_id,
       status,
       CASE
           WHEN status IN ('paid', 'shipped')          THEN 'fulfilled'
           WHEN status IN ('refunded', 'cancelled')    THEN 'reversed'
           ELSE                                              'in_progress'
       END AS searched_priority,
       CASE status
           WHEN 'paid'      THEN '✓ paid'
           WHEN 'refunded'  THEN '↩ refunded'
           WHEN 'cancelled' THEN '✗ cancelled'
           ELSE                  '… pending'
       END AS simple_label
FROM orders
ORDER BY order_id;
Enter fullscreen mode Exit fullscreen mode

Output.

order_id status searched_priority simple_label
1001 paid fulfilled ✓ paid
1002 refunded reversed ↩ refunded
1003 cancelled reversed ✗ cancelled
1004 pending in_progress … pending

Rule of thumb: if your branches use = against a single column, the simple form is shorter; the moment ranges, IS NULL, or compound predicates enter the picture, switch to the searched form.

SQL
Topic — case-expression
CASE expression SQL problems

Practice →

SQL
Topic — conditional-logic
Conditional logic SQL drills

Practice →


3. CASE in SELECT — bucketing, tier labels, and derived columns

The bucketing pattern — letter grades, risk bands, RFM segments, NPS buckets

The single most-used CASE pattern is bucketing: take a continuous metric (score, revenue, age) and assign each row to a discrete tier. Letter grades, customer risk bands, RFM segments, NPS buckets, age cohorts, salary tiers, and tenure ranges all share the same shape.

The bucketing pattern at a glance.

  • ShapeCASE WHEN metric >= t1 THEN 'tier_1' WHEN metric >= t2 THEN 'tier_2' … ELSE 'tier_last' END.
  • Top-to-bottom evaluation — first true WHEN decides the tier; later branches are skipped.
  • Branches can overlap intentionallyWHEN salary >= 150000 matches Sam too, but WHEN salary >= 100000 is checked only if the previous branch failed, so Sam still lands in Executive.
  • Use it for — letter grades (A/B/C/D/F), customer cohorts (high/mid/low value), age bands, salary tiers, RFM segments, NPS buckets (promoter/passive/detractor), risk ratings.

Boundary rules — >= vs > decides who lands in each tier.

  • >= is inclusive at the lower bound; > is exclusive.
  • State your boundaries aloud — "A is 90+ inclusive, B is 80–89, C is 70–79".
  • Half-open intervals are cleanestWHEN score >= 90 then WHEN score >= 80 (each implicitly covers up to but not including the next tier's lower bound thanks to top-down evaluation).
  • Avoid double-counting — closed intervals like WHEN score BETWEEN 80 AND 89 are correct but verbose; the half-open >= chain is the standard pattern.

Aliasing the derived column.

  • Use AS <name>CASE … END AS letter_grade makes the output column self-documenting.
  • Reference the alias in ORDER BYORDER BY letter_grade DESC works (since ORDER BY runs after SELECT).
  • Cannot reference in same query's WHERE — wrap with a CTE if you need to filter on the tier.
  • Same alias rules apply for GROUP BY and HAVING — wrap in a CTE or repeat the full CASE expression.

Composing CASE with other functions.

  • String manipulationLOWER(CASE WHEN status = 'PAID' THEN 'paid_revenue' ELSE 'other' END).
  • Numeric operationsCASE WHEN x > 0 THEN LOG(x) ELSE NULL END for safe logarithms.
  • Date arithmeticCASE WHEN sale_date >= CURRENT_DATE - INTERVAL '30 days' THEN 'recent' ELSE 'old' END.
  • Inside aggregates — see §4 for the conditional aggregation pattern.

Worked example — letter grades from exam scores

Question. Add a letter_grade column for each student based on their score: 90+ is A, 80–89 is B, 70–79 is C, 60–69 is D, below 60 is F.

Input. exam_results(student_id, name, score).

student_id name score
1 Ava 95
2 Omar 82
3 Lin 71
4 Sam 58

Code.

SELECT student_id,
       name,
       score,
       CASE
           WHEN score >= 90 THEN 'A'
           WHEN score >= 80 THEN 'B'
           WHEN score >= 70 THEN 'C'
           WHEN score >= 60 THEN 'D'
           ELSE                  'F'
       END AS letter_grade
FROM exam_results
ORDER BY score DESC;
Enter fullscreen mode Exit fullscreen mode

Output.

student_id name score letter_grade
1 Ava 95 A
2 Omar 82 B
3 Lin 71 C
4 Sam 58 F

SQL interview question — categorize employees into salary tiers and count headcount per tier

Assume employees(emp_id, name, salary). Return one row per salary tier (Junior, Mid, Senior, Executive) with the headcount, sorted from highest tier to lowest.

Solution Using CASE in SELECT + GROUP BY tier

Code solution.

WITH labelled AS (
    SELECT emp_id,
           name,
           salary,
           CASE
               WHEN salary >= 150000 THEN 'Executive'
               WHEN salary >= 100000 THEN 'Senior'
               WHEN salary >=  60000 THEN 'Mid'
               ELSE                       'Junior'
           END AS tier
    FROM employees
)
SELECT tier,
       COUNT(*) AS headcount
FROM labelled
GROUP BY tier
ORDER BY CASE tier
             WHEN 'Executive' THEN 0
             WHEN 'Senior'    THEN 1
             WHEN 'Mid'       THEN 2
             ELSE                  3
         END;
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

step relation outcome
1 Scan employees and project the tier label inside the CTE every row now carries a tier name
2 GROUP BY tier in the outer query collapses to one row per tier
3 COUNT(*) counts rows per tier headcount column produced
4 Outer ORDER BY CASE tier WHEN 'Executive' THEN 0 … END enforces highest-tier-first sort (alphabetical sort would put Executive last)

Output:

tier headcount
Executive 4
Senior 11
Mid 27
Junior 18

Why this works — concept by concept:

  • CASE inside SELECT — projects a derived tier column based on the salary buckets; top-to-bottom evaluation guarantees that "Executive" wins for salary 165k even though "Senior" would also match >= 100000.
  • CTE wrapper — the inner WITH labelled AS (…) materialises the labelled rows so the outer GROUP BY tier can reference the alias; without the CTE the same query would need to repeat the entire CASE expression in GROUP BY.
  • GROUP BY tier — collapses to one row per distinct tier value; COUNT(*) counts rows in each group.
  • ORDER BY CASE — alphabetic sort would put Executive after Junior and Mid; a CASE-based sort key enforces the business order without adding an extra rank column.
  • Cost — single sequential scan of employees for the CTE (Θ(n)), hash aggregate on the CTE output for the outer GROUP BY (Θ(n)).

SQL
Topic — case-when
Bucketing / tier-label drills

Practice →

SQL
Topic — aggregation
Aggregation + CASE combos

Practice →


4. Conditional aggregation — SUM(CASE WHEN …) the canonical interview pattern

Diagram of the conditional aggregation pattern — SUM(CASE WHEN status='paid' THEN amount ELSE 0 END) — showing rows being filtered by the predicate and contributing 'amount' or 0 to the running total on a light PipeCode card.

Wrap CASE inside SUM, COUNT, AVG, MIN, MAX to compute multiple KPIs in one grouped pass

The sum case when sql pattern — SUM(CASE WHEN cond THEN val ELSE 0 END) — is the single most-asked CASE WHEN technique in data engineering interviews and the canonical form of conditional aggregation. It lets you compute several KPIs in one grouped query instead of running multiple filtered queries, joining them, or building temp tables.

The canonical pattern at a glance.

  • ShapeSUM(CASE WHEN <cond> THEN <value> ELSE 0 END) AS <metric>.
  • Reads as — "sum <value> for rows that satisfy <cond>, contribute 0 otherwise."
  • Replaces — multiple filtered queries, joins, temp tables, or PIVOT operators.
  • Pairs with GROUP BY for per-segment metrics — one CASE per metric column.

Counting with SUM(CASE WHEN cond THEN 1 ELSE 0 END).

  • Counts rows matching a predicate — equivalent to COUNT(*) FILTER (WHERE cond) in Postgres or a separate filtered subquery.
  • THEN 1 — contributes one to the count per matching row.
  • ELSE 0 — non-matching rows contribute zero (and are NOT counted).
  • AlternativeCOUNT(CASE WHEN cond THEN 1 END) works too (because COUNT(col) ignores NULL), but SUM(CASE WHEN cond THEN 1 ELSE 0 END) is more explicit.

Why one query beats many.

  • One sequential scan of the source table → multiple KPIs side by side; reduces I/O.
  • Consistent grain — every metric is computed at the same GROUP BY level, so they sit in the same row.
  • No extra joins or temp tables — saves planner cost and review effort.
  • Refactor-friendly — adding a new KPI is one extra SUM(CASE …) column, not a new query.
  • Dashboard-ready — one row per group with all the metrics is exactly what BI tools want.

The ELSE 0 rule — the canonical interview trap.

  • Without ELSE 0CASE WHEN cond THEN x END returns NULL for the false branch.
  • SUM ignores NULL — so missing-ELSE rows silently disappear from the sum.
  • The bug is invisible — the query "works" and returns a number that looks plausible; only auditing against a separate filtered query catches the undercount.
  • Always write ELSE 0 for SUM — even when you think it cannot matter, write it; reviewers expect it.

Combining conditional sums with COUNT(*) for ratios.

  • Success rateSUM(CASE WHEN succeeded THEN 1 ELSE 0 END) * 1.0 / COUNT(*) (the * 1.0 forces float division in MySQL / older Postgres).
  • Refund rateSUM(CASE WHEN status = 'refund' THEN 1 ELSE 0 END) * 1.0 / COUNT(*).
  • Click-through rateSUM(CASE WHEN clicked THEN 1 ELSE 0 END) * 1.0 / SUM(CASE WHEN impressed THEN 1 ELSE 0 END).
  • Always guard the denominator with NULLIF for safety (§7).

Worked example — paid vs refunded revenue per day

Question. From orders(order_id, order_ts, status, amount), compute per day: total_revenue, paid_revenue, refunded_amount, and paid_orders count — all in one grouped query.

Input. orders slice.

order_id order_ts status amount
1 2026-01-01 paid 100
2 2026-01-01 paid 150
3 2026-01-01 refund 50
4 2026-01-02 paid 200
5 2026-01-02 refund 30

Code.

SELECT DATE(order_ts) AS order_date,
       SUM(amount)                                                       AS total_amount,
       SUM(CASE WHEN status = 'paid'   THEN amount ELSE 0 END)          AS paid_revenue,
       SUM(CASE WHEN status = 'refund' THEN amount ELSE 0 END)          AS refunded_amount,
       SUM(CASE WHEN status = 'paid'   THEN 1      ELSE 0 END)          AS paid_orders
FROM orders
GROUP BY DATE(order_ts)
ORDER BY 1;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. DATE(order_ts) truncates the timestamp to a calendar day for grouping.
  2. SUM(amount) aggregates every row regardless of status — gives gross volume per day.
  3. The four SUM(CASE WHEN …) columns filter rows by status before contributing to each KPI.
  4. paid_orders uses THEN 1 to count rows instead of summing amounts.
  5. GROUP BY DATE(order_ts) collapses to one row per day; every KPI shares the same order_date grain.

Output.

order_date total_amount paid_revenue refunded_amount paid_orders
2026-01-01 300 250 50 2
2026-01-02 230 200 30 1

SQL interview question — paid vs refunded revenue per region in one query

Assume orders(order_id, region, status, amount). Return per region: total_revenue, paid_revenue, refunded_amount, and the count of paid_orders — all in one grouped query.

Solution Using SUM(CASE WHEN …)

Code solution.

SELECT region,
       SUM(amount)                                                  AS total_revenue,
       SUM(CASE WHEN status = 'paid'   THEN amount ELSE 0 END)     AS paid_revenue,
       SUM(CASE WHEN status = 'refund' THEN amount ELSE 0 END)     AS refunded_amount,
       SUM(CASE WHEN status = 'paid'   THEN 1      ELSE 0 END)     AS paid_orders
FROM orders
GROUP BY region
ORDER BY paid_revenue DESC;
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

step relation outcome
1 Scan orders once every order row evaluated
2 GROUP BY region rows grouped by region; planner uses hash aggregate
3 Four aggregate calls evaluated per group each SUM(CASE …) independently filters and sums
4 paid_orders uses THEN 1 ELSE 0 counts paid rows per region without a separate query
5 Outer ORDER BY paid_revenue DESC best-performing regions first

Output:

region total_revenue paid_revenue refunded_amount paid_orders
US 12,500 11,200 1,300 142
EU 8,750 7,900 850 98
APAC 5,400 5,100 300 64

Why this works — concept by concept:

  • Conditional aggregationSUM(CASE WHEN status = 'paid' THEN amount ELSE 0 END) sums only the rows where the predicate is true; non-matching rows contribute zero, preserving the regional grain.
  • The ELSE 0 discipline — without it, refunded rows would contribute NULL, SUM would silently ignore them, and paid_revenue would still look correct — but total_revenue - paid_revenue - refunded_amount would not balance. Always write ELSE 0.
  • One scan, four KPIs — a naive solution would write four filtered queries (or four joined subqueries); this single grouped query reads the table once and emits all four metrics side by side.
  • Counting with THEN 1 ELSE 0 — converts the conditional sum into a conditional count without a separate COUNT(*) FILTER (WHERE …) (which is Postgres-only).
  • Cost — single sequential scan Θ(n) plus a hash aggregate keyed on region (Θ(n)); the four CASE expressions add a constant factor per row.

SQL
Topic — conditional-aggregation
Conditional-aggregation drills

Practice →

SQL
Topic — aggregation
Aggregation SQL library

Practice →


5. Pivot rows to columns with conditional aggregation

Diagram of the pivot-rows-to-columns pattern — three input rows of (campaign, channel, revenue) are reshaped into one output row per campaign with columns rev_paid / rev_organic / rev_social via SUM(CASE WHEN channel = ...) — on a light PipeCode-branded infographic.

One SUM(CASE WHEN ...) per output column — portable pivot without PIVOT

When a stakeholder wants "revenue by channel as columns instead of rows", reach for the conditional-aggregation pivot. The pattern produces one row per group with a fixed set of derived columns — far more portable than dialect-specific PIVOT operators.

The pattern at a glance.

  • Shape — one SUM(CASE WHEN <category_col> = '<value>' THEN <metric> ELSE 0 END) AS <metric>_<value> per output column.
  • One row per group — driven by GROUP BY on the dimension you want as rows.
  • Stable column names — derived from the literal predicates, so the schema is known at write-time.
  • Replaces — SQL Server's PIVOT, Postgres crosstab, multiple joined subqueries.

Why this beats native PIVOT.

  • Portability — works on every dialect that supports CASE (which is every major dialect).
  • Readable — the predicates document the column meanings inline (channel = 'paid' → rev_paid).
  • Composable — easily extends with new metrics by adding more SUM(CASE …) columns.
  • No special grammar — junior reviewers can follow it without learning PIVOT.

Stable column names.

  • Hard-coded in the SQL — the column list is whatever you write; not dependent on the data.
  • Renames are explicit — change rev_paid to paid_revenue in one place.
  • Auditable — the predicate appears in the same expression as the column name, so reviewers can verify each KPI in isolation.

Limitations.

  • Static column list — the columns must be known when you write the SQL. For dynamic pivots (where the categories themselves are unknown), reach for procedural SQL, application-side string interpolation, or dialect-specific dynamic SQL.
  • Many categories = wide query — 50 channels means 50 SUM(CASE …) columns; readability suffers. Consider a GROUP BY category shape instead.
  • Empty groups still appear — an ELSE 0 ensures empty cells render as 0 rather than NULL.

Use cases.

  • Campaign performance by channelrev_paid / rev_organic / rev_social columns.
  • Revenue by tierrev_free / rev_basic / rev_pro / rev_enterprise.
  • Conversions by funnel stepstep_1_views / step_2_clicks / step_3_purchases.
  • NPS distributionpromoter_count / passive_count / detractor_count per cohort.

SQL interview question — pivot campaign performance by channel

Assume sessions(campaign_id, channel, converted, revenue). Return per campaign_id: total sessions, sessions split by channel (paid / organic / social), total conversions, and revenue split by channel — all in one grouped query.

Solution Using SUM(CASE WHEN channel = …)

Code solution.

SELECT campaign_id,
       COUNT(*)                                                                AS sessions_total,
       SUM(CASE WHEN channel = 'paid'    THEN 1       ELSE 0 END)              AS sessions_paid,
       SUM(CASE WHEN channel = 'organic' THEN 1       ELSE 0 END)              AS sessions_organic,
       SUM(CASE WHEN channel = 'social'  THEN 1       ELSE 0 END)              AS sessions_social,
       SUM(CASE WHEN converted = TRUE    THEN 1       ELSE 0 END)              AS conversions_total,
       SUM(CASE WHEN channel = 'paid'    THEN revenue ELSE 0 END)              AS rev_paid,
       SUM(CASE WHEN channel = 'organic' THEN revenue ELSE 0 END)              AS rev_organic,
       SUM(CASE WHEN channel = 'social'  THEN revenue ELSE 0 END)              AS rev_social
FROM sessions
GROUP BY campaign_id
ORDER BY (SUM(CASE WHEN channel = 'paid' THEN revenue ELSE 0 END)) DESC;
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

step relation outcome
1 Scan sessions once every row evaluated
2 GROUP BY campaign_id hash aggregate, one row per campaign
3 8 aggregate calls evaluated per group each KPI computed in the same pass
4 Channel SUM(CASE WHEN channel = … THEN 1 ELSE 0) columns session counts pivoted to columns
5 Channel SUM(CASE WHEN channel = … THEN revenue ELSE 0) columns revenue pivoted to columns
6 Outer ORDER BY on a derived expression sort by paid-channel revenue without referencing the alias

Output:

campaign_id sessions_total sessions_paid sessions_organic sessions_social conversions_total rev_paid rev_organic rev_social
C-001 1,250 800 350 100 142 12,500 4,200 1,100
C-002 980 420 510 50 89 6,800 6,100 540

Why this works — concept by concept:

  • Pivot via conditional sums — each channel becomes its own column without any dialect-specific PIVOT operator; the column names are derived from the literal predicates and are stable across runs.
  • One scan, all KPIs — the planner reads sessions once and computes every aggregate in parallel using the same hash aggregate node; the alternative would be 4–8 joined subqueries.
  • THEN 1 for counts, THEN revenue for sums — the same CASE shape produces a count column or a sum column depending on the THEN expression.
  • ELSE 0 everywhere — guarantees empty cells render as 0 (not NULL); makes the output safe for downstream arithmetic.
  • Repeating the CASE expression in ORDER BY — column aliases cannot be referenced inside aggregates in ORDER BY on every dialect; repeating the expression is the portable choice.
  • Cost — single sequential scan Θ(n) plus hash aggregate Θ(n); the 8 CASE expressions are a constant per-row factor.

SQL
Topic — conditional-aggregation
Pivot pattern drills

Practice →

SQL
Topic — aggregations
Aggregation patterns

Practice →


6. CASE in WHERE, GROUP BY, ORDER BY, and HAVING

CASE works in every clause — pick the right slot for your intent

CASE WHEN is not confined to the SELECT list — it can drive filtering (WHERE), grouping (GROUP BY), filtering of aggregates (HAVING), and custom sort priority (ORDER BY). Knowing which slot to reach for is its own interview signal.

CASE in WHERE.

  • Rare in practice — usually refactorable to IN, OR, or boolean combinations.
  • Useful when — the condition depends on a column value (e.g. "if region = 'EU', require gdpr_consent = TRUE; otherwise no consent check").
  • ShapeWHERE CASE WHEN region = 'EU' THEN gdpr_consent ELSE TRUE END.
  • Reads better asWHERE (region <> 'EU') OR (region = 'EU' AND gdpr_consent = TRUE).

CASE in GROUP BY.

  • Group by a tier labelGROUP BY CASE WHEN salary >= 100000 THEN 'high' ELSE 'low' END.
  • Group by a computed bucketGROUP BY CASE WHEN age < 18 THEN 'minor' WHEN age < 65 THEN 'adult' ELSE 'senior' END.
  • Cannot reference the SELECT alias — repeat the full CASE expression in GROUP BY (logical order issue).
  • Cleanest pattern — define the CASE in a CTE, then GROUP BY tier in the outer query (Blog72 §3 reference).

CASE in ORDER BY.

  • Custom sort priorityORDER BY CASE status WHEN 'urgent' THEN 0 WHEN 'high' THEN 1 ELSE 2 END.
  • Sort tier labels in business order — alphabetic sort would scramble Executive / Senior / Mid / Junior; a CASE expression enforces the right order without an explicit rank column.
  • ORDER BY can reference SELECT aliasesORDER BY tier works if tier is a CASE alias in the same SELECT, but the CASE expression itself is also legal.
  • Pair with DESC for reverse priorityORDER BY CASE … END DESC.

CASE in HAVING.

  • Filter aggregates produced by CASEHAVING SUM(CASE WHEN tier = 'paid' THEN 1 ELSE 0 END) > 0.
  • Multi-condition existence checks — "pages with both paid and free users" → HAVING SUM(CASE WHEN tier = 'paid' THEN 1 ELSE 0 END) > 0 AND SUM(CASE WHEN tier = 'free' THEN 1 ELSE 0 END) > 0.
  • Always references aggregatesHAVING runs after GROUP BY, so bare-column CASE expressions are illegal here unless wrapped in an aggregate.

Same alias rules apply.

  • SELECT aliases are visible in ORDER BYORDER BY tier works.
  • SELECT aliases are NOT visible in WHERE / GROUP BY / HAVING — logical processing order; the alias does not yet exist when those clauses run.
  • Workaround — repeat the full CASE expression in the earlier clause, OR wrap the query in a CTE / subquery and filter the outer scope.

Worked example — custom sort order for ticket priority

Question. From tickets(ticket_id, status, opened_at), return every ticket sorted with urgent first, then high, then normal, then low. Inside each priority group, sort by opened_at ascending (oldest first).

Input. tickets slice.

ticket_id status opened_at
101 normal 2026-05-19
102 urgent 2026-05-20
103 high 2026-05-18
104 low 2026-05-17
105 urgent 2026-05-19

Code.

SELECT ticket_id,
       status,
       opened_at
FROM tickets
ORDER BY
    CASE status
        WHEN 'urgent' THEN 0
        WHEN 'high'   THEN 1
        WHEN 'normal' THEN 2
        WHEN 'low'    THEN 3
        ELSE              4
    END,
    opened_at ASC;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The CASE expression assigns a numeric sort key per ticket — urgent gets 0, high gets 1, etc.
  2. The planner sorts rows by the CASE key first; ties on the CASE key are broken by opened_at ascending.
  3. Within urgent, ticket 105 (2026-05-19) appears before ticket 102 (2026-05-20).
  4. The ELSE 4 lands any unexpected status values at the bottom.

Output.

ticket_id status opened_at
105 urgent 2026-05-19
102 urgent 2026-05-20
103 high 2026-05-18
101 normal 2026-05-19
104 low 2026-05-17

SQL
Topic — conditional-filtering
Conditional-filtering drills

Practice →

SQL
Topic — case-when
CASE WHEN SQL library

Practice →


7. CASE with NULLIF and COALESCE — safe arithmetic patterns

Diagram of safe-division pattern — revenue / NULLIF(cost, 0) — showing how NULLIF turns a zero divisor into NULL instead of an error, on a light PipeCode-branded infographic.

NULLIF guards divide-by-zero; COALESCE replaces NULL with a default — both pair beautifully with CASE

CASE WHEN rarely ships alone. The two conditional functions that pair best with it are NULLIF (returns NULL when two values are equal) and COALESCE (returns the first non-NULL argument). Together they let you write bulletproof safe-division and empty-group fallbacks.

NULLIF(a, b) at a glance.

  • ShapeNULLIF(a, b).
  • ReturnsNULL when a = b; otherwise returns a.
  • Equivalent toCASE WHEN a = b THEN NULL ELSE a END.
  • Classic usex / NULLIF(y, 0) returns NULL (instead of erroring) when y = 0.

COALESCE(a, b, c, …) at a glance.

  • ShapeCOALESCE(a, b, c, …).
  • Returns — the first non-NULL argument from left to right.
  • Use cases — replace NULL with 0, with a default string, or with a fallback column.
  • Short-circuit — stops evaluating after the first non-NULL is found.

Safe-division template.

  • Pattern<numerator> / NULLIF(<denominator>, 0).
  • Behaviour — returns NULL when the denominator is zero; everywhere else returns the quotient.
  • Pair with COALESCECOALESCE(<numerator> / NULLIF(<denominator>, 0), 0) replaces the NULL with 0 for downstream arithmetic.
  • Use it for — success rates, click-through rates, refund rates, conversion rates, churn rates — every ratio where the denominator could legitimately be zero.

Empty-group bulletproofing.

  • PatternCOALESCE(SUM(CASE WHEN cond THEN amount ELSE 0 END), 0).
  • WhySUM over zero rows returns NULL (not 0); without COALESCE, the empty group renders as NULL and breaks downstream consumers.
  • Combined with FILTER (Postgres) — COALESCE(SUM(amount) FILTER (WHERE cond), 0).

Composition order — outer to inner.

  • COALESCE on the outside — final fallback so the result is never NULL.
  • NULLIF on the inside — protects intermediate expressions (denominators) from invalid values.
  • CAST last (or first) — explicit type casting prevents implicit-cast surprises.
  • ExampleROUND(COALESCE(SUM(CASE WHEN succeeded THEN 1 ELSE 0 END) * 1.0 / NULLIF(COUNT(*), 0), 0), 4) reads "round the success rate; fall back to 0 if no rows; guard divide-by-zero with NULLIF; force float division with * 1.0."

SQL interview question — success rate per region with safe division

Assume events(event_id, region, succeeded) where succeeded is a boolean. Return per region: total events, successes, failures, and success_rate (successes / total). Regions with zero events should render as success_rate = 0, not crash.

Solution Using SUM(CASE WHEN …) + NULLIF

Code solution.

SELECT region,
       COUNT(*)                                                                AS total_events,
       SUM(CASE WHEN succeeded         THEN 1 ELSE 0 END)                      AS successes,
       SUM(CASE WHEN NOT succeeded     THEN 1 ELSE 0 END)                      AS failures,
       ROUND(
         COALESCE(
           SUM(CASE WHEN succeeded THEN 1 ELSE 0 END) * 1.0
           / NULLIF(COUNT(*), 0),
           0
         ),
         4
       )                                                                       AS success_rate
FROM events
GROUP BY region
ORDER BY success_rate DESC;
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

step relation outcome
1 Scan events, group by region hash aggregate
2 SUM(CASE WHEN succeeded THEN 1 ELSE 0 END) counts successes per region
3 SUM(CASE WHEN NOT succeeded THEN 1 ELSE 0 END) counts failures per region
4 COUNT(*) total events per region
5 NULLIF(COUNT(*), 0) converts zero-event groups to NULL, preventing divide-by-zero
6 successes * 1.0 / NULLIF(COUNT(*), 0) float division; NULL when denominator is zero
7 COALESCE(…, 0) replaces the NULL with 0 for empty groups
8 ROUND(…, 4) normalises the output to 4 decimal places

Output:

region total_events successes failures success_rate
US 1,000 950 50 0.9500
EU 800 760 40 0.9500
APAC 600 540 60 0.9000

Why this works — concept by concept:

  • SUM(CASE WHEN succeeded THEN 1 ELSE 0) — counts successful events without a separate filtered query; the explicit ELSE 0 prevents the silent-undercount NULL trap.
  • NULLIF(COUNT(*), 0) — turns the zero-denominator case into NULL, which propagates safely instead of raising a divide-by-zero error.
  • * 1.0 for float division — without it, MySQL and older Postgres versions perform integer division and truncate successes / total to 0 or 1.
  • COALESCE(…, 0) — the outer fallback ensures zero-event groups render as 0.0000 rather than NULL; safe for downstream BI tools.
  • ROUND(…, 4) — normalises the precision; reviewers expect financial / rate metrics to come back with a stable decimal scale.
  • Cost — single sequential scan Θ(n) plus hash aggregate; the three CASE expressions and the safe-division wrappers are constant per-row factors.

SQL
Topic — conditional-aggregation
Safe-aggregation drills

Practice →

SQL
Topic — aggregation
Aggregation SQL drills

Practice →


8. Dialect alternatives — FILTER, IIF, DECODE, IF

Same semantics, cleaner syntax — know the dialect-specific shorthand

Every major dialect supports the standard CASE WHEN syntax, but most also ship a shorter idiom for the conditional-aggregation use case. Knowing the alternatives is a senior-signal in panels at companies that lean on a specific stack.

PostgreSQL FILTER (WHERE …).

  • ShapeSUM(amount) FILTER (WHERE status = 'paid').
  • ReplacesSUM(CASE WHEN status = 'paid' THEN amount ELSE 0 END).
  • Why it wins — reads top-to-bottom as "sum amounts, filtered to paid"; no ELSE 0 needed.
  • Identical plan — the planner emits the same hash aggregate; this is pure readability sugar.
  • Available since — PostgreSQL 9.4 (and SQL standard SQL:2003).

SQL Server IIF(cond, then, else).

  • ShapeIIF(channel = 'paid', revenue, 0).
  • ReplacesCASE WHEN channel = 'paid' THEN revenue ELSE 0 END for binary conditions.
  • Limitation — only two branches; for >2 branches, fall back to CASE.
  • Use case — concise conditional aggregation: SUM(IIF(channel = 'paid', revenue, 0)).

MySQL IF(cond, then, else).

  • ShapeIF(channel = 'paid', revenue, 0).
  • Identical to SQL Server's IIF — same binary-condition shorthand.
  • Use caseSUM(IF(channel = 'paid', revenue, 0)) for compact conditional aggregation in MySQL.
  • Not portableIF does not exist in Postgres or Oracle; use CASE if your query must run elsewhere.

Oracle DECODE(expr, val1, res1, val2, res2, …, default).

  • ShapeDECODE(channel, 'paid', revenue, 'organic', revenue, 0).
  • ReplacesCASE channel WHEN 'paid' THEN revenue WHEN 'organic' THEN revenue ELSE 0 END (simple form).
  • Limitations — equality only (no ranges); legacy syntax; new Oracle code prefers CASE.
  • Recognise it — DBA-heavy panels at Oracle shops may quiz you on DECODE semantics.

Snowflake / BigQuery.

  • Standard CASE only — neither IIF nor DECODE is available.
  • Use CASE WHEN — the portable choice across cloud-native warehouses.
  • BigQuery IF(cond, then, else) — available as a function; same semantics as MySQL's IF.

Portability rule.

  • Write CASE WHEN when the same query must run across multiple dialects (e.g. dbt models that target both Postgres and Snowflake).
  • Reach for FILTER / IIF / IF only when the codebase is single-dialect and the team prefers the shorter idiom.
  • Document dialect-specific code with a comment so future maintainers know why it's there.

Worked example — same query in four dialects

Question. Show paid revenue and refunded amount per region using the cleanest idiom available in PostgreSQL, SQL Server, MySQL, and Oracle.

PostgreSQL (FILTER).

SELECT region,
       SUM(amount) FILTER (WHERE status = 'paid')    AS paid_revenue,
       SUM(amount) FILTER (WHERE status = 'refund')  AS refunded_amount
FROM orders
GROUP BY region;
Enter fullscreen mode Exit fullscreen mode

SQL Server (IIF).

SELECT region,
       SUM(IIF(status = 'paid',   amount, 0)) AS paid_revenue,
       SUM(IIF(status = 'refund', amount, 0)) AS refunded_amount
FROM orders
GROUP BY region;
Enter fullscreen mode Exit fullscreen mode

MySQL (IF).

SELECT region,
       SUM(IF(status = 'paid',   amount, 0)) AS paid_revenue,
       SUM(IF(status = 'refund', amount, 0)) AS refunded_amount
FROM orders
GROUP BY region;
Enter fullscreen mode Exit fullscreen mode

Oracle / portable (CASE WHEN).

SELECT region,
       SUM(CASE WHEN status = 'paid'   THEN amount ELSE 0 END) AS paid_revenue,
       SUM(CASE WHEN status = 'refund' THEN amount ELSE 0 END) AS refunded_amount
FROM orders
GROUP BY region;
Enter fullscreen mode Exit fullscreen mode

Rule of thumb: default to CASE WHEN unless the codebase is locked to one dialect; even then, mention the standard form alongside the shortcut so reviewers know you understand both.

SQL
Topic — case-when
Dialect-portable CASE WHEN drills

Practice →

SQL
Topic — conditional-aggregation
Conditional aggregation library

Practice →


Choosing where to put CASE (cheat sheet)

A one-screen cheat sheet for using case statements in SQL — pick the clause that matches your intent, then copy the corresponding shape into your query.

You want to … Put CASE in … Notes
Derive a tier label or computed column SELECT projection Use AS <alias> for readability
Count rows matching a predicate SUM(CASE WHEN cond THEN 1 ELSE 0 END) Or COUNT(*) FILTER (WHERE cond) in Postgres
Sum a value only when a condition holds SUM(CASE WHEN cond THEN value ELSE 0 END) Always include ELSE 0 to avoid the silent-undercount NULL trap
Pivot rows to columns SUM(CASE WHEN dim = 'x' THEN val ELSE 0 END) per output column Portable replacement for PIVOT
Group rows by a tier GROUP BY CASE WHEN … THEN … END Repeat the CASE expression; the SELECT alias is not visible here
Filter aggregates by a condition HAVING SUM(CASE WHEN … THEN 1 ELSE 0 END) > 0 Multi-condition existence checks
Custom sort priority ORDER BY CASE WHEN status = … THEN 0 … END Numeric sort keys per business priority
Apply a column-dependent filter WHERE CASE WHEN region = 'EU' THEN gdpr_consent ELSE TRUE END Often refactorable to OR / AND; reach for CASE when the condition depends on the row
Safe division <num> / NULLIF(<denom>, 0) Pair with COALESCE(..., 0) for empty-group fallback
Replace NULL with a default COALESCE(<col>, <default>) Equivalent to CASE WHEN <col> IS NULL THEN <default> ELSE <col> END

Frequently asked questions

What's the difference between CASE WHEN and IF-ELSE in SQL?

CASE WHEN is the ANSI-standard conditional expression and works in every major SQL dialect; it evaluates predicates top-to-bottom and returns the value tied to the first match, or ELSE (or NULL) when nothing matches. IF / IIF are dialect-specific shorthand for the binary-condition case — MySQL has IF(cond, then, else), SQL Server has IIF(cond, then, else), and both reduce to a single CASE WHEN cond THEN then ELSE else END. Use CASE WHEN for anything beyond two branches and for portability; use IF / IIF only when the codebase is locked to a single dialect and the team prefers the shorter idiom.

What's the difference between the searched form and the simple form of CASE?

The searched form (CASE WHEN <bool_expr> THEN … END) accepts arbitrary boolean predicates per WHEN — ranges, IS NULL, compound AND / OR are all legal. The simple form (CASE <expr> WHEN <value> THEN … END) compares one expression to a series of constant values for equality only; it cannot express ranges or null checks. The searched form is strictly more expressive and is the one you should default to. Reach for the simple form only when every branch is <expr> = <constant> and the equality-only shape genuinely reads cleaner.

How do I use CASE WHEN with SUM (sum case when sql)?

Wrap a CASE inside SUM — the sum case when sql pattern — to compute a metric only for rows that satisfy a predicate: SUM(CASE WHEN <cond> THEN <value> ELSE 0 END) AS <metric>. This is the canonical interview pattern — it lets you compute multiple KPIs in one grouped query instead of running multiple filtered queries, joining them, or building temp tables. The single most important rule is to always include ELSE 0 — without it, false-branch rows return NULL, SUM ignores NULL, and the total is silently undercounted. For counting rows, use THEN 1 ELSE 0; for summing values, use THEN <value> ELSE 0.

How do I pivot rows to columns using CASE WHEN?

Write one SUM(CASE WHEN <category_col> = '<value>' THEN <metric> ELSE 0 END) AS <metric>_<value> per output column, then GROUP BY the dimension you want as rows. Each category becomes its own column with a stable name derived from your literal predicates. This pattern is far more portable than dialect-specific PIVOT operators (SQL Server's PIVOT, Postgres's crosstab extension), works on every database that supports CASE, and extends easily — adding a new column is just one more SUM(CASE …) expression. The only limitation is that the column list must be known when you write the SQL; for dynamic pivots, reach for procedural SQL or application-side templating.

What's the PostgreSQL FILTER clause and how does it relate to CASE WHEN?

The PostgreSQL FILTER (WHERE <cond>) clause is a cleaner native idiom for conditional aggregation: SUM(amount) FILTER (WHERE status = 'paid') is identical in plan and result to SUM(CASE WHEN status = 'paid' THEN amount ELSE 0 END). It reads top-to-bottom as "sum amounts, filtered to paid rows," requires no ELSE 0, and is part of the SQL:2003 standard. FILTER works on every aggregate (SUM, COUNT, AVG, MIN, MAX) and is the preferred Postgres-native form. For portability across MySQL, SQL Server, Oracle, Snowflake, and BigQuery — none of which support FILTER — fall back to CASE WHEN.


Practice on PipeCode

PipeCode ships 450+ data-engineering interview problems — including PostgreSQL-first SQL practice keyed to CASE WHEN, conditional logic, conditional aggregation, pivot-via-CASE patterns, and safe-arithmetic drills with NULLIF + COALESCE.

Kick off via Explore practice →; drill the dedicated CASE WHEN SQL lane →; fan out into the conditional-logic SQL lane →; reinforce conditional-aggregation drills →; rehearse aggregation SQL drills →; widen coverage on the full SQL practice library →.

Top comments (0)