DEV Community

Cover image for SQL IF / IIF / NULLIF / NULL-Handling Cheat Sheet for Data Engineers
Gowtham Potureddi
Gowtham Potureddi

Posted on

SQL IF / IIF / NULLIF / NULL-Handling Cheat Sheet for Data Engineers

sql if looks like a single keyword to a junior — interviewers know it is actually five different keywords across five dialects layered on top of a 1986 ANSI design decision called three-valued logic. The result is the most expensive language-design pothole in analytics SQL: queries that look right, pass the local sniff test, and silently drop or duplicate rows in production because of how NULL collides with =, IN, NOT IN, JOIN, and GROUP BY.

This guide is the cheat sheet you wished existed the first time null not in sql quietly removed half your result set. It walks through the dialect matrix for IF / IIF / IFF / CASE WHEN, the two NULL-handling primitives (sql nullif and sql coalesce), the is null sql vs = NULL trap, the sql ifnull / NVL / ISNULL family of aliases, and the four-rule NULL contract that governs JOIN, GROUP BY, COUNT, and ORDER BY. Each section pairs a teaching block with a Solution-Tail interview answer — code, a step-by-step trace, an output table, then a concept-by-concept breakdown of why it works.

PipeCode blog header for a SQL IF / IIF / NULLIF / NULL-handling cheat sheet — bold white headline 'SQL NULL Handling · Cheat Sheet' with subtitle 'IF · IIF · NULLIF · COALESCE · three-valued logic' and a stylised NULL = NULL → NULL gotcha card on a dark gradient with purple, green, orange, and blue accents and a small pipecode.ai attribution.

When you want hands-on reps immediately after reading, drill the NULL handling practice library →, rehearse on CASE expression problems →, and stack the conditional muscles with conditional logic drills →.


On this page


1. Why NULL handling is the silent bug magnet

NULL is not a value — it is the absence of one, and that distinction breaks every assumption a programmer brings from Python or Java

The one-sentence invariant: in SQL, NULL means "unknown" — and any expression involving NULL with an arithmetic operator, comparison operator, or boolean operator evaluates to NULL, not to TRUE or FALSE. Once you internalise that "unknown propagates," the entire family of sql if and NULL interview questions becomes a sequence of three-valued logic deductions.

The three places NULL bites.

  • Aggregates. SUM(col), AVG(col), MAX(col) ignore NULL rows. COUNT(col) ignores NULLs; COUNT(*) counts them. A team that swaps COUNT(*) for COUNT(col) "for clarity" silently changes the metric.
  • WHERE. WHERE x = NULL always returns no rows because x = NULL evaluates to NULL (unknown), and WHERE treats NULL as FALSE. The only correct test is WHERE x IS NULL or WHERE x IS NOT NULL.
  • JOIN. INNER JOIN a ON a.k = b.k drops every row where either side is NULL — because NULL = NULL is NULL, not TRUE. This is the source of "my join lost 12% of the data" tickets.

Three-valued logic in one sentence.

SQL booleans are not two-valued (TRUE / FALSE) — they are three-valued (TRUE / FALSE / UNKNOWN). The third state is what NULL produces under every binary operator, and the WHERE / ON / HAVING clauses treat UNKNOWN the same as FALSE — but most language constructs propagate UNKNOWN instead of folding it.

What interviewers listen for.

  • Do you say "NULL means unknown, not empty" when asked what NULL is? — senior signal.
  • Do you reach for IS NULL the moment you compare against NULL? — required answer.
  • Do you mention NOT IN as a common interview trap when a subquery returns NULL? — senior signal.
  • Do you recognise that COUNT(*) and COUNT(col) differ on NULL rows? — required answer.

The 2026 reality.

  • CASE WHEN is still the only fully portable conditional, but every modern dialect now also ships an IF / IIF / IFF sugar keyword.
  • Snowflake and Postgres offer IS DISTINCT FROM and IS NOT DISTINCT FROM — proper NULL-safe inequality / equality operators that interviewers love to probe.
  • MySQL ships <=> as a NULL-safe equality operator and IFNULL(a, b) as a 2-arg COALESCE alias.
  • BigQuery ships IF(cond, then, else) and IFNULL(a, b) plus the modern SAFE. prefix for null-on-error arithmetic.

Worked example — the four NULL gotchas in one query

Detailed explanation. New analysts run the obvious filter WHERE region = 'EU' AND status != 'cancelled', then are stunned when the row count differs from the table's actual EU rows. The reason is NULL in the status column — 'EU' AND status != 'cancelled' is NULL for every NULL status, and WHERE drops every NULL-evaluating row.

Question. Given the orders table below, how many EU orders are not cancelled? Show how the naive WHERE drops NULL status rows and how to fix it with COALESCE or IS DISTINCT FROM.

Input.

order_id region status
1 EU placed
2 EU cancelled
3 EU NULL
4 US placed

Code.

-- Naive — silently drops order_id=3
SELECT COUNT(*) AS eu_not_cancelled
FROM orders
WHERE region = 'EU'
  AND status != 'cancelled';

-- Fix 1 — COALESCE the comparison
SELECT COUNT(*) AS eu_not_cancelled
FROM orders
WHERE region = 'EU'
  AND COALESCE(status, '') != 'cancelled';

-- Fix 2 — NULL-safe inequality (Postgres / Snowflake)
SELECT COUNT(*) AS eu_not_cancelled
FROM orders
WHERE region = 'EU'
  AND status IS DISTINCT FROM 'cancelled';
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The naive query evaluates status != 'cancelled' for order_id=3. The status is NULL, so NULL != 'cancelled' is NULL (unknown).
  2. WHERE treats NULL as FALSE and drops the row. The count is 1 (only order_id=1), not 2.
  3. The COALESCE fix replaces NULL with the empty string before the comparison, so '' != 'cancelled' is TRUE and the row is kept.
  4. The IS DISTINCT FROM fix is the NULL-safe inequality operator: it returns TRUE when one side is NULL and the other is not, FALSE when both are NULL or both are the same value.

Output.

Query Returned
Naive != 1
COALESCE(status, '') != 'cancelled' 2
status IS DISTINCT FROM 'cancelled' 2

Rule of thumb. Every time you compare a column against a literal with = or !=, ask "can this column be NULL?" — if yes, wrap with COALESCE or switch to IS DISTINCT FROM.

Worked example — three-valued logic by hand

Detailed explanation. The fastest way to internalise three-valued logic is to evaluate a small predicate row by row, naming the intermediate boolean state at each step. Interviewers love this because it surfaces whether the candidate truly understands UNKNOWN propagation or merely memorised "use IS NULL."

Question. Given the row (region = 'EU', status = NULL), evaluate the predicate region = 'EU' AND status != 'cancelled' OR region IS NULL step by step. What does WHERE return?

Input (one row).

region status
EU NULL

Code.

SELECT
    region,
    status,
    -- decompose the predicate for tracing
    (region = 'EU')                AS p1,
    (status != 'cancelled')        AS p2,
    (region = 'EU' AND status != 'cancelled')   AS p1_and_p2,
    (region IS NULL)               AS p3,
    (region = 'EU' AND status != 'cancelled' OR region IS NULL) AS full_predicate
FROM orders
WHERE order_id = 99;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. Evaluate p1 = (region = 'EU')'EU' = 'EU' → TRUE.
  2. Evaluate p2 = (status != 'cancelled')NULL != 'cancelled' → NULL (any comparison with NULL is NULL).
  3. Evaluate p1 AND p2TRUE AND NULL → NULL (look up the AND truth table: TRUE AND NULL = NULL because the unknown half could be TRUE or FALSE; AND only short-circuits on FALSE).
  4. Evaluate p3 = (region IS NULL)'EU' IS NULL → FALSE.
  5. Combine: (NULL) OR FALSE → NULL (OR truth table: NULL OR FALSE = NULL — neither side guarantees TRUE, so the result is still unknown).
  6. WHERE treats NULL as FALSE → the row is filtered out, even though the human reading of the predicate (region='EU' is true, so the row should be kept) suggests it should appear.

Output.

p1 p2 p1_and_p2 p3 full_predicate WHERE outcome
TRUE NULL NULL FALSE NULL row dropped

Rule of thumb. Always trace your predicate by hand on the boundary case where the most-suspicious column is NULL — if the answer is "NULL," the row will be silently dropped. Add an explicit IS NULL arm to the predicate or wrap with COALESCE.

Worked example — aggregates silently ignore NULL

Detailed explanation. SUM, AVG, MIN, MAX all skip NULL input rows entirely. That sounds harmless until you realise AVG(col) is SUM(col) / COUNT(col) — both numerator and denominator are NULL-skipped — so the average is computed over the non-NULL subset, not the whole table.

Question. Given the scores table where some rows have a NULL score, compute the average using AVG(score) and compare with the "every row counts as 0 for NULL" interpretation. Show why the numbers differ and which is usually correct.

Input.

student_id score
1 80
2 100
3 NULL
4 60

Code.

SELECT
    AVG(score)                                              AS avg_skip_nulls,
    AVG(COALESCE(score, 0))                                 AS avg_treat_null_as_zero,
    SUM(score)                                              AS sum_score,
    SUM(score) * 1.0 / COUNT(*)                             AS sum_over_all_rows
FROM scores;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. AVG(score) sums the non-NULL scores: 80 + 100 + 60 = 240. Denominator is COUNT(score) = 3 (skips NULL). Result = 240 / 3 = 80.0.
  2. AVG(COALESCE(score, 0)) rewrites NULL as 0 before averaging. Sum = 80 + 100 + 0 + 60 = 240. Denominator = COUNT(COALESCE(...)) = 4. Result = 240 / 4 = 60.0.
  3. SUM(score) skips NULL: 80 + 100 + 60 = 240. NOT 240 + NULL.
  4. SUM(score) / COUNT(*) deliberately divides the NULL-skipping sum by the all-rows count = 240 / 4 = 60.0. Same as the COALESCE form.
  5. The "skip NULL" semantics is the default — and is usually what statisticians want ("average of known scores"). The "treat NULL as 0" interpretation is sometimes desired ("average including absent submissions count as 0"). Be explicit about which you want.

Output.

avg_skip_nulls avg_treat_null_as_zero sum_score sum_over_all_rows
80.0 60.0 240 60.0

Rule of thumb. Before writing AVG(col), ask: "do I want the average over non-NULL inputs, or over every row (treating NULL as zero / sentinel)?" If the answer is the latter, wrap the column in COALESCE(col, 0) inside the AVG. The default semantic almost never matches a junior's intuition.

SQL interview question on NULL bug detection

A senior interviewer often opens with: "Walk me through every place NULL could change a row count in a SELECT — WHERE, JOIN, GROUP BY, COUNT, NOT IN — and show how you'd catch each one in code review." It blends three-valued logic, the COUNT family, and the null not in sql trap into a single audit.

Solution Using a NULL-aware audit pattern

-- 1) Audit NULL counts per column
SELECT
    COUNT(*)                                  AS total_rows,
    COUNT(*) - COUNT(region)                  AS null_region,
    COUNT(*) - COUNT(status)                  AS null_status,
    COUNT(*) FILTER (WHERE status IS NULL)    AS null_status_explicit
FROM orders;

-- 2) Rewrite the metric to be NULL-aware
SELECT
    COUNT(*)                                            AS all_orders,
    COUNT(*) FILTER (WHERE status = 'cancelled')        AS cancelled,
    COUNT(*) FILTER (WHERE status IS DISTINCT FROM 'cancelled') AS not_cancelled
FROM orders;
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

Row region status total_rows null_region null_status
1 EU placed counted no no
2 EU cancelled counted no no
3 EU NULL counted no yes
4 US placed counted no no

After the audit, the team rewrites the metric using FILTER (WHERE ...) so that the boolean expression is evaluated with three-valued logic but the COUNT only increments on TRUE — NULL and FALSE both skip.

Output:

Metric Value
total_rows 4
null_region 0
null_status 1
all_orders 4
cancelled 1
not_cancelled 3

Why this works — concept by concept:

  • COUNT(*) vs COUNT(col)COUNT(*) counts every row including NULLs; COUNT(col) skips NULLs. Subtracting one from the other gives the NULL count per column for free.
  • FILTER (WHERE ...) — ANSI-standard syntax (Postgres, Snowflake, BigQuery) that lets COUNT increment only on TRUE boolean results — NULL and FALSE are both skipped. It is more readable than SUM(CASE WHEN ... THEN 1 ELSE 0 END).
  • IS DISTINCT FROM — NULL-safe inequality. Returns TRUE when one side is NULL and the other is a real value, so the "not cancelled" count includes NULL status rows.
  • Audit before metric — running the NULL audit first surfaces the column as a NULL hotspot before any downstream report reads from it. Cheap insurance against a 3% silent drop in next quarter's dashboard.
  • Cost — one extra scan for the audit (O(n)); zero extra cost on the rewritten metric since FILTER and IS DISTINCT FROM both compile to the same plan as CASE WHEN.

SQL
Topic — null handling
NULL handling problems (SQL)

Practice →


2. IF / IIF / IFF / CASE — dialect by dialect

sql if is five keywords pretending to be one — only CASE WHEN works in every dialect

The mental model in one line: CASE WHEN cond THEN a ELSE b END is the ANSI-standard ternary operator; IF, IIF, IFF are dialect-specific sugar over the same idea. Once you say "CASE WHEN is the portable answer," you can defend a one-line IF in MySQL or IIF in SQL Server while still knowing what to write when the schema travels to Postgres.

Visual dialect-matrix of conditional expressions — six dialect columns (ANSI/Postgres, MySQL, SQL Server, Snowflake, BigQuery, Oracle); each cell shows the canonical syntax (CASE WHEN, IF, IIF, IFF, IF, CASE WHEN/DECODE) for the same 'active vs inactive' logic; a small annotation about CASE being the only portable answer; on a light PipeCode card.

The dialect matrix in one table.

Dialect Conditional sugar NULL-substitution sugar Notes
ANSI / Postgres CASE WHEN ... THEN ... ELSE ... END COALESCE(a, b, ...) no IF / IIF / IFF — CASE only
MySQL IF(cond, then, else) IFNULL(a, b) + COALESCE also has CASE WHEN
SQL Server IIF(cond, then, else) ISNULL(a, b) + COALESCE ISNULL is 2-arg; COALESCE is N-arg
Snowflake IFF(cond, then, else) IFNULL(a, b) + NVL + COALESCE most aliases of any dialect
BigQuery IF(cond, then, else) IFNULL(a, b) + COALESCE also has NULLIF and COALESCE
Oracle CASE WHEN ... END + DECODE(...) NVL(a, b) + NVL2(a, b, c) + COALESCE DECODE is Oracle-specific

Two CASE flavours.

  • Searched CASE (the one you should default to) — CASE WHEN x > 100 THEN 'high' WHEN x > 50 THEN 'mid' ELSE 'low' END. Each branch is a full boolean expression. Most flexible.
  • Simple CASECASE x WHEN 1 THEN 'one' WHEN 2 THEN 'two' ELSE 'other' END. Equivalent to CASE WHEN x = 1 ..., but broken on NULL because x = NULL is NULL (unknown), so the NULL branch never matches in a simple CASE. Stick to searched CASE.

Common interview probes on conditional sugar.

  • "Why does Postgres not have IF or IIF?" — because the ANSI committee picked CASE. Postgres takes the standards-first stance.
  • "Is IIF(cond, a, b) shorthand for anything?" — yes, SQL Server compiles it to CASE WHEN cond THEN a ELSE b END.
  • "What dialects support nesting IF?" — every dialect that has IF supports nesting it, but readability dies fast. Use searched CASE WHEN past two levels.
  • "What is the type coercion rule between then and else?" — every conditional infers a common type from both branches. Mixing INT and VARCHAR either fails or silently casts to VARCHAR depending on the dialect.

Worked example — the same active/inactive logic across four dialects

Detailed explanation. A reporting team migrates a flag column (status = 'active''A', else 'I') from MySQL to Postgres to Snowflake to BigQuery. The logic does not change, but the keyword does. Learning the dialect mapping once saves a week of "why does this break in prod?" tickets.

Question. Translate the expression IF(status = 'active', 'A', 'I') from MySQL into the canonical syntax of Postgres, SQL Server, Snowflake, and BigQuery. Highlight the one portable answer.

Input.

user_id status
1 active
2 suspended
3 active
4 NULL

Code.

-- MySQL / BigQuery
SELECT user_id, IF(status = 'active', 'A', 'I') AS flag FROM users;

-- SQL Server
SELECT user_id, IIF(status = 'active', 'A', 'I') AS flag FROM users;

-- Snowflake
SELECT user_id, IFF(status = 'active', 'A', 'I') AS flag FROM users;

-- ANSI / Postgres / EVERY dialect
SELECT user_id,
       CASE WHEN status = 'active' THEN 'A' ELSE 'I' END AS flag
FROM users;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. Each dialect picked a different name for the same operator: IF (MySQL / BigQuery), IIF (SQL Server), IFF (Snowflake). The argument order is identical: (cond, then, else).
  2. Postgres deliberately omits all three and asks the user to write CASE WHEN. So does the ANSI standard.
  3. The status = 'active' predicate evaluates to NULL for user_id=4. Every dialect treats NULL as "not TRUE" in the then branch, so the else arm fires and flag becomes 'I'.
  4. If the team needed to distinguish "NULL status" from "non-active status," the predicate must be rewritten as CASE WHEN status = 'active' THEN 'A' WHEN status IS NULL THEN 'U' ELSE 'I' ENDIF / IIF / IFF only have one boolean slot.

Output.

user_id flag (MySQL IF) flag (Postgres CASE)
1 A A
2 I I
3 A A
4 I I

Rule of thumb. If you only need two branches, the dialect sugar (IF / IIF / IFF) is fine. The moment you need three or more branches, or you need to special-case NULL, switch to CASE WHEN — it is the only construct that scales without nesting.

Worked example — searched CASE vs simple CASE on a nullable column

Detailed explanation. Searched CASE evaluates each WHEN as a boolean expression. Simple CASE compares the input expression against each WHEN value with =. Because x = NULL is always NULL, a simple CASE with a NULL row falls into the ELSE arm — even if you wrote a WHEN NULL branch.

Question. Given the users.tier column where some rows are NULL, bucket every user into 'gold', 'silver', 'bronze', or 'unknown'. Show why a simple CASE tier WHEN NULL THEN 'unknown' ... fails and the searched-CASE fix.

Input.

user_id tier
1 gold
2 silver
3 NULL
4 bronze

Code.

-- BROKEN — simple CASE never matches the NULL row to "WHEN NULL"
SELECT
    user_id,
    CASE tier
        WHEN 'gold'   THEN 'gold-rank'
        WHEN 'silver' THEN 'silver-rank'
        WHEN 'bronze' THEN 'bronze-rank'
        WHEN NULL     THEN 'unknown'    -- this branch can never match
        ELSE 'unranked'
    END AS rank_label
FROM users;

-- CORRECT — searched CASE with explicit IS NULL branch
SELECT
    user_id,
    CASE
        WHEN tier IS NULL    THEN 'unknown'
        WHEN tier = 'gold'   THEN 'gold-rank'
        WHEN tier = 'silver' THEN 'silver-rank'
        WHEN tier = 'bronze' THEN 'bronze-rank'
        ELSE 'unranked'
    END AS rank_label
FROM users;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The simple CASE compiles to CASE WHEN tier = 'gold' ... WHEN tier = NULL ... END. For user_id=3, tier = NULL evaluates to NULL — never TRUE — so the WHEN NULL branch is skipped.
  2. The simple CASE falls through to ELSE for user_id=3, returning 'unranked' instead of the intended 'unknown'.
  3. The searched CASE uses tier IS NULL (which returns TRUE / FALSE, never NULL), so the NULL row correctly maps to 'unknown'.
  4. The branch order matters: put WHEN tier IS NULL THEN 'unknown' first. Otherwise the NULL row may already have failed earlier predicates (tier = 'gold' was NULL → not TRUE → skipped) and you'd still end up in the right place — but the rule "NULL arm first" is robust against future edits.

Output (correct query).

user_id rank_label
1 gold-rank
2 silver-rank
3 unknown
4 bronze-rank

Rule of thumb. Never use a simple CASE x WHEN y ... on a nullable column. Always use searched CASE WHEN cond ... END with an explicit IS NULL branch first. The extra characters are insurance against an entire family of silent bugs.

Worked example — nested conditionals: when CASE beats IF

Detailed explanation. Three or more branches inside a single conditional become hard to read with IF / IIF / IFF because each branch nests inside the previous one's else slot. CASE WHEN scales linearly: each branch is a sibling, not a child.

Question. Bucket amount into four bands: 'mega' (>1000), 'big' (>100), 'small' (>10), 'tiny' (else). Show the nested IF (MySQL) and the equivalent CASE WHEN, and explain which one is easier to maintain.

Input.

order_id amount
1 5000
2 200
3 50
4 5

Code.

-- MySQL nested IF — readable up to two levels, ugly past that
SELECT
    order_id,
    IF(amount > 1000, 'mega',
       IF(amount > 100, 'big',
          IF(amount > 10, 'small', 'tiny'))) AS bucket
FROM orders;

-- CASE WHEN — sibling branches, scales without nesting
SELECT
    order_id,
    CASE
        WHEN amount > 1000 THEN 'mega'
        WHEN amount > 100  THEN 'big'
        WHEN amount > 10   THEN 'small'
        ELSE                    'tiny'
    END AS bucket
FROM orders;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The nested IF reads as a chain of if-else from outside in. Each branch's "else" slot holds the next IF. Adding a fifth band requires inserting another nested IF inside the deepest else — a fragile edit that touches multiple lines.
  2. The CASE WHEN form lists every branch at the same indentation. Adding a fifth band is a single new line; the diff is minimal.
  3. Both queries return the same result. The choice is a code-review and maintenance argument, not a correctness one.
  4. Type-inference is the same on both forms — every branch must produce a compatible type.

Output.

order_id amount bucket
1 5000 mega
2 200 big
3 50 small
4 5 tiny

Rule of thumb. Use IF / IIF / IFF for two-branch ternary expressions. Reach for CASE WHEN the moment you have three or more branches — it scales, it diffs cleanly, and it survives the day the team migrates from MySQL to Postgres.

SQL interview question on portable conditional logic

A senior interviewer might frame this as: "Your team is consolidating MySQL, SQL Server, and Snowflake into a single dbt project. Rewrite this IF(x > 100, 'big', 'small') so it compiles on all three engines without engine-specific macros."

Solution Using CASE WHEN as the portable fallback

-- Portable conditional — compiles on MySQL, SQL Server, Snowflake,
-- Postgres, BigQuery, Oracle, Redshift, DuckDB, SQLite.
SELECT
    order_id,
    amount,
    CASE
        WHEN amount > 100 THEN 'big'
        WHEN amount > 10  THEN 'small'
        ELSE 'tiny'
    END AS bucket
FROM orders;
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

order_id amount branch evaluated bucket
1 250 first WHEN matches (>100) big
2 50 first WHEN false, second matches (>10) small
3 5 both WHENs false, ELSE tiny
4 NULL both WHENs evaluate NULL → fall through to ELSE tiny

The trace highlights that NULL amount falls into the ELSE arm — not into a "NULL bucket." If a "NULL bucket" is required, add an explicit WHEN amount IS NULL THEN 'unknown' branch at the top.

Output:

order_id amount bucket
1 250 big
2 50 small
3 5 tiny
4 NULL tiny

Why this works — concept by concept:

  • Searched CASE evaluates top-down — the first WHEN whose predicate is TRUE wins; subsequent WHENs are not checked. Order matters when the buckets are not disjoint.
  • ELSE is the catch-all — without ELSE, unmatched rows return NULL. Always supply an ELSE unless NULL on miss is the contract.
  • NULL collapses to ELSE — because NULL > 10 is NULL (unknown), and WHEN treats unknown as false. Add WHEN x IS NULL THEN ... first if you need to special-case NULL.
  • Type unification — every THEN and the ELSE must produce a compatible type; CASE returns the common super-type (e.g. INT + BIGINTBIGINT; INT + VARCHAR → engine-dependent cast or error).
  • Cost — O(branches) per row; the optimiser typically short-circuits on the first match. Pure compute, no I/O.

SQL
Topic — case expression
CASE expression problems (SQL)

Practice →


3. NULLIF + COALESCE — the two NULL-handling primitives

sql nullif turns sentinels into NULL; sql coalesce turns NULL into a default — together they cover 90% of NULL hygiene

The mental model in one line: NULLIF(a, b) returns NULL when a = b and a otherwise — the inverse of COALESCE, which returns the first non-NULL value in its argument list. Pairing them lets you sanitise dirty data (NULLIF) and then resupply a default value (COALESCE) in two clean expressions.

Visual diagram of NULLIF + COALESCE — left a NULLIF(a, b) card showing the rule 'if a = b then NULL else a' with a worked example of safe division (revenue / NULLIF(impressions, 0)); right a COALESCE card showing 'first non-NULL value' with a worked example COALESCE(nickname, first_name, 'guest'); on a light PipeCode card.

NULLIF in detail.

  • Signature. NULLIF(a, b). Returns NULL when a = b, otherwise returns a. The second argument is only compared, never returned.
  • Three rules. NULLIF(5, 5) → NULL; NULLIF(5, 3) → 5; NULLIF(NULL, 5) → NULL.
  • Use case 1 — safe division. revenue / NULLIF(impressions, 0) — if impressions is 0, the denominator becomes NULL, and the whole expression becomes NULL instead of raising a "division by zero" error.
  • Use case 2 — sentinel cleanup. NULLIF(country, '') converts the empty string into NULL so that downstream aggregates and joins behave correctly.

COALESCE in detail.

  • Signature. COALESCE(a, b, c, ...). Returns the first non-NULL argument. Returns NULL only if every argument is NULL.
  • N-arg. Unlike IFNULL (2-arg) and ISNULL (2-arg, SQL Server), COALESCE accepts any number of arguments. Use it for fallback chains.
  • Use case 1 — provide a default. COALESCE(nickname, first_name, 'guest') — the first non-NULL name wins.
  • Use case 2 — collapse multiple nullable columns into one display value. Useful for unioned tables where the same logical field arrives from different source columns.

Dialect aliases.

  • COALESCE — ANSI standard, works everywhere. N-arg.
  • IFNULL(a, b) — MySQL, Snowflake, BigQuery, SQLite, DuckDB. 2-arg only.
  • NVL(a, b) — Oracle, Snowflake. 2-arg.
  • NVL2(a, b, c) — Oracle. Returns b if a is NOT NULL, else c.
  • ISNULL(a, b) — SQL Server (2-arg) and Snowflake (1-arg, returns boolean). Watch the overload — ISNULL(x) in Snowflake is "is x NULL?" but ISNULL(x, 0) in SQL Server is "replace NULL with 0." Same name, different semantics.

Both NULLIF and COALESCE are pure functions — safe to use in SELECT, WHERE, GROUP BY, ORDER BY, JOIN predicates, and inside window functions. The optimiser short-circuits COALESCE on the first non-NULL argument, so cheap expressions should come first.

Worked example — safe division using NULLIF

Detailed explanation. A team computes click-through rate as clicks / impressions. The pipeline blows up nightly because some campaigns have 0 impressions on quiet hours — clicks / 0 raises a division-by-zero error and the whole batch fails. The two-character fix is to wrap the denominator in NULLIF.

Question. Given the campaign_stats table, write a query that computes CTR as a percentage, returning NULL when impressions is 0 instead of raising an error.

Input.

campaign_id clicks impressions
1 12 240
2 0 0
3 7 50
4 3 NULL

Code.

SELECT
    campaign_id,
    clicks,
    impressions,
    ROUND(100.0 * clicks / NULLIF(impressions, 0), 2) AS ctr_pct
FROM campaign_stats;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. For campaign_id=1, NULLIF(240, 0) returns 240. The division proceeds: 100 * 12 / 240 = 5.00.
  2. For campaign_id=2, NULLIF(0, 0) returns NULL. Division by NULL yields NULL — no error. The row stays in the result with ctr_pct = NULL.
  3. For campaign_id=3, NULLIF(50, 0) returns 50. CTR = 100 * 7 / 50 = 14.00.
  4. For campaign_id=4, impressions is already NULL. NULLIF(NULL, 0) returns NULL (because NULL = 0 is NULL, so NULLIF returns the first arg unchanged). The division produces NULL — same outcome as zero impressions.

Output.

campaign_id clicks impressions ctr_pct
1 12 240 5.00
2 0 0 NULL
3 7 50 14.00
4 3 NULL NULL

Rule of thumb. Every division in production SQL should be of the form numerator / NULLIF(denominator, 0). This is a no-thinking-required idiom; the cost is one comparison per row, the benefit is "your batch never fails on a zero day."

Worked example — fallback chain with COALESCE

Detailed explanation. A users table has three optional name columns: nickname, first_name, email. The product wants to display the first non-NULL value, with 'guest' as the ultimate fallback.

Question. Write a single SELECT that returns one display_name per user using a fallback chain: nicknamefirst_nameemail → literal 'guest'.

Input.

user_id nickname first_name email
1 NULL NULL a@x.com
2 NULL Alice a@x.com
3 aly Alice a@x.com
4 NULL NULL NULL

Code.

SELECT
    user_id,
    COALESCE(nickname, first_name, email, 'guest') AS display_name
FROM users;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. For user_id=1, COALESCE checks nickname (NULL), then first_name (NULL), then email (a@x.com) — returns a@x.com.
  2. For user_id=2, COALESCE checks nickname (NULL), then first_name (Alice) — returns Alice and short-circuits.
  3. For user_id=3, COALESCE returns aly on the first argument.
  4. For user_id=4, every column is NULL — COALESCE returns the literal 'guest'.

Output.

user_id display_name
1 a@x.com
2 Alice
3 aly
4 guest

Rule of thumb. When you have a fallback chain longer than two, reach for COALESCE (N-arg) rather than nested IFNULL or ISNULL. Cheaper expressions should appear first because COALESCE short-circuits.

Worked example — sentinel cleanup with NULLIF before aggregation

Detailed explanation. A legacy ETL pipeline stores missing country codes as the literal string 'N/A' instead of NULL. Counting "distinct countries" returns the wrong number because 'N/A' is counted as a real country. NULLIF(country, 'N/A') converts the sentinel to NULL before COUNT DISTINCT, which then ignores it.

Question. Given the events table with country codes (some stored as 'N/A'), compute the distinct-country count two ways: naive and NULLIF-cleaned. Show the difference.

Input.

event_id country
1 US
2 FR
3 N/A
4 US
5 N/A
6 DE

Code.

SELECT
    COUNT(DISTINCT country)                       AS naive_distinct,
    COUNT(DISTINCT NULLIF(country, 'N/A'))        AS clean_distinct
FROM events;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The naive count distinct sees four distinct values: US, FR, N/A, DE. Result = 4.
  2. NULLIF(country, 'N/A') rewrites the sentinel as NULL: the column becomes US, FR, NULL, US, NULL, DE.
  3. COUNT(DISTINCT ...) collapses duplicates (US appears twice) and skips NULL by default. Distinct non-NULL set = {US, FR, DE}. Result = 3.
  4. The metric the business actually wants is the clean count — 'N/A' is not a country.

Output.

naive_distinct clean_distinct
4 3

Rule of thumb. Audit every text column for sentinel values during onboarding ('N/A', '-', '', 'unknown', '0'). Wrap each detected sentinel in NULLIF(col, 'sentinel') at the staging layer so downstream metrics behave correctly without per-query plumbing.

SQL interview question on combined NULLIF + COALESCE

The probe usually sounds like: "You have a prices table where missing prices are stored as 0 (sentinel) and others as NULL. Compute a sane average price per category that ignores both 0 and NULL."

Solution Using NULLIF to normalise sentinels and AVG to ignore NULL

SELECT
    category,
    AVG(NULLIF(price, 0)) AS avg_price
FROM prices
GROUP BY category;
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

category price NULLIF(price, 0) included in AVG?
A 10 10 yes
A 0 NULL no
A 20 20 yes
B NULL NULL no
B 8 8 yes
B 0 NULL no

NULLIF(price, 0) rewrites the sentinel 0 as NULL. AVG then ignores both the original NULLs and the converted sentinels — exactly the "missing data" semantics the report needs.

Output:

category avg_price
A 15.00
B 8.00

Why this works — concept by concept:

  • NULLIF as a sentinel scrubber — converts a value that looks like data but means "missing" into NULL, which is the canonical "missing" representation in SQL.
  • AVG ignores NULL — every standard aggregate (SUM, AVG, MIN, MAX) skips NULL inputs. AVG of {10, NULL, 20} is 30/2 = 15, not 30/3 = 10. The denominator only counts non-NULL inputs.
  • No CASE WHEN requiredNULLIF(price, 0) is shorter and clearer than CASE WHEN price = 0 THEN NULL ELSE price END. Pick the primitive over the open-coded form.
  • Composable with COALESCE — if the report wants a fallback ("if every price is sentinel, show 0"), wrap the whole expression: COALESCE(AVG(NULLIF(price, 0)), 0).
  • Cost — O(rows) for the AVG; NULLIF is a constant-time scalar function. Same cost as a plain AVG.

SQL
Topic — aggregation
Aggregation problems with NULL (SQL)

Practice →


4. IS NULL vs = NULL — the three-valued logic trap

is null sql is the only correct NULL test — = NULL is always UNKNOWN, which WHERE silently filters out

The mental model in one line: x = NULL always returns NULL (unknown), and WHERE / ON / HAVING treat NULL as FALSE — so any predicate written with = NULL quietly removes every row instead of finding them. Once you say "use IS NULL, never = NULL," the entire null not in sql family of bugs becomes obvious.

Visual diagram of IS NULL vs = NULL — two side-by-side cards (= NULL always returns NULL/UNKNOWN, IS NULL returns TRUE/FALSE); a separate panel showing the NOT IN (subquery containing NULL) classic trap returning zero rows; a small chip about DISTINCT FROM / IS NOT DISTINCT FROM as NULL-safe alternatives; on a light PipeCode card.

The contrast in five bullets.

  • x = NULL — always returns NULL. NEVER TRUE, even when x is NULL. WHERE / ON / HAVING treat NULL as FALSE. Result: zero rows.
  • x IS NULL — returns TRUE when x is NULL, FALSE otherwise. Never returns UNKNOWN. The only correct NULL test.
  • x IS NOT NULL — returns TRUE when x is non-NULL, FALSE when x is NULL.
  • x IS NOT DISTINCT FROM y — NULL-safe equality: TRUE when both are equal OR both are NULL, FALSE otherwise. Postgres / Snowflake / DuckDB.
  • x IS DISTINCT FROM y — NULL-safe inequality: TRUE when values differ OR exactly one is NULL, FALSE otherwise. Postgres / Snowflake / DuckDB.

The IN / NOT IN trap.

  • x IN (1, 2, NULL) — evaluates to TRUE when x = 1 OR x = 2 OR x = NULL. The last term is NULL, but the OR can be TRUE if either of the first two is TRUE. So IN is usually fine — WHERE x IN (1, 2, NULL) returns rows where x is 1 or 2 (NULL row is not returned).
  • x NOT IN (1, 2, NULL) — evaluates to x != 1 AND x != 2 AND x != NULL. The third term is NULL. With AND, anything AND NULL is NULL unless one side is FALSE. So the whole expression collapses to NULL for every x that is not 1 or 2 — and WHERE filters out all of them. Result: zero rows. This is the most famous null not in sql interview gotcha.
  • The fix. Use NOT EXISTS (...) instead, which is NULL-safe, or filter NULLs out of the subquery first: WHERE y IS NOT NULL.

Dialect-specific NULL-safe operators.

  • MySQL ships <=> — read aloud as "spaceship" — as the NULL-safe equality operator. a <=> b is TRUE when both are NULL or both are the same value.
  • Postgres / Snowflake / DuckDB ship IS [NOT] DISTINCT FROM as the ANSI-aligned version of the same operator.
  • SQL Server has no dedicated NULL-safe operator; you write (a = b OR (a IS NULL AND b IS NULL)) or use EXCEPT / INTERSECT (which are set-semantic and NULL-safe by definition).

Common interview probes.

  • "Why does WHERE x = NULL return no rows?" — because x = NULL is NULL (unknown), and WHERE treats NULL as FALSE.
  • "Why does NOT IN with NULL return no rows?" — because the AND chain collapses to NULL, which WHERE treats as FALSE.
  • "What is the NULL-safe equality operator in Postgres?" — IS NOT DISTINCT FROM. In MySQL it is <=>.
  • "How do you find rows where two nullable columns differ, including the NULL-vs-not-NULL case?" — WHERE a IS DISTINCT FROM b.

Worked example — the WHERE x = NULL silent filter

Detailed explanation. A junior engineer writes WHERE deleted_at = NULL to find soft-deleted rows. The query returns zero rows even though the table has thousands. The bug: deleted_at = NULL is always NULL (unknown), and WHERE drops NULL-evaluating rows.

Question. Given the users table, find every soft-deleted user (rows where deleted_at IS NULL). Show the buggy query and the fix.

Input.

user_id deleted_at
1 2026-01-12
2 NULL
3 2026-02-03
4 NULL

Code.

-- BROKEN — returns zero rows
SELECT user_id
FROM users
WHERE deleted_at = NULL;

-- CORRECT — returns the soft-active rows (deleted_at IS NULL)
SELECT user_id
FROM users
WHERE deleted_at IS NULL;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The buggy query evaluates deleted_at = NULL for user_id=2. The result is NULL (NULL = NULL is NULL).
  2. WHERE treats NULL as FALSE → row dropped. Same for user_id=4. Both NULL rows are filtered out.
  3. For user_id=1 and user_id=3, deleted_at = NULL evaluates to NULL ('2026-01-12' = NULL is NULL). Same result — dropped.
  4. The query returns zero rows. Worst part: no error, no warning, no log line. Silent.
  5. The fix uses IS NULL, which is a unary predicate that returns TRUE / FALSE — never NULL.

Output.

Query Returned user_ids
WHERE deleted_at = NULL (none)
WHERE deleted_at IS NULL 2, 4

Rule of thumb. If you catch yourself writing = NULL or != NULL in code review, that is an automatic block. The only correct NULL tests are IS NULL, IS NOT NULL, and the NULL-safe operators IS DISTINCT FROM / IS NOT DISTINCT FROM / <=>.

Worked example — the NOT IN with NULL classic interview gotcha

Detailed explanation. Find every employee whose manager_id is NOT one of the managers in a separate table. The subquery includes a NULL manager (because the CEO has no manager). The naive NOT IN returns zero rows — the classic null not in sql trap.

Question. Given employees and managers, write a query to find every employee whose manager_id is NOT in the managers.manager_id list. Show the broken NOT IN and the two safe fixes.

Input — employees.

employee_id manager_id
1 7
2 8
3 99

Input — managers.

manager_id
7
8
NULL

Code.

-- BROKEN — returns zero rows because of NULL in subquery
SELECT employee_id
FROM employees
WHERE manager_id NOT IN (SELECT manager_id FROM managers);

-- Fix 1 — NOT EXISTS (NULL-safe)
SELECT e.employee_id
FROM employees e
WHERE NOT EXISTS (
    SELECT 1 FROM managers m WHERE m.manager_id = e.manager_id
);

-- Fix 2 — filter NULL out of the subquery
SELECT employee_id
FROM employees
WHERE manager_id NOT IN (
    SELECT manager_id FROM managers WHERE manager_id IS NOT NULL
);
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The naive NOT IN expands to manager_id != 7 AND manager_id != 8 AND manager_id != NULL for each employee row.
  2. For employee_id=3 (manager_id=99), the first two terms are TRUE, the third is NULL. TRUE AND TRUE AND NULL is NULL.
  3. WHERE treats NULL as FALSE — row dropped. Same logic for employee_id=1 and 2 (they ARE in the list and the first or second equality is TRUE, but the negated form is FALSE).
  4. Net result: zero rows. Bug.
  5. The NOT EXISTS fix is semantically NULL-safe — it asks "does any row exist with this manager_id?" — NULL in the subquery never matches an = predicate, so the subquery finds nothing for employee_id=3, and NOT EXISTS returns TRUE.
  6. The "filter NULL out of subquery" fix is mechanically cheaper to spot in code review but requires you to remember to do it on every NOT IN.

Output.

Query Returned employee_ids
NOT IN (...) (none)
NOT EXISTS (...) 3
NOT IN with IS NOT NULL filter 3

Rule of thumb. Treat NOT IN (subquery) as a code smell. Default to NOT EXISTS (subquery) — it is NULL-safe, often as fast or faster, and survives the day someone adds a nullable column to the subquery table.

Worked example — IN with NULL silently misses the NULL row

Detailed explanation. WHERE x IN (1, 2, NULL) does not match rows where x is NULL — even though NULL is in the list. The reason is the same three-valued logic that breaks = NULL: x = NULL is NULL, OR-folding with other NULL-or-FALSE terms still yields NULL or FALSE for the row.

Question. Given the payments table, find rows whose status_code is 1, 2, or NULL. Show that IN (1, 2, NULL) misses the NULL row and how to fix it.

Input.

payment_id status_code
1 1
2 5
3 NULL
4 2

Code.

-- Misleading — does NOT match payment_id=3
SELECT payment_id
FROM payments
WHERE status_code IN (1, 2, NULL);

-- Fix — combine IN with explicit IS NULL
SELECT payment_id
FROM payments
WHERE status_code IN (1, 2) OR status_code IS NULL;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. WHERE status_code IN (1, 2, NULL) expands to status_code = 1 OR status_code = 2 OR status_code = NULL.
  2. For payment_id=3 (NULL), every term evaluates to NULL (any = with NULL is NULL). NULL OR NULL OR NULL is NULL. WHERE drops the row.
  3. For payment_id=1 (1), the first term is TRUE; OR short-circuits — row kept.
  4. For payment_id=4 (2), the second term is TRUE — row kept.
  5. For payment_id=2 (5), all three terms are FALSE/NULL — row dropped.
  6. The "NULL" in the IN list is a no-op; you must add OR status_code IS NULL explicitly to match NULL rows.

Output.

Query Returned payment_ids
IN (1, 2, NULL) 1, 4
IN (1, 2) OR IS NULL 1, 3, 4

Rule of thumb. Treat IN (literal_list) as a positive-only test against non-NULL values. If NULL rows must be matched, append OR col IS NULL — the IN list cannot do it for you, no matter what the list contains.

SQL interview question on NULL-safe comparisons

A senior interviewer often frames this as: "Two tables have an email column; both can be NULL. Find every pair of rows where the emails are 'the same' in a business sense — including the case where both are NULL." This is a NULL-safe equality probe.

Solution Using IS NOT DISTINCT FROM

-- Postgres / Snowflake / DuckDB syntax
SELECT a.id AS a_id, b.id AS b_id, a.email
FROM contacts_a a
JOIN contacts_b b
  ON a.email IS NOT DISTINCT FROM b.email;
Enter fullscreen mode Exit fullscreen mode

For dialects without IS NOT DISTINCT FROM, the portable equivalent is:

SELECT a.id AS a_id, b.id AS b_id, a.email
FROM contacts_a a
JOIN contacts_b b
  ON (a.email = b.email)
  OR (a.email IS NULL AND b.email IS NULL);
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

a.id a.email b.id b.email = result IS NOT DISTINCT FROM result
1 x@y 10 x@y TRUE TRUE
2 NULL 11 x@y NULL → drop FALSE
3 NULL 12 NULL NULL → drop TRUE
4 x@y 13 NULL NULL → drop FALSE

The trace highlights the only row where the two operators differ: row 3 (NULLNULL). Standard = evaluates to NULL — JOIN drops the pair. IS NOT DISTINCT FROM evaluates to TRUE — JOIN keeps the pair. That is the entire reason NULL-safe equality exists.

Output:

a_id b_id email
1 10 x@y
3 12 NULL

Why this works — concept by concept:

  • IS NOT DISTINCT FROM as NULL-safe equality — explicitly treats two NULLs as "the same" (returns TRUE) and exactly one NULL as "different" (returns FALSE). Never returns NULL.
  • JOIN ON treats NULL as FALSE — without the NULL-safe operator, every row where either side is NULL is dropped by the ON predicate, exactly like WHERE.
  • Portable fallback(a = b) OR (a IS NULL AND b IS NULL) is the engine-agnostic translation. MySQL ships <=> as a one-character version of the same idea.
  • Symmetry with IS DISTINCT FROM — the negated form is the NULL-safe inequality operator. Both come from the SQL:1999 standard and are increasingly available in modern engines.
  • Cost — same as a plain = join: O(left × right) without indexes, O(matched_pairs) with a hash join. The NULL-safety is purely a semantic upgrade, not a planner penalty.

SQL
Topic — conditional logic
Conditional logic problems (SQL)

Practice →


5. NULL in JOIN, GROUP BY, ORDER BY — the contract

NULL changes JOIN results, GROUP BY buckets, COUNT outputs, and ORDER BY position — the four-rule contract every senior DE knows cold

The mental model in one line: JOIN drops NULL on both sides; LEFT JOIN preserves NULL on the left; GROUP BY treats all NULLs as one group; COUNT(*) counts NULLs, COUNT(col) ignores them. Once you can quote those four rules cold, the entire NULL-in-set-operations interview surface collapses to a memorisation exercise.

Visual diagram of NULL behaviour in JOIN/GROUP BY/COUNT and a three-valued logic truth table for AND/OR — left a 3-row mini diagram showing INNER JOIN drops NULL rows, LEFT JOIN preserves them, GROUP BY treats NULL as one group; right a 3x3 AND/OR truth table including TRUE/FALSE/NULL; on a light PipeCode card.

The four rules.

  • Rule 1 — INNER JOIN drops NULL on join keys. a.k = b.k is NULL when either side is NULL → ON treats NULL as FALSE → row dropped.
  • Rule 2 — LEFT JOIN preserves NULL on the left side. The unmatched left rows still appear; right-side columns become NULL.
  • Rule 3 — GROUP BY treats every NULL in the group-by column as a single group. All NULL rows aggregate into one "NULL bucket."
  • Rule 4 — COUNT(*) counts every row including NULLs; COUNT(col) counts only non-NULL values of col. The difference is the count of NULLs in that column.

Three-valued logic truth table (AND / OR).

AND truth table:

AND TRUE FALSE NULL
TRUE TRUE FALSE NULL
FALSE FALSE FALSE FALSE
NULL NULL FALSE NULL

OR truth table:

OR TRUE FALSE NULL
TRUE TRUE TRUE TRUE
FALSE TRUE FALSE NULL
NULL TRUE NULL NULL

Key insight. FALSE AND anything is FALSE (short-circuit). TRUE OR anything is TRUE. The third state (NULL) only propagates when both operands could produce it — that asymmetry is what makes three-valued logic the silent bug magnet.

ORDER BY and NULL.

  • Default position is dialect-specific. Postgres / Oracle put NULL last on ASC; MySQL / SQL Server put NULL first on ASC. BigQuery puts NULL first on ASC, last on DESC.
  • Override with NULLS FIRST / NULLS LAST. Available in Postgres, Snowflake, BigQuery, Oracle, DuckDB. SQL Server requires the workaround ORDER BY CASE WHEN col IS NULL THEN 1 ELSE 0 END, col.
  • The portable answer: explicit NULLS LAST (or the workaround) is more predictable than relying on the engine default.

Common interview probes on the NULL contract.

  • "Why did my INNER JOIN drop 12% of the rows?" — because the join key has NULLs on one or both sides.
  • "Does COUNT(DISTINCT col) count NULL?" — no. DISTINCT collapses every value (including NULL) but COUNT then skips the NULL group, so NULL never contributes to the count.
  • "What does GROUP BY col return when half the rows have NULL col?" — one group with NULL key, one group per distinct non-NULL value. SUM / AVG inside each group still ignore NULLs.
  • "What does SELECT MAX(col) return when every row has NULL col?" — NULL. Every aggregate of an empty (or all-NULL) input is NULL, except for COUNT which is 0.

Worked example — INNER JOIN drops NULL rows, LEFT JOIN preserves them

Detailed explanation. A team joins orders to customers on customer_id. Some orders have a NULL customer_id (guest checkouts). The INNER JOIN drops those orders silently; the LEFT JOIN keeps them with NULL customer columns.

Question. Given the two tables, show the row count and result of an INNER JOIN vs a LEFT JOIN. Explain why the counts differ.

Input — orders.

order_id customer_id
1 100
2 200
3 NULL

Input — customers.

customer_id name
100 Alice
200 Bob

Code.

-- INNER JOIN — drops order_id=3 (NULL customer_id)
SELECT o.order_id, o.customer_id, c.name
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id;

-- LEFT JOIN — preserves order_id=3 with name=NULL
SELECT o.order_id, o.customer_id, c.name
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.customer_id;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The INNER JOIN evaluates o.customer_id = c.customer_id for every pair. For order_id=3, the predicate is NULL = 100 (and NULL = 200), both NULL. ON treats NULL as FALSE → the row contributes nothing to the result.
  2. The LEFT JOIN does the same predicate evaluation, but its semantics say: "for every row in the left table, emit either the matched right side or NULLs if no right row matched." Since the predicate is NULL for order_id=3, no right row matches, so the row is emitted with c.name = NULL.
  3. The row count diverges: INNER JOIN returns 2 rows; LEFT JOIN returns 3 rows. The difference (1) is exactly the count of unmatched left-side rows — orders without a matching customer.

Output (INNER JOIN).

order_id customer_id name
1 100 Alice
2 200 Bob

Output (LEFT JOIN).

order_id customer_id name
1 100 Alice
2 200 Bob
3 NULL NULL

Rule of thumb. If the "missing" side of a join carries business meaning (guest checkouts, unattributed events, anonymous sessions), use LEFT JOIN and explicitly handle the NULL columns downstream (with COALESCE, with a CASE bucket, or with an IS NULL filter).

Worked example — GROUP BY treats every NULL as one group

Detailed explanation. Aggregate orders by region. Some rows have NULL region (origin unknown). GROUP BY collapses every NULL row into a single "NULL bucket" — not one bucket per NULL, not dropped.

Question. Given the orders table, count the orders per region using GROUP BY. Show how the NULL region appears as a single group, not as silently dropped rows.

Input.

order_id region
1 EU
2 US
3 NULL
4 EU
5 NULL
6 US

Code.

SELECT
    region,
    COUNT(*) AS order_count
FROM orders
GROUP BY region
ORDER BY order_count DESC NULLS LAST;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. GROUP BY hashes each row into a bucket keyed by region. The non-NULL values produce two buckets: EU and US.
  2. Every NULL region row hashes to a single shared "NULL bucket" — not a separate bucket per NULL row.
  3. COUNT(*) counts every row in each bucket including those with NULL keys.
  4. The result has three groups, one of which has a NULL key. ORDER BY ranks them; the explicit NULLS LAST puts the NULL row at the bottom even if its count tied with another group.

Output.

region order_count
EU 2
US 2
NULL 2

Rule of thumb. Never assume NULL group keys are dropped — they are aggregated. If you want the metric without the NULL bucket, add WHERE region IS NOT NULL. If you want NULL relabelled as 'unknown', wrap with COALESCE(region, 'unknown') before the GROUP BY.

Worked example — COUNT(*) vs COUNT(col) vs COUNT(DISTINCT col)

Detailed explanation. Three COUNTs, three different numbers — interview-favourite triple. Each variant has a different NULL semantic.

Question. Given five orders where customer_id has one NULL and one duplicate, compute COUNT(*), COUNT(customer_id), and COUNT(DISTINCT customer_id). Explain the differences.

Input.

order_id customer_id
1 100
2 200
3 NULL
4 100
5 300

Code.

SELECT
    COUNT(*)                     AS total_rows,
    COUNT(customer_id)           AS non_null_customer_ids,
    COUNT(DISTINCT customer_id)  AS unique_customers
FROM orders;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. COUNT(*) counts every row, including the row with NULL customer_id. Result = 5.
  2. COUNT(customer_id) counts every row whose customer_id is not NULL. Result = 4 (rows 1, 2, 4, 5).
  3. COUNT(DISTINCT customer_id) first deduplicates non-NULL values: {100, 200, 300}. NULLs are not deduplicated into a "NULL group" by DISTINCT here — they are simply skipped. Result = 3.
  4. The three numbers differ by exactly the count of NULLs (1) and duplicates (1 — customer_id=100 appears twice).

Output.

total_rows non_null_customer_ids unique_customers
5 4 3

Rule of thumb. Pick the COUNT that matches your business question: "how many orders?" → COUNT(*); "how many orders have a known customer?" → COUNT(customer_id); "how many distinct customers ordered?" → COUNT(DISTINCT customer_id).

Worked example — ORDER BY with explicit NULL placement

Detailed explanation. ORDER BY col puts NULLs at the top or bottom depending on the dialect default and the ASC / DESC direction. Reports that travel between Postgres and SQL Server need explicit NULLS FIRST / NULLS LAST to ensure the NULL row lands in the same position on both engines.

Question. Given the customers table with a nullable last_purchase_date, sort by last_purchase_date DESC so that customers without any purchase appear at the bottom. Show the portable version.

Input.

customer_id last_purchase_date
1 2026-05-20
2 NULL
3 2026-04-10
4 NULL

Code.

-- Postgres / Snowflake / BigQuery / Oracle / DuckDB
SELECT customer_id, last_purchase_date
FROM customers
ORDER BY last_purchase_date DESC NULLS LAST;

-- SQL Server fallback — same result via CASE
SELECT customer_id, last_purchase_date
FROM customers
ORDER BY
    CASE WHEN last_purchase_date IS NULL THEN 1 ELSE 0 END,  -- nulls last
    last_purchase_date DESC;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The Postgres-family query directly attaches NULLS LAST to the ORDER BY column. The engine sorts NULLs to the bottom regardless of direction.
  2. The SQL Server fallback computes a "is-null priority" column on the fly: 0 for non-NULL, 1 for NULL. ORDER BY on that column first means non-NULL rows lead, NULL rows trail.
  3. Inside each priority group, the secondary last_purchase_date DESC sorts the actual values from most recent to oldest.
  4. The "without NULLS LAST" version of the Postgres query would behave correctly on Postgres (default DESC puts NULLs first → most-recent-first list with NULLs at top), but not on MySQL or SQL Server. Being explicit is portable.

Output.

customer_id last_purchase_date
1 2026-05-20
3 2026-04-10
2 NULL
4 NULL

Rule of thumb. Always write explicit NULLS FIRST or NULLS LAST when the column is nullable. The engine default is a portability landmine — and if your downstream pagination relies on stable ordering, an implicit NULL position will look fine in dev and break in prod the day someone migrates the query.

SQL interview question on NULL in JOIN + GROUP BY

A senior interviewer often combines: "Aggregate every customer's total spend in the last 90 days. Customers with zero orders should appear with total_spend = 0, not be dropped. How would you write this?"

Solution Using LEFT JOIN + COALESCE to preserve zero-order customers

SELECT
    c.customer_id,
    c.name,
    COALESCE(SUM(o.amount), 0) AS total_spend
FROM customers c
LEFT JOIN orders o
       ON o.customer_id = c.customer_id
      AND o.order_date >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY c.customer_id, c.name
ORDER BY total_spend DESC NULLS LAST;
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

c.id c.name matched orders (90d) SUM(o.amount) COALESCE → total_spend
1 Alice 2 orders, 50 + 30 80 80
2 Bob 0 orders NULL 0
3 Cara 1 order, 200 200 200

The LEFT JOIN preserves customers with zero matched orders by emitting NULL for o.amount. SUM(NULL) is NULL; COALESCE(NULL, 0) is 0. The result is the "every customer appears" report the business asked for.

Output:

customer_id name total_spend
3 Cara 200
1 Alice 80
2 Bob 0

Why this works — concept by concept:

  • LEFT JOIN preserves the driving table — every row in customers appears in the output exactly once. Customers without recent orders get NULL o.amount, not dropped rows.
  • SUM ignores NULL inputsSUM(o.amount) over a group with zero matched orders sums no rows, which is NULL (not 0). This is why COALESCE is needed.
  • COALESCE supplies the business defaultCOALESCE(SUM(...), 0) turns "no orders" from NULL into 0, which is what every downstream metric expects.
  • Predicate on the JOIN, not WHERE — the date filter o.order_date >= ... is on the ON clause, not WHERE. Putting it on WHERE would silently turn the LEFT JOIN into an INNER JOIN by filtering NULL-right rows out post-join.
  • NULLS LAST — explicitly orders the NULL/zero rows last; otherwise the result depends on engine default and the "Bob" row could appear at the top in some dialects.
  • Cost — same as a plain LEFT JOIN + GROUP BY: O(left × right_matched) for hash join, O(rows) for the aggregate. COALESCE and ORDER BY are constant-per-row.

SQL
Topic — joins
JOIN problems with NULL handling (SQL)

Practice →


Worked example — FULL OUTER JOIN exposes both sides' unmatched rows

Detailed explanation. FULL OUTER JOIN combines the LEFT and RIGHT outer behaviours: every row from both sides appears at least once, with NULLs on whichever side did not match. It is the right tool when you need to find rows present on either side but not the other — the canonical "diff two tables" pattern.

Question. Given orders_today and orders_yesterday, find every order_id that exists in one snapshot but not the other (a row appeared or disappeared between the two snapshots).

Input — orders_today.

order_id status
1 placed
2 shipped
3 paid

Input — orders_yesterday.

order_id status
1 placed
2 placed
4 placed

Code.

SELECT
    COALESCE(t.order_id, y.order_id) AS order_id,
    t.status                         AS today_status,
    y.status                         AS yesterday_status,
    CASE
        WHEN t.order_id IS NULL THEN 'removed'
        WHEN y.order_id IS NULL THEN 'added'
        WHEN t.status IS DISTINCT FROM y.status THEN 'changed'
        ELSE                        'same'
    END AS delta
FROM orders_today t
FULL OUTER JOIN orders_yesterday y
              ON t.order_id = y.order_id
WHERE t.order_id IS NULL
   OR y.order_id IS NULL
   OR t.status IS DISTINCT FROM y.status;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The FULL OUTER JOIN emits every row from both sides. Matched pairs share an order_id; unmatched left rows have NULL on the right; unmatched right rows have NULL on the left.
  2. COALESCE(t.order_id, y.order_id) picks whichever side has a non-NULL order_id — gives a single column to identify the row.
  3. The CASE classifies each row: removed (today is NULL), added (yesterday is NULL), changed (status differs, NULL-safe via IS DISTINCT FROM), or same.
  4. The WHERE keeps only rows where something actually differs — drops the noise of identical rows in both snapshots.

Output.

order_id today_status yesterday_status delta
2 shipped placed changed
3 paid NULL added
4 NULL placed removed

Rule of thumb. FULL OUTER JOIN + COALESCE + IS DISTINCT FROM is the textbook table-diff pattern. Memorise the shape — it shows up in every "compare snapshot to baseline" interview question and in every dbt freshness / data-quality test.

Cheat sheet — NULL recipes

  • Default value for a nullable column. COALESCE(col, 'unknown') — works in every dialect. For 2-arg only: IFNULL(col, 'unknown') in MySQL / Snowflake / BigQuery, ISNULL(col, 'unknown') in SQL Server, NVL(col, 'unknown') in Oracle.
  • Safe division. numerator / NULLIF(denominator, 0) — turns the "division by zero" runtime error into a NULL output. Make it a code-review checklist item.
  • NULL-safe equality. a IS NOT DISTINCT FROM b (Postgres / Snowflake / DuckDB). a <=> b in MySQL. Manual fallback: (a = b) OR (a IS NULL AND b IS NULL).
  • NULL-safe inequality. a IS DISTINCT FROM b. Manual fallback: (a != b) OR (a IS NULL AND b IS NOT NULL) OR (a IS NOT NULL AND b IS NULL).
  • Convert a sentinel to NULL. NULLIF(col, '') for empty-string sentinels; NULLIF(col, 0) for zero sentinels; NULLIF(col, -1) for negative sentinels.
  • Count NULLs in a column. COUNT(*) - COUNT(col), or COUNT(*) FILTER (WHERE col IS NULL), or SUM(CASE WHEN col IS NULL THEN 1 ELSE 0 END).
  • NOT IN with NULL — the safe form. NOT EXISTS (SELECT 1 FROM t WHERE t.col = parent.col). Treat plain NOT IN (subquery) as a smell.
  • GROUP BY without the NULL bucket. WHERE col IS NOT NULL before GROUP BY col. Or relabel: GROUP BY COALESCE(col, 'unknown').
  • ORDER BY with explicit NULL placement. ORDER BY col DESC NULLS LAST. For SQL Server: ORDER BY CASE WHEN col IS NULL THEN 1 ELSE 0 END, col DESC.
  • Find rows with any NULL. WHERE col1 IS NULL OR col2 IS NULL OR .... Or in Postgres: WHERE (col1, col2, col3) IS NULL (composite IS NULL — every component must be NULL for the whole row test to be TRUE).
  • Aggregate ignores NULLs by default. AVG(col), SUM(col), MIN(col), MAX(col) all skip NULL. COUNT(*) counts NULLs; COUNT(col) skips them.
  • CASE on a nullable column. Put WHEN col IS NULL THEN ... first; otherwise the NULL row silently falls into the ELSE arm. Searched CASE only — simple CASE breaks on NULL.
  • IFNULL vs COALESCE. Prefer COALESCE — it is ANSI, N-arg, and works in every dialect. IFNULL is shorter only when you have exactly two arguments.

Frequently asked questions

Does NULL = NULL return TRUE in SQL?

No — NULL = NULL returns NULL (unknown), not TRUE. SQL booleans are three-valued (TRUE / FALSE / UNKNOWN), and any binary comparison involving NULL produces NULL. WHERE, ON, and HAVING treat that UNKNOWN as FALSE, so a predicate of x = NULL filters out every row. The only correct NULL test is IS NULL (unary, returns TRUE / FALSE) or the NULL-safe operator IS NOT DISTINCT FROM in Postgres / Snowflake, or <=> in MySQL.

Why does NOT IN with NULL return zero rows?

Because NOT IN (a, b, NULL) expands to != a AND != b AND != NULL. The last term is NULL (any comparison with NULL is NULL). With AND, TRUE AND TRUE AND NULL is NULL — not TRUE — so the WHERE clause treats it as FALSE and drops every row. The fix is NOT EXISTS (subquery), which is NULL-safe by construction, or filtering NULLs out of the subquery first with WHERE col IS NOT NULL. This is the most-asked null not in sql interview gotcha.

COALESCE vs IFNULL vs ISNULL vs NVL — which one should I use?

Use COALESCE — it is ANSI-standard, N-arg (accepts any number of arguments), and works in every dialect including MySQL, SQL Server, Postgres, Snowflake, BigQuery, Oracle, and DuckDB. IFNULL(a, b) is a 2-arg MySQL / Snowflake / BigQuery alias; ISNULL(a, b) is the SQL Server name (and behaves differently in Snowflake, where ISNULL(x) is a 1-arg "is x NULL" predicate); NVL(a, b) is Oracle (and also Snowflake). When you only have two arguments and you are inside a single-dialect project, the shorter name is fine — but COALESCE is the portable default.

How do I count NULL rows in a column?

Three idioms, all correct. COUNT(*) - COUNT(col) exploits the fact that COUNT(*) counts every row but COUNT(col) skips NULLs — the difference is the NULL count. COUNT(*) FILTER (WHERE col IS NULL) is the ANSI-standard syntax (Postgres / Snowflake / BigQuery / DuckDB). SUM(CASE WHEN col IS NULL THEN 1 ELSE 0 END) is the portable fallback for MySQL and SQL Server. Pick whichever your dialect supports; all three compile to the same plan.

Does GROUP BY combine NULLs into a single group?

Yes — every NULL value in the GROUP BY column collapses into a single shared "NULL group," and the aggregate functions inside (SUM, AVG, COUNT(*)) compute across that group exactly as they would for any other key. This is different from "NULL is dropped" (it is not) and from "every NULL is its own group" (it is not). If you do not want the NULL bucket in your output, add WHERE col IS NOT NULL. If you want to relabel it, group by COALESCE(col, 'unknown') and the group will appear with the literal 'unknown'.

Why does my INNER JOIN drop rows where the join key is NULL?

Because INNER JOIN evaluates the ON predicate a.k = b.k for every row pair, and NULL = NULL is NULL — which ON treats as FALSE. The unmatched left and right rows are dropped. Switch to LEFT JOIN (or RIGHT / FULL OUTER JOIN) to preserve the unmatched side; or use a NULL-safe join predicate a.k IS NOT DISTINCT FROM b.k (Postgres / Snowflake) or a.k <=> b.k (MySQL) to make NULL match NULL. This is the silent cause of "why did the join lose 12% of the data?" tickets — and a favourite interview probe for sql ifnull and three-valued logic together.

Practice on PipeCode

Pipecode.ai is Leetcode for Data Engineering — every NULL recipe above ships with hands-on practice rooms where you write the IS NULL test, the NULLIF wrapper, and the LEFT JOIN + COALESCE fallback against real graded inputs. PipeCode pairs every reading with 450+ DE-focused problems and a real-time scoring engine, so you never have to wonder whether your fix to `null not in sql` actually behaves the same on Postgres as on Snowflake.

Practice NULL handling now →
CASE expression drills →

Top comments (0)