Gowtham Potureddi

Posted on May 23

SQL PIVOT and UNPIVOT: Reshape Rows ↔ Columns for Analytics

#sql #python #interview #dataengineering

sql pivot turns long-format rows into wide-format columns: one row per category becomes one column per category, with the aggregate value populating each cell. sql unpivot is the inverse — one column per metric becomes one row per metric, restoring the long shape that downstream pipelines and BI tools prefer. These two reshape operators answer the bulk of the sql interview questions in the "make this data wide" / "make this data long" cluster, and getting pivot in sql right — especially the dialect divergence between SQL Server's native PIVOT operator and PostgreSQL's SUM(CASE WHEN …) idiom — separates candidates who pass screening from candidates who don't.

This guide walks through every clause in the sql pivot unpivot family that reviewers love to test in data engineering interview questions: the long-vs-wide mental model, the native PIVOT / UNPIVOT syntax in SQL Server / Oracle / Snowflake, the portable SUM(CASE WHEN …) idiom that works on every dialect (PostgreSQL, MySQL, every flavour without a PIVOT keyword), how to handle dynamic column lists when the pivot values are unknown at query time, the inverse UNPIVOT and its portable UNION ALL rewrite, and the seven gotchas (NULL cells, dynamic SQL, non-unique pivot keys) that fail most candidates. Every section ends as sql interview questions with answers: a runnable PostgreSQL query, a traced execution, an output table, and a concept-by-concept why this works breakdown — the exact shape sql for data engineers rounds reward when pivot table in sql comes up.

When you want hands-on reps immediately after reading, browse pivoting SQL practice →, drill the unpivoting lane →, sharpen conditional aggregation problems →, rehearse aggregation SQL drills →, or widen coverage on the full SQL practice library →.

On this page

Why PIVOT and UNPIVOT matter in data engineering interviews
Long format vs wide format — the reshape mental model
The portable SUM(CASE WHEN …) pivot — works on every dialect
Native PIVOT syntax — SQL Server, Oracle, Snowflake
UNPIVOT and the inverse reshape
Dynamic pivots — when the column list is not known at write time
PIVOT gotchas — NULL cells, non-unique keys, dynamic SQL hazards
Choosing the right reshape (cheat sheet)
Frequently asked questions
Practice on PipeCode

1. Why PIVOT and UNPIVOT matter in data engineering interviews

Reshape rows ↔ columns — that's the whole interview surface in two sentences

The one-sentence invariant: sql pivot turns long-format rows (one row per (entity, metric) pair) into wide-format columns (one row per entity, one column per metric), with an aggregate populating each cell; sql unpivot does the inverse — one wide column per metric becomes one long row per metric. Once you internalise that, every prompt in the pivot table sql / unpivot in sql family becomes "pick the reshape direction and the aggregate."

The two operators at a glance. Together they answer most pivot table sql and pivot table in sql interview prompts.

PIVOT — long → wide. One row per group key, one new column per distinct value of the "pivot column", each cell populated by an aggregate (typically SUM or COUNT).
UNPIVOT — wide → long. One wide table with N metric columns becomes a long table with N × row_count rows, each row carrying a metric name and its value. The sql unpivot operator is dialect-native on SQL Server, Oracle, Snowflake, BigQuery, and Databricks; the portable UNION ALL rewrite covers PostgreSQL and MySQL.

The two strategies for PIVOT you must know.

Native PIVOT operator — built into SQL Server, Oracle, and Snowflake; concise syntax with a FOR … IN (…) list of values. Not available in PostgreSQL, MySQL, BigQuery (BigQuery added it in 2022 but it differs from SQL Server's syntax).
Portable SUM(CASE WHEN col = 'X' THEN value ELSE 0 END) idiom — works on every dialect; ties directly back to the conditional-aggregation pattern from Blog72; this is the default for PostgreSQL / MySQL.

Why interviewers love PIVOT / UNPIVOT.

Every dashboard, BI export, and pivot table is a reshape — sales by region × month, headcount by team × quarter, conversions by source × week.
Tests two fluencies at once — GROUP BY grain awareness (which column is the row key?) and conditional logic (which column does each row contribute to?).
Dialect divergence is substantial — native PIVOT syntax differs across SQL Server / Oracle / Snowflake / BigQuery; the portable idiom works everywhere.
The UNPIVOT direction surfaces "long format is the data-engineer's friend" — most pipelines, BI tools, and warehouse models prefer long format because it scales gracefully when new categories appear.

What interviewers listen for.

Do you name the row key and the column key before writing the query? — "the row key is region; the column key is quarter; the cell value is SUM(amount)."
Do you reach for the portable SUM(CASE WHEN …) idiom when the dialect is PostgreSQL or MySQL? — the answer that travels.
Do you mention ELSE 0 discipline to avoid the silent-NULL undercount from Blog72? — bonus points.
Do you flag the dynamic-pivot trap when the column list is unknown at write time? — senior signal (the answer involves dynamic SQL or downstream reshape in pandas).

Worked example — pivot quarterly revenue per region

Detailed explanation. When a stakeholder asks for "quarterly revenue per region in a spreadsheet shape", the natural output is one row per region with four columns — Q1, Q2, Q3, Q4. The input is long format (one row per (region, quarter, amount)); the output is wide format (one row per region, four metric columns).

Question. From sales(region, quarter, amount), return one row per region with four columns (q1, q2, q3, q4) holding total revenue per quarter.

Input. sales slice.

region	quarter	amount
US	Q1	1200
US	Q2	1500
US	Q3	1700
US	Q4	2000
EU	Q1	800
EU	Q2	950
EU	Q3	1100
EU	Q4	1300

Code (portable, works on every dialect).

SELECT region,
       SUM(CASE WHEN quarter = 'Q1' THEN amount ELSE 0 END) AS q1,
       SUM(CASE WHEN quarter = 'Q2' THEN amount ELSE 0 END) AS q2,
       SUM(CASE WHEN quarter = 'Q3' THEN amount ELSE 0 END) AS q3,
       SUM(CASE WHEN quarter = 'Q4' THEN amount ELSE 0 END) AS q4
FROM sales
GROUP BY region
ORDER BY region;

Step-by-step explanation.

FROM sales reads every detail row.
GROUP BY region collapses to one row per region — the row key.
Each SUM(CASE WHEN quarter = 'Qx' THEN amount ELSE 0 END) adds amount only when the row matches that quarter, otherwise contributes 0.
ELSE 0 (not omitted) keeps SUM returning 0 instead of NULL for regions missing a quarter — the discipline from Blog72.
The output grain is one row per region with four metric columns — the wide shape stakeholders want.

Output.

region	q1	q2	q3	q4
EU	800	950	1100	1300
US	1200	1500	1700	2000

Rule of thumb: identify the row key first (region), then the column key (quarter) and its known values (Q1..Q4), then write one SUM(CASE WHEN …) per known value — that's the universal pivot template.

SQL
Topic — case-when
SUM(CASE WHEN …) pivot drills

Practice →

SQL
Topic — conditional-aggregation
Conditional aggregation problems

Practice →

2. Long format vs wide format — the reshape mental model

Two shapes for the same data — pick the one that matches your downstream

Before any pivot syntax, the mental model: the same dataset can be expressed as long format (narrow, many rows, fixed columns) or wide format (short, fewer rows, more columns). Knowing which shape you want — and which shape your downstream consumer wants — is half the battle in any reshape interview question.

Long format — one row per (entity, metric, value).

Shape — typically three or four columns: entity_id, metric_name, metric_value, optional timestamp.
Row count — entities × metrics; grows linearly when new metrics or entities arrive.
Use it for — warehouse storage, ETL staging, event logs, dashboards built on Looker / Tableau / Metabase (which expect tidy long data).
Why data engineers prefer it — adding a new metric does not require schema changes; the new metric just becomes new rows with a new metric_name value.
The default shape in OLTP and most warehouse fact tables.

Wide format — one row per entity, one column per metric.

Shape — entity_id plus N metric columns; each column holds one specific metric's value.
Row count — equal to the number of distinct entities; columns equal the number of metrics.
Use it for — spreadsheet exports, BI dashboards that expect a "cube" shape, reports formatted for non-technical readers.
Why analysts and stakeholders prefer it — visually compact; "looks like a pivot table"; trivially cross-readable left-to-right.
The output shape that PIVOT produces and that UNPIVOT consumes.

When to pivot (long → wide).

The downstream consumer (Excel, a sheet, a static report) expects a pivot-table shape.
The metric list is small, stable, and known in advance (Q1 / Q2 / Q3 / Q4; not "all sub-products since 2018").
The number of pivot columns is bounded — wide tables with thousands of columns are anti-patterns.

When to unpivot (wide → long).

Upstream system emits wide data (a CSV from finance with one column per month) and you need to load it into a long fact table.
A model expects tidy data and the source is wide.
Joining two wide tables on a (entity, metric) grain — unpivot to long, join, then pivot back if needed.

Why long is the canonical storage shape for data engineering.

Adding a new metric is INSERT … VALUES (id, 'new_metric', val) — no schema change.
Adding a new entity is INSERT — no schema change.
Schema stability — long tables typically have 3–6 columns and never change; wide tables grow unbounded.
Aggregations and joins are cleaner on long tables — GROUP BY metric_name partitions naturally.

The reshape arrows.

Long → Wide = PIVOT (native operator) OR SUM(CASE WHEN …) (portable idiom).
Wide → Long = UNPIVOT (native operator) OR UNION ALL of SELECT entity, 'm1', m1 FROM t UNION ALL SELECT entity, 'm2', m2 FROM t … (portable idiom).

SQL
Topic — unpivoting
Unpivoting SQL drills

Practice →

SQL
Topic — data-aggregation
Data aggregation library

Practice →

3. The portable SUM(CASE WHEN …) pivot — works on every dialect

`sql pivot without pivot` — the universal pivot idiom every data engineer should know

The portable pivot pattern works on every SQL dialect ever shipped, predates the native PIVOT keyword by decades, and is the only option in PostgreSQL, MySQL, SQLite, and most cloud warehouses without a dedicated operator. It is the sql pivot example every interviewer expects you to know.

The anatomy of the portable pivot.

GROUP BY <row_key> — defines the output grain; one row per distinct value of the row key.
One SUM(CASE WHEN <col_key> = '<value>' THEN <metric> ELSE 0 END) per known column value — each populates one wide-format column.
ELSE 0 keeps the aggregate at 0 instead of NULL for missing combinations.
COUNT(CASE WHEN cond THEN 1 END) for count-pivots (ELSE can be omitted because COUNT skips NULL).

The template.

SELECT <row_key>,
       SUM(CASE WHEN <col_key> = '<value_1>' THEN <metric> ELSE 0 END) AS <alias_1>,
       SUM(CASE WHEN <col_key> = '<value_2>' THEN <metric> ELSE 0 END) AS <alias_2>,
       …
FROM <table>
GROUP BY <row_key>
ORDER BY <row_key>;

Why the portable idiom is preferred even when PIVOT exists.

Portability — the same query works on Postgres, MySQL, SQL Server, Snowflake, BigQuery; native PIVOT is a translation step.
Multiple aggregates per pivot column — native PIVOT typically allows one aggregate per call; SUM(CASE WHEN …) lets you compute SUM and COUNT in the same query.
Custom predicates per cell — WHEN amount > 100 AND quarter = 'Q1' is trivial in CASE; awkward in native PIVOT.
No IN (…) list constraint — native PIVOT requires the value list inline; CASE WHEN accepts any expression.

ELSE 0 discipline — the Blog72 callback.

Without ELSE 0 — SUM returns NULL for groups where no row matches the WHEN condition; downstream arithmetic propagates the NULL and undercounts.
With ELSE 0 — the missing-data case is explicitly zero; pivoted dashboards render 0, not blank.
Senior signal — stating "I use ELSE 0 to avoid the silent-NULL undercount" is the pattern from Blog72 applied to pivot cells.

Multiple metrics in one pivot query.

Combine SUM and COUNT in the same row by adding more CASE WHEN columns.
Example — per region, both total revenue per quarter AND order count per quarter:

SELECT region,
       SUM(CASE WHEN quarter = 'Q1' THEN amount ELSE 0 END) AS q1_rev,
       SUM(CASE WHEN quarter = 'Q2' THEN amount ELSE 0 END) AS q2_rev,
       COUNT(CASE WHEN quarter = 'Q1' THEN 1 END)           AS q1_orders,
       COUNT(CASE WHEN quarter = 'Q2' THEN 1 END)           AS q2_orders
FROM sales
GROUP BY region;

Filtering inside the cell with FILTER (Postgres 9.4+).

Postgres-specific cleaner syntax — SUM(amount) FILTER (WHERE quarter = 'Q1') AS q1_rev.
Equivalent to SUM(CASE WHEN quarter = 'Q1' THEN amount END).
More readable for one-off pivots; not portable to MySQL / SQL Server.

SQL interview question — paid vs refunded revenue per region in a wide layout

Assume orders(region, status, amount) with status ∈ {'paid', 'refund', 'pending'}. Return one row per region with three columns — paid_rev, refund_amount, pending_amount — totalled across all orders.

Solution Using SUM(CASE WHEN …)

Detailed explanation. This is a textbook conditional-aggregation pivot from Blog72 applied to a region × status grain. The row key is region; the column key is status with three known values; the cell metric is SUM(amount). The ELSE 0 keeps regions with no refunds rendering 0, not NULL.

Code (PostgreSQL / works on every dialect).

SELECT region,
       SUM(CASE WHEN status = 'paid'    THEN amount ELSE 0 END) AS paid_rev,
       SUM(CASE WHEN status = 'refund'  THEN amount ELSE 0 END) AS refund_amount,
       SUM(CASE WHEN status = 'pending' THEN amount ELSE 0 END) AS pending_amount
FROM orders
GROUP BY region
ORDER BY paid_rev DESC;

Equivalent using Postgres FILTER clause.

SELECT region,
       SUM(amount) FILTER (WHERE status = 'paid')    AS paid_rev,
       SUM(amount) FILTER (WHERE status = 'refund')  AS refund_amount,
       SUM(amount) FILTER (WHERE status = 'pending') AS pending_amount
FROM orders
GROUP BY region
ORDER BY paid_rev DESC;

Step-by-step trace.

step	relation	outcome
1	Scan `orders`	every order row read
2	`GROUP BY region`	hash aggregate; one row per region
3	Three `SUM(CASE WHEN …)` per group	three cell totals per row
4	`ELSE 0` ensures missing-status cells become 0	no NULL holes
5	`ORDER BY paid_rev DESC`	top-paying regions first

Output:

region	paid_rev	refund_amount	pending_amount
US	11,200	1,300	0
EU	7,900	850	200
APAC	5,100	300	0

Why this works — concept by concept:

GROUP BY region defines the grain — one row per region in the output.
Three CASE WHEN columns map the three statuses — each populates one wide-format column.
ELSE 0 keeps cells zero, not NULL — the Blog72 silent-undercount defence applied to pivot cells.
FILTER syntax is the Postgres-clean equivalent — drops the CASE ceremony for the same semantics.
One pass through the data — the planner reads orders once and evaluates all three aggregates in the same hash node.
Cost — Θ(n) scan + Θ(n) hash aggregate keyed on region; the three aggregate calls are a constant per-row factor.

SQL
Topic — conditional-aggregation
Pivot via conditional aggregation

Practice →

SQL
Topic — case-when
CASE WHEN SQL library

Practice →

4. Native PIVOT syntax — SQL Server, Oracle, Snowflake

`pivot in sql` — three dialects with a dedicated keyword

SQL Server, Oracle, and Snowflake ship a native PIVOT operator. It is syntactic sugar over the same SUM(CASE WHEN …) semantics — the planner often produces identical execution plans — but the syntax is concise enough that interviewers like to test whether you can read it. PostgreSQL and MySQL do not have native PIVOT; the portable idiom from §3 is mandatory there.

SQL Server PIVOT — the canonical shape.

SELECT region, [Q1], [Q2], [Q3], [Q4]
FROM (
    SELECT region, quarter, amount FROM sales
) src
PIVOT (
    SUM(amount)
    FOR quarter IN ([Q1], [Q2], [Q3], [Q4])
) AS pvt
ORDER BY region;

PIVOT (aggregate FOR column IN (value_list)) — one aggregate, one column to pivot, an explicit list of values.
Bracketed identifiers — [Q1] is the SQL Server way to quote column names with special characters or reserved words.
The derived table src selects only the columns participating — region (row key), quarter (pivot column), amount (cell value). Any extra columns become extra row-key components.

Oracle PIVOT — same idea, slight syntax tweak.

SELECT * FROM (
    SELECT region, quarter, amount FROM sales
)
PIVOT (
    SUM(amount)
    FOR quarter IN ('Q1' AS q1, 'Q2' AS q2, 'Q3' AS q3, 'Q4' AS q4)
)
ORDER BY region;

AS alias clauses produce nicer column names than 'Q1'.
String literals for the value list (Oracle uses single quotes).
Same FOR … IN (…) shape; same one-aggregate-per-call rule.

Snowflake PIVOT — close to SQL Server.

SELECT * FROM (
    SELECT region, quarter, amount FROM sales
) src
PIVOT (
    SUM(amount)
    FOR quarter IN ('Q1', 'Q2', 'Q3', 'Q4')
) AS pvt
ORDER BY region;

String literals for the value list (no brackets).
PIVOT (... FOR ... IN (... ANY)) since 2023 — dynamic value list (computed at query time).
One aggregate per PIVOT — same constraint as SQL Server / Oracle.

BigQuery PIVOT — closest to the standard.

SELECT * FROM (
    SELECT region, quarter, amount FROM sales
)
PIVOT (
    SUM(amount)
    FOR quarter IN ('Q1', 'Q2', 'Q3', 'Q4')
);

No alias required after the PIVOT (...) clause.
String literals, same FOR / IN shape.

The constraints native PIVOT shares across dialects.

One aggregate per PIVOT call — to compute both SUM and COUNT, run two queries and join them, or use the portable SUM(CASE WHEN …) idiom.
Values must be enumerated at write time — dynamic value lists require dynamic SQL or warehouse-specific extensions (Snowflake's ANY).
Implicit GROUP BY — every column from the derived table that is not the pivot column or the aggregated metric becomes a row-key column.
The implicit-grouping trap — if your derived table accidentally projects an extra column (e.g. order_id), each row of order_id becomes its own row key and the pivot collapses to one row per (region, order_id) instead of per region.

Dialect support matrix.

Dialect	Native `PIVOT`	Native `UNPIVOT`	Portable `SUM(CASE WHEN …)`
PostgreSQL	No	No	Yes — `SUM(amount) FILTER (WHERE …)` is the clean form
MySQL	No	No	Yes
SQL Server	Yes (since 2005)	Yes	Yes
Oracle	Yes (since 11g)	Yes	Yes
Snowflake	Yes	Yes	Yes
BigQuery	Yes (since 2022)	Yes	Yes
Databricks	Yes	Yes	Yes

SQL
Topic — case-when
PIVOT syntax SQL drills

Practice →

SQL
Topic — aggregation
Aggregation + PIVOT patterns

Practice →

5. UNPIVOT and the inverse reshape

`unpivot in sql` — wide back to long for warehouse loading

sql unpivot is the inverse of sql pivot: a wide table with N metric columns becomes a long table with N × row_count rows. Every cell becomes a row, with the source column name as a new column and the cell value as another. The shape is exactly what warehouse fact tables and tidy-data dashboards consume — and unpivot in sql shows up in interviews as the "wide-to-long" reshape twin of the pivot prompt.

Native UNPIVOT — SQL Server, Oracle, Snowflake.

SELECT region, quarter, amount
FROM (
    SELECT region, q1, q2, q3, q4 FROM quarterly_sales
) src
UNPIVOT (
    amount FOR quarter IN (q1, q2, q3, q4)
) AS unpvt;

UNPIVOT (value_col FOR name_col IN (col_list)) — value_col holds the cell values; name_col holds the source column names.
One row per source cell — four source columns × two regions = eight rows.
NULL cells are dropped by default — if q3 IS NULL for some region, that row does not appear. UNPIVOT … INCLUDE NULLS (some dialects) keeps them.

Portable UNPIVOT via UNION ALL — every dialect.

SELECT region, 'Q1' AS quarter, q1 AS amount FROM quarterly_sales
UNION ALL
SELECT region, 'Q2',           q2          FROM quarterly_sales
UNION ALL
SELECT region, 'Q3',           q3          FROM quarterly_sales
UNION ALL
SELECT region, 'Q4',           q4          FROM quarterly_sales;

One SELECT per source column — each projects the same shape (region, quarter_literal, value_column).
UNION ALL stitches them vertically; never UNION (don't pay the dedup cost).
NULL handling is explicit — add WHERE q1 IS NOT NULL per leg if you want to drop NULL cells, or omit to keep them.

PostgreSQL LATERAL + VALUES — the elegant Postgres pattern.

SELECT region, quarter, amount
FROM quarterly_sales,
     LATERAL (VALUES
         ('Q1', q1),
         ('Q2', q2),
         ('Q3', q3),
         ('Q4', q4)
     ) AS t(quarter, amount);

LATERAL lets the inline VALUES reference outer columns (q1, q2, etc.).
More compact than four UNION ALL legs.
PostgreSQL-only — not portable to MySQL or SQL Server, but very clean when available.

jsonb_each + to_jsonb — when columns are dynamic in Postgres.

SELECT region, kv.key AS quarter, (kv.value)::numeric AS amount
FROM quarterly_sales,
     LATERAL jsonb_each(to_jsonb(quarterly_sales) - 'region') kv;

to_jsonb(row) - 'region' turns the row into a JSON object and drops the row key.
jsonb_each expands the object into key-value rows.
Use case — unpivoting tables with unknown or wildly varying wide columns; trades type safety for flexibility.

Common sql unpivot use cases — when unpivot in sql is the right reshape.

Loading wide CSVs into a long fact table — finance dumps a sheet with one column per month; the warehouse ETL unpivots it before insertion.
Reshaping for Looker / Tableau / Metabase — these tools strongly prefer tidy long data.
Computing aggregates across metrics — MAX(amount) across all quarters per region is trivial once the data is long.

SQL interview question — unpivot wide quarterly_sales into long

Assume quarterly_sales(region, q1, q2, q3, q4). Return a long-format result with columns (region, quarter, amount) — one row per (region, quarter) cell.

Solution Using UNION ALL (portable)

Detailed explanation. The portable unpivot pattern produces one SELECT per source column, then stacks them with UNION ALL. Use UNION ALL (not UNION) because every row is unique by construction — no dedup pass needed.

Code (PostgreSQL / works on every dialect).

SELECT region, 'Q1' AS quarter, q1 AS amount FROM quarterly_sales
UNION ALL
SELECT region, 'Q2',           q2          FROM quarterly_sales
UNION ALL
SELECT region, 'Q3',           q3          FROM quarterly_sales
UNION ALL
SELECT region, 'Q4',           q4          FROM quarterly_sales
ORDER BY region, quarter;

Equivalent in SQL Server (native UNPIVOT).

SELECT region, quarter, amount
FROM (
    SELECT region, q1, q2, q3, q4 FROM quarterly_sales
) src
UNPIVOT (
    amount FOR quarter IN (q1, q2, q3, q4)
) AS unpvt
ORDER BY region, quarter;

Equivalent in Postgres using LATERAL VALUES.

SELECT region, quarter, amount
FROM quarterly_sales,
     LATERAL (VALUES
         ('Q1', q1), ('Q2', q2), ('Q3', q3), ('Q4', q4)
     ) AS t(quarter, amount)
ORDER BY region, quarter;

Step-by-step trace.

step	relation	outcome
1	Four `SELECT`s each project `(region, 'Qx', qx)`	four streams of equal shape
2	`UNION ALL` stacks them vertically	one combined stream of N × 4 rows
3	Each row carries the source column name and the cell value	tidy long shape
4	`ORDER BY region, quarter`	sorted final output

Output:

region	quarter	amount
EU	Q1	800
EU	Q2	950
EU	Q3	1100
EU	Q4	1300
US	Q1	1200
US	Q2	1500
US	Q3	1700
US	Q4	2000

Why this works — concept by concept:

UNION ALL stacks the four projections — every row is unique by construction; no dedup needed.
String literal in each SELECT — 'Q1' AS quarter injects the source column name as data; this is the unpivot trick.
The cell value column comes from the corresponding wide column — q1, q2, q3, q4 each populate the same output column.
Native UNPIVOT is syntactic sugar — produces the same plan on every engine that supports it.
LATERAL VALUES is the Postgres-clean variant — one row per source row times the number of VALUES tuples.
Cost — four scans of quarterly_sales (the portable form) or one scan (native UNPIVOT / LATERAL); the native forms are cheaper at scale.

SQL
Topic — unpivoting
UNPIVOT SQL drills

Practice →

SQL
Topic — set-operations
UNION ALL + UNPIVOT patterns

Practice →

6. Dynamic pivots — when the column list is not known at write time

The dynamic-pivot problem — and why it's a senior interview signal

The portable and native pivot patterns both require the column list to be enumerated at write time: Q1 / Q2 / Q3 / Q4 is fine, but product_a / product_b / … / product_2000_new_skus is not. When the pivot values are unknown at write time, you need either dynamic SQL, a warehouse extension (Snowflake's PIVOT (… IN ANY)), or a downstream reshape in pandas / Spark.

The fundamental constraint.

Relational SQL is schema-bound — SELECT projects a fixed number of named columns; you cannot project N columns where N depends on data values.
Native PIVOT (… IN (?, ?, …)) still needs the value list at parse time.
SUM(CASE WHEN col = 'X' THEN val ELSE 0 END) still needs 'X' as a literal at write time.

Three approaches when the value list is dynamic.

Approach 1 — dynamic SQL (the classic workaround).

Step 1 — SELECT DISTINCT pivot_col FROM source to discover the value list.
Step 2 — concatenate the value list into a SQL string with one SUM(CASE WHEN …) per value.
Step 3 — EXECUTE (SQL Server's EXEC sp_executesql, Postgres EXECUTE, Oracle EXECUTE IMMEDIATE).
Cost — two database round-trips; SQL string concatenation is hand-rolled; injection risk if user-supplied data is mixed in unsafely.

-- SQL Server example
DECLARE @cols NVARCHAR(MAX), @sql NVARCHAR(MAX);
SELECT @cols = STRING_AGG(QUOTENAME(quarter), ',')
FROM (SELECT DISTINCT quarter FROM sales) q;

SET @sql = N'
SELECT region, ' + @cols + N'
FROM (SELECT region, quarter, amount FROM sales) src
PIVOT (SUM(amount) FOR quarter IN (' + @cols + N')) AS pvt;';

EXEC sp_executesql @sql;

Approach 2 — warehouse-native dynamic pivot (Snowflake's ANY).

Snowflake since 2023 — PIVOT (SUM(amount) FOR quarter IN (ANY)) discovers the value list automatically.
BigQuery — limited dynamic-pivot support via EXECUTE IMMEDIATE for the same DDL-string approach.
Cleanest option when available; no string concatenation.

Approach 3 — reshape downstream in pandas / Spark / dbt.

Pull long-format data from SQL.
Reshape in the application layer — pandas df.pivot_table(index='region', columns='quarter', values='amount', aggfunc='sum').
Often the right call for production pipelines — keeps SQL simple and pushes dynamic-schema work to a tool designed for it.

When dynamic pivot is the wrong question.

The value list is "ever-growing" — pivoting on product SKUs, user IDs, etc. is almost always wrong; you'd produce a table with thousands of columns.
Re-evaluate — is the question really "give me a wide layout" or is it "give me a long table I can render as a pivot in BI"? — usually the latter.
Senior signal — push back on the requirement when the long shape is what the consumer actually needs.

The interview answer.

State the constraint clearly — "SQL needs to know the column list at write time."
Offer three paths — dynamic SQL, warehouse-native dynamic pivot, or downstream reshape.
Pick the right one for the context — reach for downstream reshape unless the schema is tightly bounded.
Avoid the trap — never agree to a "thousand-column pivot"; flag the schema-design problem.

SQL
Topic — conditional-aggregation
Dynamic pivot SQL drills

Practice →

SQL
Topic — data-aggregation
Data aggregation library

Practice →

7. PIVOT gotchas — NULL cells, non-unique keys, dynamic SQL hazards

The seven bugs interviewers test most often on PIVOT and UNPIVOT

Reshape operators have a small surface area but a long tail of edge cases that fail candidates. These are the seven gotchas reviewers test most often when pivot table in sql comes up in an interview.

Gotcha 1 — SUM returns NULL for missing cells without ELSE 0.

The bug — SUM(CASE WHEN quarter = 'Q3' THEN amount END) (no ELSE) returns NULL when the region has no Q3 orders.
Symptom — dashboards render blank / — instead of 0; downstream arithmetic propagates NULL.
Fix — add ELSE 0 in every cell aggregate, or wrap with COALESCE(…, 0).

Gotcha 2 — Implicit GROUP BY in native PIVOT includes every projected column.

The bug — accidentally projecting order_id into the derived table makes every row its own group; the pivot collapses to one row per (region, order_id) instead of per region.
Symptom — wide table with millions of rows of one-row-per-order, each with mostly empty cells.
Fix — project only the row-key columns, the pivot column, and the metric in the derived table.

Gotcha 3 — Pivot value list must match exactly (case + whitespace).

The bug — FOR quarter IN ('q1', 'q2', 'q3', 'q4') when the data contains 'Q1', 'Q2', 'Q3', 'Q4'.
Symptom — every cell is NULL because no row matches.
Fix — normalise upstream (LOWER(quarter) or UPPER(quarter)) or list both casings.

Gotcha 4 — UNPIVOT drops NULL cells silently.

The behaviour — native UNPIVOT excludes rows where the cell is NULL; this is often desired but sometimes surprising.
Symptom — UNPIVOT output has fewer rows than expected when the wide table has missing values.
Fix — UNPIVOT … INCLUDE NULLS if your dialect supports it, or use the portable UNION ALL form with explicit WHERE control.

Gotcha 5 — Non-unique row keys produce unexpected aggregates.

The bug — pivoting on a row key that is not unique in the source data (e.g. region but the same region appears multiple times per quarter) silently aggregates.
Symptom — cell values are larger than expected because multiple source rows feed each cell.
Fix — confirm the source grain before pivoting; pre-aggregate to the row-key × pivot-column grain inside a CTE.

Gotcha 6 — Native PIVOT allows only one aggregate per call.

The constraint — PIVOT (SUM(amount) FOR …) and PIVOT (COUNT(*) FOR …) cannot coexist in a single PIVOT clause.
Workaround — run two PIVOTs and join on the row key, OR use the portable SUM(CASE WHEN …) + COUNT(CASE WHEN …) form in one query.
Performance — the portable form is often faster because it scans the source table once.

Gotcha 7 — Dynamic SQL pivots introduce injection risk.

The bug — concatenating user-supplied data into the value list string opens a SQL injection vector.
Symptom — security audit failure; potential data exfiltration.
Fix — use the dialect's safe quoting function (QUOTENAME in SQL Server, quote_literal in Postgres) to escape values; or restrict the discovery query to a known-safe source.

SQL
Topic — case-when
PIVOT gotcha drills

Practice →

SQL
Topic — unpivoting
UNPIVOT gotcha drills

Practice →

Choosing the right reshape (cheat sheet)

A one-screen cheat sheet for using SQL PIVOT and UNPIVOT — pick the direction, then the dialect-appropriate idiom.

You want to …	Direction	Portable idiom	Native (when available)
Long → Wide with one aggregate	PIVOT	`SUM(CASE WHEN col = 'X' THEN v ELSE 0 END)`	`PIVOT (SUM(v) FOR col IN ('X', …))`
Long → Wide with multiple aggregates	PIVOT	One `SUM(CASE WHEN …)` per metric per value	Two native PIVOTs joined on the row key
Long → Wide with Postgres clean syntax	PIVOT	`SUM(v) FILTER (WHERE col = 'X')`	n/a (Postgres has no native PIVOT)
Wide → Long	UNPIVOT	`SELECT … UNION ALL SELECT …` per source column	`UNPIVOT (val FOR name IN (col1, col2, …))`
Wide → Long in Postgres (clean)	UNPIVOT	`LATERAL (VALUES ('X', col1), ('Y', col2)) AS t(name, val)`	n/a
Dynamic column list	PIVOT	Dynamic SQL + string-concatenated `CASE WHEN`s	`PIVOT (… IN ANY)` (Snowflake 2023+)
Excel-style pivot table	PIVOT	`SUM(CASE WHEN …)` with a row key and column key	dialect-native PIVOT
Reshape for warehouse load	UNPIVOT	`UNION ALL` of per-column projections	dialect-native UNPIVOT
Reshape for Looker / Tableau	UNPIVOT	`UNION ALL` to long format	tidy data favoured by BI tools
Count of rows per cell	PIVOT	`COUNT(CASE WHEN col = 'X' THEN 1 END)`	`PIVOT (COUNT(*) FOR col IN (…))`
Both SUM and COUNT per cell	PIVOT	One query with two CASE-aggregate columns per value	Two PIVOTs joined

Frequently asked questions

What is PIVOT in SQL?

PIVOT is a reshape operator that turns long-format rows into wide-format columns. The input is typically a table with three logical columns — a row key (which becomes the rows in the output), a column key (whose distinct values become the new columns), and a metric (whose aggregate populates each cell). For example, sales(region, quarter, amount) can be pivoted on quarter to produce one row per region with four columns (Q1, Q2, Q3, Q4) holding SUM(amount). SQL Server, Oracle, Snowflake, BigQuery, and Databricks have native PIVOT operators with a FOR … IN (…) syntax; PostgreSQL and MySQL have no native PIVOT and rely on the portable SUM(CASE WHEN col = 'X' THEN val ELSE 0 END) idiom inside a GROUP BY query.

How do I pivot rows to columns in PostgreSQL or MySQL?

PostgreSQL and MySQL have no native PIVOT keyword — use the portable SUM(CASE WHEN …) idiom. The template is SELECT row_key, SUM(CASE WHEN col_key = 'value_1' THEN metric ELSE 0 END) AS alias_1, … FROM source GROUP BY row_key. Add one CASE WHEN column per distinct pivot value, with ELSE 0 to keep missing cells as 0 instead of NULL (Blog72 silent-undercount defence). PostgreSQL 9.4+ also supports the cleaner SUM(amount) FILTER (WHERE quarter = 'Q1') syntax, which produces the same result without the CASE ceremony. The idiom is the only portable shape, predates the PIVOT keyword by decades, and is the right answer for any cross-engine codebase.

What's the difference between PIVOT and UNPIVOT?

PIVOT goes long → wide: one row per (entity, metric, value) triple becomes one row per entity with one column per distinct metric. UNPIVOT goes wide → long: one row per entity with N metric columns becomes N × row_count long rows, each carrying a metric name and its value. The two are inverses: PIVOT is what BI tools and stakeholders want for visual reports; UNPIVOT is what warehouse ETL pipelines want for tidy storage. Native UNPIVOT exists on SQL Server, Oracle, Snowflake, BigQuery, and Databricks; the portable form is a UNION ALL of one SELECT entity, 'metric_name' AS m, metric_value FROM source per source column. PostgreSQL has a particularly clean LATERAL (VALUES …) shape that beats both the native and portable forms for readability.

Can I pivot on a column with unknown values at write time?

Not directly with relational SQL — the PIVOT operator and the SUM(CASE WHEN …) idiom both require the value list to be enumerated at parse time. Three workarounds: (1) dynamic SQL — discover the distinct values, concatenate them into a SQL string, and EXECUTE the resulting query; verbose and injection-risky; (2) warehouse-native dynamic pivot — Snowflake's PIVOT (… IN ANY) discovers the value list at query time; the cleanest option when available; (3) reshape downstream — keep SQL queries simple and long-format, then pivot in pandas / Spark / dbt; usually the right production answer. Be wary of "ever-growing" pivot column lists (product SKUs, user IDs) — they signal a schema-design problem; the long shape is almost always what the consumer actually needs, even if they asked for wide.

Why use SUM(CASE WHEN …) instead of native PIVOT when both are available?

Three reasons. Portability — the same query works on Postgres, MySQL, SQL Server, Snowflake, BigQuery; native PIVOT is a per-dialect translation step. Multiple aggregates per pivot column — native PIVOT typically allows one aggregate per call, so computing both SUM and COUNT per cell requires two PIVOTs and a join; SUM(CASE WHEN …) + COUNT(CASE WHEN …) does it in one query with one scan. Custom predicates per cell — WHEN amount > 100 AND quarter = 'Q1' is trivial in CASE; awkward in native PIVOT's IN (…) list. The portable idiom is preferred even on dialects with native PIVOT for any non-trivial pivot; native PIVOT is best reserved for read-only ad-hoc reports where the syntax brevity matters.

Practice on PipeCode

PipeCode ships 450+ data-engineering interview problems — including PostgreSQL-first SQL practice keyed to PIVOT / UNPIVOT, conditional aggregation with SUM(CASE WHEN …), the FILTER clause, set-operations for UNPIVOT via UNION ALL, and the dialect quirks that fail candidates who memorise only one engine. Whether you're drilling sql interview questions with answers for sql for data engineers loops or grinding through data engineering interview questions end-to-end, the practice library mirrors the same long-vs-wide mental model this guide teaches.

Kick off via Explore practice →; drill the CASE WHEN SQL lane →; fan out into the unpivoting lane →; rehearse conditional aggregation problems →; reinforce aggregation SQL drills →; widen coverage on the full SQL practice library →.

DEV Community

SQL PIVOT and UNPIVOT: Reshape Rows ↔ Columns for Analytics

1. Why PIVOT and UNPIVOT matter in data engineering interviews

Reshape rows ↔ columns — that's the whole interview surface in two sentences

Worked example — pivot quarterly revenue per region

2. Long format vs wide format — the reshape mental model

Two shapes for the same data — pick the one that matches your downstream

3. The portable SUM(CASE WHEN …) pivot — works on every dialect

`sql pivot without pivot` — the universal pivot idiom every data engineer should know

SQL interview question — paid vs refunded revenue per region in a wide layout

Solution Using SUM(CASE WHEN …)

4. Native PIVOT syntax — SQL Server, Oracle, Snowflake

`pivot in sql` — three dialects with a dedicated keyword

5. UNPIVOT and the inverse reshape

`unpivot in sql` — wide back to long for warehouse loading

SQL interview question — unpivot wide quarterly_sales into long

Solution Using UNION ALL (portable)

6. Dynamic pivots — when the column list is not known at write time

The dynamic-pivot problem — and why it's a senior interview signal

7. PIVOT gotchas — NULL cells, non-unique keys, dynamic SQL hazards

The seven bugs interviewers test most often on PIVOT and UNPIVOT

Choosing the right reshape (cheat sheet)

Frequently asked questions

What is PIVOT in SQL?

How do I pivot rows to columns in PostgreSQL or MySQL?

What's the difference between PIVOT and UNPIVOT?

Can I pivot on a column with unknown values at write time?

Why use SUM(CASE WHEN …) instead of native PIVOT when both are available?

Practice on PipeCode

Top comments (0)

1. Why PIVOT and UNPIVOT matter in data engineering interviews

Reshape rows ↔ columns — that's the whole interview surface in two sentences

Worked example — pivot quarterly revenue per region

2. Long format vs wide format — the reshape mental model

Two shapes for the same data — pick the one that matches your downstream

3. The portable SUM(CASE WHEN …) pivot — works on every dialect

sql pivot without pivot — the universal pivot idiom every data engineer should know

SQL interview question — paid vs refunded revenue per region in a wide layout

Solution Using SUM(CASE WHEN …)

4. Native PIVOT syntax — SQL Server, Oracle, Snowflake

pivot in sql — three dialects with a dedicated keyword

5. UNPIVOT and the inverse reshape

unpivot in sql — wide back to long for warehouse loading

SQL interview question — unpivot wide quarterly_sales into long

Solution Using UNION ALL (portable)

6. Dynamic pivots — when the column list is not known at write time

The dynamic-pivot problem — and why it's a senior interview signal

7. PIVOT gotchas — NULL cells, non-unique keys, dynamic SQL hazards

The seven bugs interviewers test most often on PIVOT and UNPIVOT

Choosing the right reshape (cheat sheet)

Frequently asked questions

What is PIVOT in SQL?

How do I pivot rows to columns in PostgreSQL or MySQL?

What's the difference between PIVOT and UNPIVOT?

Can I pivot on a column with unknown values at write time?

Why use SUM(CASE WHEN …) instead of native PIVOT when both are available?

Practice on PipeCode

`sql pivot without pivot` — the universal pivot idiom every data engineer should know

`pivot in sql` — three dialects with a dedicated keyword

`unpivot in sql` — wide back to long for warehouse loading