Gowtham Potureddi

Posted on May 13

CTE in SQL for Data Engineering Interviews: WITH Clauses, Recursive CTEs, and Window SQL Patterns

#python #sql #interview #dataengineering

CTE in SQL — a Common Table Expression introduced with WITH — is how you turn a brittle wall of nested subqueries into a readable, debuggable pipeline. In data engineering interviews — the same lanes as basic sql interview questions, joins in sql interview questions, and sql interview questions with answers rounds — reviewers use CTEs as a readability signal: if you can name intermediate results and chain them cleanly, you will survive the live whiteboard refactor.

You will build the pattern from scratch: single-CTE anatomy, chained CTE pipelines, CTE + sql window functions for rank-then-filter (top‑N per group), WITH RECURSIVE org-chart and counting sequences, and the classic board question triad (CTE vs subquery vs temporary table) with blunt trade-offs. Every interview-style beat ends as sql interview questions with answers: runnable Postgres-flavoured SQL, a traced execution, printed output tables, and a concept-by-concept why this works map — without repeating the boilerplate headline as keyword stuffing.

When you want hands-on reps immediately after reading, browse CTE (SQL) practice →, dive practice SQL hub →, sharpen window function SQL →, rehearse join interview SQL →, or widen coverage on the general CTE topic →.

On this page

Why CTEs matter in interviews and pipelines
Single CTE — anatomy of WITH … AS
Multiple chained CTEs — read top-down like dbt staging
CTE with joins and aggregations
CTE + sql window functions — rank, then filter
Recursive CTE — hierarchies without leaving SQL
CTE vs subquery vs temp table — interview trade-offs
Choosing CTE usage (checklist)
Frequently asked questions
Practice on PipeCode

1. Why CTEs matter in interviews and pipelines

The readability invariant interviewers optimise for

The core invariant you should state aloud: a CTE binds a disposable, named relational expression to one outer SELECT, INSERT, UPDATE, or DELETE; it disappears when the enclosing statement completes—no DDL, no session lease—and it reads cleaner than tortured nested parentheses. That wording alone answers half of the "what is CTE" prompts you will hear in screening loops.

Detailed explanation.

Bound to one statement — every WITH chain is part of a single top-level statement. Nothing "lives" after COMMIT/ROLLBACK of that unit the way a temp table does; you cannot SELECT a CTE alias from a different query in the same session without copy-pasting the definition.
Algebraic, not physical — the engine may inline a CTE into the outer query, merge predicates, or hoist joins. You name relations for humans and maintainers; the planner still rewrites. Senior answers separate authoring ergonomics from execution guarantees (see §7 for materialization nuance).
Same relational type as a subquery — a CTE produces a relation (bag or set depending on UNION vs UNION ALL); every row has a schema fixed by the inner SELECT list. That is why you can stack CTEs like typed pipeline stages.

When an interviewer asks why you reached for WITH instead of a derived table, cite three forces:

Nameability — avg_salary reads better than "subselect #3" and signals intent in code review.
Refactor‑friendliness — comment out a CTE while debugging without rewiring parentheses; reorder stages when the story changes.
Composition — later CTEs may reference earlier ones in the same chain; that mirrors dbt / SQL transformation layers without leaving the warehouse editor.

Pro tip: When someone says "walk me through this query," start at the last CTE in the chain and narrate backward to data sources—mirrors how many optimisers inline, and shows you control the algebraic story end-to-end.

Common beginner mistakes

Treating CTE results as persisted tables—they are logical scopes, not caches (mention materialized CTE hints only when you genuinely know your engine exposes them).
Inlining seventeen anonymous subqueries to "save lines" — interviewers downgrade readability instantly.
Hiding mutating statements inside so-called readability layers — keep each CTE a pure relational expression unless the prompt explicitly mixes DML patterns.
Reusing a CTE name as if it were a view registered in the catalog—only the enclosing statement can see it; cross-statement reuse needs a real view, temp table, or ORM layer.

2. Single CTE — anatomy of `WITH … AS`

Name the intermediate relation, then consume it once

Every non-recursive CTE has the same silhouette:

WITH alias AS (
    SELECT ...
)
SELECT ... FROM alias;

Warehouses disagree on microscopic optimisation trivia; they agree on this grammar.

`WITH` binds scope, not storage

Think of WITH as lexical, not physical: engines may inline the CTE, merge filters, reorder joins—the author job is expressing intent cleanly so downstream planners can.

Detailed explanation.

Column list optional — you may write WITH cte (a, b) AS (SELECT …) to rename projected columns explicitly; handy when outer queries should not depend on brittle inner aliases.
Multiple references — the outer statement may reference the same CTE twice (FROM cte c1 JOIN cte c2 …). That is legal but can surprise optimizers into scanning twice unless merged; mention that aloud if the interviewer asks about cost.
Mutual exclusion with some clauses — dialect details vary, but in interviews assume the CTE's inner SELECT follows normal rules (no bare aggregates without GROUP BY, no illegal forward references to outer query columns—correlated subqueries still need explicit correlation).

Worked example. Filter highly paid rows before projecting columns:

Concept	Meaning
Alias	reusable handle `big_earn` inside the outer query
Scope	disappears after outer `SELECT` finishes
Contrast vs inline	same relational idea, sharper whiteboard narration

Step-by-step.

Define big_earn as SELECT emp_id, name, salary FROM employees WHERE salary > 50000.
Outer query selects SELECT name, salary FROM big_earn ORDER BY salary DESC.
Add LIMIT 3 knowing the filtration already happened upstream logically — note: ORDER BY + LIMIT apply to the outer grain after the CTE is bound, which matches how you narrate debugging ("sort the named relation, then truncate").

Worked-example solution.

WITH big_earn AS (
    SELECT emp_id, name, salary
    FROM employees
    WHERE salary > 50000
)
SELECT name, salary
FROM big_earn
ORDER BY salary DESC
LIMIT 3;

Common beginner traps

Forgetting that outer WHERE cannot "reach into" the CTE's hidden predicates unless you expose columns—push stable filters into the CTE when they shrink working sets early.

Rule of thumb: if the subquery has a business meaning ("high earners," "valid orders"), name it.

3. Multiple chained CTEs — read top-down like dbt staging

Each new CTE may reference any prior CTE in the chain

The multi-CTE invariant: order matters only for name resolution—cte_n may reference cte_{1..n-1} but never forward-declare aliases you have not defined yet. This mirrors layered SQL transformations: staging → intermediates → marts.

Detailed explanation.

Acyclic name graph — imagine an edge from daily_totals → raw_orders because the former reads the latter. The textual order you write is a valid topological sort of that dependency DAG; swap two CTEs blindly and names break.
Grain discipline — each stage should have a one-sentence grain statement ("one row per order," "one row per calendar day per store"). When the grain shifts, rename the CTE so reviewers see the dimensional contract.
Debugging workflow — during a live interview, you can stub later CTEs as SELECT * FROM prior_cte LIMIT 10 to validate shapes before adding aggregates—chaining rewards that incremental reveal.

Worked example. Three-step revenue SLA filter:

Stage	Responsibility
`raw_orders`	normalise ingest projections
`daily_totals`	aggregate at calendar-day grain
`flagged_days`	predicate on revenue thresholds

Step-by-step.

raw_orders projects event_date, usd_revenue.
daily_totals aggregates SUM(revenue) grouped by calendar date — watch for fan-out: if raw_orders accidentally joins to a dimension before this step, your SUM multiplies; keep joins that change cardinality either explicit in a dedicated CTE or after aggregation when semantics demand it.
flagged_days keeps SLA-busting days only — pure filter on an already-collapsed daily relation.

Worked-example solution.

WITH raw_orders AS (
    SELECT DATE(order_ts)::date AS d, revenue_usd
    FROM ingest.shopify_orders
    WHERE refunded IS FALSE
),
daily_totals AS (
    SELECT d, SUM(revenue_usd)::numeric(14,2) AS rev
    FROM raw_orders
    GROUP BY d
),
flagged_days AS (
    SELECT *
    FROM daily_totals
    WHERE rev >= 25000
)
SELECT *
FROM flagged_days
ORDER BY d DESC;

Common chaining pitfalls

Hidden Cartesian products — joining two wide CTEs without keys duplicates rows; verbalize keys whenever you fuse layers.
Leaky filters — applying WHERE only in the outermost SELECT while earlier CTEs still scan massive history; push time windows and partitioning predicates as early as the schema allows.

Rule of thumb: rename each layer after the grain (per_order, per_customer_day) so graders see dimensional thinking.

4. CTE with joins and aggregations

Join inside the CTE when the join is the reusable story

Pulling joins in sql interview questions into a CTE tells the room you know how to isolate dimension wiring before applying business filters.

Detailed explanation.

Freeze aggregates once — the pattern WITH cohort_metrics AS (… GROUP BY key …) computes each group's summary exactly once in the narrative. The outer query then behaves like a probe: attach metrics to detail rows and filter on those scalars.
Cardinality contract — after dept_avg, you expect one row per dept_id (assuming dept_id is a key in departments). Joining that CTE back to employees on the same key should not duplicate employees unless the aggregate CTE itself was built from a many-to-many path—if it was, split "explode" joins out of the aggregate stage.
Interview narration — say aloud: "first I collapse to department grain, then I join the employee spine at employee grain, then I filter." That three-beat story maps directly to the SQL.

Department averages with a reusable join CTE

Worked example. Employees enriched with departmental averages prior to outperform filters.

Step-by-step.

Aggregate salaries by department ID.
Join employees back to those aggregates.
Filter rows beating their department averages.

Worked-example solution.

WITH dept_avg AS (
    SELECT d.dept_id, AVG(e.salary)::numeric(12,2) AS dept_avg_salary
    FROM employees e
    JOIN departments d ON d.dept_id = e.dept_id
    GROUP BY d.dept_id
)
SELECT e.name,
       e.salary,
       d.dept_name,
       a.dept_avg_salary,
       e.salary - a.dept_avg_salary AS spread
FROM employees e
JOIN departments d USING (dept_id)
JOIN dept_avg a USING (dept_id)
WHERE e.salary > a.dept_avg_salary;

Common beginner mistakes

Omitting GROUP BY stability rules while referencing raw columns beside aggregates inside the same projection.
Accidentally multiplying row counts via join cardinalities prior to aggregates—solve with guarded sub-aggregates in earlier CTEs.

Alternative sketch — correlate without a JOIN in the aggregate CTE

You can compute AVG(...) OVER (PARTITION BY dept_id) in a dedicated CTE instead of grouping first; grouped CTE + join is easier to reason about aloud when WITH readability is graded.

SQL interview question — employees beating their department average

Assume employees(emp_id, name, dept_id, salary) plus departments(dept_id, dept_name). Return every teammate whose salary exceeds that department's average, including the departmental average beside each teammate row.

Solution Using a join-friendly aggregate CTE

Code solution.

WITH dept_avg AS (
    SELECT dept_id, AVG(salary)::numeric(12,2) AS dept_avg_salary
    FROM employees
    GROUP BY dept_id
)
SELECT e.emp_id,
       e.name,
       d.dept_name,
       e.salary,
       a.dept_avg_salary
FROM employees e
JOIN dept_avg a USING (dept_id)
JOIN departments d USING (dept_id)
WHERE e.salary > a.dept_avg_salary
ORDER BY d.dept_name, e.salary DESC;

Step-by-step trace.

step	relation	outcome
1	Scan `employees` inside `dept_avg`	input multiset for averaging
2	`GROUP BY dept_id` + `AVG(salary)`	department-grain relation: one `dept_avg_salary` scalar per dept
3	`JOIN employees … USING (dept_id)`	employee-grain relation again; each worker row carries parent dept's average unchanged
4	`WHERE e.salary > a.dept_avg_salary`	anti-regression filter — removes rows at or below cohort mean

Output:

emp_id	name	dept_name	salary	dept_avg_salary
12	Ava	Retail	92,500	78,320
4	Mei	Insights	130,400	110,980

(Demonstrative salaries — graders care about algebraic shape.)

Why this works — concept by concept:

CTE dept_avg — freezes department-grain aggregates exactly once per cohort; avoids repeating AVG subqueries inline on every employee row in the outer text.
JOIN USING (dept_id) — stitches scalar averages back without exploding beyond employee cardinality when dept_avg stayed at true dept key grain.
Filter after aggregation — separates compute departmental truth vs evaluate individuals; interviewers listen for that separation as a signal you understand two different grains in one question.
Numeric cast — ::numeric(12,2) (or equivalent) prevents float drift in panel walkthroughs when money-like fields appear.
Cost — hash aggregate typically Θ(n) on the employee spine for the grouped leg, plus Θ(n) equi-join cost to reattach; dominated by scans unless indexed paths short-circuit.

SQL
Topic — CTE (SQL)
CTE‑focused SQL problems

Practice →

SQL
Topic — joins
Join interview patterns

Practice →

SQL
Topic — aggregation
Aggregation drills

Practice →

5. CTE + sql window functions — rank, then filter

Separate ranking logic from predicates—two CTE beats one clever monster

Most sql window functions arcs follow rank-within partitions → predicate on ranks; folding both layers into one derived table hides mistakes under pressure.

Detailed explanation.

Logical processing order — window functions attach to rows after FROM/WHERE/GROUP BY/HAVING in the relational pipeline for that inner SELECT; you typically cannot reference a window alias in the same SELECT's WHERE in standard SQL—you project ranks in one CTE, filter in the next (or nest a subquery). That restriction is precisely why graders like CTE + window combos: the shape matches the semantics.
Partition = rivalry scope — PARTITION BY dept_id means "within this department bucket only, reorder rows"; rows in other departments never compete for rn = 1.
ORDER BY inside OVER — resolves ties; emp_id as trailing sort key buys determinism for ROW_NUMBER when salaries collide.

`ROW_NUMBER()` vs cousins (pick aloud in-panel)

function	ties at same ORDER BY keys	skips rank values after ties?
`ROW_NUMBER()`	breaks arbitrarily per sort key completeness	never — strictly 1..N per partition
`RANK()`	equal keys share rank	yes — gaps after a tie (`1, 1, 3`)
`DENSE_RANK()`	equal keys share rank	no gaps (`1, 1, 2`)

Demand deterministic slicing ⇒ ROW_NUMBER + exhaustive ORDER BY. Reward parity for equal salaries ⇒ RANK or DENSE_RANK per business rule.

`ROW_NUMBER()` is deterministic for mechanical top‑N slicing

Different from RANK()—choose deliberately when duplicates must share honours.

Worked example. Produce top‑3 salaries per department with deterministic ties.

Step-by-step.

Decorate employees with ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC, emp_id ASC).
Filter rn <= 3 in downstream scope—the second CTE (or outer SELECT) is your predicate stage isolated from analytic decoration.

Worked-example solution.

WITH ranked AS (
    SELECT dept_id,
           emp_id,
           name,
           salary,
           ROW_NUMBER() OVER (
               PARTITION BY dept_id
               ORDER BY salary DESC, emp_id ASC
           ) AS rn
    FROM employees
)
SELECT *
FROM ranked
WHERE rn <= 3
ORDER BY dept_id, rn;

SQL interview question — top two salaries whenever a department staffs at least two people

Assume employees(dept_id, emp_id, name, salary). Return rows only for departments with population ≥ 2, showing the highest two salaries in each (ROW_NUMBER ordering, break ties on emp_id).

Solution Using stacked CTEs for readability

Code solution.

WITH ranked AS (
    SELECT dept_id,
           emp_id,
           name,
           salary,
           ROW_NUMBER() OVER (
               PARTITION BY dept_id
               ORDER BY salary DESC, emp_id ASC
           ) AS rn,
           COUNT(*) OVER (PARTITION BY dept_id) AS dept_pop
    FROM employees
),
top_two AS (
    SELECT *
    FROM ranked
    WHERE dept_pop >= 2
      AND rn <= 2
)
SELECT dept_id, emp_id, name, salary
FROM top_two
ORDER BY dept_id, rn;

Step-by-step trace.

step	action
1	`ranked` decorates rows with deterministic `rn`; grain unchanged vs base `employees` — one output row still per `(dept_id, emp_id)`.
2	`COUNT() OVER (PARTITION BY dept_id)` is a pure scalar broadcast* inside each dept — computes headcount without self-joining aggregates back.
3	`top_two` enforces BOTH “big enough dept” plus podium depth simultaneously — logically equivalent to `HAVING` on dept size after hypothetical `GROUP BY`, but keeps employee-level projection intact.

Output:

dept_id	emp_id	name	salary
10	4	Ava	98,400
10	7	Omar	95,050
20	1	Zoe	120,010

Why this works — concept by concept:

Window PARTITION BY — constrains rivalry to departmental peers exclusively; cross-department ordering is irrelevant noise.
ROW_NUMBER semantics — hands you distinct ranks per partition for deterministic slicing machinery when tie-break columns are explicit (emp_id).
COUNT(*) window — encodes interviewer guardrails (“only multi-person teams”) without collapsing rows pre-rank (GROUP BY would destroy per-employee rn).
Stacked CTEs — separates decorate (ranked) from predicate (top_two), mirroring dialect rules about filtering window outputs.
Cost intuition — window sort/heaps approach Θ(n log n) dominance per partition in typical merge-sort planners; mention index-friendly ORDER BY keys (dept_id, salary DESC) as an optimization avenue if indexes exist.

SQL
Topic — CTE
CTE topic lane

Practice →

SQL
Topic — window functions
Window-function SQL problems

Practice →

SQL
Topic — CTEs
Broader CTE practice set

Practice →

6. Recursive CTE — hierarchies without leaving SQL

`WITH RECURSIVE` stitches an anchor SELECT to an inductive UNION ALL spine

Recursive patterns answer organisation charts, bill-of-material explosions, controlled sequence generation—for acyclic hierarchies WITH RECURSIVE lands squarely inside CTE in SQL interviewer expectations across Postgres-first DE loops.

Detailed explanation.

On each evaluation round, SQL engines conceptually compute:

Anchor (non-recursive SELECT) — initial working set (frontier, generation 0).
Recursive member — SELECT … JOIN recursive_cte_alias …; the join references the WITH RECURSIVE alias (tree, seq, …). Results are UNION ALL-appended unless you explicitly demand dedupe with UNION.
Fixed point — iteration stops when the recursive member contributes no new rows under the engine's recursive evaluation rules (graphs with cycles need explicit guarding — anchors + cycle columns or rewriting — otherwise recursion becomes pathological).

Invariant checklist:

Anchor SELECT gathers starting frontier rows (often roots where parent keys are NULL).
Recursive SELECT joins the prior frontier back to driving base rows.
UNION ALL appends successive generations (UNION when dedupe is mandated explicitly and you accept DISTINCT cost).
Cycles blow up naive trees — state how you would detect or prevent them (CYCLE, path arrays, UNION dedupe semantics, or procedural escape hatches).

WITH RECURSIVE seq(n) AS (
    SELECT 1
    UNION ALL
    SELECT n + 1
    FROM seq
    WHERE n < 5
)
SELECT * FROM seq;

This toy emits {1…5} — it isolates recursion mechanics before you attach employees edges.

SQL interview question — enumerate every descendant under VP `vp_id`

Given employees(emp_id, name, manager_id) with an acyclic tree, vp_id acts as subtree root. Emit every reachable employee with hierarchical level numbering starting at 0 for the VP herself.

Solution Using Postgres `WITH RECURSIVE`

Code solution.

WITH RECURSIVE tree AS (
    SELECT emp_id,
           name,
           manager_id,
           0 AS level
    FROM employees
    WHERE emp_id = :vp_id
    UNION ALL
    SELECT e.emp_id,
           e.name,
           e.manager_id,
           tree.level + 1
    FROM employees e
    JOIN tree ON e.manager_id = tree.emp_id
)
SELECT emp_id,
       name,
       level,
       REPEAT('  ', level) || name AS indent_name
FROM tree
ORDER BY level, emp_id;

Step-by-step trace.

frontier	expands by
depth 0	VP seed row from anchor — subtree root anchored at interviewer parameter
depth k ≥ 1	every employee whose `manager_id` references someone already resident in `tree` (inductive join)

Expansion halts when the recursive SELECT would append only rows already reachable—under cycle-free graphs, breadth grows until leaf managers fail to recruit new hires. Worst-case work proportional to |subtree| edges traversed modulo engine batching semantics.

Output:

emp_id	name	level	indent_name
100	Quinn	0	Quinn
110	Ravi	1	··Ravi

Why this works — concept by concept:

Anchor picks root identity — pins recursion context to interviewer-supplied vp_id; switching roots is a literal parameter tweak, no structural rewrite.
JOIN aligns generations — children attach only to parents already enumerated, matching org-tree edge direction (employee.manager_id → parent.emp_id).
Level accumulator — 0 at VP, increments per hop—communicates depth for indented displays, maximum-depth caps (WHERE level <= …), or post-filtering slices in outer queries.
UNION ALL composition — appends frontier rows without collapsing duplicates that UNION DISTINCT might hide prematurely on messy data (distinctness still comes from emp_id uniqueness in a clean HR dimension).
Complexity intuition — visits each node in the reached subtree once along each valid parent link in acyclic settings; cycles break this story—call that out explicitly in senior panels.

SQL
Topic — CTE/SQL hub
Deep CTE‑SQL drills

Practice →

Python
Topic — recursion
Recursion practice (adjacent muscle)

Practice →

SQL
Language — SQL
All SQL problems

Practice →

7. CTE vs subquery vs temp table — interview trade-offs

Know which tool implies which lifecycle story

Three-way grading grid interviewers memorize:

Tool	Lifetime	Typical debugging	Replay story
CTE (`WITH`)	enclosing statement only	annotate pipeline layers mentally	rerun full statement blob
Inline derived table	same statement span	cramped syntax	rerun entire mega-expression
`CREATE TEMP TABLE`	session-bound	rerun arbitrary downstream slices	iterative analyst workflow

Detailed explanation.

Inlining vs forcing materialization — on PostgreSQL 12+, WITH foo AS (...) SELECT … usually inherits optimisation like an inline derived table; prepend MATERIALIZED (or NOT MATERIALIZED) once you deliberately shape evaluation order (control duplicate work) or tame estimated rows quirks the optimizer misses. Naming the dialect matters: Snowflake / BigQuery / other warehouses each implement hints differently — never claim portability without hedging.
Subquery multiplicity — identical inline subqueries in one statement may execute more than once unless the planner deduplicates identical fragments — CTE naming makes reuse intent obvious to collaborators even when plans merge.
Temp tables redeem exploration — you can CREATE INDEX ON tmp_… for repeated joins inside a detective session; that is awkward for ephemeral CTEs. Temps also interoperate across ORM/console round-trips in one session when you slice-and-dice predicates interactively.

Interview sound bite: "CTEs optimise communication during one deterministic statement; **TEMP tables optimise iterative exploration where you rerun predicates interactively.**"

Worked-example solution.

-- Exploration inside psql → TEMP wins iteration ergonomics:
CREATE TEMP TABLE tmp_high_value_orders AS
SELECT *
FROM staging.orders
WHERE revenue_usd > 500;

SELECT customer_id,
       AVG(revenue_usd) avg_rev
FROM tmp_high_value_orders
GROUP BY 1;

-- vs production SQL model favouring readability:
WITH high_value_orders AS (
    SELECT *
    FROM staging.orders
    WHERE revenue_usd > 500
)
SELECT customer_id,
       AVG(revenue_usd) avg_rev
FROM high_value_orders
GROUP BY 1;

Choosing in one breath

Use CTEs for published pipelines and pairing windows with filters; reserve TEMP for sandbox hypotheses, brute-force cardinality introspection, and multi-query rehearsals; reserve opaque inline subqueries for one-shot predicates where naming hurts more than helping.

Rule of thumb: articulate lifetime + collaborator audience plainly—students often chant "plans identical" without naming engine nuances or session economics.

SQL
Topic — subqueries
Subquery practice lane

Practice →

Choosing CTE usage (checklist)

Situation	Reach for …	Avoid …
Multi-phase authored SQL destined for repos / reviews	chained CTEs	one mega SELECT nobody dares refactor
Window ranking + nuanced filters	sequential CTEs	deeply nested analytic soup
Acyclic hierarchies entirely inside warehouse dialect	`WITH RECURSIVE` pathways	bouncing back to procedural-only apps prematurely
Ad-hoc investigation with iterative reruns	`TEMP TABLE` + helpful indexes	retyping massive CTE trees each tweak

Frequently asked questions

What are typical cte in sql interview questions?

Expect definition ("what token opens a CTE?" → WITH), lifecycle (statement scope vs session objects), readability refactors (nested subquery → named pipeline), window + CTE combos rank-then-filter, and trees via WITH RECURSIVE. Panels often wedge one optimisation empathy question ("would you force materialisation? why?"). Answer each with two crisp sentences: mechanism first, dialect caveat second—not acronym dumps.

How is a CTE different from a temporary table?

A CTE is logical sugar glued to one enclosing statement: no catalog object, disappears when the batch finishes unless you wrap it differently. CREATE TEMP TABLE … allocates session-scoped physical storage, survives until disconnect or DROP, and supports indexes / repeated downstream queries comfortably. Reach for whichever matches lifetime and whether collaborators need iterative rerun ergonomics—not whichever "feels fancier".

When should `WITH RECURSIVE` surrender to procedural graph code?

Stay in SQL while the graph stays moderate, acyclic (or you have explicit CYCLE / path handling patterns for your dialect), and results feed relational BI directly. Pivot to procedural / graph libs when cycles are common without clean detection, branching explodes runtimes, validations span systems outside the warehouse contract, or you need fine-grained backtracking/search primitives SQL does not express cleanly.

Do CTEs automatically materialise intermediate results?

No blanket guarantee. Many optimisers inline / fold ordinary CTEs like named subqueries. PostgreSQL exposes MATERIALIZED / NOT MATERIALIZED to steer evaluation; cite those only alongside the engine name. Recursive CTEs follow different planner stories—mention termination semantics if the interviewer probes performance cliffs.

How should sql interview questions with answers narratives flow?

Lead with legible layering (stage name → grain), paste minimal runnable SQL, then narrate Step-by-step trace from base relations outward, freeze an Output table—even toy numbers—and finish with Why this works tying grain, join cardinality, aggregation vs window semantics, and coarse cost intuition. Mirrors how this article stitches panel answers together.

What one-line summary should stick in recall?

"WITH* names intermediate relations so collaborators reason about algebraic stages the same way dbt exposes staging → mart layers—but never confuse naming with persisted storage unless you pinned it there yourself."*

Practice on PipeCode

PipeCode ships 450+ data-engineering interview problems—including PostgreSQL-first SQL practice keyed to WITH chains, joins, aggregation, CTE + sql window functions, and branching recursive reasoning.

Kick off via Explore practice →; drill the dedicated CTE(SQL) lane →; fan out across CTE topic → or CTE(s) bucket →; deepen window-functions SQL practice →; rehearse join SQL drills →; reinforce aggregation practice → whenever grouped metrics underpin your predicates.

1. Why CTEs matter in interviews and pipelines

The readability invariant interviewers optimise for

Common beginner mistakes

2. Single CTE — anatomy of WITH … AS

Name the intermediate relation, then consume it once

WITH binds scope, not storage

Common beginner traps

3. Multiple chained CTEs — read top-down like dbt staging

Each new CTE may reference any prior CTE in the chain

Common chaining pitfalls

4. CTE with joins and aggregations

Join inside the CTE when the join is the reusable story

Department averages with a reusable join CTE

Common beginner mistakes

Alternative sketch — correlate without a JOIN in the aggregate CTE

SQL interview question — employees beating their department average

Solution Using a join-friendly aggregate CTE

5. CTE + sql window functions — rank, then filter

Separate ranking logic from predicates—two CTE beats one clever monster

ROW_NUMBER() vs cousins (pick aloud in-panel)

ROW_NUMBER() is deterministic for mechanical top‑N slicing

SQL interview question — top two salaries whenever a department staffs at least two people

Solution Using stacked CTEs for readability

6. Recursive CTE — hierarchies without leaving SQL

WITH RECURSIVE stitches an anchor SELECT to an inductive UNION ALL spine

SQL interview question — enumerate every descendant under VP vp_id

Solution Using Postgres WITH RECURSIVE

7. CTE vs subquery vs temp table — interview trade-offs

Know which tool implies which lifecycle story

Choosing in one breath

Choosing CTE usage (checklist)

Frequently asked questions

What are typical cte in sql interview questions?

How is a CTE different from a temporary table?

When should WITH RECURSIVE surrender to procedural graph code?

Do CTEs automatically materialise intermediate results?

How should sql interview questions with answers narratives flow?

What one-line summary should stick in recall?

Practice on PipeCode

2. Single CTE — anatomy of `WITH … AS`

`WITH` binds scope, not storage

`ROW_NUMBER()` vs cousins (pick aloud in-panel)

`ROW_NUMBER()` is deterministic for mechanical top‑N slicing

`WITH RECURSIVE` stitches an anchor SELECT to an inductive UNION ALL spine

SQL interview question — enumerate every descendant under VP `vp_id`

Solution Using Postgres `WITH RECURSIVE`

When should `WITH RECURSIVE` surrender to procedural graph code?