When I started writing SQL against transaction data in fintech and securities brokerage environments, my queries were a mess. Three levels of nesting, the same aggregate computed twice, and a debugging process that involved commenting out chunks until something ran. The shift came when I stopped treating subqueries as the only way to compose logic and started picking the right structure for the job.
This article walks through:
- what a subquery actually is
- the four shapes a subquery can take
- when a subquery is the correct choice
- what a Common Table Expression (CTE) is
- non-recursive and recursive CTEs, with examples drawn from real reporting work
- a side-by-side comparison covering readability, debugging, and performance
- the rule of thumb I now use to decide between the two
Most of the examples below are loosely modelled on the kinds of tables I work with day to day — member records in a SACCO, KYC verification logs, trade tickets, and transaction ledgers. The schemas are simplified for readability, but the patterns are the ones that show up in production reporting.
What Is a Subquery?
A subquery is a SELECT statement embedded inside another SQL statement. The inner query is evaluated, its result is handed to the outer query, and the outer query uses that result as a value, a row, a list, or a derived table.
Here is the kind of query I run when reviewing trade activity — pulling client orders that exceed the average ticket size for the week:
SELECT client_id, instrument_code, order_value
FROM trade_orders
WHERE trade_date BETWEEN '2026-04-13' AND '2026-04-19'
AND order_value > (
SELECT AVG(order_value)
FROM trade_orders
WHERE trade_date BETWEEN '2026-04-13' AND '2026-04-19'
);
The inner query computes one number — the weekly average. The outer query uses that number as the comparison threshold. The subquery is doing exactly one job: producing a value for the WHERE clause.
That is the core idea. Where a subquery sits in the statement, and what it returns, determines what type it is.
The Four Types of Subqueries
1. Scalar Subquery
Returns a single value — one row, one column. You can drop it anywhere a single value is valid: a SELECT list, a WHERE comparison, a HAVING clause.
Example — showing each member's deposit balance alongside the SACCO-wide average:
SELECT
member_id,
deposit_balance,
(SELECT AVG(deposit_balance) FROM member_accounts) AS scheme_average
FROM member_accounts;
I use scalar subqueries most often when I need a benchmark figure on every row of a report without joining a second table for it.
2. Single-Row Subquery
Returns one row with one or more columns. Useful when the comparison needs more than one value but the inner query is guaranteed to produce a single tuple.
Example — finding the trade ticket that matches the highest-value order of the day:
SELECT *
FROM trade_orders
WHERE (order_value, trade_date) = (
SELECT MAX(order_value), CURRENT_DATE
FROM trade_orders
WHERE trade_date = CURRENT_DATE
);
The operators that work here are the standard comparison set: =, <>, >, <, >=, <=.
3. Multi-Row Subquery
Returns a list of values. Used with IN, NOT IN, ANY, ALL, or EXISTS.
Example — pulling all KYC records that belong to clients flagged in a sanctions screening run:
SELECT kyc_id, full_name, id_number, verification_status
FROM kyc_records
WHERE client_id IN (
SELECT client_id
FROM sanctions_screening_hits
WHERE hit_confidence >= 0.85
AND screening_run_date = '2026-04-25'
);
This is the workhorse subquery in compliance work. You almost always have one set of identifiers and need to filter another table by them.
4. Correlated Subquery
The inner query references a column from the outer query, so it cannot be evaluated independently. It runs once per row of the outer query.
Example — finding members whose latest deposit is larger than their own twelve-month average:
SELECT m.member_id, m.full_name, d.deposit_amount, d.deposit_date
FROM member_accounts m
JOIN deposits d ON d.member_id = m.member_id
WHERE d.deposit_amount > (
SELECT AVG(deposit_amount)
FROM deposits
WHERE member_id = m.member_id
AND deposit_date >= CURRENT_DATE - INTERVAL '12 months'
);
Correlated subqueries are powerful but expensive. On a deposits table with a few million rows, this pattern can take the query from sub-second to several minutes. I treat correlated subqueries as a last resort when I cannot express the logic with a window function or a CTE join.
When a Subquery Is the Right Choice
Subqueries earn their place when:
- the logic is used once and only once in the statement
- the inner result is a single scalar or a short list of identifiers
- the query is short enough that the nesting does not hurt the reader
- the subquery sits inside a
WHERE,HAVING, orSELECTand feels like part of the predicate
For a quick filter — "trades above the daily average", "clients matching a screening list", "the latest record per group via MAX()" — a subquery is direct and the optimiser handles it well. The moment I find myself pasting the same subquery into two different parts of the same statement, or nesting subqueries three levels deep, I stop and rewrite the thing as a CTE.
What Is a CTE (Common Table Expression)?
A Common Table Expression is a named, temporary result set declared with the WITH keyword and referenced like a table within the statement that follows. The CTE exists only for the duration of that statement.
Here is the same "trades above the weekly average" query rewritten as a CTE:
WITH weekly_average AS (
SELECT AVG(order_value) AS avg_order_value
FROM trade_orders
WHERE trade_date BETWEEN '2026-04-13' AND '2026-04-19'
)
SELECT t.client_id, t.instrument_code, t.order_value
FROM trade_orders t
CROSS JOIN weekly_average w
WHERE t.trade_date BETWEEN '2026-04-13' AND '2026-04-19'
AND t.order_value > w.avg_order_value;
Two things change. First, the average is named — weekly_average — so a reader can see what the intermediate step represents without parsing the inner query. Second, if I needed that same average a second time in the statement, I would not have to recompute it.
CTEs are a structural tool. They turn a query into a sequence of named steps.
The Two Types of CTE
1. Non-Recursive CTE
The standard form. Used for breaking a complex query into stages, deduplicating logic, or replacing nested subqueries with named blocks.
Example — a reconciliation report that lines up daily transaction totals from the core ledger against the totals reported by a payment processor:
WITH ledger_totals AS (
SELECT transaction_date,
SUM(amount) AS ledger_amount,
COUNT(*) AS ledger_count
FROM core_ledger
WHERE transaction_date = '2026-04-25'
GROUP BY transaction_date
),
processor_totals AS (
SELECT settlement_date AS transaction_date,
SUM(net_amount) AS processor_amount,
COUNT(*) AS processor_count
FROM processor_settlements
WHERE settlement_date = '2026-04-25'
GROUP BY settlement_date
)
SELECT
l.transaction_date,
l.ledger_count,
p.processor_count,
l.ledger_count - p.processor_count AS count_variance,
l.ledger_amount,
p.processor_amount,
l.ledger_amount - p.processor_amount AS amount_variance
FROM ledger_totals l
JOIN processor_totals p USING (transaction_date);
Each CTE handles one source. The final SELECT does the comparison. If the variance comes out wrong, I know exactly which block to inspect — I can run SELECT * FROM ledger_totals mentally and see what that step produced.
This is the structure I default to for any reporting query that touches more than two tables or involves more than one aggregation step.
2. Recursive CTE
A recursive CTE references itself. It is the standard tool for hierarchical data — anything that forms a tree or a chain.
Example — walking a SACCO's member referral chain. Each member has a referred_by column pointing at the member who introduced them. I want the full upline for a given member:
WITH RECURSIVE referral_chain AS (
-- anchor: start with the member of interest
SELECT member_id, full_name, referred_by, 0 AS depth
FROM member_accounts
WHERE member_id = 'M-104782'
UNION ALL
-- recursive step: walk up to each referrer
SELECT m.member_id, m.full_name, m.referred_by, rc.depth + 1
FROM member_accounts m
JOIN referral_chain rc ON m.member_id = rc.referred_by
)
SELECT depth, member_id, full_name
FROM referral_chain
ORDER BY depth;
Other places I have used recursive CTEs:
- expanding a chart of accounts where each account has a
parent_account_id - tracing a chain of related transactions in a fraud review (transaction A funded transaction B funded transaction C)
- reconstructing organisation hierarchies for branch-level reporting
Without recursion, this kind of traversal requires an unknown number of self-joins. With a recursive CTE, the depth is determined by the data.
Subqueries vs CTEs: The Honest Comparison
| Dimension | Subquery | CTE |
|---|---|---|
| Readability of complex logic | Drops off quickly past two levels | Stays linear regardless of depth |
| Reuse within the same statement | Must be rewritten each time | Defined once, referenced many times |
| Debugging | Must isolate by commenting out and rebuilding | Each block can be inspected independently |
| Recursion | Not supported | Supported with WITH RECURSIVE
|
| Verbosity for simple filters | Compact | Adds boilerplate |
| Optimiser treatment | Often inlined and merged with the outer query | Varies by engine — see below |
A Note on Performance
There is a popular claim that "CTEs are slower than subqueries". The truth is more specific than that, and it depends on the database engine.
-
PostgreSQL versions before 12 treated CTEs as an optimisation fence — the planner materialised the CTE and could not push predicates into it. That genuinely made some CTE queries slower than the equivalent subquery. From version 12 onward, non-recursive CTEs without side effects and referenced only once are inlined by default, behaving like subqueries. You can force the old behaviour with
WITH ... AS MATERIALIZEDor prevent it withAS NOT MATERIALIZED. - SQL Server has always inlined CTEs into the execution plan. A non-recursive CTE and the equivalent subquery typically produce identical plans.
- MySQL added CTE support in 8.0 and treats them similarly to derived tables.
- BigQuery, Snowflake, and Redshift all inline non-recursive CTEs.
The practical implication: on any modern engine, choosing a CTE over a subquery for readability does not cost you anything measurable in most cases. The exception is correlated subqueries against very large tables, which are slow regardless of how you wrap them — the fix there is usually a window function or a join to a pre-aggregated CTE, not a different syntactic form.
When I need to know for certain, I run EXPLAIN ANALYZE (PostgreSQL) or check the actual execution plan (SQL Server). Guesses about performance are worth less than the plan in front of you.
How I Decide Between the Two
My working rule, after a few years of writing this kind of code:
Reach for a subquery when the logic is one short step, used once, and reads naturally in place. A scalar in the SELECT list, a list of IDs in an IN clause, an EXISTS check.
Reach for a CTE when any of these are true:
- the same intermediate result is needed more than once
- the query has more than two logical stages (filter → aggregate → compare, for example)
- the logic is hierarchical and needs recursion
- a colleague will read this query in three months, and I want them to understand it without a debugging session
The subquery and the CTE solve overlapping problems. The choice is rarely about performance on modern engines — it is about whether the structure of your code matches the structure of the problem you are solving.
In compliance and reconciliation work, where queries get audited and revisited, I default to CTEs the moment a query crosses two stages. The five extra lines of WITH boilerplate pay for themselves the first time someone — often me — has to come back and explain what the query does.
Top comments (0)