Kahuthu Muriuki

Posted on Apr 26

SQL Subqueries vs CTEs: What I Reach For When Writing Compliance and Reconciliation Queries

#beginners #database #sql

When I started writing SQL against transaction data in fintech and securities brokerage environments, my queries were a mess. Three levels of nesting, the same aggregate computed twice, and a debugging process that involved commenting out chunks until something ran. The shift came when I stopped treating subqueries as the only way to compose logic and started picking the right structure for the job.

This article walks through:

what a subquery actually is
the four shapes a subquery can take
when a subquery is the correct choice
what a Common Table Expression (CTE) is
non-recursive and recursive CTEs, with examples drawn from real reporting work
a side-by-side comparison covering readability, debugging, and performance
the rule of thumb I now use to decide between the two

Most of the examples below are loosely modelled on the kinds of tables I work with day to day — member records in a SACCO, KYC verification logs, trade tickets, and transaction ledgers. The schemas are simplified for readability, but the patterns are the ones that show up in production reporting.

What Is a Subquery?

A subquery is a SELECT statement embedded inside another SQL statement. The inner query is evaluated, its result is handed to the outer query, and the outer query uses that result as a value, a row, a list, or a derived table.

Here is the kind of query I run when reviewing trade activity — pulling client orders that exceed the average ticket size for the week:

SELECT client_id, instrument_code, order_value
FROM trade_orders
WHERE trade_date BETWEEN '2026-04-13' AND '2026-04-19'
  AND order_value > (
      SELECT AVG(order_value)
      FROM trade_orders
      WHERE trade_date BETWEEN '2026-04-13' AND '2026-04-19'
  );

The inner query computes one number — the weekly average. The outer query uses that number as the comparison threshold. The subquery is doing exactly one job: producing a value for the WHERE clause.

That is the core idea. Where a subquery sits in the statement, and what it returns, determines what type it is.

The Four Types of Subqueries

1. Scalar Subquery

Returns a single value — one row, one column. You can drop it anywhere a single value is valid: a SELECT list, a WHERE comparison, a HAVING clause.

Example — showing each member's deposit balance alongside the SACCO-wide average:

SELECT
    member_id,
    deposit_balance,
    (SELECT AVG(deposit_balance) FROM member_accounts) AS scheme_average
FROM member_accounts;

I use scalar subqueries most often when I need a benchmark figure on every row of a report without joining a second table for it.

2. Single-Row Subquery

Returns one row with one or more columns. Useful when the comparison needs more than one value but the inner query is guaranteed to produce a single tuple.

Example — finding the trade ticket that matches the highest-value order of the day:

SELECT *
FROM trade_orders
WHERE (order_value, trade_date) = (
    SELECT MAX(order_value), CURRENT_DATE
    FROM trade_orders
    WHERE trade_date = CURRENT_DATE
);

The operators that work here are the standard comparison set: =, <>, >, <, >=, <=.

3. Multi-Row Subquery

Returns a list of values. Used with IN, NOT IN, ANY, ALL, or EXISTS.

Example — pulling all KYC records that belong to clients flagged in a sanctions screening run:

SELECT kyc_id, full_name, id_number, verification_status
FROM kyc_records
WHERE client_id IN (
    SELECT client_id
    FROM sanctions_screening_hits
    WHERE hit_confidence >= 0.85
      AND screening_run_date = '2026-04-25'
);

This is the workhorse subquery in compliance work. You almost always have one set of identifiers and need to filter another table by them.

4. Correlated Subquery

The inner query references a column from the outer query, so it cannot be evaluated independently. It runs once per row of the outer query.

Example — finding members whose latest deposit is larger than their own twelve-month average:

SELECT m.member_id, m.full_name, d.deposit_amount, d.deposit_date
FROM member_accounts m
JOIN deposits d ON d.member_id = m.member_id
WHERE d.deposit_amount > (
    SELECT AVG(deposit_amount)
    FROM deposits
    WHERE member_id = m.member_id
      AND deposit_date >= CURRENT_DATE - INTERVAL '12 months'
);

Correlated subqueries are powerful but expensive. On a deposits table with a few million rows, this pattern can take the query from sub-second to several minutes. I treat correlated subqueries as a last resort when I cannot express the logic with a window function or a CTE join.

When a Subquery Is the Right Choice

Subqueries earn their place when:

the logic is used once and only once in the statement
the inner result is a single scalar or a short list of identifiers
the query is short enough that the nesting does not hurt the reader
the subquery sits inside a WHERE, HAVING, or SELECT and feels like part of the predicate

For a quick filter — "trades above the daily average", "clients matching a screening list", "the latest record per group via MAX()" — a subquery is direct and the optimiser handles it well. The moment I find myself pasting the same subquery into two different parts of the same statement, or nesting subqueries three levels deep, I stop and rewrite the thing as a CTE.

What Is a CTE (Common Table Expression)?

A Common Table Expression is a named, temporary result set declared with the WITH keyword and referenced like a table within the statement that follows. The CTE exists only for the duration of that statement.

Here is the same "trades above the weekly average" query rewritten as a CTE:

WITH weekly_average AS (
    SELECT AVG(order_value) AS avg_order_value
    FROM trade_orders
    WHERE trade_date BETWEEN '2026-04-13' AND '2026-04-19'
)
SELECT t.client_id, t.instrument_code, t.order_value
FROM trade_orders t
CROSS JOIN weekly_average w
WHERE t.trade_date BETWEEN '2026-04-13' AND '2026-04-19'
  AND t.order_value > w.avg_order_value;

Two things change. First, the average is named — weekly_average — so a reader can see what the intermediate step represents without parsing the inner query. Second, if I needed that same average a second time in the statement, I would not have to recompute it.

CTEs are a structural tool. They turn a query into a sequence of named steps.

The Two Types of CTE

1. Non-Recursive CTE

The standard form. Used for breaking a complex query into stages, deduplicating logic, or replacing nested subqueries with named blocks.

Example — a reconciliation report that lines up daily transaction totals from the core ledger against the totals reported by a payment processor:

WITH ledger_totals AS (
    SELECT transaction_date,
           SUM(amount) AS ledger_amount,
           COUNT(*)    AS ledger_count
    FROM core_ledger
    WHERE transaction_date = '2026-04-25'
    GROUP BY transaction_date
),
processor_totals AS (
    SELECT settlement_date AS transaction_date,
           SUM(net_amount) AS processor_amount,
           COUNT(*)        AS processor_count
    FROM processor_settlements
    WHERE settlement_date = '2026-04-25'
    GROUP BY settlement_date
)
SELECT
    l.transaction_date,
    l.ledger_count,
    p.processor_count,
    l.ledger_count - p.processor_count           AS count_variance,
    l.ledger_amount,
    p.processor_amount,
    l.ledger_amount - p.processor_amount         AS amount_variance
FROM ledger_totals l
JOIN processor_totals p USING (transaction_date);

Each CTE handles one source. The final SELECT does the comparison. If the variance comes out wrong, I know exactly which block to inspect — I can run SELECT * FROM ledger_totals mentally and see what that step produced.

This is the structure I default to for any reporting query that touches more than two tables or involves more than one aggregation step.

2. Recursive CTE

A recursive CTE references itself. It is the standard tool for hierarchical data — anything that forms a tree or a chain.

Example — walking a SACCO's member referral chain. Each member has a referred_by column pointing at the member who introduced them. I want the full upline for a given member:

WITH RECURSIVE referral_chain AS (
    -- anchor: start with the member of interest
    SELECT member_id, full_name, referred_by, 0 AS depth
    FROM member_accounts
    WHERE member_id = 'M-104782'

    UNION ALL

    -- recursive step: walk up to each referrer
    SELECT m.member_id, m.full_name, m.referred_by, rc.depth + 1
    FROM member_accounts m
    JOIN referral_chain rc ON m.member_id = rc.referred_by
)
SELECT depth, member_id, full_name
FROM referral_chain
ORDER BY depth;

Other places I have used recursive CTEs:

expanding a chart of accounts where each account has a parent_account_id
tracing a chain of related transactions in a fraud review (transaction A funded transaction B funded transaction C)
reconstructing organisation hierarchies for branch-level reporting

Without recursion, this kind of traversal requires an unknown number of self-joins. With a recursive CTE, the depth is determined by the data.

Subqueries vs CTEs: The Honest Comparison

Dimension	Subquery	CTE
Readability of complex logic	Drops off quickly past two levels	Stays linear regardless of depth
Reuse within the same statement	Must be rewritten each time	Defined once, referenced many times
Debugging	Must isolate by commenting out and rebuilding	Each block can be inspected independently
Recursion	Not supported	Supported with `WITH RECURSIVE`
Verbosity for simple filters	Compact	Adds boilerplate
Optimiser treatment	Often inlined and merged with the outer query	Varies by engine — see below

A Note on Performance

There is a popular claim that "CTEs are slower than subqueries". The truth is more specific than that, and it depends on the database engine.

PostgreSQL versions before 12 treated CTEs as an optimisation fence — the planner materialised the CTE and could not push predicates into it. That genuinely made some CTE queries slower than the equivalent subquery. From version 12 onward, non-recursive CTEs without side effects and referenced only once are inlined by default, behaving like subqueries. You can force the old behaviour with WITH ... AS MATERIALIZED or prevent it with AS NOT MATERIALIZED.
SQL Server has always inlined CTEs into the execution plan. A non-recursive CTE and the equivalent subquery typically produce identical plans.
MySQL added CTE support in 8.0 and treats them similarly to derived tables.
BigQuery, Snowflake, and Redshift all inline non-recursive CTEs.

The practical implication: on any modern engine, choosing a CTE over a subquery for readability does not cost you anything measurable in most cases. The exception is correlated subqueries against very large tables, which are slow regardless of how you wrap them — the fix there is usually a window function or a join to a pre-aggregated CTE, not a different syntactic form.

When I need to know for certain, I run EXPLAIN ANALYZE (PostgreSQL) or check the actual execution plan (SQL Server). Guesses about performance are worth less than the plan in front of you.

How I Decide Between the Two

My working rule, after a few years of writing this kind of code:

Reach for a subquery when the logic is one short step, used once, and reads naturally in place. A scalar in the SELECT list, a list of IDs in an IN clause, an EXISTS check.

Reach for a CTE when any of these are true:

the same intermediate result is needed more than once
the query has more than two logical stages (filter → aggregate → compare, for example)
the logic is hierarchical and needs recursion
a colleague will read this query in three months, and I want them to understand it without a debugging session

The subquery and the CTE solve overlapping problems. The choice is rarely about performance on modern engines — it is about whether the structure of your code matches the structure of the problem you are solving.

In compliance and reconciliation work, where queries get audited and revisited, I default to CTEs the moment a query crosses two stages. The five extra lines of WITH boilerplate pay for themselves the first time someone — often me — has to come back and explain what the query does.

DEV Community