Jesse Ngugi

Posted on Apr 26

Subqueries vs CTEs in SQL: Master Nested Queries and Write Cleaner, Smarter Code

#sqlsubqueries #commontableexpressions #ctesinsql #sqlbestpractices

In the world of relational databases, writing efficient and readable SQL queries is both an art and a science. As data grows more complex, developers and analysts often need to nest logic and aggregate results conditionally. Two powerful tools for handling these scenarios are subqueries and Common Table Expressions (CTEs). While they sometimes achieve similar outcomes, they differ significantly in readability, performance characteristics and ideal use cases.

This article provides a deep dive into both concepts. You’ll learn exactly what subqueries and CTEs are, the various types of each, when to reach for them, and—most importantly—a clear head-to-head comparison. By the end, you’ll have the knowledge to choose the right approach for any query, whether you’re working in PostgreSQL, SQL Server, MySQL, Oracle or SQLite. All examples use standard American National Standards Institute (ANSI) SQL unless noted.

What Is a Subquery?

A subquery (also called a nested query or inner query) is simply a SELECT statement embedded inside another SQL statement. The inner query runs first, produces a result set (scalar value, single row, multiple rows, or even a table), and that result is then used by the outer query.

Think of it like a helper query that feeds data into the main query. Subqueries can appear in the SELECT list, FROM clause, WHERE clause, HAVING clause, or even in INSERT/UPDATE/DELETE statements.

Here’s a classic example:

SELECT employee_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

The subquery (SELECT AVG(salary) FROM employees) calculates the average salary across the entire table and returns a single number. The outer query then filters employees earning above that average.

Subqueries make SQL extremely expressive. They allow you to perform calculations or lookups without needing to join entire tables or create temporary tables manually.

Different Types of Subqueries

Subqueries are commonly classified in two overlapping ways: by the shape of the data they return and by whether they are correlated with the outer query.

1. By data returned:

Scalar subqueries: Return exactly one value (one row, one column). Often used with comparison operators (=, >, <).
Single-row subqueries: Return one row with multiple columns. Used with row constructors or operators like IN.
Multi-row (column) subqueries: Return multiple rows but a single column. Typically paired with IN, ANY, ALL, or EXISTS.
Table subqueries (or derived tables): Return multiple rows and columns. These are placed in the FROM clause and treated like a virtual table.

2. By dependency:

Non-correlated subqueries: Independent of the outer query. They run once and their result is reused. Most efficient.
Correlated subqueries: Reference columns from the outer query. They execute once for every row of the outer query (similar to a loop). Common with EXISTS or when comparing row-by-row.

Example of a correlated subquery (find employees earning more than the average in their own department):

SELECT employee_name, department_id, salary
FROM employees e
WHERE salary > (SELECT AVG(salary)
                FROM employees
                WHERE department_id = e.department_id);

The inner query references e.department_id from the outer alias, so it re-runs for each employee.

When Should You Use Subqueries?

Use subqueries when:

You need a quick calculation (e.g., compare against an aggregate).
The logic is simple and doesn’t repeat.
You want to filter or join based on a dynamic value that’s not worth the time to compute otherwise.
Replacing a complex JOIN would make the query harder to understand.
Performing INSERT/UPDATE based on another table’s data.

Common real-world scenarios include:

Finding “top N” per group without window functions.
Existence checks (EXISTS subqueries are often optimized well).
Dynamic date ranges or thresholds.

However, deeply nested subqueries or heavy use of correlated subqueries can hurt performance and readability. That’s where CTEs often shine.

What Are Common Table Expressions (CTEs)?

A Common Table Expression (CTE) is a named temporary result set that exists only for the duration of a single query. You define it using the WITH clause at the very beginning of your statement.

Basic syntax:

WITH cte_name AS (
    SELECT ...  -- your subquery logic here
)
SELECT ... FROM cte_name ...;

Unlike subqueries, CTEs have a name you can reference multiple times in the main query (or in subsequent CTEs). They behave like readable, reusable temporary views.

CTEs were introduced in the SQL:1999 standard and are now supported by virtually every modern database system.

Different Types and Use Cases of CTEs

1. Non-recursive CTEs

These are the most common. They simply name and store a result set for later use. Great for breaking complex queries into logical steps.

Example – sales by department with multiple references:

WITH dept_sales AS (
    SELECT department_id, SUM(amount) AS total_sales
    FROM sales
    GROUP BY department_id
),
high_performers AS (
    SELECT d.department_name, ds.total_sales
    FROM departments d
    JOIN dept_sales ds ON d.id = ds.department_id
    WHERE ds.total_sales > 100000
)
SELECT * FROM high_performers
ORDER BY total_sales DESC;

You can reference dept_sales as many times as needed without repeating the logic.

2. Recursive CTEs

These use WITH RECURSIVE and include two parts separated by UNION ALL:

Anchor member: The base case (starting point).
Recursive member: References the CTE itself to build further rows.

Perfect for hierarchical or graph-like data: org charts, bill-of-materials, folder structures, or transitive closures.

Classic recursive example – employee hierarchy:

WITH RECURSIVE org_chart AS (
    -- Anchor: top-level managers
    SELECT employee_id, manager_id, employee_name, 0 AS level
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive: find direct reports
    SELECT e.employee_id, e.manager_id, e.employee_name, o.level + 1
    FROM employees e
    JOIN org_chart o ON e.manager_id = o.employee_id
)
SELECT * FROM org_chart
ORDER BY level, employee_name;

This query keeps joining until no more employees are found.

Key use cases for CTEs:

Improving readability in queries with 5+ joins or multiple aggregations.
Chaining calculations (e.g., first calculate totals, then percentages, then rankings).
Recursive traversal of trees or graphs.
Creating “step-by-step” logic that’s easier to debug.
Replacing temporary tables or views when you only need the result once.
Writing modular queries that non-technical stakeholders can more easily follow.

Subqueries vs. CTEs: Performance, Readability and When to Choose Each

Both tools solve nested-logic problems, but they have distinct strengths.

Readability

CTEs win decisively for anything beyond simple nesting. Naming a result set (high_value_customers) is far clearer than reading three levels of nested parentheses. CTEs read top-to-bottom like a story, while subqueries can feel like Russian dolls. For team codebases or long-term maintenance, CTEs reduce cognitive load dramatically.

Reusability

A CTE can be referenced multiple times in the same query without repeating code. A subquery must be duplicated if used more than once. This also helps the query optimizer in some databases.

Performance

In modern database engines, the query optimizer often rewrites both into similar execution plans. However:

Non-correlated subqueries and simple CTEs usually perform identically.
Correlated subqueries can cause row-by-row execution (N+1 problem) unless the optimizer rewrites them.
Recursive CTEs are the only clean way to handle recursion; subqueries cannot do true recursion.
Some databases (SQL Server, PostgreSQL) allow materializing CTEs with hints, which can help or hurt depending on data volume.
MySQL historically had limitations with subquery materialization; newer versions handle CTEs better.

Scope and Capabilities

Subqueries can be used inside other subqueries or even in the SELECT list.
CTEs can reference previous CTEs (chaining) and are visible to the entire query.
CTEs work nicely with window functions, recursive logic, and can be used in views or stored procedures more elegantly.

When to use each:

Scenario	Prefer Subquery	Prefer CTE
Simple one-time filter/calculation	Yes (cleaner)	Overkill
Deeply nested logic	Hard to read	Yes (step-by-step)
Need to reference result multiple times	Repeat subquery (ugly)	Yes
Recursive/hierarchical data	Not possible	Yes (only option)
Performance-critical simple query	Often faster (inlined)	Equivalent
Complex multi-step analytics	Messy	Yes (highly readable)
Quick ad-hoc query	Fast to write	Slightly more verbose

Rule of thumb:

If your query has 2–3 levels of nesting and is easy to understand → subquery is fine.
If you find yourself scrolling up and down to understand the logic, or repeating code, or dealing with hierarchies → rewrite as CTEs.
For maximum performance, always test with EXPLAIN/EXPLAIN ANALYZE on your specific data and database version.

Best Practices and Final Thoughts

Keep subqueries shallow (no more than two levels) unless the logic is trivial.
Always alias subqueries and CTEs clearly.
Use CTEs to name intermediate results the way you would name variables in code.
For recursive CTEs, always ensure a termination condition to prevent infinite loops.
Monitor execution plans—don’t assume one is always faster.
In large datasets, consider indexing columns used in WHERE clauses of subqueries/CTEs.
When possible, prefer window functions or JOINs over subqueries/CTEs for better performance, but don’t sacrifice clarity.

Both subqueries and CTEs are essential tools in every SQL developer’s toolkit. Subqueries offer concise power for straightforward nesting, while CTEs bring elegance, readability, and recursion to complex analytical workloads. Mastering both—and knowing exactly when to choose one over the other—will make your queries faster to write, easier to maintain, and more performant in production.

The next time you face a tangled query, pause and ask: “Could this be clearer as a CTE?” or “Is this simple enough for a subquery?” Your future self (and your teammates) will thank you.

Happy querying!