SQL Subqueries vs CTEs: A Complete Guide for Data Analysts
If you've been writing SQL for a while, you've probably run into situations where a single SELECT statement just doesn't cut it. You need to filter based on aggregated data, break a complex query into logical steps, or reuse a derived dataset multiple times in the same query.
That's where subqueries and CTEs (Common Table Expressions) come in.
In this article, I'll walk you through both concepts clearly what they are, the different types, when to use each, and how they compare in performance and readability. By the end, you'll know exactly which tool to reach for depending on the situation.
What Is a Subquery?
A subquery (also called an inner query or nested query) is a SELECT statement written inside another SQL statement. The outer query uses the result of the inner query as part of its logic.
Think of it like a function call: the inner query runs first, returns a result, and the outer query uses that result.
Basic Example
Suppose you have an orders table and you want to find all customers who placed orders above the average order value:
SELECT customer_id, order_value
FROM orders
WHERE order_value > (
SELECT AVG(order_value)
FROM orders
);
Here, SELECT AVG(order_value) FROM orders is the subquery. It computes the average, and the outer query filters rows against it.
Types of Subqueries
Subqueries come in several forms depending on where they appear and how they relate to the outer query.
1. Scalar Subquery
Returns a single value (one row, one column). Used in SELECT, WHERE, or HAVING clauses.
SELECT
product_name,
price,
(SELECT AVG(price) FROM products) AS avg_price
FROM products;
This adds the overall average price as a column alongside each product row.
2. Column Subquery (Multi-Row Subquery)
Returns one column with multiple rows. Typically used with IN, ANY, or ALL.
SELECT customer_name
FROM customers
WHERE customer_id IN (
SELECT customer_id
FROM orders
WHERE order_date >= '2024-01-01'
);
This returns the names of customers who placed at least one order in 2024.
3. Row Subquery
Returns one row with multiple columns. Useful for row-level comparisons.
SELECT *
FROM employees
WHERE (department_id, salary) = (
SELECT department_id, MAX(salary)
FROM employees
WHERE department_id = 3
GROUP BY department_id
);
4. Table Subquery (Derived Table)
Returns a full result set (multiple rows and columns) used in the FROM clause. Often called a derived table.
SELECT dept_summary.department, dept_summary.avg_salary
FROM (
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
) AS dept_summary
WHERE dept_summary.avg_salary > 50000;
The inner query builds a temporary table, and the outer query filters it further.
5. Correlated Subquery
This is the most powerful and most expensive type. A correlated subquery references a column from the outer query, meaning it runs once per row of the outer query.
SELECT e.employee_name, e.salary, e.department
FROM employees e
WHERE e.salary > (
SELECT AVG(salary)
FROM employees
WHERE department = e.department
);
Here, e.department is from the outer query. For every employee row, the subquery recalculates the department average. This is powerful but can be slow on large datasets.
When Should You Use Subqueries?
Subqueries are the right tool when:
- You need a quick, one-off filter based on an aggregated value (e.g., above average)
- The nested logic is simple and short keeping it inline is readable enough
- You're using
IN,EXISTS, orNOT EXISTSto filter against another table's result - The subquery is not reused elsewhere in the same query
- You're writing in an environment that doesn't support CTEs (older MySQL versions, for example)
Use EXISTS instead of IN for large datasets it short-circuits as soon as a match is found:
-- Faster than IN for large tables
SELECT customer_name
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);
What Are CTEs (Common Table Expressions)?
A CTE is a named temporary result set defined using the WITH keyword at the beginning of a query. You can think of it like creating a named "view" that only lives for the duration of that single query.
Basic Syntax
WITH cte_name AS (
-- Your inner query here
SELECT column1, column2
FROM some_table
WHERE condition
)
SELECT *
FROM cte_name;
Example
WITH high_value_orders AS (
SELECT customer_id, SUM(order_value) AS total_spent
FROM orders
GROUP BY customer_id
HAVING SUM(order_value) > 10000
)
SELECT c.customer_name, h.total_spent
FROM customers c
JOIN high_value_orders h ON c.customer_id = h.customer_id;
The CTE high_value_orders is defined once and then used in the main query like a regular table.
Types and Use Cases of CTEs
1. Simple (Non-Recursive) CTE
The standard form: it defines a named result set used once or multiple times in the main query.
Use case: Cleaning up complex FROM clause logic, improving readability.
WITH active_products AS (
SELECT product_id, product_name, price
FROM products
WHERE is_active = TRUE
)
SELECT * FROM active_products WHERE price < 500;
2. Multiple CTEs
You can chain multiple CTEs in a single WITH block, separated by commas.
Use case: Breaking a complex query into clear, logical steps.
WITH
sales_2024 AS (
SELECT product_id, SUM(quantity) AS total_units
FROM sales
WHERE YEAR(sale_date) = 2024
GROUP BY product_id
),
product_info AS (
SELECT product_id, product_name, category
FROM products
WHERE is_active = TRUE
)
SELECT p.product_name, p.category, s.total_units
FROM product_info p
JOIN sales_2024 s ON p.product_id = s.product_id
ORDER BY s.total_units DESC;
Each CTE builds on the previous ones, making the logic easy to follow step by step.
3. Recursive CTE
A recursive CTE is one that references itself. It's used to traverse hierarchical or tree-structured data like org charts, category trees, or file directories.
Structure:
WITH RECURSIVE cte_name AS (
-- Anchor member (base case)
SELECT ...
FROM table
WHERE condition
UNION ALL
-- Recursive member (references the CTE itself)
SELECT ...
FROM table
JOIN cte_name ON ...
)
SELECT * FROM cte_name;
Use case: Traversing an employee hierarchy
WITH RECURSIVE org_chart AS (
-- Start with the CEO (no manager)
SELECT employee_id, employee_name, manager_id, 1 AS level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Add each employee's direct reports
SELECT e.employee_id, e.employee_name, e.manager_id, oc.level + 1
FROM employees e
JOIN org_chart oc ON e.manager_id = oc.employee_id
)
SELECT * FROM org_chart ORDER BY level;
This walks down the entire management tree level by level, something impossible to do cleanly with a regular subquery.
Subqueries vs CTEs: A Clear Comparison
| Factor | Subquery | CTE |
|---|---|---|
| Readability | Can get deeply nested and hard to follow | Highly readable; named blocks read like prose |
| Reusability | Must be duplicated if needed more than once | Defined once, referenced many times |
| Recursion | Not supported | Supported (Recursive CTEs) |
| Debugging | Hard; inner queries cannot be isolated | Easy; each CTE can be run independently |
| Performance | Varies; correlated subqueries can be slow | Generally similar; optimizer often treats them the same |
| Scope | Exists inline within the query | Scoped to the entire query block |
| Compatibility | Supported in virtually all SQL versions | Requires SQL:1999+ (most modern databases support it) |
Performance: Does It Actually Matter?
This is a common question and the honest answer is: it depends on your database engine.
In most modern databases (PostgreSQL, SQL Server, BigQuery, MySQL 8+), the query optimizer often treats CTEs and subqueries equivalently. It may inline a CTE and execute it the same way it would a derived table subquery.
However, there are important nuances:
- PostgreSQL pre-v12 treated CTEs as optimization fences they were always materialized (computed once and stored), which could either help or hurt performance depending on the query. From v12 onwards, the optimizer decides whether to inline or materialize.
- Correlated subqueries are almost always slower than their CTE equivalent because they execute repeatedly per row.
- Multiple references to the same subquery are inefficient. A CTE avoids this by computing the result once.
Rule of thumb: Don't optimize prematurely. Write for readability first using CTEs. If you have a performance problem, measure with EXPLAIN and optimize from there.
Readability: Where CTEs Win
Compare these two approaches to the same problem finding the top customers per region:
With a nested subquery:
SELECT region, customer_name, total_spent
FROM (
SELECT c.region, c.customer_name, SUM(o.order_value) AS total_spent,
RANK() OVER (PARTITION BY c.region ORDER BY SUM(o.order_value) DESC) AS rnk
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.region, c.customer_name
) ranked
WHERE rnk = 1;
With a CTE:
WITH customer_totals AS (
SELECT c.region, c.customer_name, SUM(o.order_value) AS total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.region, c.customer_name
),
ranked_customers AS (
SELECT *, RANK() OVER (PARTITION BY region ORDER BY total_spent DESC) AS rnk
FROM customer_totals
)
SELECT region, customer_name, total_spent
FROM ranked_customers
WHERE rnk = 1;
Both return the same result. But the CTE version is self-documenting each step has a name that explains what it does. Any teammate (or future you) can read it top to bottom like a story.
When to Use Each: A Decision Guide
Use a subquery when:
- The logic is simple and short (a single filter or lookup)
- It's only used once and doesn't need a name
- You're using
EXISTS,NOT EXISTS,IN, orNOT IN - You want maximum compatibility across older SQL environments
Use a CTE when:
- The logic is complex or involves multiple steps
- You need to reference the same derived dataset more than once
- You want your query to be easily readable and debuggable
- You're working with hierarchical data (recursive CTE)
- You're collaborating with a team and code clarity matters
Final Thoughts
Subqueries and CTEs both solve the same core problem: letting you build queries on top of queries. The difference lies in how you express that logic.
Subqueries are concise and widely supported, making them great for simple, inline filtering. CTEs are structured and readable, making them essential for anything complex, multi-step, or hierarchical.
As you grow in your SQL journey, you'll find yourself naturally reaching for CTEs more often, not just because they're more readable, but because they make your thinking explicit. A well-named CTE is documentation built right into your query.
The best SQL isn't just correct, it is clear.
If this helped you, drop a like or leave a comment below. I'd love to know how you use subqueries and CTEs in your own work, Thank you.
Top comments (0)