Sharon-nyabuto

Posted on Apr 21

SQL Subqueries vs. CTEs: A Guide to Writing Better Queries

#sql #learning

When you first learn SQL, queries are usually straightforward: you SELECT some columns from a table and apply a WHERE filter to narrow down the results.

As your data questions become more complex, however, you realize sometimes you need the answer to one question before you can answer another.
For example: "Which employees earn above the company average?", requires calculating the average salary first, then use the result to filter employees.

To solve this, SQL provides two incredibly powerful tools: Subqueries and Common Table Expressions (CTEs).

This guide will break down these tools and cover:

Part 1: Subqueries
- What subqueries are and how they work
- The different types of subqueries
- When to use subqueries
Part 2: CTEs (Common Table Expressions)
- What CTEs are and why they matter
- The different types of CTEs
- When to use CTEs
Part 3: A clear comparison of Subqueries vs. CTEs

PART 1: SUBQUERIES

Also known as an inner query or a nested query, a subquery is a query embedded within another SQL query.

Subqueries are typically found nested inside standard SQL clauses such as SELECT, FROM, WHERE, or HAVING.

SQL follows a strict order of execution:

The innermost subquery is always executed first. Its result is then passed on to be used by the outer query (the containing query).

Subqueries allow you to use a single query rather than multiple queries, making your code more flexible and dynamic. They are categorized either by their output or by execution.

Subqueries Categorized by Output

One way to understand subqueries is by looking at the result they return.
A subquery can return a single value, a list of values, or an entire table, depending on the query.

Scalar Subquery

Returns exactly one value (one row, one column).

Scalar subqueries are used with single value comparison operators; =, <, >, <= & >=

For example: "Which students scored above the class average on the math test?" requires a scalar subquery.

(Assuming the class average is 80.)

SELECT student_name, score
FROM students
WHERE score > (SELECT AVG(score)
               FROM students);

Result:

student_name	score
Awino Judy	95
Wanjiku Stacy	88
Kiprop Bryan	82

Note: The scalar subquery SELECT AVG(score) FROM students returns a single value (80), then the outer query uses this value to filter students who scored above the class average

Multi-row Subquery (List Subquery)

Returns a column of values, i.e. multiple rows but one column.
It is often used with IN and NOT IN operators to filter results against a list of values.

For example: "Which students registered for the math exam?" requires a multi-row subquery

SELECT student_id,student_name
FROM students
WHERE student_id 
IN (SELECT student_id 
  FROM exam_registrations 
  WHERE exam_subject = 'Math');

Result:

student_id	student_name
1	Wanjiku Stacy
2	Kiprop Bryan
3	Awino Judy
4	Mutua Henry

The list subquery first generates a list of student IDs from the exam_registrations table, then the outer query uses IN to filter the students table against that list.

Table Subquery (Derived Table)

Returns multiple rows and columns, acting like a temporary table.
This subquery is used in the FROM clause, and must always have an alias.

For example: "Which students scored more than 160 total points in all 3 math tests?" requires a table subquery.

SELECT student_name, total_score
FROM 
(SELECT student_name, 
SUM(score) AS total_score
  FROM exam_results
  GROUP BY student_name) AS student_totals
WHERE total_score > 160;

Result;

student_name	total_score
Awino Nancy	185
Wanjiku Stacy	170
Kiprop Bryan	165

The table subquery ran first to calculate everyone's total score, creating a temporary virtual table which we aliased (student_totals). The outer query used this table to filter the results to get the students with more than 160 marks.

Subqueries Categorized by Execution

Subqueries can also be categorized based on how they work with the rest of the query.

Correlated Subquery

A correlated subquery is dependent on data from the outer query.
It runs for every row in the outer query, using values from the outer query to filter its result.

Uncorrelated (Independent) Subquery

An uncorrelated subquery runs once, and is fully independent of the outer query.
This means that it evaluates its logic exactly once, gets its result, and simply hands that result over to the main query to complete the task.

If you're ever uncertain as to whether a subquery is correlated (dependent) or uncorrelated (independent), you can use the highlight test on your SQL editor.

Highlight just the subquery and run it:

If it returns a result: It is an uncorrelated subquery. It can stand on its own.

If it throws an error: It is a correlated subquery. It depends on information from the outer query to make sense and run correctly.

When to use Subqueries

Subqueries are useful when;

Calculating metrics like averages or totals on the fly, without hardcoding numbers.
Filtering results using a dynamically generated list of values i.e. (List subqueries).
Using row-specific logic
There's need to aggregate data before filtering.
Working with older legacy databases that don't support more modern features.

Subqueries are excellent for simple logic, but when they are deeply nested, they become difficult to read and more difficult to maintain. This is where Common Table Expressions (CTEs) come in.

PART 2: The CTE (COMMON TABLE EXPRESSION)

A CTE is a temporary, named result set defined at the top of a query using the WITH clause, that exists only for the duration of your query.

Once it is defined, you can query and reference the CTE as you would a normal table in your database.

Because a CTE is defined at the top of your script, it allows you to read the code logically from top-to-bottom

The Basic Syntax of a CTE

No matter how complex a query gets, almost every CTE follows this simple, two-part structure:

Define the CTE using the WITH clause at the top.
Use the CTE in your main query.

Syntax;

WITH cte_name AS (
    -- Step 1: Write your query here. 
    -- This creates your temporary, virtual table.
    SELECT column1, column2
    FROM table1
    WHERE condition
)
-- Step 2: Write your main query here, reading from the CTE!
SELECT column1
FROM cte_name;

Now that we know the basic syntax, we can dive into the two types of CTEs commonly encountered: Recursive and Non-Recursive (based on whether or not they reference themselves).

Non-Recursive CTEs
They are temporary result sets that run once and don’t reference themselves.
They’re commonly used for breaking complex queries into modular blocks.

The CTE syntax is;

WITH cte_name AS (
  -- Your SQL logic here (SELECT, JOIN, GROUP BY, etc.)
)
SELECT * FROM cte_name;

Use Case: Replacing nested subqueries for better readability and reusability.

Earlier when we wanted to find students who scored a total of more than 160 points across all their maths tests, We solved it using a Table Subquery in the FROM clause.

Let's try the exact same question, using a CTE.
"Which students scored more than 160 total points in all 3 math tests?"

WITH student_totals AS (
  SELECT student_name, SUM(score) AS total_score
  FROM exam_scores
  GROUP BY student_name
)
SELECT student_name, total_score
FROM student_totals
WHERE total_score > 160;

Result;

student_name	total_score
Awino Nancy	185
Wanjiku Stacy	170
Kiprop Bryan	165

This CTE achieves the same result as our Table Subquery Example, in a cleaner, more readable way.

Recursive CTEs

Recursive CTEs are temporary result sets that reference themselves to explore and map out hierarchical (tree-like) data (e.g., organizational charts, category trees).
Syntax;

WITH RECURSIVE cte_name AS (
  -- Base case: Start with the root of the hierarchy
  SELECT column1, column2 FROM table WHERE parent_id IS NULL
  UNION ALL
  -- Recursive case: Reference the CTE itself to traverse child rows
  SELECT t.column1, t.column2 FROM table t
  JOIN cte_name ON t.parent_id = cte_name.id
)
SELECT * FROM cte_name;

Use Case: Exploring hierarchical or tree structured data (e.g., reporting structures or category hierarchies).

For example: "Alice is a manager at a Nairobi tech startup. Who are the employees in her team hierarchy?" requires a recursive CTE.

WITH RECURSIVE team_hierarchy AS (
  SELECT employee_id, name, manager_id FROM employees WHERE name = 'Alice'
  UNION ALL
  SELECT e.employee_id, e.name, e.manager_id FROM employees e
  JOIN team_hierarchy th 
ON e.manager_id = th.employee_id
)
SELECT * FROM team_hierarchy;

Result;

employee_id	name	manager_id
1	Alice	NULL
2	Bob	1
3	Charlie	1
4	Diana	2
5	Eve	2

This result shows Alice, her direct reports (Bob, Charlie), and Bob’s direct reports (Diana, Eve)—the full team hierarchy.
The line JOIN team_hierarchy th ON e.manager_id = th.employee_id is where the recursion happens.
team_hierarchy is the CTE itself, we are joining the query to its own result. It takes every employee already in the CTE (e.g., Alice) and finds all employees who report to them. This repeats in a loop until no more reports are found.

When to use CTEs

Subqueries are useful when;

A query has multiple logical steps
You need to use the same data multiple times
Querying hierarchical data
You temporarily want to replace a view.
When replacing nested subqueries.

PART 3: SUBQUERIES VS CTEs

Here is a table comparison of subqueries and CTEs, based on architecture, readability and performance.

Feature	Subqueries	CTEs
Readability & Flow	Reads inside-out. Prone to creating "spaghetti code" if nested multiple times.	Reads top-to-bottom. Keeps code clean and highly modular.
Reusability	Single-use only. If you need the same data twice, you have to type the whole subquery again.	Highly reusable. Can be referenced multiple times in the main query
Recursion	Cannot handle self-referencing or hierarchical data.	Perfect for hierarchical/tree-structured data using Recursive CTEs.
Performance	Fast for basic filters. Correlated Subqueries can be slow on large datasets because they loop row-by-row.	Generally the same performance (optimizers treat standard CTEs and subqueries equally). However, CTEs can be temporarily cached in memory in databases like PostgreSQL for speed boosts.
Best Use Case	Quick, single-step math (Scalar), checking lists (`IN` / `EXISTS`), or creating simple derived tables.	Multi-step data transformations, chaining queries, building org charts, or acting as temporary views.

SQL is not just about writing queries, it’s about writing queries that others can easily understand.

When in doubt, choose the option that is easiest for the next developer to understand.

If you only remember one thing from this guide:

Use Subqueries for quick, single-step tasks and simple filters.
Use CTEs for multi-step queries, reusability, and keeping your code readable from top to bottom.

SQL is all about practice. Don’t get discouraged if subqueries and CTEs take a few tries to master, with consistent practice, they get and feel more intuitive.

If you found this guide helpful, don't forget to save it for later.

Do you prefer using subqueries or CTEs in your work?
Let me know in the comments below. Happy querying!

DEV Community

SQL Subqueries vs. CTEs: A Guide to Writing Better Queries

PART 1: SUBQUERIES

Subqueries Categorized by Output

Scalar Subquery

Multi-row Subquery (List Subquery)

Table Subquery (Derived Table)

Subqueries Categorized by Execution

Correlated Subquery

Uncorrelated (Independent) Subquery

When to use Subqueries

PART 2: The CTE (COMMON TABLE EXPRESSION)

The Basic Syntax of a CTE

When to use CTEs

PART 3: SUBQUERIES VS CTEs

Top comments (0)