When I first started learning SQL, I kept hearing two terms thrown around: subqueries and CTEs. They seemed to do similar things, and I wasn't sure when to use one over the other. If you're in the same boat, this guide will clear everything up.
By the end of this article, you'll understand:
- What subqueries are and the different types
- What CTEs are and how to use them
- When to use each approach
- Performance and readability comparisons
Let's dive in!
What is a Subquery?
A subquery (also called an inner query or nested query) is a query placed inside another SQL query. The inner query runs first, and its result is used by the outer query.
Think of it like a Russian nesting doll - a query inside a query inside a query.
Basic Subquery Example
-- Find students who scored above the class average
SELECT first_name, last_name, marks
FROM exam_results
WHERE marks > (SELECT AVG(marks) FROM exam_results);
Here, (SELECT AVG(marks) FROM exam_results) is the subquery. It calculates the average mark first, then the outer query finds all students above that average.
Types of Subqueries
Subqueries can be categorized by where they appear and what they return.
1. By Location
| Type | Location | Example |
|---|---|---|
| Subquery in WHERE | Inside WHERE clause | WHERE marks > (SELECT AVG(marks)...) |
| Subquery in FROM | Inside FROM clause (derived table) | FROM (SELECT * FROM students WHERE city='Nairobi') AS nairobi_students |
| Subquery in SELECT | Inside SELECT clause | SELECT name, (SELECT AVG(marks) FROM results) AS class_avg |
| Subquery in HAVING | Inside HAVING clause | HAVING AVG(marks) > (SELECT AVG(marks) FROM results) |
2. By Return Value
| Type | Returns | Operators Used |
|---|---|---|
| Scalar Subquery | Single value (1 row, 1 column) |
=, >, <, >=, <=, <>
|
| Row Subquery | Single row (1 row, multiple columns) |
=, <> with row constructor |
| Column Subquery | Single column (multiple rows, 1 column) |
IN, NOT IN, ANY, ALL
|
| Table Subquery | Full table (multiple rows & columns) | Used in FROM clause |
Examples of Each Type
Scalar Subquery (Single Value)
SELECT full_name, age
FROM patients
WHERE age = (SELECT MAX(age) FROM patients);
-- Returns the oldest patient(s)
Column Subquery (Multiple Rows, One Column)
SELECT full_name, city
FROM patients
WHERE city IN (SELECT city FROM students WHERE class = 'Form 4');
-- Patients living in cities where Form 4 students live
Table Subquery (Derived Table)
SELECT dept_name, avg_salary
FROM (
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
) AS dept_averages
JOIN departments ON departments.id = dept_averages.department_id;
Correlated vs Non-Correlated Subqueries
This is an important distinction many beginners miss.
| Type | Description | Performance |
|---|---|---|
| Non-Correlated | Inner query runs independently ONCE | ✅ Faster |
| Correlated | Inner query depends on outer query, runs ONCE PER ROW | ⚠️ Slower |
Non-Correlated Example:
SELECT name, marks
FROM exam_results
WHERE marks > (SELECT AVG(marks) FROM exam_results);
-- Inner query runs once
Correlated Example:
SELECT e1.student_id, e1.marks, e1.subject_id
FROM exam_results e1
WHERE e1.marks > (
SELECT AVG(e2.marks)
FROM exam_results e2
WHERE e2.student_id = e1.student_id -- References outer query!
);
-- Inner query runs for EACH student
When Should You Use Subqueries?
✅ Good Use Cases for Subqueries:
- Simple comparisons to aggregates
WHERE salary > (SELECT AVG(salary) FROM employees)
- Checking existence
WHERE EXISTS (SELECT 1 FROM orders WHERE customer_id = customers.id)
- IN / NOT IN conditions
WHERE city IN (SELECT city FROM offices)
- Single-value lookups
SELECT name, (SELECT department_name FROM depts WHERE id = emp.dept_id) AS dept
FROM employees emp;
❌ When NOT to Use Subqueries:
- When the query becomes hard to read (nested 3+ levels deep)
- When you need to reference the same derived table multiple times
- When performance suffers from correlated subqueries on large datasets
What is a CTE (Common Table Expression)?
A CTE (Common Table Expression) is a temporary named result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. It's defined using the WITH keyword.
Think of a CTE as creating a temporary view that exists only for the duration of your query.
Basic CTE Syntax
WITH cte_name AS (
-- Your query here
SELECT column1, column2
FROM table_name
WHERE condition
)
-- Now use the CTE like a regular table
SELECT *
FROM cte_name
WHERE another_condition;
Simple CTE Example
WITH nairobi_students AS (
SELECT student_id, first_name, last_name, class
FROM students
WHERE city = 'Nairobi'
)
SELECT class, COUNT(*) AS student_count
FROM nairobi_students
GROUP BY class
ORDER BY student_count DESC;
Types of CTEs and Their Use Cases
1. Basic CTE (Single CTE)
Used to simplify a complex query by breaking it into logical steps.
WITH high_performers AS (
SELECT student_id, AVG(marks) AS avg_mark
FROM exam_results
GROUP BY student_id
HAVING AVG(marks) > 75
)
SELECT s.first_name, s.last_name, hp.avg_mark
FROM students s
JOIN high_performers hp ON s.student_id = hp.student_id
ORDER BY hp.avg_mark DESC;
2. Multiple CTEs
You can define several CTEs in one query, separated by commas.
WITH
-- CTE 1: Get student averages
student_averages AS (
SELECT student_id, AVG(marks) AS avg_mark
FROM exam_results
GROUP BY student_id
),
-- CTE 2: Classify performance
performance_categories AS (
SELECT
student_id,
avg_mark,
CASE
WHEN avg_mark >= 80 THEN 'Excellent'
WHEN avg_mark >= 60 THEN 'Good'
WHEN avg_mark >= 40 THEN 'Average'
ELSE 'Needs Improvement'
END AS performance_level
FROM student_averages
)
-- Main query using both CTEs
SELECT s.first_name, s.last_name, pc.avg_mark, pc.performance_level
FROM students s
JOIN performance_categories pc ON s.student_id = pc.student_id
ORDER BY pc.avg_mark DESC;
3. Recursive CTEs
Recursive CTEs call themselves - perfect for hierarchical data like org charts, category trees, or bill of materials.
WITH RECURSIVE employee_hierarchy AS (
-- Anchor: Start with the CEO (no supervisor)
SELECT employee_id, full_name, supervisor_id, 1 AS level
FROM employees
WHERE supervisor_id IS NULL
UNION ALL
-- Recursive: Join to find subordinates
SELECT e.employee_id, e.full_name, e.supervisor_id, eh.level + 1
FROM employees e
JOIN employee_hierarchy eh ON e.supervisor_id = eh.employee_id
)
SELECT * FROM employee_hierarchy
ORDER BY level, full_name;
4. CTE with DML Operations
You can use CTEs with INSERT, UPDATE, or DELETE.
WITH deleted_appointments AS (
DELETE FROM appointments
WHERE appt_date < '2024-01-01'
RETURNING *
)
INSERT INTO appointment_archive
SELECT * FROM deleted_appointments;
Common Use Cases for CTEs
| Use Case | Why CTE Works Well |
|---|---|
| Breaking down complex queries | Makes logic step-by-step and readable |
| Reusing derived tables | Define once, reference multiple times |
| Recursive queries | Only way to do recursion in standard SQL |
| Replacing views for one-time use | Temporary, no database object created |
| Window function preprocessing | Clean separation of filtering and ranking |
Subqueries vs CTEs: A Clear Comparison
Now for the big question: When should you use a subquery, and when should you use a CTE?
Comparison Table
| Criteria | Subquery | CTE |
|---|---|---|
| Readability (Simple) | ✅ Good | 🟡 Overkill |
| Readability (Complex) | ❌ Gets messy fast | ✅ Excellent |
| Reusability in query | ❌ Must rewrite | ✅ Define once, use many times |
| Recursion | ❌ Not supported | ✅ Supported |
| Debugging | ❌ Hard to test parts | ✅ Can SELECT from CTE alone |
| Performance | 🟡 Similar (optimizer treats both the same in most cases) | 🟡 Similar |
| Nesting depth | ❌ Can become unreadable | ✅ Linear, step-by-step |
Performance: Is One Faster Than the Other?
Short answer: In modern databases (PostgreSQL 12+, MySQL 8+, SQL Server 2019+), there is typically NO performance difference between equivalent subqueries and CTEs.
The query optimizer treats them the same way and generates identical execution plans.
Exception: Some databases (like older PostgreSQL versions) treat CTEs as "optimization fences," meaning the CTE is materialized (stored in memory) before being used. In these cases, CTEs might be slightly slower for large datasets.
Best Practice: Write for readability first, then optimize if needed.
Readability Comparison
Let's look at the same query written both ways.
Query Goal: Find students whose average mark is higher than the overall school average.
Using Subqueries (Nested):
SELECT s.first_name, s.last_name, student_avg.avg_mark
FROM students s
JOIN (
SELECT student_id, AVG(marks) AS avg_mark
FROM exam_results
GROUP BY student_id
) AS student_avg ON s.student_id = student_avg.student_id
WHERE student_avg.avg_mark > (
SELECT AVG(marks)
FROM exam_results
)
ORDER BY student_avg.avg_mark DESC;
Using CTEs:
WITH
school_average AS (
SELECT AVG(marks) AS overall_avg
FROM exam_results
),
student_averages AS (
SELECT student_id, AVG(marks) AS avg_mark
FROM exam_results
GROUP BY student_id
)
SELECT s.first_name, s.last_name, sa.avg_mark
FROM students s
JOIN student_averages sa ON s.student_id = sa.student_id
CROSS JOIN school_average
WHERE sa.avg_mark > school_average.overall_avg
ORDER BY sa.avg_mark DESC;
The CTE version reads like a story:
- First, get the school average
- Then, get each student's average
- Finally, find students above the school average
When to Use Each: Decision Framework
┌─────────────────────────────────────────────────┐
│ │
│ Do you need to reference the result │
│ MULTIPLE TIMES in your query? │
│ │
│ YES ──────────────────► Use CTE │
│ │ │
│ NO │
│ ▼ │
│ Is your query getting complicated │
│ (3+ levels of nesting)? │
│ │
│ YES ──────────────────► Use CTE │
│ │ │
│ NO │
│ ▼ │
│ Do you need recursion? │
│ │
│ YES ──────────────────► Use CTE │
│ │ │
│ NO │
│ ▼ │
│ Is it a simple filter or EXISTS check? │
│ │
│ YES ──────────────────► Use Subquery │
│ │
└─────────────────────────────────────────────────┘
Quick Reference Cheat Sheet
| Situation | Use This |
|---|---|
WHERE x > (SELECT AVG(x)...) |
Subquery |
WHERE EXISTS (SELECT 1...) |
Subquery |
WHERE column IN (SELECT...) |
Subquery |
| Need the same derived table 2+ times | CTE |
| Query has 3+ nested levels | CTE |
| Hierarchical/recursive data | CTE (Recursive) |
| Complex multi-step logic | CTE |
| Debugging intermediate results | CTE |
Real-World Example: Hospital Patient Analysis
Let's put it all together with a realistic example using the city_hospital database.
Task: Find doctors who have seen more patients than the average, along with their most common diagnosis.
WITH
-- Step 1: Count appointments per doctor
doctor_patient_counts AS (
SELECT
d.doctor_id,
d.full_name AS doctor_name,
d.specialisation,
COUNT(a.appointment_id) AS patient_count
FROM doctors d
LEFT JOIN appointments a ON d.doctor_id = a.doctor_id
GROUP BY d.doctor_id, d.full_name, d.specialisation
),
-- Step 2: Calculate average patients per doctor
avg_patients AS (
SELECT AVG(patient_count) AS avg_count
FROM doctor_patient_counts
),
-- Step 3: Find most common diagnosis per doctor
doctor_top_diagnosis AS (
SELECT DISTINCT ON (doctor_id)
doctor_id,
diagnosis,
COUNT(*) AS diagnosis_count
FROM appointments
WHERE diagnosis IS NOT NULL
GROUP BY doctor_id, diagnosis
ORDER BY doctor_id, COUNT(*) DESC
)
-- Step 4: Bring it all together
SELECT
dpc.doctor_name,
dpc.specialisation,
dpc.patient_count,
dtd.diagnosis AS most_common_diagnosis,
dtd.diagnosis_count AS diagnosis_occurrences
FROM doctor_patient_counts dpc
CROSS JOIN avg_patients ap
LEFT JOIN doctor_top_diagnosis dtd ON dpc.doctor_id = dtd.doctor_id
WHERE dpc.patient_count > ap.avg_count
ORDER BY dpc.patient_count DESC;
This query would be a nightmare to write and read as nested subqueries. With CTEs, each step is clear and logical.
Common Mistakes to Avoid
1. Using Correlated Subqueries When You Don't Need To
❌ Bad (Slow):
SELECT name, salary,
(SELECT AVG(salary) FROM employees e2 WHERE e2.dept_id = e1.dept_id) AS dept_avg
FROM employees e1;
✅ Better (Use JOIN and GROUP BY):
SELECT e.name, e.salary, d.dept_avg
FROM employees e
JOIN (
SELECT dept_id, AVG(salary) AS dept_avg
FROM employees
GROUP BY dept_id
) d ON e.dept_id = d.dept_id;
2. Forgetting to Alias Subqueries in FROM
❌ Error:
SELECT * FROM (SELECT * FROM students WHERE city = 'Nairobi');
-- ERROR: subquery in FROM must have an alias
✅ Fixed:
SELECT * FROM (SELECT * FROM students WHERE city = 'Nairobi') AS nairobi_students;
3. Using CTEs When a Simple WHERE Clause Works
❌ Overkill:
WITH adults AS (
SELECT * FROM patients WHERE age >= 18
)
SELECT * FROM adults WHERE city = 'Nairobi';
✅ Simpler:
SELECT * FROM patients WHERE age >= 18 AND city = 'Nairobi';
Key Takeaways
Subqueries are queries nested inside other queries - great for simple filters and single-value comparisons.
CTEs (WITH clauses) create named temporary result sets - perfect for breaking down complex logic.
Performance is usually identical - modern optimizers treat both the same.
-
Choose based on readability:
- Simple filters → Subquery
- Multi-step logic → CTE
- Reusing results → CTE
- Recursion → CTE
Correlated subqueries run once per row and can be slow on large datasets.
Always alias subqueries used in the FROM clause.
CTEs make debugging easier - you can run each CTE independently.
Practice Exercises
Try these on your own database:
Exercise 1: Write a subquery to find patients who live in cities where no student lives.
Exercise 2: Rewrite the following nested subquery as a CTE:
SELECT name FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees WHERE dept_id = 1)
AND dept_id IN (SELECT id FROM departments WHERE location = 'New York');
Exercise 3: Create a recursive CTE to display a company's organizational chart.
Further Reading
- PostgreSQL Documentation: WITH Queries
- Use The Index, Luke: Subqueries
- Modern SQL: Common Table Expressions
Conclusion
Both subqueries and CTEs are essential tools in your SQL toolkit. Neither is "better" - they serve different purposes.
Remember: Write SQL for humans first, computers second. If a subquery makes your query hard to read, refactor to a CTE. If a CTE feels like overkill for a simple filter, use a subquery.
The more you practice, the more intuitive this choice becomes!
If you found this article helpful, feel free to share it with other SQL learners. Have questions? Drop them in the comments below!
Top comments (0)