DEV Community

Kigen Tarus
Kigen Tarus

Posted on

Subqueries vs CTEs in SQL: A Complete Guide for Beginners

When I first started learning SQL, I kept hearing two terms thrown around: subqueries and CTEs. They seemed to do similar things, and I wasn't sure when to use one over the other. If you're in the same boat, this guide will clear everything up.

By the end of this article, you'll understand:

  • What subqueries are and the different types
  • What CTEs are and how to use them
  • When to use each approach
  • Performance and readability comparisons

Let's dive in!


What is a Subquery?

A subquery (also called an inner query or nested query) is a query placed inside another SQL query. The inner query runs first, and its result is used by the outer query.

Think of it like a Russian nesting doll - a query inside a query inside a query.

Basic Subquery Example

-- Find students who scored above the class average
SELECT first_name, last_name, marks
FROM exam_results
WHERE marks > (SELECT AVG(marks) FROM exam_results);
Enter fullscreen mode Exit fullscreen mode

Here, (SELECT AVG(marks) FROM exam_results) is the subquery. It calculates the average mark first, then the outer query finds all students above that average.


Types of Subqueries

Subqueries can be categorized by where they appear and what they return.

1. By Location

Type Location Example
Subquery in WHERE Inside WHERE clause WHERE marks > (SELECT AVG(marks)...)
Subquery in FROM Inside FROM clause (derived table) FROM (SELECT * FROM students WHERE city='Nairobi') AS nairobi_students
Subquery in SELECT Inside SELECT clause SELECT name, (SELECT AVG(marks) FROM results) AS class_avg
Subquery in HAVING Inside HAVING clause HAVING AVG(marks) > (SELECT AVG(marks) FROM results)

2. By Return Value

Type Returns Operators Used
Scalar Subquery Single value (1 row, 1 column) =, >, <, >=, <=, <>
Row Subquery Single row (1 row, multiple columns) =, <> with row constructor
Column Subquery Single column (multiple rows, 1 column) IN, NOT IN, ANY, ALL
Table Subquery Full table (multiple rows & columns) Used in FROM clause

Examples of Each Type

Scalar Subquery (Single Value)

SELECT full_name, age
FROM patients
WHERE age = (SELECT MAX(age) FROM patients);
-- Returns the oldest patient(s)
Enter fullscreen mode Exit fullscreen mode

Column Subquery (Multiple Rows, One Column)

SELECT full_name, city
FROM patients
WHERE city IN (SELECT city FROM students WHERE class = 'Form 4');
-- Patients living in cities where Form 4 students live
Enter fullscreen mode Exit fullscreen mode

Table Subquery (Derived Table)

SELECT dept_name, avg_salary
FROM (
    SELECT department_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department_id
) AS dept_averages
JOIN departments ON departments.id = dept_averages.department_id;
Enter fullscreen mode Exit fullscreen mode

Correlated vs Non-Correlated Subqueries

This is an important distinction many beginners miss.

Type Description Performance
Non-Correlated Inner query runs independently ONCE ✅ Faster
Correlated Inner query depends on outer query, runs ONCE PER ROW ⚠️ Slower

Non-Correlated Example:

SELECT name, marks
FROM exam_results
WHERE marks > (SELECT AVG(marks) FROM exam_results);
-- Inner query runs once
Enter fullscreen mode Exit fullscreen mode

Correlated Example:

SELECT e1.student_id, e1.marks, e1.subject_id
FROM exam_results e1
WHERE e1.marks > (
    SELECT AVG(e2.marks)
    FROM exam_results e2
    WHERE e2.student_id = e1.student_id  -- References outer query!
);
-- Inner query runs for EACH student
Enter fullscreen mode Exit fullscreen mode

When Should You Use Subqueries?

✅ Good Use Cases for Subqueries:

  1. Simple comparisons to aggregates
   WHERE salary > (SELECT AVG(salary) FROM employees)
Enter fullscreen mode Exit fullscreen mode
  1. Checking existence
   WHERE EXISTS (SELECT 1 FROM orders WHERE customer_id = customers.id)
Enter fullscreen mode Exit fullscreen mode
  1. IN / NOT IN conditions
   WHERE city IN (SELECT city FROM offices)
Enter fullscreen mode Exit fullscreen mode
  1. Single-value lookups
   SELECT name, (SELECT department_name FROM depts WHERE id = emp.dept_id) AS dept
   FROM employees emp;
Enter fullscreen mode Exit fullscreen mode

❌ When NOT to Use Subqueries:

  1. When the query becomes hard to read (nested 3+ levels deep)
  2. When you need to reference the same derived table multiple times
  3. When performance suffers from correlated subqueries on large datasets

What is a CTE (Common Table Expression)?

A CTE (Common Table Expression) is a temporary named result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. It's defined using the WITH keyword.

Think of a CTE as creating a temporary view that exists only for the duration of your query.

Basic CTE Syntax

WITH cte_name AS (
    -- Your query here
    SELECT column1, column2
    FROM table_name
    WHERE condition
)
-- Now use the CTE like a regular table
SELECT *
FROM cte_name
WHERE another_condition;
Enter fullscreen mode Exit fullscreen mode

Simple CTE Example

WITH nairobi_students AS (
    SELECT student_id, first_name, last_name, class
    FROM students
    WHERE city = 'Nairobi'
)
SELECT class, COUNT(*) AS student_count
FROM nairobi_students
GROUP BY class
ORDER BY student_count DESC;
Enter fullscreen mode Exit fullscreen mode

Types of CTEs and Their Use Cases

1. Basic CTE (Single CTE)

Used to simplify a complex query by breaking it into logical steps.

WITH high_performers AS (
    SELECT student_id, AVG(marks) AS avg_mark
    FROM exam_results
    GROUP BY student_id
    HAVING AVG(marks) > 75
)
SELECT s.first_name, s.last_name, hp.avg_mark
FROM students s
JOIN high_performers hp ON s.student_id = hp.student_id
ORDER BY hp.avg_mark DESC;
Enter fullscreen mode Exit fullscreen mode

2. Multiple CTEs

You can define several CTEs in one query, separated by commas.

WITH 
-- CTE 1: Get student averages
student_averages AS (
    SELECT student_id, AVG(marks) AS avg_mark
    FROM exam_results
    GROUP BY student_id
),
-- CTE 2: Classify performance
performance_categories AS (
    SELECT 
        student_id,
        avg_mark,
        CASE 
            WHEN avg_mark >= 80 THEN 'Excellent'
            WHEN avg_mark >= 60 THEN 'Good'
            WHEN avg_mark >= 40 THEN 'Average'
            ELSE 'Needs Improvement'
        END AS performance_level
    FROM student_averages
)
-- Main query using both CTEs
SELECT s.first_name, s.last_name, pc.avg_mark, pc.performance_level
FROM students s
JOIN performance_categories pc ON s.student_id = pc.student_id
ORDER BY pc.avg_mark DESC;
Enter fullscreen mode Exit fullscreen mode

3. Recursive CTEs

Recursive CTEs call themselves - perfect for hierarchical data like org charts, category trees, or bill of materials.

WITH RECURSIVE employee_hierarchy AS (
    -- Anchor: Start with the CEO (no supervisor)
    SELECT employee_id, full_name, supervisor_id, 1 AS level
    FROM employees
    WHERE supervisor_id IS NULL

    UNION ALL

    -- Recursive: Join to find subordinates
    SELECT e.employee_id, e.full_name, e.supervisor_id, eh.level + 1
    FROM employees e
    JOIN employee_hierarchy eh ON e.supervisor_id = eh.employee_id
)
SELECT * FROM employee_hierarchy
ORDER BY level, full_name;
Enter fullscreen mode Exit fullscreen mode

4. CTE with DML Operations

You can use CTEs with INSERT, UPDATE, or DELETE.

WITH deleted_appointments AS (
    DELETE FROM appointments
    WHERE appt_date < '2024-01-01'
    RETURNING *
)
INSERT INTO appointment_archive
SELECT * FROM deleted_appointments;
Enter fullscreen mode Exit fullscreen mode

Common Use Cases for CTEs

Use Case Why CTE Works Well
Breaking down complex queries Makes logic step-by-step and readable
Reusing derived tables Define once, reference multiple times
Recursive queries Only way to do recursion in standard SQL
Replacing views for one-time use Temporary, no database object created
Window function preprocessing Clean separation of filtering and ranking

Subqueries vs CTEs: A Clear Comparison

Now for the big question: When should you use a subquery, and when should you use a CTE?

Comparison Table

Criteria Subquery CTE
Readability (Simple) ✅ Good 🟡 Overkill
Readability (Complex) ❌ Gets messy fast ✅ Excellent
Reusability in query ❌ Must rewrite ✅ Define once, use many times
Recursion ❌ Not supported ✅ Supported
Debugging ❌ Hard to test parts ✅ Can SELECT from CTE alone
Performance 🟡 Similar (optimizer treats both the same in most cases) 🟡 Similar
Nesting depth ❌ Can become unreadable ✅ Linear, step-by-step

Performance: Is One Faster Than the Other?

Short answer: In modern databases (PostgreSQL 12+, MySQL 8+, SQL Server 2019+), there is typically NO performance difference between equivalent subqueries and CTEs.

The query optimizer treats them the same way and generates identical execution plans.

Exception: Some databases (like older PostgreSQL versions) treat CTEs as "optimization fences," meaning the CTE is materialized (stored in memory) before being used. In these cases, CTEs might be slightly slower for large datasets.

Best Practice: Write for readability first, then optimize if needed.


Readability Comparison

Let's look at the same query written both ways.

Query Goal: Find students whose average mark is higher than the overall school average.

Using Subqueries (Nested):

SELECT s.first_name, s.last_name, student_avg.avg_mark
FROM students s
JOIN (
    SELECT student_id, AVG(marks) AS avg_mark
    FROM exam_results
    GROUP BY student_id
) AS student_avg ON s.student_id = student_avg.student_id
WHERE student_avg.avg_mark > (
    SELECT AVG(marks)
    FROM exam_results
)
ORDER BY student_avg.avg_mark DESC;
Enter fullscreen mode Exit fullscreen mode

Using CTEs:

WITH 
school_average AS (
    SELECT AVG(marks) AS overall_avg
    FROM exam_results
),
student_averages AS (
    SELECT student_id, AVG(marks) AS avg_mark
    FROM exam_results
    GROUP BY student_id
)
SELECT s.first_name, s.last_name, sa.avg_mark
FROM students s
JOIN student_averages sa ON s.student_id = sa.student_id
CROSS JOIN school_average
WHERE sa.avg_mark > school_average.overall_avg
ORDER BY sa.avg_mark DESC;
Enter fullscreen mode Exit fullscreen mode

The CTE version reads like a story:

  1. First, get the school average
  2. Then, get each student's average
  3. Finally, find students above the school average

When to Use Each: Decision Framework

┌─────────────────────────────────────────────────┐
│                                                 │
│   Do you need to reference the result           │
│   MULTIPLE TIMES in your query?                 │
│                                                 │
│         YES ──────────────────► Use CTE         │
│          │                                      │
│          NO                                     │
│          ▼                                      │
│   Is your query getting complicated             │
│   (3+ levels of nesting)?                       │
│                                                 │
│         YES ──────────────────► Use CTE         │
│          │                                      │
│          NO                                     │
│          ▼                                      │
│   Do you need recursion?                        │
│                                                 │
│         YES ──────────────────► Use CTE         │
│          │                                      │
│          NO                                     │
│          ▼                                      │
│   Is it a simple filter or EXISTS check?        │
│                                                 │
│         YES ──────────────────► Use Subquery    │
│                                                 │
└─────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Quick Reference Cheat Sheet

Situation Use This
WHERE x > (SELECT AVG(x)...) Subquery
WHERE EXISTS (SELECT 1...) Subquery
WHERE column IN (SELECT...) Subquery
Need the same derived table 2+ times CTE
Query has 3+ nested levels CTE
Hierarchical/recursive data CTE (Recursive)
Complex multi-step logic CTE
Debugging intermediate results CTE

Real-World Example: Hospital Patient Analysis

Let's put it all together with a realistic example using the city_hospital database.

Task: Find doctors who have seen more patients than the average, along with their most common diagnosis.

WITH 
-- Step 1: Count appointments per doctor
doctor_patient_counts AS (
    SELECT 
        d.doctor_id,
        d.full_name AS doctor_name,
        d.specialisation,
        COUNT(a.appointment_id) AS patient_count
    FROM doctors d
    LEFT JOIN appointments a ON d.doctor_id = a.doctor_id
    GROUP BY d.doctor_id, d.full_name, d.specialisation
),

-- Step 2: Calculate average patients per doctor
avg_patients AS (
    SELECT AVG(patient_count) AS avg_count
    FROM doctor_patient_counts
),

-- Step 3: Find most common diagnosis per doctor
doctor_top_diagnosis AS (
    SELECT DISTINCT ON (doctor_id)
        doctor_id,
        diagnosis,
        COUNT(*) AS diagnosis_count
    FROM appointments
    WHERE diagnosis IS NOT NULL
    GROUP BY doctor_id, diagnosis
    ORDER BY doctor_id, COUNT(*) DESC
)

-- Step 4: Bring it all together
SELECT 
    dpc.doctor_name,
    dpc.specialisation,
    dpc.patient_count,
    dtd.diagnosis AS most_common_diagnosis,
    dtd.diagnosis_count AS diagnosis_occurrences
FROM doctor_patient_counts dpc
CROSS JOIN avg_patients ap
LEFT JOIN doctor_top_diagnosis dtd ON dpc.doctor_id = dtd.doctor_id
WHERE dpc.patient_count > ap.avg_count
ORDER BY dpc.patient_count DESC;
Enter fullscreen mode Exit fullscreen mode

This query would be a nightmare to write and read as nested subqueries. With CTEs, each step is clear and logical.


Common Mistakes to Avoid

1. Using Correlated Subqueries When You Don't Need To

Bad (Slow):

SELECT name, salary,
    (SELECT AVG(salary) FROM employees e2 WHERE e2.dept_id = e1.dept_id) AS dept_avg
FROM employees e1;
Enter fullscreen mode Exit fullscreen mode

Better (Use JOIN and GROUP BY):

SELECT e.name, e.salary, d.dept_avg
FROM employees e
JOIN (
    SELECT dept_id, AVG(salary) AS dept_avg
    FROM employees
    GROUP BY dept_id
) d ON e.dept_id = d.dept_id;
Enter fullscreen mode Exit fullscreen mode

2. Forgetting to Alias Subqueries in FROM

Error:

SELECT * FROM (SELECT * FROM students WHERE city = 'Nairobi');
-- ERROR: subquery in FROM must have an alias
Enter fullscreen mode Exit fullscreen mode

Fixed:

SELECT * FROM (SELECT * FROM students WHERE city = 'Nairobi') AS nairobi_students;
Enter fullscreen mode Exit fullscreen mode

3. Using CTEs When a Simple WHERE Clause Works

Overkill:

WITH adults AS (
    SELECT * FROM patients WHERE age >= 18
)
SELECT * FROM adults WHERE city = 'Nairobi';
Enter fullscreen mode Exit fullscreen mode

Simpler:

SELECT * FROM patients WHERE age >= 18 AND city = 'Nairobi';
Enter fullscreen mode Exit fullscreen mode

Key Takeaways

  1. Subqueries are queries nested inside other queries - great for simple filters and single-value comparisons.

  2. CTEs (WITH clauses) create named temporary result sets - perfect for breaking down complex logic.

  3. Performance is usually identical - modern optimizers treat both the same.

  4. Choose based on readability:

    • Simple filters → Subquery
    • Multi-step logic → CTE
    • Reusing results → CTE
    • Recursion → CTE
  5. Correlated subqueries run once per row and can be slow on large datasets.

  6. Always alias subqueries used in the FROM clause.

  7. CTEs make debugging easier - you can run each CTE independently.


Practice Exercises

Try these on your own database:

Exercise 1: Write a subquery to find patients who live in cities where no student lives.

Exercise 2: Rewrite the following nested subquery as a CTE:

SELECT name FROM employees 
WHERE salary > (SELECT AVG(salary) FROM employees WHERE dept_id = 1)
AND dept_id IN (SELECT id FROM departments WHERE location = 'New York');
Enter fullscreen mode Exit fullscreen mode

Exercise 3: Create a recursive CTE to display a company's organizational chart.


Further Reading


Conclusion

Both subqueries and CTEs are essential tools in your SQL toolkit. Neither is "better" - they serve different purposes.

Remember: Write SQL for humans first, computers second. If a subquery makes your query hard to read, refactor to a CTE. If a CTE feels like overkill for a simple filter, use a subquery.

The more you practice, the more intuitive this choice becomes!


If you found this article helpful, feel free to share it with other SQL learners. Have questions? Drop them in the comments below!

Top comments (0)