DEV Community

muriithilydia46-wq
muriithilydia46-wq

Posted on

The Query Inside the Query: Understanding Subqueries and CTEs in SQL

Have you ever written an SQL query, looked at it five minutes later, and had absolutely no idea what it was doing? You are not alone. Most SQL beginners hit a wall when their queries start growing.

In this article, we are going to look at two powerful SQL tools; subqueries and CTEs. We shall understand what they are, when to use them, and why CTEs might just change the way you write SQL forever.

Part 1: Subqueries

A subquery is simply a SELECT statement written inside another SELECT statement. The inner query runs first, and its results is handed to the outer query.

Let us look at an example from city hospital data;

"Question: Find all patients who paid more than the average appointment fee."

SELECT patient_id, diagnosis, fee
FROM appointments
WHERE fee > (SELECT AVG(fee) FROM appointments);

In this case, the inner query (SELECT AVG(fee) FROM appointments) runs first and returns a number. The outer query then uses that number to filter results.

Commonly used types:

  1. Scalar subquery — returns one single value. Used in WHERE or SELECT.

e.g., What is each student's mark compared to the class average?

SELECT student_id, marks,
(SELECT ROUND(AVG(marks),2) FROM exam_results) AS class_avg
FROM exam_results;

  1. List subquery — returns a column of values. Used with IN or NOT IN. e.g., Which patients have had at least one appointment?

SELECT full_name FROM patients
WHERE patient_id IN (SELECT DISTINCT patient_id FROM appointments);

  1. Table (Derived) — returns a full mini-table. Used with the FROM clause.
    e.g., Find departments where the average marks exceed 65
    SELECT department, avg_marks
    FROM (
    SELECT s.department, ROUND(AVG(er.marks), 2) AS avg_marks
    FROM exam_results er
    JOIN subjects s ON s.subject_id = er.subject_id
    GROUP BY s.department
    ) AS dept_summary
    WHERE avg_marks > 65;

  2. Correlated — references the outer query. It runs once per row.
    e.g., Find students who scored above their own class average

SELECT student_id, class, marks
FROM exam_results e1
WHERE marks > (
SELECT AVG(marks) FROM exam_results e2
WHERE e2.class = e1.class
);

When to Use a Subquery;

  • Quick one-off filters using IN, NOT IN, or EXISTS
  • When the logic is short, i.e., one or two lines
  • When you only need the result once

Part 2: CTEs —(Common Table Expression)

A CTE lets you write your inner logic once, give it a name, and then use that name in your main query.
Think of it as giving your subquery a label so everyone knows what it means. You define it at the top of your query.

Basic Syntax
WITH cte_name AS (
SELECT ...
FROM ...
WHERE ...
)
SELECT * FROM cte_name;

Types of CTEs

  1. Simple CTE -It defines a named result set used once or multiple times in the main query. It is perfect for cleaning up a single layer of complexity.
    e.g., Show only doctors with more than 8 years of experience
    WITH experienced_doctors AS (
    SELECT doctor_id, full_name, specialisation, years_exp
    FROM doctors
    WHERE years_exp > 8
    )
    SELECT * FROM experienced_doctors
    ORDER BY years_exp DESC;

  2. Multiple CTEs
    Defines several named blocks in one WITH clause, separated by commas. Each one can reference the previous, like a logical pipeline.
    e.g., Find doctors who charge above the hospital average fee

WITH doctor_fees AS (
SELECT doctor_id, ROUND(AVG(fee), 2) AS avg_fee
FROM appointments
GROUP BY doctor_id
),
hospital_avg AS (
SELECT ROUND(AVG(fee), 2) AS overall_avg
FROM appointments
)
SELECT d.full_name, df.avg_fee, ha.overall_avg
FROM doctor_fees df
JOIN doctors d ON d.doctor_id = df.doctor_id
JOIN hospital_avg ha ON df.avg_fee > ha.overall_avg
ORDER BY df.avg_fee DESC;

3.Recursive CTE
A recursive CTE references itself. It is the only way to handle hierarchical data like a reporting chain or org chart. This is something a regular subquery simply cannot do.

e.g., Build the full doctor supervision hierarchy
WITH RECURSIVE doctor_hierarchy AS (

-- Base: doctors with no supervisor (top of the chain)
SELECT doctor_id, full_name, supervisor_id, 1 AS level
FROM doctors
WHERE supervisor_id IS NULL

UNION ALL
-- Recursive: find doctors supervised by those already found
SELECT d.doctor_id, d.full_name, d.supervisor_id, dh.level + 1
FROM doctors d
JOIN doctor_hierarchy dh ON d.supervisor_id = dh.doctor_id
)

SELECT * FROM doctor_hierarchy
ORDER BY level, doctor_id;

The base case finds the top-level doctors (Dr. Amina Omondi, Dr. James Abdi). The recursive case keeps looking for who reports to them, until no more levels exist.

Here is a comparison between Sub-queries and CTEs

Conclusion;
Subqueries and CTEs both solve similar problems, but they feel very different to write and to read.

Use a subquery when your filter is short, simple, and only needed once, especially with IN, NOT IN, and EXISTS.
Use a CTE when your logic grows, when you need to reuse a result, or when someone else needs to read your code.

Start with subqueries to understand nesting. Proceed to CTEs as your queries grow. Your code will be cleaner, and you definitely will sleep earlier!

Top comments (0)