After getting comfortable with joins and window functions, I’m now diving into more advanced query structuring techniques.
Subqueries felt like a natural extension of filtering and aggregation. CTEs, on the other hand, introduced a cleaner way of breaking down complex logic into readable, step-by-step components.
This guide provides a practical comparison of both concepts using a simple student exam results dataset, highlighting when and why to choose one over the other.
What is a Subquery?
A subquery is a query nested inside another query. It runs first, and its result is used by the outer (main) query.
Subqueries can be placed in:
-
SELECTclause -
FROMclause -
WHEREorHAVINGclause
Why Use Subqueries?
Subqueries are useful when you need to:
- Perform intermediate calculations
- Apply dynamic filtering
- Break down complex logic without creating temporary tables
- Keep queries self-contained
1. Subqueries in the WHERE Clause
This is the most common use case.
Example: Students scoring above the overall average
SELECT
student_id,
marks
FROM exam_results
WHERE marks > (
SELECT AVG(marks)
FROM exam_results
);
What’s happening:
- The inner query calculates the average marks across all students.
- The outer query filters for students who scored above that average.
Subqueries in the WHERE clause with IN and EXISTS
Using IN (for value comparison)
SELECT student_id
FROM exam_results
WHERE student_id IN (
SELECT student_id
FROM exam_results
WHERE marks > 80
);
Using EXISTS (for row existence check)
SELECT DISTINCT e1.student_id
FROM exam_results e1
WHERE EXISTS (
SELECT 1
FROM exam_results e2
WHERE e2.student_id = e1.student_id
AND e2.marks > 80
);
Key Difference:
-
IN→ Compares values against a list -
EXISTS→ Checks whether at least one matching row exists (usually more efficient with large datasets)
2. Subqueries in the SELECT Clause
Used to compute additional values for each row.
Example: Show each student’s marks alongside their personal average
SELECT
e1.student_id,
e1.marks,
(
SELECT AVG(e2.marks)
FROM exam_results e2
WHERE e2.student_id = e1.student_id
) AS student_avg
FROM exam_results e1;
Note: This is a correlated subquery — it runs once for every row in the outer query, which can impact performance on large tables.
3. Subqueries in the FROM Clause (Derived Tables)
Creates a temporary result set that can be treated like a table.
Example: Average marks per student
SELECT *
FROM (
SELECT
student_id,
AVG(marks) AS avg_marks
FROM exam_results
GROUP BY student_id
) AS student_avg;
Important: Derived tables must have an alias.
Types of Subqueries
- Scalar Subquery: Returns a single value
- Column Subquery: Returns multiple rows (usually one column)
- Correlated Subquery: References columns from the outer query and runs for each row
What is a CTE (Common Table Expression)?
A Common Table Expression (CTE) is a temporary named result set defined using the WITH clause. It makes queries more readable by breaking them into logical steps.
Think of it as creating a named mini-table that you can reference later in the same query.
Why Use CTEs?
- Improved Readability — Break complex logic into named, logical steps instead of deeply nested subqueries
- Reusability — Define a result set once and reference it multiple times in the same query
-
Easier Debugging — You can
SELECT *from any CTE individually to test and verify intermediate results - Better Organization — Complex queries become a series of clear building blocks rather than one giant nested statement
- Logical Flow — You can build your query step by step, making it easier to understand the thought process
- Reduced Repetition — Avoid repeating the same subquery logic multiple times
CTEs are especially powerful when you need to perform multiple transformations or aggregations on the same dataset before joining everything together.
Basic CTE Example: Student averages
WITH student_avg AS (
SELECT
student_id,
AVG(marks) AS avg_marks
FROM exam_results
GROUP BY student_id
)
SELECT *
FROM student_avg;
Types and Use Cases of CTEs
1. Simple CTE
WITH high_scores AS (
SELECT *
FROM exam_results
WHERE marks > 80
)
SELECT * FROM high_scores;
2. Multiple CTEs
WITH student_avg AS (
SELECT
student_id,
AVG(marks) AS avg_marks
FROM exam_results
GROUP BY student_id
),
top_students AS (
SELECT *
FROM student_avg
WHERE avg_marks > 85
)
SELECT * FROM top_students;
3. Recursive CTE
Used for hierarchical or recursive data (e.g., organizational charts, bill of materials). Advanced topic — not ccovered in this article.
Subqueries vs CTEs
| Aspect | Subqueries | CTEs (Common Table Expressions) |
|---|---|---|
| Readability | Can become hard to read when deeply nested | Much cleaner and more structured |
| Reusability | Logic must be repeated if used multiple times | Defined once and can be reused multiple times |
| Performance | Simple subqueries are fast; correlated ones can be slow | Often better with modern optimizers; depends on usage |
| Debugging | More difficult | Easier (you can SELECT from each CTE separately) |
| Scope | Limited to the immediate query part | Available throughout the entire query |
When to Use Each
| Use Case | Recommended Choice | Reason |
|---|---|---|
| Simple filtering or single value | Subquery | Quick and concise |
| Complex, multi-step logic | CTE | Better organization and readability |
| Reusable intermediate results | CTE | Defined once, used multiple times |
| Row-by-row comparisons | Correlated Subquery | Runs for each row and can reference outer query values |
| Readability & maintainability | CTE | Easier to read, debug, and maintain |
Conclusion
Both subqueries and CTEs help break down complex SQL problems, but they do so in different ways:
- Subqueries are great for quick, inline logic that fits naturally inside another clause.
- CTEs shine when your query becomes more complex and you want maximum readability and maintainability.
As you progress, the goal shifts from “writing queries that work” to “writing queries that are clear, efficient, and easy to maintain.”
Happy querying!
Top comments (0)