Introduction
When working with databases, there comes a time when a simple SELECT statement is not enough to answer complex questions. Imagine you need to find customers who spent more than the average spent by all customers, or products that have never been sold, or employees who earn more than their department's average salary. These are the moments when subqueries and Common Table Expressions (CTEs) become essential tools in your SQL toolkit.
This article explores subqueries and CTEs in depth, explaining what they are, how they work, when to use each, and most importantly—a clear comparison to help you choose the right approach for your queries.
What is a Subquery?
A subquery is a query nested inside another SQL query. It acts as a "query within a query," where the inner query provides data or conditions that the outer query uses. Think of it as solving a problem in steps: first, you calculate something small, then you use that result in the main query.
Basic Syntax
SELECT column_name
FROM table_name
WHERE column_name OPERATOR (
SELECT column_name
FROM table_name
WHERE condition
);
Simple Example
Suppose you want to find products that cost more than the average price:
SELECT product_name, price
FROM Products
WHERE price > (
SELECT AVG(price)
FROM Products
);
In this example:
- The inner query calculates the average price
- The outer query retrieves products with prices above that average
Different Types of Subqueries
Subqueries can be classified in two ways: by their position in the query and by their behavior.
1. By Position
a. Subquery in WHERE Clause (Scalar Subquery)
Returns a single value (one row, one column):
SELECT product_name, price
FROM Products
WHERE price > (
SELECT AVG(price)
FROM Products
);
b. Subquery in FROM Clause (Derived Table)
Returns a temporary table used by the outer query:
SELECT customer_name, total_spent
FROM (
SELECT c.customer_name, SUM(s.quantity * p.price) AS total_spent
FROM Customers c
JOIN Sales s ON c.customer_id = s.customer_id
JOIN Products p ON s.product_id = p.product_id
GROUP BY c.customer_name
) AS customer_spending
WHERE total_spent > 1000;
c. Subquery in SELECT Clause (Scalar Again)
Adds a calculated column:
SELECT product_name, price,
(SELECT AVG(price) FROM Products) AS average_price
FROM Products;
2. By Behavior
a. Correlated Subquery
References the outer query's table. It runs once for each row in the outer query:
SELECT p.product_id, p.product_name, p.category, p.price
FROM Products p
WHERE p.price > (
SELECT AVG(p2.price)
FROM Products p2
WHERE p2.category = p.category
);
This finds products priced higher than their category's average.
b. Non-Correlated Subquery
Does not reference the outer query's table. Runs once and provides a static value:
SELECT product_name, price
FROM Products
WHERE price > (
SELECT AVG(price)
FROM Products
);
c. Nested Subquery
Multiple levels of subqueries:
SELECT customer_name
FROM Customers
WHERE customer_id IN (
SELECT customer_id
FROM Sales
WHERE product_id IN (
SELECT product_id
FROM Products
WHERE category = 'Electronics'
)
);
When to Use Subqueries
Subqueries are used in the following scenarios:
1. Filtering with Aggregated Data
Find customers who spent more than average:
SELECT customer_name, total_spent
FROM (
SELECT c.customer_name, SUM(p.price * s.quantity) AS total_spent
FROM Customers c
JOIN Sales s ON c.customer_id = s.customer_id
JOIN Products p ON s.product_id = p.product_id
GROUP BY c.customer_name
) AS spending
WHERE total_spent > (
SELECT AVG(total_spent)
FROM (
SELECT SUM(p.price * s.quantity) AS total_spent
FROM Sales s
JOIN Products p ON s.product_id = p.product_id
GROUP BY s.customer_id
) AS all_spending
);
2. Finding Records That Do Not Exist
Find products never sold:
SELECT p.product_id, p.product_name
FROM Products p
WHERE p.product_id NOT IN (
SELECT product_id
FROM Sales
);
3. Comparison Within Groups
Find products priced higher than their category average:
SELECT p.product_name, p.price, p.category
FROM Products p
WHERE p.price > (
SELECT AVG(p2.price)
FROM Products p2
WHERE p2.category = p.category
);
4. Dynamic Filtering
Find customers who registered earlier than average:
SELECT customer_name, registration_date
FROM Customers
WHERE registration_date < (
SELECT AVG(registration_date)
FROM Customers
);
What is a CTE (Common Table Expression)?
A Common Table Expression (CTE) is a named temporary result set that exists only for the duration of a single SQL statement. It provides a way to write auxiliary statements that can be referenced within a larger query, making complex queries more readable and organized.
Basic Syntax
WITH cte_name AS (
-- CTE query
SELECT columns
FROM table_name
WHERE condition
)
-- Main query referencing CTE
SELECT *
FROM cte_name
WHERE condition;
Simple Example
Find products above average price using CTE:
WITH AveragePrice AS (
SELECT AVG(price) AS avg_price
FROM Products
)
SELECT product_name, price
FROM Products p
CROSS JOIN AveragePrice avg
WHERE p.price > avg.avg_price;
Different Types of CTEs
1. Non-Recursive CTE
The most common type—processes data without self-referencing:
WITH CustomerSpending AS (
SELECT c.customer_id,
c.customer_name,
SUM(s.quantity * p.price) AS total_spent
FROM Customers c
JOIN Sales s ON c.customer_id = s.customer_id
JOIN Products p ON s.product_id = p.product_id
GROUP BY c.customer_id, c.customer_name
)
SELECT customer_name, total_spent
FROM CustomerSpending
WHERE total_spent > 500;
2. Recursive CTE
References itself to process hierarchical data. Useful for organizational charts or tree structures:
WITH RECURSIVE org_chart AS (
-- Base case: top-level managers
SELECT employee_id, manager_id, employee_name, 1 AS level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive case: employees reporting to someone
SELECT e.employee_id, e.manager_id, e.employee_name, oc.level + 1
FROM employees e
JOIN org_chart ec ON e.manager_id = ec.employee_id
)
SELECT * FROM org_chart;
3. Multiple CTEs
Chain multiple CTEs in one statement:
WITH
CustomerSpending AS (
SELECT customer_id, SUM(quantity * price) AS total
FROM Sales s
JOIN Products p ON s.product_id = p.product_id
GROUP BY customer_id
),
AvgSpending AS (
SELECT AVG(total) AS avg_spent
FROM CustomerSpending
)
SELECT cs.customer_id, cs.total
FROM CustomerSpending cs
CROSS JOIN AvgSpending av
WHERE cs.total > av.avg_spent;
Use Cases of CTEs
1. Simplifying Complex Queries
Instead of nesting multiple subqueries:
-- Using CTEs for readability
WITH
MonthlySales AS (
SELECT DATE_TRUNC('month', sale_date) AS month,
SUM(quantity * price) AS total_sales
FROM Sales s
JOIN Products p ON s.product_id = p.product_id
GROUP BY DATE_TRUNC('month', sale_date)
),
AverageMonthlySales AS (
SELECT AVG(total_sales) AS avg_monthly
FROM MonthlySales
)
SELECT month, total_sales
FROM MonthlySales ms
CROSS JOIN AverageMonthlySales ams
WHERE ms.total_sales > ams.avg_monthly;
2. Reusing the Same Calculation
Calculate once, reference multiple times:
WITH
SalesSummary AS (
SELECT product_id,
SUM(quantity) AS total_qty,
SUM(quantity * price) AS total_revenue
FROM Sales s
JOIN Products p ON s.product_id = p.product_id
GROUP BY product_id
)
-- Reference the CTE twice
SELECT 'Top Products' AS category, COUNT(*) AS count
FROM SalesSummary
WHERE total_revenue > 1000
UNION ALL
SELECT 'Low Performers', COUNT(*)
FROM SalesSummary
WHERE total_qty < 10;
3. Working with Hierarchical Data
Employee organizational structure:
WITH RECURSIVE OrgStructure AS (
SELECT employee_id, employee_name, manager_id, 1 AS level
FROM employees
WHERE manager_id IS NULL
UNION ALL
SELECT e.employee_id, e.employee_name, e.manager_id, os.level + 1
FROM employees e
JOIN OrgStructure os ON e.manager_id = os.employee_id
)
SELECT * FROM OrgStructure;
4. Running Totals and Window Functions
WITH DailySales AS (
SELECT sale_date,
SUM(quantity * price) AS daily_total
FROM Sales s
JOIN Products p ON s.product_id = p.product_id
GROUP BY sale_date
)
SELECT sale_date,
daily_total,
SUM(daily_total) OVER (ORDER BY sale_date) AS running_total
FROM DailySales;
Subqueries vs CTEs: A Clear Comparison
| Aspect | Subquery | CTE |
|---|---|---|
| Definition | Nested query inside another query | Named temporary result set |
| Readability | Can become messy with nesting | More readable and organized |
| Performance | Similar execution time | Similar execution time |
| Reusability | Cannot be referenced multiple times | Can be referenced multiple times |
| Debugging | Harder to debug | Easier to debug step by step |
| Recursive | Not supported | Supported |
| Scope | Limited to single query | Available throughout the statement |
Performance Comparison
In terms of performance, both subqueries and CTEs typically execute similarly because the database optimizer converts both to the same execution plan. However:
| Factor | Subquery | CTE |
|---|---|---|
| Optimization | May run multiple times (correlated) | Runs once |
| Materialization | May be materialized by optimizer | May be materialized |
| Index Usage | Uses indexes effectively | Uses indexes effectively |
Key Takeaway: Performance difference is negligible in most cases. Choose based on readability and maintainability.
When to Use Subqueries
- Simple filtering with aggregate values
- Quick, one-off queries
- When the logic is straightforward
- When working with limited data
Example:
-- Simple filter—use subquery
SELECT product_name, price
FROM Products
WHERE price > (
SELECT AVG(price)
FROM Products
);
When to Use CTEs
- Complex, multi-step logic
- When you need to reference the same calculation multiple times
- Recursive operations (hierarchical data)
- Improved readability for you and your team
Example:
-- Complex logic—use CTE
WITH
CustomerSpending AS (
SELECT customer_id, SUM(quantity * price) AS total
FROM Sales s
JOIN Products p ON s.product_id = p.product_id
GROUP BY customer_id
),
AvgSpending AS (
SELECT AVG(total) AS avg_spent
FROM CustomerSpending
)
SELECT cs.*
FROM CustomerSpending cs
CROSS JOIN AvgSpending av
WHERE cs.total > av.avg_spent;
Summary
Subqueries
| Type | Use Case |
|---|---|
| Scalar in WHERE | Filtering with aggregate values |
| Derived Table (FROM) | Complex calculations as temporary table |
| Correlated | Row-by-row comparisons |
| Nested | Multi-level filtering |
CTEs
| Type | Use Case |
|---|---|
| Non-Recursive | Simplifying complex queries |
| Recursive | Hierarchical data |
| Multiple | Chaining calculations |
Key Takeaways
Subqueries are best for simple, one-time calculations where you filter or compare against a single aggregate value.
CTEs shine when your logic is complex, multi-step, or needs to be referenced more than once.
Choose readability: If your query is hard to read, use a CTE to break it into logical steps.
Performance is equal: In most databases, the optimizer treats both similarly.
CTEs are more maintainable: If someone else needs to understand your query, CTEs make it easier.
Final Recommendation
| Scenario | Recommended |
|---|---|
| Products above average price | Subquery |
| Customers who spent more than average | Subquery or CTE |
| Products never sold | Subquery |
| Hierarchical employee data | CTE (Recursive) |
| Multiple referenced calculations | CTE |
| Complex multi-step analysis | CTE |
Remember: there is no strict rule; choose the tool that makes your code clearer and easier to maintain.
Top comments (0)