DEV Community

Ibrahim0695
Ibrahim0695

Posted on

Up down or Down up. Understanding Subqueries and CTEs in SQL: A Comprehensive Guide

Introduction

When working with databases, there comes a time when a simple SELECT statement is not enough to answer complex questions. Imagine you need to find customers who spent more than the average spent by all customers, or products that have never been sold, or employees who earn more than their department's average salary. These are the moments when subqueries and Common Table Expressions (CTEs) become essential tools in your SQL toolkit.

This article explores subqueries and CTEs in depth, explaining what they are, how they work, when to use each, and most importantly—a clear comparison to help you choose the right approach for your queries.


What is a Subquery?

A subquery is a query nested inside another SQL query. It acts as a "query within a query," where the inner query provides data or conditions that the outer query uses. Think of it as solving a problem in steps: first, you calculate something small, then you use that result in the main query.

Basic Syntax

SELECT column_name
FROM table_name
WHERE column_name OPERATOR (
    SELECT column_name
    FROM table_name
    WHERE condition
);
Enter fullscreen mode Exit fullscreen mode

Simple Example

Suppose you want to find products that cost more than the average price:

SELECT product_name, price
FROM Products
WHERE price > (
    SELECT AVG(price)
    FROM Products
);
Enter fullscreen mode Exit fullscreen mode

In this example:

  • The inner query calculates the average price
  • The outer query retrieves products with prices above that average

Different Types of Subqueries

Subqueries can be classified in two ways: by their position in the query and by their behavior.

1. By Position

a. Subquery in WHERE Clause (Scalar Subquery)

Returns a single value (one row, one column):

SELECT product_name, price
FROM Products
WHERE price > (
    SELECT AVG(price)
    FROM Products
);
Enter fullscreen mode Exit fullscreen mode

b. Subquery in FROM Clause (Derived Table)

Returns a temporary table used by the outer query:

SELECT customer_name, total_spent
FROM (
    SELECT c.customer_name, SUM(s.quantity * p.price) AS total_spent
    FROM Customers c
    JOIN Sales s ON c.customer_id = s.customer_id
    JOIN Products p ON s.product_id = p.product_id
    GROUP BY c.customer_name
) AS customer_spending
WHERE total_spent > 1000;
Enter fullscreen mode Exit fullscreen mode

c. Subquery in SELECT Clause (Scalar Again)

Adds a calculated column:

SELECT product_name, price,
       (SELECT AVG(price) FROM Products) AS average_price
FROM Products;
Enter fullscreen mode Exit fullscreen mode

2. By Behavior

a. Correlated Subquery

References the outer query's table. It runs once for each row in the outer query:

SELECT p.product_id, p.product_name, p.category, p.price
FROM Products p
WHERE p.price > (
    SELECT AVG(p2.price)
    FROM Products p2
    WHERE p2.category = p.category
);
Enter fullscreen mode Exit fullscreen mode

This finds products priced higher than their category's average.

b. Non-Correlated Subquery

Does not reference the outer query's table. Runs once and provides a static value:

SELECT product_name, price
FROM Products
WHERE price > (
    SELECT AVG(price)
    FROM Products
);
Enter fullscreen mode Exit fullscreen mode

c. Nested Subquery

Multiple levels of subqueries:

SELECT customer_name
FROM Customers
WHERE customer_id IN (
    SELECT customer_id
    FROM Sales
    WHERE product_id IN (
        SELECT product_id
        FROM Products
        WHERE category = 'Electronics'
    )
);
Enter fullscreen mode Exit fullscreen mode

When to Use Subqueries

Subqueries are used in the following scenarios:

1. Filtering with Aggregated Data

Find customers who spent more than average:

SELECT customer_name, total_spent
FROM (
    SELECT c.customer_name, SUM(p.price * s.quantity) AS total_spent
    FROM Customers c
    JOIN Sales s ON c.customer_id = s.customer_id
    JOIN Products p ON s.product_id = p.product_id
    GROUP BY c.customer_name
) AS spending
WHERE total_spent > (
    SELECT AVG(total_spent)
    FROM (
        SELECT SUM(p.price * s.quantity) AS total_spent
        FROM Sales s
        JOIN Products p ON s.product_id = p.product_id
        GROUP BY s.customer_id
    ) AS all_spending
);
Enter fullscreen mode Exit fullscreen mode

2. Finding Records That Do Not Exist

Find products never sold:

SELECT p.product_id, p.product_name
FROM Products p
WHERE p.product_id NOT IN (
    SELECT product_id
    FROM Sales
);
Enter fullscreen mode Exit fullscreen mode

3. Comparison Within Groups

Find products priced higher than their category average:

SELECT p.product_name, p.price, p.category
FROM Products p
WHERE p.price > (
    SELECT AVG(p2.price)
    FROM Products p2
    WHERE p2.category = p.category
);
Enter fullscreen mode Exit fullscreen mode

4. Dynamic Filtering

Find customers who registered earlier than average:

SELECT customer_name, registration_date
FROM Customers
WHERE registration_date < (
    SELECT AVG(registration_date)
    FROM Customers
);
Enter fullscreen mode Exit fullscreen mode

What is a CTE (Common Table Expression)?

A Common Table Expression (CTE) is a named temporary result set that exists only for the duration of a single SQL statement. It provides a way to write auxiliary statements that can be referenced within a larger query, making complex queries more readable and organized.

Basic Syntax

WITH cte_name AS (
    -- CTE query
    SELECT columns
    FROM table_name
    WHERE condition
)
-- Main query referencing CTE
SELECT *
FROM cte_name
WHERE condition;
Enter fullscreen mode Exit fullscreen mode

Simple Example

Find products above average price using CTE:

WITH AveragePrice AS (
    SELECT AVG(price) AS avg_price
    FROM Products
)
SELECT product_name, price
FROM Products p
CROSS JOIN AveragePrice avg
WHERE p.price > avg.avg_price;
Enter fullscreen mode Exit fullscreen mode

Different Types of CTEs

1. Non-Recursive CTE

The most common type—processes data without self-referencing:

WITH CustomerSpending AS (
    SELECT c.customer_id, 
           c.customer_name, 
           SUM(s.quantity * p.price) AS total_spent
    FROM Customers c
    JOIN Sales s ON c.customer_id = s.customer_id
    JOIN Products p ON s.product_id = p.product_id
    GROUP BY c.customer_id, c.customer_name
)
SELECT customer_name, total_spent
FROM CustomerSpending
WHERE total_spent > 500;
Enter fullscreen mode Exit fullscreen mode

2. Recursive CTE

References itself to process hierarchical data. Useful for organizational charts or tree structures:

WITH RECURSIVE org_chart AS (
    -- Base case: top-level managers
    SELECT employee_id, manager_id, employee_name, 1 AS level
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive case: employees reporting to someone
    SELECT e.employee_id, e.manager_id, e.employee_name, oc.level + 1
    FROM employees e
    JOIN org_chart ec ON e.manager_id = ec.employee_id
)
SELECT * FROM org_chart;
Enter fullscreen mode Exit fullscreen mode

3. Multiple CTEs

Chain multiple CTEs in one statement:

WITH 
CustomerSpending AS (
    SELECT customer_id, SUM(quantity * price) AS total
    FROM Sales s
    JOIN Products p ON s.product_id = p.product_id
    GROUP BY customer_id
),
AvgSpending AS (
    SELECT AVG(total) AS avg_spent
    FROM CustomerSpending
)
SELECT cs.customer_id, cs.total
FROM CustomerSpending cs
CROSS JOIN AvgSpending av
WHERE cs.total > av.avg_spent;
Enter fullscreen mode Exit fullscreen mode

Use Cases of CTEs

1. Simplifying Complex Queries

Instead of nesting multiple subqueries:

-- Using CTEs for readability
WITH 
MonthlySales AS (
    SELECT DATE_TRUNC('month', sale_date) AS month,
           SUM(quantity * price) AS total_sales
    FROM Sales s
    JOIN Products p ON s.product_id = p.product_id
    GROUP BY DATE_TRUNC('month', sale_date)
),
AverageMonthlySales AS (
    SELECT AVG(total_sales) AS avg_monthly
    FROM MonthlySales
)
SELECT month, total_sales
FROM MonthlySales ms
CROSS JOIN AverageMonthlySales ams
WHERE ms.total_sales > ams.avg_monthly;
Enter fullscreen mode Exit fullscreen mode

2. Reusing the Same Calculation

Calculate once, reference multiple times:

WITH 
SalesSummary AS (
    SELECT product_id, 
           SUM(quantity) AS total_qty,
           SUM(quantity * price) AS total_revenue
    FROM Sales s
    JOIN Products p ON s.product_id = p.product_id
    GROUP BY product_id
)
-- Reference the CTE twice
SELECT 'Top Products' AS category, COUNT(*) AS count
FROM SalesSummary
WHERE total_revenue > 1000

UNION ALL

SELECT 'Low Performers', COUNT(*)
FROM SalesSummary
WHERE total_qty < 10;
Enter fullscreen mode Exit fullscreen mode

3. Working with Hierarchical Data

Employee organizational structure:

WITH RECURSIVE OrgStructure AS (
    SELECT employee_id, employee_name, manager_id, 1 AS level
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    SELECT e.employee_id, e.employee_name, e.manager_id, os.level + 1
    FROM employees e
    JOIN OrgStructure os ON e.manager_id = os.employee_id
)
SELECT * FROM OrgStructure;
Enter fullscreen mode Exit fullscreen mode

4. Running Totals and Window Functions

WITH DailySales AS (
    SELECT sale_date, 
           SUM(quantity * price) AS daily_total
    FROM Sales s
    JOIN Products p ON s.product_id = p.product_id
    GROUP BY sale_date
)
SELECT sale_date, 
       daily_total,
       SUM(daily_total) OVER (ORDER BY sale_date) AS running_total
FROM DailySales;
Enter fullscreen mode Exit fullscreen mode

Subqueries vs CTEs: A Clear Comparison

Aspect Subquery CTE
Definition Nested query inside another query Named temporary result set
Readability Can become messy with nesting More readable and organized
Performance Similar execution time Similar execution time
Reusability Cannot be referenced multiple times Can be referenced multiple times
Debugging Harder to debug Easier to debug step by step
Recursive Not supported Supported
Scope Limited to single query Available throughout the statement

Performance Comparison

In terms of performance, both subqueries and CTEs typically execute similarly because the database optimizer converts both to the same execution plan. However:

Factor Subquery CTE
Optimization May run multiple times (correlated) Runs once
Materialization May be materialized by optimizer May be materialized
Index Usage Uses indexes effectively Uses indexes effectively

Key Takeaway: Performance difference is negligible in most cases. Choose based on readability and maintainability.


When to Use Subqueries

  • Simple filtering with aggregate values
  • Quick, one-off queries
  • When the logic is straightforward
  • When working with limited data

Example:

-- Simple filter—use subquery
SELECT product_name, price
FROM Products
WHERE price > (
    SELECT AVG(price)
    FROM Products
);
Enter fullscreen mode Exit fullscreen mode

When to Use CTEs

  • Complex, multi-step logic
  • When you need to reference the same calculation multiple times
  • Recursive operations (hierarchical data)
  • Improved readability for you and your team

Example:

-- Complex logic—use CTE
WITH 
CustomerSpending AS (
    SELECT customer_id, SUM(quantity * price) AS total
    FROM Sales s
    JOIN Products p ON s.product_id = p.product_id
    GROUP BY customer_id
),
AvgSpending AS (
    SELECT AVG(total) AS avg_spent
    FROM CustomerSpending
)
SELECT cs.*
FROM CustomerSpending cs
CROSS JOIN AvgSpending av
WHERE cs.total > av.avg_spent;
Enter fullscreen mode Exit fullscreen mode

Summary

Subqueries

Type Use Case
Scalar in WHERE Filtering with aggregate values
Derived Table (FROM) Complex calculations as temporary table
Correlated Row-by-row comparisons
Nested Multi-level filtering

CTEs

Type Use Case
Non-Recursive Simplifying complex queries
Recursive Hierarchical data
Multiple Chaining calculations

Key Takeaways

  1. Subqueries are best for simple, one-time calculations where you filter or compare against a single aggregate value.

  2. CTEs shine when your logic is complex, multi-step, or needs to be referenced more than once.

  3. Choose readability: If your query is hard to read, use a CTE to break it into logical steps.

  4. Performance is equal: In most databases, the optimizer treats both similarly.

  5. CTEs are more maintainable: If someone else needs to understand your query, CTEs make it easier.


Final Recommendation

Scenario Recommended
Products above average price Subquery
Customers who spent more than average Subquery or CTE
Products never sold Subquery
Hierarchical employee data CTE (Recursive)
Multiple referenced calculations CTE
Complex multi-step analysis CTE

Remember: there is no strict rule; choose the tool that makes your code clearer and easier to maintain.


Top comments (0)