Stop Using Subqueries: 3 Advanced SQL CTE Patterns That Saved My Production Database

#sql #database #programming #dataengineering

We’ve all seen it. The massive, deeply nested SQL query with subqueries inside subqueries.
It’s impossible to read, a nightmare to debug, and usually performs terribly.

Early in my career as a BI Engineer, I wrote queries like that. Then, I learned about CTEs (Common Table Expressions).

Using the WITH clause changed how I write SQL forever. But simply replacing a subquery with a CTE is just the beginning.

Here are 3 advanced CTE patterns I use in production to handle millions of records cleanly and efficiently.

1. The "Pipeline" Pattern (Breaking Down Complex Logic)

The most common mistake is trying to do all aggregations, joins, and filtering in one giant SELECT statement.

Instead, use CTEs to create a logical "pipeline" where each step does exactly one thing. This makes debugging incredibly easy because you can SELECT * from any intermediate step to see what the data looks like.

-- Bad: Nested Subquery Nightmare
SELECT customer_id, total_spent
FROM (
    SELECT customer_id, SUM(amount) as total_spent
    FROM orders
    WHERE status = 'COMPLETED'
    GROUP BY customer_id
)
WHERE total_spent > 1000;


-- Good: The CTE Pipeline
WITH completed_orders AS (
    -- Step 1: Filter raw data
    SELECT customer_id, amount
    FROM orders
    WHERE status = 'COMPLETED'
),
customer_totals AS (
    -- Step 2: Aggregate
    SELECT customer_id, SUM(amount) as total_spent
    FROM completed_orders
    GROUP BY customer_id
)
-- Final Output
SELECT customer_id, total_spent
FROM customer_totals
WHERE total_spent > 1000;

2. The Recursive CTE (Navigating Hierarchies)

If you ever need to query hierarchical data—like an employee org chart, folder structures, or category trees—a recursive CTE is your best friend.

A recursive CTE references itself to loop through data until a condition is met. Let's say we want to find the entire management chain above a specific employee.

WITH RECURSIVE OrgChart AS (
    -- Base Case: Start with the specific employee
    SELECT employee_id, name, manager_id, 1 as level
    FROM employees
    WHERE employee_id = 405  -- Let's say this is 'Datta'

    UNION ALL

    -- Recursive Step: Find the manager of the previous level
    SELECT e.employee_id, e.name, e.manager_id, o.level + 1
    FROM employees e
    INNER JOIN OrgChart o ON e.employee_id = o.manager_id
)

SELECT name, level
FROM OrgChart
ORDER BY level;

3. The "Deduplication" Pattern (Using Window Functions)

Data engineering is 50% writing pipelines and 50% cleaning up duplicate records.

When you have duplicates and only want to keep the most recent record for each user, combining a CTE with the ROW_NUMBER() window function is the cleanest, most performant way to do it.

WITH RankedLogins AS (
    SELECT 
        user_id,
        login_timestamp,
        ip_address,
        -- Assign a row number partitioned by user, ordered by newest first
        ROW_NUMBER() OVER (
            PARTITION BY user_id 
            ORDER BY login_timestamp DESC
        ) as rn
    FROM user_logins
)

-- Select only the most recent login (rn = 1)
SELECT user_id, login_timestamp, ip_address
FROM RankedLogins
WHERE rn = 1;

🎯 The Bottom Line

CTEs aren't just syntax sugar; they are a structural framework for writing maintainable code.

When you are building BI dashboards or automated reporting pipelines, the SQL you write today needs to be readable by the engineer who inherits it 6 months from now. CTEs ensure your logic is modular, readable, and easy to test.

If you found this useful, I regularly share insights on Data Engineering, Python, and BI architecture over at my portfolio: dattasable.com

What is your favorite SQL trick that most beginners don't know about? Drop it in the comments! 👇