We’ve all seen it. The massive, deeply nested SQL query with subqueries inside subqueries.
It’s impossible to read, a nightmare to debug, and usually performs terribly.
Early in my career as a BI Engineer, I wrote queries like that. Then, I learned about CTEs (Common Table Expressions).
Using the WITH clause changed how I write SQL forever. But simply replacing a subquery with a CTE is just the beginning.
Here are 3 advanced CTE patterns I use in production to handle millions of records cleanly and efficiently.
1. The "Pipeline" Pattern (Breaking Down Complex Logic)
The most common mistake is trying to do all aggregations, joins, and filtering in one giant SELECT statement.
Instead, use CTEs to create a logical "pipeline" where each step does exactly one thing. This makes debugging incredibly easy because you can SELECT * from any intermediate step to see what the data looks like.
-- Bad: Nested Subquery Nightmare
SELECT customer_id, total_spent
FROM (
SELECT customer_id, SUM(amount) as total_spent
FROM orders
WHERE status = 'COMPLETED'
GROUP BY customer_id
)
WHERE total_spent > 1000;
-- Good: The CTE Pipeline
WITH completed_orders AS (
-- Step 1: Filter raw data
SELECT customer_id, amount
FROM orders
WHERE status = 'COMPLETED'
),
customer_totals AS (
-- Step 2: Aggregate
SELECT customer_id, SUM(amount) as total_spent
FROM completed_orders
GROUP BY customer_id
)
-- Final Output
SELECT customer_id, total_spent
FROM customer_totals
WHERE total_spent > 1000;
2. The Recursive CTE (Navigating Hierarchies)
If you ever need to query hierarchical data—like an employee org chart, folder structures, or category trees—a recursive CTE is your best friend.
A recursive CTE references itself to loop through data until a condition is met. Let's say we want to find the entire management chain above a specific employee.
WITH RECURSIVE OrgChart AS (
-- Base Case: Start with the specific employee
SELECT employee_id, name, manager_id, 1 as level
FROM employees
WHERE employee_id = 405 -- Let's say this is 'Datta'
UNION ALL
-- Recursive Step: Find the manager of the previous level
SELECT e.employee_id, e.name, e.manager_id, o.level + 1
FROM employees e
INNER JOIN OrgChart o ON e.employee_id = o.manager_id
)
SELECT name, level
FROM OrgChart
ORDER BY level;
3. The "Deduplication" Pattern (Using Window Functions)
Data engineering is 50% writing pipelines and 50% cleaning up duplicate records.
When you have duplicates and only want to keep the most recent record for each user, combining a CTE with the ROW_NUMBER() window function is the cleanest, most performant way to do it.
WITH RankedLogins AS (
SELECT
user_id,
login_timestamp,
ip_address,
-- Assign a row number partitioned by user, ordered by newest first
ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY login_timestamp DESC
) as rn
FROM user_logins
)
-- Select only the most recent login (rn = 1)
SELECT user_id, login_timestamp, ip_address
FROM RankedLogins
WHERE rn = 1;
🎯 The Bottom Line
CTEs aren't just syntax sugar; they are a structural framework for writing maintainable code.
When you are building BI dashboards or automated reporting pipelines, the SQL you write today needs to be readable by the engineer who inherits it 6 months from now. CTEs ensure your logic is modular, readable, and easy to test.
If you found this useful, I regularly share insights on Data Engineering, Python, and BI architecture over at my portfolio: dattasable.com
What is your favorite SQL trick that most beginners don't know about? Drop it in the comments! 👇
Top comments (2)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.