One SQL query to traverse entire hierarchies. No loops, no N+1 queries, no tears. Just elegant recursive CTEs.
Your product manager walks up to your desk with that look. You know the one.
"Hey, can you pull all products in the 'Electronics' category? Oh, and include all subcategories too. And their subcategories. You know, the whole tree."
Your internal monologue: "Oh no. Not the N+1 problem again."
Your options:
- Multiple database queries - SELECT children, then grandchildren, then great-grandchildren... (Slow. So slow.)
- Complex application logic - Recursive functions that hit the database repeatedly (N+1 hell)
- Nested loops - Building the tree in code with increasingly unreadable logic (Spaghetti)
Or... you could write one elegant SQL query with a recursive CTE and go grab coffee while your teammates wonder if you're a wizard.
This is Part 2 of 3 in our CTE series. In Part 1, we covered basic CTEs. Now we're going recursive.
TL;DR
- Recursive CTEs = CTEs that reference themselves to traverse hierarchies
- Perfect for: org charts, category trees, file systems, comment threads, bill of materials
- Two parts: Anchor (starting point) + Recursive member (the magic)
- One query instead of N queries or complex loops
- Watch out for: Infinite loops, memory usage, runaway recursion
- Always include depth limiters for safety
This series:
- Part 1: Basic CTEs (readability & maintenance)
- Part 2 (this): Recursive CTEs for hierarchical data
- Part 3: Using CTEs with ORMs that don't support them
Quick glossary:
- Recursive CTE: A CTE that references itself to traverse tree or graph structures
- Anchor member: The starting point (roots of your tree)
- Recursive member: The part that references the CTE itself
- Hierarchy/Tree: Data with parent-child relationships (employees → managers, categories → subcategories)
- UNION ALL: Combines results from anchor and recursive parts without removing duplicates (faster than UNION, which we want here)
Quick Recap: What's a CTE Again?
From Part 1: CTEs are named subqueries that make complex SQL readable and maintainable.
WITH recent_orders AS (
SELECT user_id FROM orders WHERE order_date > '2025-01-01'
)
SELECT * FROM users
WHERE id IN (SELECT user_id FROM recent_orders);
That's a non-recursive CTE. It runs once, produces a result, done.
Now let's make it recursive. 🔄
The Hierarchy Nightmare
Hierarchical data is everywhere in software:
- E-commerce: Categories → Subcategories → Products
- Corporate: CEO → VPs → Directors → Managers → Employees
- File systems: Root → Folders → Subfolders → Files
- Social: Posts → Comments → Replies → Nested replies
- Manufacturing: Products → Assemblies → Components → Raw materials
The Traditional Approaches All Suck
Approach 1: Multiple Queries (The Slow One)
# Get parent category
parent = db.query("SELECT * FROM categories WHERE id = 1")
# Get children
children = db.query("SELECT * FROM categories WHERE parent_id = 1")
# Get grandchildren
for child in children:
grandchildren = db.query(f"SELECT * FROM categories WHERE parent_id = {child.id}")
# And so on... 😱
- Slow: N+1 queries (or worse: N² queries)
- Network overhead for each query
- Scales terribly
Approach 2: Recursive Application Code (The Bug Factory)
def get_all_descendants(category_id, visited=None):
if visited is None:
visited = set()
if category_id in visited: # Cycle detection
return []
visited.add(category_id)
children = db.query(
"SELECT * FROM categories WHERE parent_id = ?",
[category_id]
)
results = children
for child in children:
results.extend(get_all_descendants(child.id, visited))
return results
- Still N+1 queries (one per level)
- Cycle detection in application code (bugs everywhere)
- Stack overflow on deep hierarchies
- Complex to test and maintain
Approach 3: Materialized Paths (The Maintenance Nightmare)
-- Store full path in each row
CREATE TABLE categories (
id INT,
name VARCHAR(100),
path VARCHAR(1000) -- "/1/5/23/"
);
-- Query becomes simple
SELECT * FROM categories WHERE path LIKE '/1/%';
- Fast queries ✅
- Nightmare to maintain ❌
- Breaks on category moves
- Path length limits
- Denormalized data
There's a better way. Let the database do what databases are good at: set operations and recursion.
Anatomy of a Recursive CTE
A recursive CTE has two parts connected by UNION ALL:
WITH RECURSIVE cte_name AS (
-- Part 1: ANCHOR MEMBER (starting point)
SELECT id, name, parent_id, 0 AS level
FROM table_name
WHERE parent_id IS NULL -- Root rows
UNION ALL
-- Part 2: RECURSIVE MEMBER (the magic)
SELECT t.id, t.name, t.parent_id, cte.level + 1
FROM table_name t
JOIN cte_name cte ON t.parent_id = cte.id -- Reference itself!
)
SELECT * FROM cte_name;
How It Works
- Execute anchor member → Get initial rows (the roots)
- Execute recursive member using those rows
- Repeat step 2 with the new rows
- Stop when no new rows are returned
- UNION ALL combines everything into the final result
Visual example:
Iteration 0 (Anchor):
SELECT WHERE parent_id IS NULL → Returns: [CEO]
Iteration 1 (Recursive):
JOIN with [CEO] → Returns: [VP1, VP2, VP3]
Iteration 2 (Recursive):
JOIN with [VP1, VP2, VP3] → Returns: [Dir1, Dir2, Dir3, Dir4]
Iteration 3 (Recursive):
JOIN with [Dir1...Dir4] → Returns: [Manager1, Manager2, ...]
Iteration 4 (Recursive):
JOIN with [Manager1...] → Returns: [Employee1, Employee2, ...]
Iteration 5 (Recursive):
JOIN → Returns: [] ← STOP, no more rows
Final result: CEO + all VPs + all Directors + all Managers + all Employees
Key insight: The database handles the iteration. You just define the rules.
Real-World Example 1: Company Org Chart
Let's build a complete organizational hierarchy with all the trimmings.
Setup:
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(100),
manager_id INT,
department VARCHAR(50),
salary DECIMAL(10,2)
);
The Query:
WITH RECURSIVE org_hierarchy AS (
-- Anchor: Start with CEO (no manager)
SELECT
id,
name,
manager_id,
department,
salary,
0 as level,
CAST(name AS VARCHAR(1000)) as hierarchy_path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive: Find all direct reports
SELECT
e.id,
e.name,
e.manager_id,
e.department,
e.salary,
oh.level + 1,
CONCAT(oh.hierarchy_path, ' > ', e.name)
FROM employees e
JOIN org_hierarchy oh ON e.manager_id = oh.id
)
SELECT
level,
REPEAT(' ', level) || name as indented_name,
department,
salary,
hierarchy_path
FROM org_hierarchy
ORDER BY hierarchy_path;
What You Get:
level | indented_name | department | salary | hierarchy_path
------|---------------------|------------|---------|---------------------------
0 | Sarah Chen | Executive | 250000 | Sarah Chen
1 | Mike Johnson | Sales | 150000 | Sarah Chen > Mike Johnson
2 | Anna Smith | Sales | 95000 | Sarah Chen > Mike Johnson > Anna Smith
2 | Bob Williams | Sales | 92000 | Sarah Chen > Mike Johnson > Bob Williams
1 | Lisa Anderson | Engineering| 160000 | Sarah Chen > Lisa Anderson
2 | Tom Davis | Engineering| 110000 | Sarah Chen > Lisa Anderson > Tom Davis
Benefits:
✅ Complete org structure in one query
✅ Indentation shows hierarchy visually
✅ Full path from CEO to each employee
✅ Can add filters, aggregations, whatever you need
Want just one manager's team? Change the anchor:
WHERE id = 123 -- Start from specific manager
Real-World Example 2: E-commerce Categories
Your product manager's request from the intro. Here's how to solve it elegantly.
Problem: Show all products in "Electronics" including all subcategories (Phones → Smartphones → iPhone, etc.)
The Query:
WITH RECURSIVE category_tree AS (
-- Anchor: Start with parent category
SELECT id, name, parent_id, 0 as depth
FROM categories
WHERE id = :category_id -- "Electronics" = 5
UNION ALL
-- Recursive: Get all subcategories
SELECT c.id, c.name, c.parent_id, ct.depth + 1
FROM categories c
JOIN category_tree ct ON c.parent_id = ct.id
)
SELECT
DISTINCT p.id,
p.name,
p.price,
c.name as category_name,
ct.depth as category_depth
FROM category_tree ct
JOIN products p ON p.category_id = ct.id
JOIN categories c ON c.id = ct.id
ORDER BY ct.depth, c.name, p.name;
Result: All products in Electronics, Phones, Smartphones, Tablets, Laptops, Gaming... the entire subtree. One query.
Without recursive CTE: You'd need to either:
- Query each level separately (unknown depth = impossible)
- Load all categories and traverse in code (slow, complex)
- Maintain materialized paths (fragile)
With recursive CTE: Define the traversal rules once, database handles the rest.
Common Patterns Quick Reference
Here are battle-tested patterns you can adapt for your use cases.
Bill of Materials (Manufacturing)
Calculate all components needed to build a product, including nested assemblies:
WITH RECURSIVE bom AS (
-- Anchor: Top-level product
SELECT
product_id,
component_id,
quantity,
0 as level
FROM bill_of_materials
WHERE product_id = :target_product
UNION ALL
-- Recursive: Components of components
SELECT
bom_next.product_id,
bom_next.component_id,
bom.quantity * bom_next.quantity, -- Multiply quantities!
bom.level + 1
FROM bill_of_materials bom_next
JOIN bom ON bom_next.product_id = bom.component_id
)
SELECT
component_id,
SUM(quantity) as total_needed
FROM bom
GROUP BY component_id;
Use case: "To build 100 bikes, I need 200 wheels, 400 spokes, 200 tires..."
File System Traversal
Calculate total size of a directory and all subdirectories:
WITH RECURSIVE dir_tree AS (
-- Anchor: Starting directory
SELECT id, name, parent_id, size, 0 as depth
FROM filesystem
WHERE id = :directory_id
UNION ALL
-- Recursive: All subdirectories and files
SELECT
f.id,
f.name,
f.parent_id,
f.size,
dt.depth + 1
FROM filesystem f
JOIN dir_tree dt ON f.parent_id = dt.id
)
SELECT
SUM(size) as total_size,
COUNT(*) as total_items,
MAX(depth) as max_depth
FROM dir_tree;
Use case: "Show disk usage for this folder and everything inside it."
Social Networks (Degrees of Separation)
Find all people within 3 degrees of separation:
WITH RECURSIVE connections AS (
-- Anchor: Direct friends
SELECT
user_id,
friend_id,
1 as degree
FROM friendships
WHERE user_id = :start_user
UNION ALL
-- Recursive: Friends of friends
SELECT
c.user_id,
f.friend_id,
c.degree + 1
FROM connections c
JOIN friendships f ON c.friend_id = f.user_id
WHERE c.degree < 3 -- Stop at 3 degrees
)
SELECT DISTINCT
friend_id,
MIN(degree) as closest_degree
FROM connections
GROUP BY friend_id
ORDER BY closest_degree;
Use case: "People you may know" features, network analysis, influence mapping.
Performance: The Good and The Careful
The Good
✅ Single query vs multiple round trips to database
✅ Database-optimized traversal using indexes and join strategies
✅ Clean code that's easy to understand and maintain
✅ Handles arbitrary depth without hardcoding levels
✅ Set-based operations instead of row-by-row processing
The Careful
⚠️ Memory-intensive on large hierarchies (thousands of nodes)
⚠️ Infinite loops if your data has cycles (A → B → C → A)
⚠️ Performance varies significantly between database systems
⚠️ No early termination - processes entire tree even if you only need part
Safety First: Always Include Depth Limiters
Never write a recursive CTE without a depth limiter:
WITH RECURSIVE tree AS (
SELECT id, parent_id, 0 as level
FROM table_name
WHERE parent_id IS NULL
UNION ALL
SELECT t.id, t.parent_id, tree.level + 1
FROM table_name t
JOIN tree ON t.parent_id = tree.id
WHERE tree.level < 10 -- 🎯 SAFETY LIMIT
)
SELECT * FROM tree;
Why?
- Protects against runaway recursion
- Catches data quality issues (cycles, infinite loops)
- Prevents database from running out of memory
- Makes query timeout predictable
How to choose the limit?
- Know your data: What's the realistic max depth?
- Add buffer: If org chart is 6 levels, set limit to 10
- Monitor: Log warnings when limit is hit (data quality issue)
Detecting Cycles in Your Data
If you suspect your data might have cycles (A → B → C → A), add cycle detection:
WITH RECURSIVE tree AS (
-- Anchor: Include path as array
SELECT
id,
parent_id,
ARRAY[id] as path,
0 as level
FROM table_name
WHERE parent_id IS NULL
UNION ALL
-- Recursive: Check if current id is already in path
SELECT
t.id,
t.parent_id,
tree.path || t.id, -- Append to path
tree.level + 1
FROM table_name t
JOIN tree ON t.parent_id = tree.id
WHERE NOT t.id = ANY(tree.path) -- 🎯 CYCLE DETECTION
AND tree.level < 10
)
SELECT * FROM tree;
How it works:
-
patharray stores all ancestors -
NOT t.id = ANY(tree.path)stops when we'd revisit a node - Prevents infinite loops
- Makes cycles visible in your data
Production tip: If cycles exist, fix your data. Don't rely on detection as a permanent solution.
Use EXPLAIN ANALYZE
Always verify performance with real data:
EXPLAIN ANALYZE
WITH RECURSIVE org_hierarchy AS (
-- Your recursive CTE here
)
SELECT * FROM org_hierarchy;
Look for:
- Recursive scan operations
- Number of iterations
- Memory usage (work_mem in PostgreSQL)
- Total execution time
Red flags:
- Execution time > 1 second for < 10,000 nodes
- Many more iterations than expected depth
- Hash joins on large intermediate results
Database Support Reality Check
Recursive CTEs are well-supported across major databases:
| Database | Support Since | Notes |
|---|---|---|
| PostgreSQL | 2009 (v8.4) | ⭐ Excellent: reliable, performant, well-documented |
| SQL Server | 2005 | ⭐ Native support from the beginning |
| Oracle | 2010 (11g R2) | ✅ Full support (also has CONNECT BY for legacy) |
| MySQL | 2018 (v8.0) | ✅ Supported but newer, test thoroughly |
| MariaDB | 2018 (v10.2.2) | ✅ Follows MySQL implementation |
Translation: Every modern database has recursive CTEs. They're not experimental. They're production-ready.
But your ORM still doesn't care. (We'll fix that in Part 3.)
When to Use Recursive CTEs
✅ Use Them For:
- Hierarchical data with parent-child relationships (always)
- Unknown depth structures (can't hardcode JOIN levels)
- Graph traversal with proper cycle protection
- One-time traversals where you'd otherwise write loops
- Replacing N+1 queries with single database operation
❌ Don't Use Them For:
- Simple parent-child (one level) - regular JOIN is simpler
-- Just use a JOIN
SELECT p.*, c.*
FROM parents p
JOIN children c ON c.parent_id = p.id;
- Known fixed depth - explicit JOINs are clearer
-- If you always need exactly 2 levels, be explicit
SELECT p.*, c.*, gc.*
FROM parents p
JOIN children c ON c.parent_id = p.id
JOIN grandchildren gc ON gc.parent_id = c.id;
- Non-hierarchical data - basic CTEs are better
- When application logic is simpler - sometimes code is clearer
The rule: Use recursive CTEs when the alternative is multiple queries, complex loops, or unknown depth. Don't use them just to be clever.
Coming Next
You now understand both types of CTEs:
- Non-recursive (Part 1): Clean, maintainable complex queries
- Recursive (Part 2): Elegant hierarchy traversal
But there's one problem: Doctrine, Hibernate, Eloquent, and most ORMs don't support CTEs natively.
In Part 3, we'll cover:
- ORMs that DO support CTEs (SQLAlchemy, jOOQ, Django 4.2+)
- Strategies for hostile ORMs (Repository pattern, database views, hybrid approach)
- Security (parameterized queries - non-negotiable)
- When to use CTEs vs alternatives (decision tree)
- Performance tuning (forcing materialization when needed)
"Using CTEs When Your ORM Says No (The Lazy Developer's Survival Guide)" - Coming soon.
Bottom Line
Recursive CTEs transform complex hierarchical problems into elegant, performant solutions.
The psychopathic approach: Write recursive functions with N+1 queries, hope for the best, watch it crash in production, blame the database.
The masochistic touch: Know there's a better way but stick with painful code because "it works" and you're scared of SQL.
The professional approach: Write one recursive CTE, let the database do what it's optimized for, go home on time.
Your ORM probably doesn't support them. That's fine - we'll work around it in Part 3. But don't let tool limitations stop you from solving problems elegantly.
One query to rule them all, one query to find them. One query to bring them all, and in the resultset bind them.
Got a hierarchy horror story? Share it in the comments. We've all fought the N+1 dragon.
📬 Want essays like this in your inbox?
I just launched a newsletter about thinking clearly in tech — no spam, no fluff.
Subscribe here: https://buttondown.com/efficientlaziness
Efficient Laziness — Think once, well, and move forward.
Top comments (0)