DEV Community

Cover image for Recursive CTEs: Because Writing Nested Loops for Trees is Psychopathic (and a Little Masochistic Too)
Pascal CESCATO
Pascal CESCATO

Posted on

Recursive CTEs: Because Writing Nested Loops for Trees is Psychopathic (and a Little Masochistic Too)

One SQL query to traverse entire hierarchies. No loops, no N+1 queries, no tears. Just elegant recursive CTEs.

Your product manager walks up to your desk with that look. You know the one.

"Hey, can you pull all products in the 'Electronics' category? Oh, and include all subcategories too. And their subcategories. You know, the whole tree."

Your internal monologue: "Oh no. Not the N+1 problem again."

Your options:

  1. Multiple database queries - SELECT children, then grandchildren, then great-grandchildren... (Slow. So slow.)
  2. Complex application logic - Recursive functions that hit the database repeatedly (N+1 hell)
  3. Nested loops - Building the tree in code with increasingly unreadable logic (Spaghetti)

Or... you could write one elegant SQL query with a recursive CTE and go grab coffee while your teammates wonder if you're a wizard.

This is Part 2 of 3 in our CTE series. In Part 1, we covered basic CTEs. Now we're going recursive.

TL;DR

  • Recursive CTEs = CTEs that reference themselves to traverse hierarchies
  • Perfect for: org charts, category trees, file systems, comment threads, bill of materials
  • Two parts: Anchor (starting point) + Recursive member (the magic)
  • One query instead of N queries or complex loops
  • Watch out for: Infinite loops, memory usage, runaway recursion
  • Always include depth limiters for safety

This series:

  • Part 1: Basic CTEs (readability & maintenance)
  • Part 2 (this): Recursive CTEs for hierarchical data
  • Part 3: Using CTEs with ORMs that don't support them

Quick glossary:

  • Recursive CTE: A CTE that references itself to traverse tree or graph structures
  • Anchor member: The starting point (roots of your tree)
  • Recursive member: The part that references the CTE itself
  • Hierarchy/Tree: Data with parent-child relationships (employees → managers, categories → subcategories)
  • UNION ALL: Combines results from anchor and recursive parts without removing duplicates (faster than UNION, which we want here)

Quick Recap: What's a CTE Again?

From Part 1: CTEs are named subqueries that make complex SQL readable and maintainable.

WITH recent_orders AS (
    SELECT user_id FROM orders WHERE order_date > '2025-01-01'
)
SELECT * FROM users 
WHERE id IN (SELECT user_id FROM recent_orders);
Enter fullscreen mode Exit fullscreen mode

That's a non-recursive CTE. It runs once, produces a result, done.

Now let's make it recursive. 🔄

The Hierarchy Nightmare

Hierarchical data is everywhere in software:

  • E-commerce: Categories → Subcategories → Products
  • Corporate: CEO → VPs → Directors → Managers → Employees
  • File systems: Root → Folders → Subfolders → Files
  • Social: Posts → Comments → Replies → Nested replies
  • Manufacturing: Products → Assemblies → Components → Raw materials

The Traditional Approaches All Suck

Approach 1: Multiple Queries (The Slow One)

# Get parent category
parent = db.query("SELECT * FROM categories WHERE id = 1")

# Get children
children = db.query("SELECT * FROM categories WHERE parent_id = 1")

# Get grandchildren
for child in children:
    grandchildren = db.query(f"SELECT * FROM categories WHERE parent_id = {child.id}")
    # And so on... 😱
Enter fullscreen mode Exit fullscreen mode
  • Slow: N+1 queries (or worse: N² queries)
  • Network overhead for each query
  • Scales terribly

Approach 2: Recursive Application Code (The Bug Factory)

def get_all_descendants(category_id, visited=None):
    if visited is None:
        visited = set()

    if category_id in visited:  # Cycle detection
        return []

    visited.add(category_id)

    children = db.query(
        "SELECT * FROM categories WHERE parent_id = ?", 
        [category_id]
    )

    results = children
    for child in children:
        results.extend(get_all_descendants(child.id, visited))

    return results
Enter fullscreen mode Exit fullscreen mode
  • Still N+1 queries (one per level)
  • Cycle detection in application code (bugs everywhere)
  • Stack overflow on deep hierarchies
  • Complex to test and maintain

Approach 3: Materialized Paths (The Maintenance Nightmare)

-- Store full path in each row
CREATE TABLE categories (
    id INT,
    name VARCHAR(100),
    path VARCHAR(1000)  -- "/1/5/23/"
);

-- Query becomes simple
SELECT * FROM categories WHERE path LIKE '/1/%';
Enter fullscreen mode Exit fullscreen mode
  • Fast queries ✅
  • Nightmare to maintain ❌
  • Breaks on category moves
  • Path length limits
  • Denormalized data

There's a better way. Let the database do what databases are good at: set operations and recursion.

Anatomy of a Recursive CTE

A recursive CTE has two parts connected by UNION ALL:

WITH RECURSIVE cte_name AS (
    -- Part 1: ANCHOR MEMBER (starting point)
    SELECT id, name, parent_id, 0 AS level
    FROM table_name
    WHERE parent_id IS NULL  -- Root rows

    UNION ALL

    -- Part 2: RECURSIVE MEMBER (the magic)
    SELECT t.id, t.name, t.parent_id, cte.level + 1
    FROM table_name t
    JOIN cte_name cte ON t.parent_id = cte.id  -- Reference itself!
)
SELECT * FROM cte_name;
Enter fullscreen mode Exit fullscreen mode

How It Works

  1. Execute anchor member → Get initial rows (the roots)
  2. Execute recursive member using those rows
  3. Repeat step 2 with the new rows
  4. Stop when no new rows are returned
  5. UNION ALL combines everything into the final result

Visual example:

Iteration 0 (Anchor):
    SELECT WHERE parent_id IS NULL → Returns: [CEO]

Iteration 1 (Recursive):
    JOIN with [CEO] → Returns: [VP1, VP2, VP3]

Iteration 2 (Recursive):
    JOIN with [VP1, VP2, VP3] → Returns: [Dir1, Dir2, Dir3, Dir4]

Iteration 3 (Recursive):
    JOIN with [Dir1...Dir4] → Returns: [Manager1, Manager2, ...]

Iteration 4 (Recursive):
    JOIN with [Manager1...] → Returns: [Employee1, Employee2, ...]

Iteration 5 (Recursive):
    JOIN → Returns: []  ← STOP, no more rows

Final result: CEO + all VPs + all Directors + all Managers + all Employees
Enter fullscreen mode Exit fullscreen mode

Key insight: The database handles the iteration. You just define the rules.

Real-World Example 1: Company Org Chart

Let's build a complete organizational hierarchy with all the trimmings.

Setup:

CREATE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    manager_id INT,
    department VARCHAR(50),
    salary DECIMAL(10,2)
);
Enter fullscreen mode Exit fullscreen mode

The Query:

WITH RECURSIVE org_hierarchy AS (
    -- Anchor: Start with CEO (no manager)
    SELECT
        id,
        name,
        manager_id,
        department,
        salary,
        0 as level,
        CAST(name AS VARCHAR(1000)) as hierarchy_path
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive: Find all direct reports
    SELECT 
        e.id,
        e.name,
        e.manager_id,
        e.department,
        e.salary,
        oh.level + 1,
        CONCAT(oh.hierarchy_path, ' > ', e.name)
    FROM employees e
    JOIN org_hierarchy oh ON e.manager_id = oh.id
)
SELECT
    level,
    REPEAT('  ', level) || name as indented_name,
    department,
    salary,
    hierarchy_path
FROM org_hierarchy
ORDER BY hierarchy_path;
Enter fullscreen mode Exit fullscreen mode

What You Get:

level | indented_name        | department | salary  | hierarchy_path
------|---------------------|------------|---------|---------------------------
0     | Sarah Chen          | Executive  | 250000  | Sarah Chen
1     |   Mike Johnson      | Sales      | 150000  | Sarah Chen > Mike Johnson
2     |     Anna Smith      | Sales      | 95000   | Sarah Chen > Mike Johnson > Anna Smith
2     |     Bob Williams    | Sales      | 92000   | Sarah Chen > Mike Johnson > Bob Williams
1     |   Lisa Anderson     | Engineering| 160000  | Sarah Chen > Lisa Anderson
2     |     Tom Davis       | Engineering| 110000  | Sarah Chen > Lisa Anderson > Tom Davis
Enter fullscreen mode Exit fullscreen mode

Benefits:
✅ Complete org structure in one query
✅ Indentation shows hierarchy visually
✅ Full path from CEO to each employee
✅ Can add filters, aggregations, whatever you need

Want just one manager's team? Change the anchor:

WHERE id = 123  -- Start from specific manager
Enter fullscreen mode Exit fullscreen mode

Real-World Example 2: E-commerce Categories

Your product manager's request from the intro. Here's how to solve it elegantly.

Problem: Show all products in "Electronics" including all subcategories (Phones → Smartphones → iPhone, etc.)

The Query:

WITH RECURSIVE category_tree AS (
    -- Anchor: Start with parent category
    SELECT id, name, parent_id, 0 as depth
    FROM categories
    WHERE id = :category_id  -- "Electronics" = 5

    UNION ALL

    -- Recursive: Get all subcategories
    SELECT c.id, c.name, c.parent_id, ct.depth + 1
    FROM categories c
    JOIN category_tree ct ON c.parent_id = ct.id
)
SELECT 
    DISTINCT p.id,
    p.name,
    p.price,
    c.name as category_name,
    ct.depth as category_depth
FROM category_tree ct
JOIN products p ON p.category_id = ct.id
JOIN categories c ON c.id = ct.id
ORDER BY ct.depth, c.name, p.name;
Enter fullscreen mode Exit fullscreen mode

Result: All products in Electronics, Phones, Smartphones, Tablets, Laptops, Gaming... the entire subtree. One query.

Without recursive CTE: You'd need to either:

  1. Query each level separately (unknown depth = impossible)
  2. Load all categories and traverse in code (slow, complex)
  3. Maintain materialized paths (fragile)

With recursive CTE: Define the traversal rules once, database handles the rest.

Common Patterns Quick Reference

Here are battle-tested patterns you can adapt for your use cases.

Bill of Materials (Manufacturing)

Calculate all components needed to build a product, including nested assemblies:

WITH RECURSIVE bom AS (
    -- Anchor: Top-level product
    SELECT 
        product_id, 
        component_id, 
        quantity, 
        0 as level
    FROM bill_of_materials
    WHERE product_id = :target_product

    UNION ALL

    -- Recursive: Components of components
    SELECT 
        bom_next.product_id,
        bom_next.component_id,
        bom.quantity * bom_next.quantity,  -- Multiply quantities!
        bom.level + 1
    FROM bill_of_materials bom_next
    JOIN bom ON bom_next.product_id = bom.component_id
)
SELECT 
    component_id, 
    SUM(quantity) as total_needed
FROM bom
GROUP BY component_id;
Enter fullscreen mode Exit fullscreen mode

Use case: "To build 100 bikes, I need 200 wheels, 400 spokes, 200 tires..."

File System Traversal

Calculate total size of a directory and all subdirectories:

WITH RECURSIVE dir_tree AS (
    -- Anchor: Starting directory
    SELECT id, name, parent_id, size, 0 as depth
    FROM filesystem
    WHERE id = :directory_id

    UNION ALL

    -- Recursive: All subdirectories and files
    SELECT 
        f.id, 
        f.name, 
        f.parent_id, 
        f.size, 
        dt.depth + 1
    FROM filesystem f
    JOIN dir_tree dt ON f.parent_id = dt.id
)
SELECT 
    SUM(size) as total_size,
    COUNT(*) as total_items,
    MAX(depth) as max_depth
FROM dir_tree;
Enter fullscreen mode Exit fullscreen mode

Use case: "Show disk usage for this folder and everything inside it."

Social Networks (Degrees of Separation)

Find all people within 3 degrees of separation:

WITH RECURSIVE connections AS (
    -- Anchor: Direct friends
    SELECT 
        user_id, 
        friend_id, 
        1 as degree
    FROM friendships
    WHERE user_id = :start_user

    UNION ALL

    -- Recursive: Friends of friends
    SELECT 
        c.user_id,
        f.friend_id,
        c.degree + 1
    FROM connections c
    JOIN friendships f ON c.friend_id = f.user_id
    WHERE c.degree < 3  -- Stop at 3 degrees
)
SELECT DISTINCT 
    friend_id,
    MIN(degree) as closest_degree
FROM connections
GROUP BY friend_id
ORDER BY closest_degree;
Enter fullscreen mode Exit fullscreen mode

Use case: "People you may know" features, network analysis, influence mapping.

Performance: The Good and The Careful

The Good

Single query vs multiple round trips to database
Database-optimized traversal using indexes and join strategies
Clean code that's easy to understand and maintain
Handles arbitrary depth without hardcoding levels
Set-based operations instead of row-by-row processing

The Careful

⚠️ Memory-intensive on large hierarchies (thousands of nodes)
⚠️ Infinite loops if your data has cycles (A → B → C → A)
⚠️ Performance varies significantly between database systems
⚠️ No early termination - processes entire tree even if you only need part

Safety First: Always Include Depth Limiters

Never write a recursive CTE without a depth limiter:

WITH RECURSIVE tree AS (
    SELECT id, parent_id, 0 as level
    FROM table_name
    WHERE parent_id IS NULL

    UNION ALL

    SELECT t.id, t.parent_id, tree.level + 1
    FROM table_name t
    JOIN tree ON t.parent_id = tree.id
    WHERE tree.level < 10  -- 🎯 SAFETY LIMIT
)
SELECT * FROM tree;
Enter fullscreen mode Exit fullscreen mode

Why?

  • Protects against runaway recursion
  • Catches data quality issues (cycles, infinite loops)
  • Prevents database from running out of memory
  • Makes query timeout predictable

How to choose the limit?

  • Know your data: What's the realistic max depth?
  • Add buffer: If org chart is 6 levels, set limit to 10
  • Monitor: Log warnings when limit is hit (data quality issue)

Detecting Cycles in Your Data

If you suspect your data might have cycles (A → B → C → A), add cycle detection:

WITH RECURSIVE tree AS (
    -- Anchor: Include path as array
    SELECT 
        id, 
        parent_id, 
        ARRAY[id] as path,
        0 as level
    FROM table_name 
    WHERE parent_id IS NULL

    UNION ALL

    -- Recursive: Check if current id is already in path
    SELECT 
        t.id,
        t.parent_id,
        tree.path || t.id,  -- Append to path
        tree.level + 1
    FROM table_name t
    JOIN tree ON t.parent_id = tree.id
    WHERE NOT t.id = ANY(tree.path)  -- 🎯 CYCLE DETECTION
      AND tree.level < 10
)
SELECT * FROM tree;
Enter fullscreen mode Exit fullscreen mode

How it works:

  • path array stores all ancestors
  • NOT t.id = ANY(tree.path) stops when we'd revisit a node
  • Prevents infinite loops
  • Makes cycles visible in your data

Production tip: If cycles exist, fix your data. Don't rely on detection as a permanent solution.

Use EXPLAIN ANALYZE

Always verify performance with real data:

EXPLAIN ANALYZE
WITH RECURSIVE org_hierarchy AS (
    -- Your recursive CTE here
)
SELECT * FROM org_hierarchy;
Enter fullscreen mode Exit fullscreen mode

Look for:

  • Recursive scan operations
  • Number of iterations
  • Memory usage (work_mem in PostgreSQL)
  • Total execution time

Red flags:

  • Execution time > 1 second for < 10,000 nodes
  • Many more iterations than expected depth
  • Hash joins on large intermediate results

Database Support Reality Check

Recursive CTEs are well-supported across major databases:

Database Support Since Notes
PostgreSQL 2009 (v8.4) ⭐ Excellent: reliable, performant, well-documented
SQL Server 2005 ⭐ Native support from the beginning
Oracle 2010 (11g R2) ✅ Full support (also has CONNECT BY for legacy)
MySQL 2018 (v8.0) ✅ Supported but newer, test thoroughly
MariaDB 2018 (v10.2.2) ✅ Follows MySQL implementation

Translation: Every modern database has recursive CTEs. They're not experimental. They're production-ready.

But your ORM still doesn't care. (We'll fix that in Part 3.)

When to Use Recursive CTEs

✅ Use Them For:

  • Hierarchical data with parent-child relationships (always)
  • Unknown depth structures (can't hardcode JOIN levels)
  • Graph traversal with proper cycle protection
  • One-time traversals where you'd otherwise write loops
  • Replacing N+1 queries with single database operation

❌ Don't Use Them For:

  • Simple parent-child (one level) - regular JOIN is simpler
-- Just use a JOIN
SELECT p.*, c.* 
FROM parents p 
JOIN children c ON c.parent_id = p.id;
Enter fullscreen mode Exit fullscreen mode
  • Known fixed depth - explicit JOINs are clearer
-- If you always need exactly 2 levels, be explicit
SELECT p.*, c.*, gc.*
FROM parents p
JOIN children c ON c.parent_id = p.id
JOIN grandchildren gc ON gc.parent_id = c.id;
Enter fullscreen mode Exit fullscreen mode
  • Non-hierarchical data - basic CTEs are better
  • When application logic is simpler - sometimes code is clearer

The rule: Use recursive CTEs when the alternative is multiple queries, complex loops, or unknown depth. Don't use them just to be clever.

Coming Next

You now understand both types of CTEs:

  • Non-recursive (Part 1): Clean, maintainable complex queries
  • Recursive (Part 2): Elegant hierarchy traversal

But there's one problem: Doctrine, Hibernate, Eloquent, and most ORMs don't support CTEs natively.

In Part 3, we'll cover:

  • ORMs that DO support CTEs (SQLAlchemy, jOOQ, Django 4.2+)
  • Strategies for hostile ORMs (Repository pattern, database views, hybrid approach)
  • Security (parameterized queries - non-negotiable)
  • When to use CTEs vs alternatives (decision tree)
  • Performance tuning (forcing materialization when needed)

"Using CTEs When Your ORM Says No (The Lazy Developer's Survival Guide)" - Coming soon.

Bottom Line

Recursive CTEs transform complex hierarchical problems into elegant, performant solutions.

The psychopathic approach: Write recursive functions with N+1 queries, hope for the best, watch it crash in production, blame the database.

The masochistic touch: Know there's a better way but stick with painful code because "it works" and you're scared of SQL.

The professional approach: Write one recursive CTE, let the database do what it's optimized for, go home on time.

Your ORM probably doesn't support them. That's fine - we'll work around it in Part 3. But don't let tool limitations stop you from solving problems elegantly.

One query to rule them all, one query to find them. One query to bring them all, and in the resultset bind them.


Got a hierarchy horror story? Share it in the comments. We've all fought the N+1 dragon.


📬 Want essays like this in your inbox?

I just launched a newsletter about thinking clearly in tech — no spam, no fluff.

Subscribe here: https://buttondown.com/efficientlaziness

Efficient Laziness — Think once, well, and move forward.

Top comments (0)