Pascal CESCATO

Posted on Nov 3

Your ORM is Lying to You About SQL (CTEs Explained)

#sql #database #performance #productivity

Readable SQL isn’t a myth. It’s a WITH.

You know that feeling when you open a SQL query you wrote six months ago, see seven levels of nested subqueries, and think "Which psychopath wrote this?" Then you check git blame and... yeah. It was you.

I've been there. We've all been there.

This is Part 1 of a 3-part series on Common Table Expressions (CTEs) - a powerful SQL feature that's been hiding in plain sight for years. Your database supports it. Your competitors probably use it. But your ORM? Nah, it's too busy pretending SQL peaked in 1999.

TL;DR

CTEs = named subqueries that make you look smart
They make complex queries cleaner, faster, and maintainable
Every modern database supports them (PostgreSQL since 2009, MySQL since 8.0, SQL Server since 2005)
Most ORMs don't (Doctrine, Hibernate, Eloquent... looking at you)
Primary benefit: readability and maintenance - performance gains are a bonus
Use EXPLAIN to verify optimization

This series:

Part 1 (this): Basic CTEs - what they are and why you need them
Part 2: Recursive CTEs for hierarchical data
Part 3: How to use CTEs when your ORM says no

Quick glossary:

CTE (Common Table Expression): A named temporary result set, like a subquery with a name
ORM (Object-Relational Mapping): Tool that lets you query databases using objects instead of SQL (Doctrine, Hibernate, Eloquent, etc.)
Materialization: When the database calculates a CTE once and stores the result temporarily

The Problem with Subqueries

Let's start with a simple example. You want to find all users who placed orders after January 1st, 2025:

SELECT * FROM users 
WHERE id IN (
    SELECT DISTINCT user_id 
    FROM orders 
    WHERE order_date > '2025-01-01'
);

Works fine. Ship it.

But what happens when you need that same list of "recent order users" in three different places in your query? Copy-paste the subquery three times? Now you have three places to update when the business logic changes. Three places to introduce bugs. Three places where a typo can ruin your weekend.

And if the subquery gets more complex - joins, aggregations, filters - you're looking at a maintenance nightmare wrapped in a readability disaster.

As I mentioned in my article on database design, queries deserve thought before execution. I PRODUCE code, I don't vomit it.

Enter CTEs: SQL for the Efficiently Lazy

Just like in LLM workflows, structure matters. In AgentKit I argued that efficient laziness is about designing reliability before execution — CTEs bring that same mindset to SQL.

Common Table Expressions (CTEs) let you name your subqueries. That's it. That's the magic.

Here's the same query with a CTE:

WITH recent_orders AS (
    SELECT DISTINCT user_id 
    FROM orders 
    WHERE order_date > '2025-01-01'
)
SELECT users.*
FROM users
JOIN recent_orders ON users.id = recent_orders.user_id;

"But that's longer!" you say. Sure, for this tiny example. But look what happens when your query grows:

WITH recent_orders AS (
    SELECT DISTINCT user_id FROM orders 
    WHERE order_date > '2025-01-01'
),
high_value_users AS (
    SELECT user_id FROM orders
    GROUP BY user_id
    HAVING SUM(total_amount) > 1000
)
SELECT 
    u.*,
    COUNT(o.id) as order_count,
    SUM(o.total_amount) as total_spent
FROM users u
JOIN recent_orders ro ON u.id = ro.user_id
JOIN high_value_users hvu ON u.id = hvu.user_id
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id;

Each CTE has a clear name and purpose. You can reference them multiple times. Change the definition once, it updates everywhere. Debug one piece at a time.

When Should You Use a CTE?

Quick checklist:
✅ You're copying the same subquery 2+ times
✅ Your query has 2+ levels of nested subqueries
✅ You need to name intermediate results for clarity
✅ You're building complex calculations step by step

Why CTEs are better:

✅ Readable - Each part has a meaningful name
✅ Reusable - Reference the same CTE multiple times in one query
✅ Maintainable - Change the logic in one place
✅ Optimizable - The database engine can inline or materialize as needed

Important nuance: The main benefit is readability and maintainability. Performance improvements depend on your database's optimizer - sometimes the CTE is inlined (treated like a regular subquery), sometimes materialized (calculated once and cached). Both can be efficient; it depends on the query.

This is efficient laziness. Write less code, maintain less code, debug less code.

Real-World Case: 30,000 Daily Sales

Let's get practical. Imagine you're analyzing sales for a company processing 30,000 transactions per day. You need quarterly sales totals, then use those totals for multiple analyses: year-over-year comparison, growth percentage, regional breakdowns.

Without CTEs, you'd write the quarterly aggregation multiple times as subqueries. Even if the optimizer handles it well, your code is messy and hard to maintain.

With a CTE, you express your intent clearly:

WITH quarterly_sales AS (
    SELECT
        DATE_TRUNC('quarter', sale_date) AS quarter,
        SUM(sale_amount) AS total_sales
    FROM sales
    WHERE country = 'France' AND brand = 'Ikea'
    GROUP BY quarter
)
SELECT
    quarter,
    total_sales,
    LAG(total_sales) OVER (ORDER BY quarter) AS prev_quarter_sales,
    ROUND(
        100.0 * (total_sales - COALESCE(LAG(total_sales) OVER (ORDER BY quarter), 0))
        / COALESCE(LAG(total_sales) OVER (ORDER BY quarter), 1),
        2
    ) AS growth_pct
FROM quarterly_sales
ORDER BY quarter;

Quick explanation of new concepts:

DATE_TRUNC('quarter', sale_date): Groups dates by quarter (Q1 2024, Q2 2024, etc.)
LAG(): Window function that accesses the previous row's value
COALESCE(): Returns first non-null value (handles missing data gracefully)

What just happened?

CTE calculates quarterly totals once - aggregating potentially millions of rows into a few dozen quarterly summaries
Main query uses that small result set - fast window functions on already-aggregated data
Database optimizer decides - inline or materialize based on cost analysis

Real-world gains:

Code clarity: Immediately obvious what each part does
Maintainability: Change quarterly logic once, everything updates
Performance: Often 2-10x faster when the optimizer materializes (but measure with EXPLAIN!)
User happiness: Reports load faster, dashboards don't timeout

For a financial dashboard or reporting system, this isn't theoretical optimization - it's the difference between "I'll grab coffee while this loads" and "it just works."

Understanding Performance: Trust the Optimizer

"But I heard CTEs are slow!"

Let me stop you right there. This myth needs to die.

Modern databases are smart. When you write a CTE, the query optimizer analyzes the whole query and chooses the best strategy:

Inline it - Treat it like a regular subquery, optimize the whole thing together
Materialize it - Calculate once, store temporarily, reuse the result

Which approach? The optimizer decides based on:

Size of the CTE result
How many times you reference it
Available indexes
Overall query complexity

Your job: Write clear, maintainable queries. The optimizer's job: make them fast.

Verify with EXPLAIN

Don't guess at performance - measure it:

EXPLAIN ANALYZE
WITH sales_summary AS (
    SELECT region, SUM(amount) as total
    FROM sales
    WHERE date >= '2024-01-01'
    GROUP BY region
)
SELECT * FROM sales_summary WHERE total > 100000;

PostgreSQL will show you:

Whether the CTE was inlined or materialized
Scan types (sequential vs index)
Actual vs estimated rows
Execution time

MySQL, SQL Server, and Oracle have similar tools (EXPLAIN, SHOW PLAN, etc.).

💡 Pro tip: In Part 3, we'll cover advanced techniques like forcing materialization when you have evidence the optimizer is wrong. But 99% of the time? Trust the optimizer.

Database Support: The Timeline

CTEs are part of the SQL:1999 standard, but adoption took time. Here's when major databases got on board:

Database	CTE Support Since	Notes
SQL Server	2005	The pioneer - full recursive support from day one
PostgreSQL	2009 (v8.4)	Excellent implementation, developer favorite
Oracle	~2010 (11g R2)	Added WITH alongside existing CONNECT BY
MySQL	2018 (v8.0)	Better late than never - before this, nothing
MariaDB	2018 (v10.2.2)	Followed MySQL's lead the same year
IBM Db2	2004-ish (varies)	Early adopter, solid support

Translation: If you're using any remotely modern database, you have CTEs. Even MySQL finally caught up in 2018.

So CTEs are standard. Databases support them. Developers love them. Everything's great, right?

The Plot Twist: Your ORM Doesn't Care

Here's where it gets frustrating. Check your ORM's documentation for CTE support:

❌ Doctrine (PHP) - No .with() method in QueryBuilder
❌ Hibernate (Java) - Nothing in HQL or Criteria API
❌ Eloquent (Laravel) - Need third-party package (staudenmeir/laravel-cte)
❌ Active Record (Rails) - Limited support via gems, nothing native

Why? Most ORMs were architected when major databases didn't universally support CTEs. They prioritized portability (supporting older database versions) and simplicity (abstracting SQL away). Fair trade-off for 2005. Questionable for 2025.

The exceptions: SQLAlchemy (Python), jOOQ (Java), and Django 4.2+ support CTEs natively. We'll cover them in Part 3.

The result? If you want CTEs with most ORMs, you're writing raw SQL. You lose some ORM benefits (automatic joins, relationship mapping), but gain the power of modern SQL.

This isn't a bug. ORMs prefer to hide complexity rather than embrace it. But when you need advanced features, that abstraction becomes limiting.

What This Means for You

You've just learned that:

CTEs make complex queries cleaner and easier to maintain
Performance gains are common (but verify with EXPLAIN)
Every modern database supports them
Your ORM probably doesn't

So what do you do?

Short term: Start recognizing places where CTEs would help. Look for repeated subqueries, complex aggregations, or queries that make you squint.

Medium term: Learn to write CTEs in raw SQL. We'll cover practical strategies for using them with ORMs in Part 3.

Long term: Push your ORM maintainers to add CTE support. Or switch to one that already has it.

For now, just remember: CTEs exist. They're powerful. Your database already supports them.

Coming Next

In Part 2, we're diving into recursive CTEs - the feature that makes hierarchical data actually manageable.

Ever needed to traverse an org chart? Find all subcategories under a category? Calculate a bill of materials? Build a comment thread with infinite nesting?

You could write multiple database queries. Or complex application logic with loops. Or you could write one elegant recursive CTE and let the database do what databases do best.

"Recursive CTEs: Because Writing Nested Loops for Trees is Psychopathic (and a Little Masochistic Too)" - Coming soon.

Bottom Line

Common Table Expressions aren't rocket science. They're named subqueries that make your SQL readable and maintainable.

They've been part of standard SQL since 1999. Every major database supports them. They're not experimental. They're not dangerous. They're just... better for complex queries.

The primary benefit is clarity. Performance improvements are a bonus that depends on your specific query and database optimizer. But even if performance stayed the same, the readability gain alone makes CTEs worth using.

Your ORM might not support them yet. That's annoying but solvable. Don't let tool limitations stop you from writing better queries.

Remember: Your users don't care if you used an ORM or raw SQL. They care that your application is fast and reliable. Use whatever tool gets you there most effectively.

And sometimes, that tool is a WITH clause.

Next in this series:

→ Part 2: Recursive CTEs for hierarchical data
→ Part 3: Using CTEs when your ORM says no

Found this helpful? Leave a comment with your gnarliest nested subquery horror story. We've all got one.

📬 Want essays like this in your inbox?

I just launched a newsletter about thinking clearly in tech — no spam, no fluff.

Subscribe here: https://buttondown.com/efficientlaziness

Efficient Laziness — Think once, well, and move forward.

DEV Community