Menje

Posted on Apr 21

Subqueries and CTEs in SQL

#sql #subqueries #ctes

When working with SQL, you eventually run into situations where a single query just isn’t enough. You need to break a problem into parts, compute an intermediate result, and then use that result elsewhere. That’s where subqueries and Common Table Expressions (CTEs) come in.

They solve similar problems, but they do it in slightly different ways, and choosing between them can affect not just performance, but also how readable and maintainable your code is.

Let's Dive into it;

What is a Subquery?

A subquery is a query written inside another query. It produces a result that the outer query depends on.

A helpful way to think about it is like asking a question within a question. For example, instead of directly filtering employees by salary, you might first calculate the average salary and then compare each employee against that value. That inner calculation is the subquery.

Subqueries are usually embedded in clauses like WHERE, SELECT, or FROM, depending on what role they’re playing.

Types of Subqueries

Subqueries come in a few forms, depending on their behavior and the results they return.

Scalar subqueries return a single value. These are often used in comparisons, like checking whether something is above or below an average or a maximum.
Multi-row subqueries return multiple values and are typically used with operators like IN. For example, you might retrieve a list of departments and then filter employees based on whether they belong to any of them.
Correlated subqueries are a bit different. They rely on values from the outer query, meaning they don’t run just once—they run repeatedly, once for each row being processed. This makes them powerful but also potentially slow if not used carefully.
Nested subqueries are simply subqueries inside other subqueries. While technically valid, they can quickly become hard to read and are often a sign that the logic could be reorganized more clearly.

When Should You Use Subqueries?

Subqueries are useful when the logic is relatively straightforward and doesn’t need to be reused. They’re great for quick calculations or filters, especially when the result is only needed once.
They also work well when you want to keep everything compact and close together, rather than breaking it into multiple steps.
That said, once subqueries become deeply nested or start depending on each other, they can become difficult to follow. Correlated subqueries in particular should be used carefully, as they may lead to performance issues on large datasets.

What are CTEs (Common Table Expressions)?

A Common Table Expression, or CTE, is a named temporary result set defined at the beginning of a query. Instead of embedding logic inside the query, you define it upfront and then reference it by name.
In a way, CTEs let you structure your query more like a sequence of steps. You compute something, give it a name, and then use it later. This makes the overall logic easier to read and reason about.

Types and Use Cases of CTEs

Simple CTEs are used to make queries more readable. Instead of writing everything in one block, you separate parts of the logic into named sections.
Multiple CTEs allow you to build queries step by step. One CTE can depend on another, which helps when dealing with more complex transformations or filtering logic. This layered approach often feels more natural than stacking multiple subqueries.
Recursive CTEs are used for hierarchical data, such as organizational structures or category trees. They repeatedly reference themselves to build results that would otherwise require loops or multiple queries. This is something subqueries are not well-suited for.

Subqueries vs CTEs

Readability

Subqueries tend to become harder to read as they grow, especially when nested. You often have to work from the inside out to understand what’s happening.
CTEs, on the other hand, present the logic in a more structured way. You can read them from top to bottom, which makes complex queries easier to follow and debug.

Performance

There’s no simple rule here. Some subqueries are optimized very well by database engines, especially simple ones. However, correlated subqueries can be expensive because they run multiple times.
CTEs can sometimes be materialized (stored temporarily), which might affect performance depending on the database system. In other cases, they’re optimized just like subqueries.
In practice, performance depends more on how the query is written and how the data is indexed than on whether you use a subquery or a CTE.

Reusability

Subqueries are typically written inline and used once. If you need the same logic in multiple places, you often end up repeating it.
CTEs allow you to define something once and reference it multiple times within the same query. This reduces duplication and makes updates easier.

Handling Complexity
Subqueries work well for simple problems, but they can become unwieldy when layered.
CTEs are better suited for complex queries because they let you break the logic into manageable steps. This is especially useful when multiple transformations or filters are involved.

When Should You Use Each?

Subqueries are a good choice when the task is simple, the result is only needed once, and keeping the query compact makes sense.
CTEs are more appropriate when the logic is complex, when readability matters, or when you need to reuse intermediate results. They’re also the better option for hierarchical data through recursion.

Final Thoughts

Subqueries and CTEs aren’t competing tools so much as different ways of thinking about the same problem. Subqueries are quick and direct, while CTEs are structured and expressive.
In practice, it’s common to start with a subquery and then switch to a CTE as the query grows in complexity. That shift usually reflects a deeper understanding of the problem—and a move toward writing queries that are not just correct, but also clear and easy to maintain.

DEV Community