Part of the "SQL: Zero to Ninja" series, written for junior web developers.
Your boss asks: "Show me every user who spent more than the average order." You freeze. To filter by the average, you first need to calculate the average. But that is a whole query on its own. Can you put a query inside another query? Yes. That little trick is called a subquery, and it is about to make you dangerous.
The idea in one line
A subquery is a query inside another query (the inner one runs first and hands its answer to the outer one), and a CTE is just a way to give that inner query a clear name so the whole thing reads like plain English.
The metaphor: a recipe with sub-steps
Think of cooking pasta with sauce. You do not make it all in one motion. You have a sub-step: "first, make the sauce." Once the sauce is ready, you use it in the main dish.
A subquery is that sub-step happening inside the recipe, sometimes a bit buried in the instructions. A CTE is when you write "Sauce" at the top of your prep board with its own little recipe, then just say "add the Sauce" later. Same cooking, way easier to read.
Subquery #1: in WHERE with IN
You want every user who has placed at least one order. The order user ids live in the orders table. So first grab all those ids (the sub-step), then keep users whose id is in that list.
SELECT name
FROM users
WHERE id IN (
SELECT user_id -- the sub-step: all user ids that have an order
FROM orders
);
Read it out loud: "Give me users whose id appears in the list of user_ids from orders." The inner query runs first and produces a list. IN checks membership against that list.
Subquery #2: a scalar subquery (a single value)
Now the boss's question: users who spent more than the average order total. A subquery that returns exactly one value (one row, one column) is called a scalar subquery. You can drop it right where a number would go.
SELECT name, total
FROM orders
JOIN users ON users.id = orders.user_id
WHERE total > (
SELECT AVG(total) -- one single number, the average order total
FROM orders
);
The inner SELECT AVG(total) returns one number, say 45.00. SQL swaps it in, so the line becomes WHERE total > 45.00. The sauce is made, now use it.
Subquery #3: in FROM (a derived table)
You can also use a subquery as if it were a temporary table. This is called a derived table. Say you want the count of orders per user, then only the busy users:
SELECT user_id, order_count
FROM (
SELECT user_id, COUNT(*) AS order_count -- a mini-table built on the fly
FROM orders
GROUP BY user_id
) AS user_orders
WHERE order_count > 3;
The inner query builds a little table with two columns. The outer query then treats it like any normal table and filters it. Note the AS user_orders: a derived table needs a name.
A quick word on correlated subqueries
Most subqueries run once. A correlated subquery is different: it points back at the outer query, so it runs again for every single row. See the o2.user_id = o1.user_id link below:
SELECT o1.id, o1.total
FROM orders o1
WHERE o1.total > (
SELECT AVG(o2.total)
FROM orders o2
WHERE o2.user_id = o1.user_id -- depends on the current outer row
);
This finds orders bigger than that user's own average. Handy, but careful: if you have a million rows, that inner query runs a million times. It can get slow.
CTEs: the same thing, but readable
When subqueries pile up, your query turns into a nest of parentheses that nobody wants to read. A CTE (Common Table Expression) uses the WITH keyword to name a sub-step up front, like writing "Sauce" on your prep board.
Here is a nested mess:
-- Hard to read: where does one query end and the next begin?
SELECT name
FROM users
WHERE id IN (
SELECT user_id
FROM orders
WHERE total > (SELECT AVG(total) FROM orders)
);
Now the same logic with CTEs. Read it top to bottom like a story:
WITH avg_order AS (
SELECT AVG(total) AS amount FROM orders
),
big_spenders AS (
SELECT DISTINCT user_id
FROM orders, avg_order
WHERE total > avg_order.amount
)
SELECT name
FROM users
WHERE id IN (SELECT user_id FROM big_spenders);
First we name the average avg_order. Then we name the users who beat it big_spenders. Then the final query just reads "give me the names of the big spenders." Each step has a clear label.
When to use which
-
Subquery: quick, one-off, used in one spot. A simple
INor scalar check. - CTE: when the query gets hard to read, or when you want to reuse the same sub-result more than once. A CTE can be referenced again and again. A buried subquery has to be copy-pasted.
Both give the same answers. The difference is how kind you are being to the next person who reads your code (often future you).
Gotchas
-
Scalar subquery that returns more than one row. If you write
WHERE total > (SELECT total FROM orders)and that inner query returns many rows, the database errors out. A>wants one value. UseAVG,MAX, or add aLIMIT 1. - Correlated subqueries can be slow. They run per row. If a query crawls, check whether the inner part secretly depends on the outer row, and consider a JOIN or CTE instead.
-
Forgetting to name a derived table. A subquery in
FROMmust have an alias (AS user_orders), or SQL complains.
Recap
- A subquery is a query inside a query. The inner one runs first.
- IN subquery: filter by a list of values from another table.
- Scalar subquery: returns one value, drop it where a number goes.
-
Derived table: a subquery in
FROM, used like a temporary table (needs a name). - Correlated subquery: runs per row, can be slow.
- A CTE (
WITH name AS (...)) names a sub-step so the query reads top to bottom.
Your turn
Write a query that finds every product that has never been ordered. Hint: the ordered product ids live in order_items, and you want products whose id is not in that list. Try it first as a subquery, then rewrite it as a CTE. If you can explain why the CTE reads easier, you have got it.
That wraps the core querying skills. Next up: Part 11: Indexes, where we make all these queries run fast.
Top comments (1)
Part 10 Practice: Subqueries and CTEs
Let's put queries inside queries, then clean them up. These use our shared schema:
users,orders,products,order_items. Try each before looking.1. IN subquery
Find the names of all users who have placed at least one order.
Solution
Why: the inner query builds a list of every user_id that has an order.
INkeeps the users whose id shows up in that list.2. NOT IN subquery (find what is missing)
Find every product that has never been ordered.
Solution
Why:
order_itemsholds every product that was ordered.NOT INflips it: keep products whose id is not in that list. (Heads up: ifproduct_idcan be NULL,NOT INgets weird. Cleaner schemas avoid that.)3. Scalar subquery (compare to the average)
Find all orders whose total is greater than the average order total.
Solution
Why: the inner query returns one number, the average. SQL swaps it in, so you compare each order's total against that single value.
4. Subquery in FROM (derived table)
Show each user_id along with how many orders they have, but only for users with more than 2 orders.
Solution
Why: you cannot filter on
COUNT(*)in a plainWHERE. So you build the counts as a derived table first, name itcounts, then filter that. (AHAVINGclause also works, this shows the derived-table way.)5. Rewrite as a CTE
Take this nested query and rewrite it using a CTE so it reads cleanly.
Solution
Why: each step gets a clear name (
avg_order, thenbig_spenders), so you read it top to bottom instead of untangling nested parentheses.6. CTE you reuse twice
Using a CTE, find users who placed more orders than the average number of orders per user. (Hint: name the per-user counts once, then use them twice.)
Solution
Why:
order_countsis referenced twice, once for the rows and once to compute the average. With a plain subquery you would have to write that counting logic twice. The CTE lets you name it once and reuse it.Great job. If you can take a tangled nested query and turn it into a clean
WITHblock, your future self (and your teammates) will thank you. On to Part 11: Indexes.