This post is part of the SQL: Zero to Ninja series.
Your boss walks over and asks: "How much money did we make last month? And how many orders did each customer place?" You stare at a table with 5,000 rows. You are not going to count them by hand. SQL can squeeze all those rows down to a few clean numbers, and it can do it per group too. Let's learn how.
The idea in one line
Aggregate functions turn many rows into one number (like a total or a count), and GROUP BY does that math once per group instead of once for the whole table.
The metaphor: sorting laundry into baskets
Picture a giant pile of laundry on the floor. That is your table, lots of messy rows.
First you sort it into baskets by color: one basket for whites, one for darks, one for colors. That sorting step is GROUP BY. Each basket is a group.
Now you can do something to each basket. Count the shirts in it. Weigh it. Find the heaviest item. That "do one number per basket" step is the aggregate function.
Big messy pile (all rows)
|
GROUP BY color <-- sort into baskets
|
whites darks colors <-- one basket per group
|
COUNT each <-- one number per basket
One pile in, a few tidy numbers out.
The five aggregate functions
These take a whole column and give you back one value:
SELECT
COUNT(*) AS how_many, -- how many rows
SUM(total) AS revenue, -- add them all up
AVG(total) AS average, -- the average
MIN(total) AS smallest, -- the smallest one
MAX(total) AS biggest -- the biggest one
FROM orders;
No GROUP BY here, so the whole table is one big basket. You get back a single row with five numbers. That is the "many rows, one answer" trick.
GROUP BY: one number per basket
Now the real power. Say you want the number of orders per user. You do not want one big count, you want one count for each user.
SELECT user_id, COUNT(*) AS order_count
FROM orders
GROUP BY user_id;
Read it in plain English: "Sort the orders into baskets by user_id, then count each basket." You get one row per user:
user_id | order_count
--------+------------
1 | 4
2 | 1
3 | 7
Want revenue per order status instead? Same idea, just group by a different column:
SELECT status, SUM(total) AS revenue
FROM orders
GROUP BY status;
status | revenue
----------+--------
paid | 9800
pending | 1200
cancelled | 300
One basket per status, one sum per basket. Your boss is happy.
COUNT(*) vs COUNT(column): the NULL trap
These two look the same but are not:
-
COUNT(*)counts rows. Every row counts, no matter what is inside. -
COUNT(column)counts rows where that column is not NULL. Blanks get skipped.
Say some orders have no total filled in yet (it is NULL):
SELECT
COUNT(*) AS all_rows, -- 100
COUNT(total) AS rows_with_total -- 95 (5 were NULL, skipped)
FROM orders;
order id | total
---------+------
1 | 50
2 | NULL <-- COUNT(*) counts this, COUNT(total) skips it
3 | 30
So if you want "how many rows exist", use COUNT(*). If you want "how many actually have a value", use COUNT(column). Mixing these up gives wrong numbers that look right.
WHERE vs HAVING: filter rows, then filter baskets
This is the part that trips up everyone, so go slow.
- WHERE runs before grouping. It throws out rows you do not want, before they ever reach a basket.
- HAVING runs after grouping. It throws out whole baskets based on the aggregate.
Imagine you only care about paid orders, and only about users who placed more than 5 of them.
SELECT user_id, COUNT(*) AS paid_orders
FROM orders
WHERE status = 'paid' -- step 1: keep only paid rows (before baskets)
GROUP BY user_id -- step 2: sort into baskets per user
HAVING COUNT(*) > 5; -- step 3: keep only big baskets (after baskets)
In laundry terms: WHERE picks which clothes go in the wash at all. HAVING looks at the finished baskets and says "only show me baskets with more than 5 shirts."
The wrong way juniors try first:
-- WRONG: you cannot use an aggregate in WHERE
SELECT user_id, COUNT(*)
FROM orders
WHERE COUNT(*) > 5 -- error! the count does not exist yet here
GROUP BY user_id;
WHERE runs before any counting happens, so COUNT(*) is meaningless there. Move that test to HAVING and it works.
A real case
"Show me each user who has spent more than 1000 dollars total, biggest spenders first."
SELECT user_id, SUM(total) AS spent
FROM orders
WHERE status = 'paid' -- only count money we actually got
GROUP BY user_id
HAVING SUM(total) > 1000 -- only the big spenders
ORDER BY spent DESC; -- richest first
That one query answers a real business question, and it reads almost like a sentence.
Gotchas
-
Selecting a naked column. If a column is in your
SELECTbut not inGROUP BYand not wrapped in an aggregate, most databases throw an error. Rule of thumb: every column in SELECT must either be in GROUP BY or inside a function like SUM or COUNT.
-- WRONG: name is not grouped and not aggregated
SELECT user_id, name, COUNT(*) FROM orders GROUP BY user_id;
WHERE vs HAVING mixup. Filtering raw rows? Use WHERE. Filtering on a count or sum? Use HAVING. If you put an aggregate in WHERE, you get an error.
Forgetting NULLs in COUNT(column). Remember,
COUNT(column)quietly skips blanks. UseCOUNT(*)when you mean "every row."
Recap
- Aggregate functions (COUNT, SUM, AVG, MIN, MAX) turn many rows into one number.
- GROUP BY runs that math once per group, one tidy row per basket.
- WHERE filters rows before grouping. HAVING filters groups after grouping.
- COUNT(*) counts all rows. COUNT(column) skips NULLs.
- Every SELECT column must be grouped or aggregated, or you get an error.
Your turn
Using the orders table, write one query that shows, for each status, how many orders there are and the average order total, but only for statuses that have more than 3 orders. Which part is WHERE and which part is HAVING? If you can explain why, you have got it.
Next up: Part 06: JOINs, Connecting Tables, where we finally staple two tables together and answer questions that span the whole schema.
Top comments (1)
Part 05 Practice: Counting and Grouping
Time to get your hands dirty. These use the shared schema (
users,orders,products,order_items). Try each one yourself before peeking at the solution. They get harder as you go.1. How many orders are in the whole table?
Count every row in
orders.Solution
Why:
COUNT(*)counts rows, no grouping needed. The whole table is one big basket.2. What is the total, average, smallest, and biggest order total?
Get all four numbers from the
orderstable in one query.Solution
Why: Aggregate functions squeeze a whole column down to one value. Stack several in one SELECT to get a quick summary row.
3. How many orders does each user have?
Show one row per user with their order count.
Solution
Why:
GROUP BY user_idsorts orders into one basket per user, thenCOUNT(*)counts each basket.4. What is the total revenue for each order status?
Group orders by
statusand sum the totals.Solution
Why: One basket per status, one sum per basket. This is the classic "revenue per group" report.
5. Which users have placed more than 5 orders?
Show the user_id and their order count, but only for users above 5 orders.
Solution
Why: You are filtering on a count, which only exists after grouping. So it goes in
HAVING, notWHERE.6. For each status, how many paid-attention statuses have more than 3 orders, with their average total? (combine WHERE and HAVING)
Only look at orders that are not cancelled. Then group by status, and only keep statuses with more than 3 orders. Show the count and average total, biggest average first.
Solution
Why:
WHEREthrows out cancelled rows before they reach a basket.HAVINGthen drops small baskets after counting. Mixing both in one query is super common in real reports.Nice work. If you got even half of these, you are already thinking in groups and baskets. Run them against your own data, tweak the numbers, and watch how the answers change. See you in Part 06.