Mohamed Idris

Posted on May 29

Part 05: Counting and Grouping (Aggregates)

#sql #database #learning

This post is part of the SQL: Zero to Ninja series.

Your boss walks over and asks: "How much money did we make last month? And how many orders did each customer place?" You stare at a table with 5,000 rows. You are not going to count them by hand. SQL can squeeze all those rows down to a few clean numbers, and it can do it per group too. Let's learn how.

The idea in one line

Aggregate functions turn many rows into one number (like a total or a count), and GROUP BY does that math once per group instead of once for the whole table.

The metaphor: sorting laundry into baskets

Picture a giant pile of laundry on the floor. That is your table, lots of messy rows.

First you sort it into baskets by color: one basket for whites, one for darks, one for colors. That sorting step is GROUP BY. Each basket is a group.

Now you can do something to each basket. Count the shirts in it. Weigh it. Find the heaviest item. That "do one number per basket" step is the aggregate function.

Big messy pile (all rows)
        |
   GROUP BY color   <-- sort into baskets
        |
  whites   darks   colors   <-- one basket per group
        |
   COUNT each       <-- one number per basket

One pile in, a few tidy numbers out.

The five aggregate functions

These take a whole column and give you back one value:

SELECT
  COUNT(*) AS how_many,   -- how many rows
  SUM(total) AS revenue,  -- add them all up
  AVG(total) AS average,  -- the average
  MIN(total) AS smallest, -- the smallest one
  MAX(total) AS biggest   -- the biggest one
FROM orders;

No GROUP BY here, so the whole table is one big basket. You get back a single row with five numbers. That is the "many rows, one answer" trick.

GROUP BY: one number per basket

Now the real power. Say you want the number of orders per user. You do not want one big count, you want one count for each user.

SELECT user_id, COUNT(*) AS order_count
FROM orders
GROUP BY user_id;

Read it in plain English: "Sort the orders into baskets by user_id, then count each basket." You get one row per user:

user_id | order_count
--------+------------
   1    |     4
   2    |     1
   3    |     7

Want revenue per order status instead? Same idea, just group by a different column:

SELECT status, SUM(total) AS revenue
FROM orders
GROUP BY status;

status    | revenue
----------+--------
paid      |  9800
pending   |  1200
cancelled |   300

One basket per status, one sum per basket. Your boss is happy.

COUNT(*) vs COUNT(column): the NULL trap

These two look the same but are not:

COUNT(*) counts rows. Every row counts, no matter what is inside.
COUNT(column) counts rows where that column is not NULL. Blanks get skipped.

Say some orders have no total filled in yet (it is NULL):

SELECT
  COUNT(*) AS all_rows,        -- 100
  COUNT(total) AS rows_with_total  -- 95 (5 were NULL, skipped)
FROM orders;

order id | total
---------+------
   1     |  50
   2     |  NULL   <-- COUNT(*) counts this, COUNT(total) skips it
   3     |  30

So if you want "how many rows exist", use COUNT(*). If you want "how many actually have a value", use COUNT(column). Mixing these up gives wrong numbers that look right.

WHERE vs HAVING: filter rows, then filter baskets

This is the part that trips up everyone, so go slow.

WHERE runs before grouping. It throws out rows you do not want, before they ever reach a basket.
HAVING runs after grouping. It throws out whole baskets based on the aggregate.

Imagine you only care about paid orders, and only about users who placed more than 5 of them.

SELECT user_id, COUNT(*) AS paid_orders
FROM orders
WHERE status = 'paid'        -- step 1: keep only paid rows (before baskets)
GROUP BY user_id             -- step 2: sort into baskets per user
HAVING COUNT(*) > 5;         -- step 3: keep only big baskets (after baskets)

In laundry terms: WHERE picks which clothes go in the wash at all. HAVING looks at the finished baskets and says "only show me baskets with more than 5 shirts."

The wrong way juniors try first:

-- WRONG: you cannot use an aggregate in WHERE
SELECT user_id, COUNT(*)
FROM orders
WHERE COUNT(*) > 5      -- error! the count does not exist yet here
GROUP BY user_id;

WHERE runs before any counting happens, so COUNT(*) is meaningless there. Move that test to HAVING and it works.

A real case

"Show me each user who has spent more than 1000 dollars total, biggest spenders first."

SELECT user_id, SUM(total) AS spent
FROM orders
WHERE status = 'paid'      -- only count money we actually got
GROUP BY user_id
HAVING SUM(total) > 1000   -- only the big spenders
ORDER BY spent DESC;       -- richest first

That one query answers a real business question, and it reads almost like a sentence.

Gotchas

Selecting a naked column. If a column is in your SELECT but not in GROUP BY and not wrapped in an aggregate, most databases throw an error. Rule of thumb: every column in SELECT must either be in GROUP BY or inside a function like SUM or COUNT.

-- WRONG: name is not grouped and not aggregated
SELECT user_id, name, COUNT(*) FROM orders GROUP BY user_id;

WHERE vs HAVING mixup. Filtering raw rows? Use WHERE. Filtering on a count or sum? Use HAVING. If you put an aggregate in WHERE, you get an error.
Forgetting NULLs in COUNT(column). Remember, COUNT(column) quietly skips blanks. Use COUNT(*) when you mean "every row."

Recap

Aggregate functions (COUNT, SUM, AVG, MIN, MAX) turn many rows into one number.
GROUP BY runs that math once per group, one tidy row per basket.
WHERE filters rows before grouping. HAVING filters groups after grouping.
COUNT(*) counts all rows. COUNT(column) skips NULLs.
Every SELECT column must be grouped or aggregated, or you get an error.

Your turn

Using the orders table, write one query that shows, for each status, how many orders there are and the average order total, but only for statuses that have more than 3 orders. Which part is WHERE and which part is HAVING? If you can explain why, you have got it.

Next up: Part 06: JOINs, Connecting Tables, where we finally staple two tables together and answer questions that span the whole schema.

Top comments (1)

Mohamed Idris • May 29

Part 05 Practice: Counting and Grouping

Time to get your hands dirty. These use the shared schema (users, orders, products, order_items). Try each one yourself before peeking at the solution. They get harder as you go.

1. How many orders are in the whole table?

Count every row in orders.

Solution

SELECT COUNT(*) AS total_orders
FROM orders;

Why: COUNT(*) counts rows, no grouping needed. The whole table is one big basket.

2. What is the total, average, smallest, and biggest order total?

Get all four numbers from the orders table in one query.

Solution

SELECT
  SUM(total) AS revenue,
  AVG(total) AS average_order,
  MIN(total) AS smallest_order,
  MAX(total) AS biggest_order
FROM orders;

Why: Aggregate functions squeeze a whole column down to one value. Stack several in one SELECT to get a quick summary row.

3. How many orders does each user have?

Show one row per user with their order count.

Solution

SELECT user_id, COUNT(*) AS order_count
FROM orders
GROUP BY user_id;

Why: GROUP BY user_id sorts orders into one basket per user, then COUNT(*) counts each basket.

4. What is the total revenue for each order status?

Group orders by status and sum the totals.

Solution

SELECT status, SUM(total) AS revenue
FROM orders
GROUP BY status;

Why: One basket per status, one sum per basket. This is the classic "revenue per group" report.

5. Which users have placed more than 5 orders?

Show the user_id and their order count, but only for users above 5 orders.

Solution

SELECT user_id, COUNT(*) AS order_count
FROM orders
GROUP BY user_id
HAVING COUNT(*) > 5;

Why: You are filtering on a count, which only exists after grouping. So it goes in HAVING, not WHERE.

6. For each status, how many paid-attention statuses have more than 3 orders, with their average total? (combine WHERE and HAVING)

Only look at orders that are not cancelled. Then group by status, and only keep statuses with more than 3 orders. Show the count and average total, biggest average first.

Solution

SELECT status, COUNT(*) AS order_count, AVG(total) AS avg_total
FROM orders
WHERE status <> 'cancelled'   -- filter rows BEFORE grouping
GROUP BY status
HAVING COUNT(*) > 3           -- filter baskets AFTER grouping
ORDER BY avg_total DESC;

Why: WHERE throws out cancelled rows before they reach a basket. HAVING then drops small baskets after counting. Mixing both in one query is super common in real reports.

Nice work. If you got even half of these, you are already thinking in groups and baskets. Run them against your own data, tweak the numbers, and watch how the answers change. See you in Part 06.