Vivek Kumar

Posted on Apr 27

SQL GROUP BY & HAVING: The Beginner's Guide to Summarizing Data Like a Pro

#beginners #database #sql #tutorial

If you've ever needed to answer questions like "How many orders did each customer place?" or "Which product categories generated more than $10,000 in revenue?", you've needed GROUP BY and HAVING. These two clauses are the backbone of summarizing and analyzing relational data in SQL.

In this guide, we'll go from zero to confident with both clauses. You'll understand what they do, how they differ from each other (and from WHERE), and walk away with real working examples you can adapt immediately.

What Is GROUP BY?

The GROUP BY clause collapses multiple rows that share the same value in one or more columns into a single summary row. You almost always pair it with an aggregate function — COUNT(), SUM(), AVG(), MIN(), or MAX() — to compute something meaningful about each group.

Syntax:

SELECT column_name, AGGREGATE_FUNCTION(other_column)
FROM table_name
GROUP BY column_name;

A Simple Example

Imagine an orders table:

order_id	customer_id	amount	status
1	101	50.00	completed
2	102	120.00	completed
3	101	30.00	completed
4	103	200.00	pending
5	102	75.00	completed

How many orders has each customer placed?

SELECT customer_id, COUNT(*) AS order_count
FROM orders
GROUP BY customer_id;

Result:

customer_id	order_count
101	2
102	2
103	1

SQL gathered all rows for customer 101 and counted them — giving you one result row per unique customer_id.

GROUP BY With Multiple Aggregate Functions

You're not limited to one aggregate per query. Here's how to get the total and average order amount per customer in one shot:

SELECT
  customer_id,
  COUNT(*)        AS order_count,
  SUM(amount)     AS total_spent,
  AVG(amount)     AS avg_order_value
FROM orders
GROUP BY customer_id;

Result:

customer_id	order_count	total_spent	avg_order_value
101	2	80.00	40.00
102	2	195.00	97.50
103	1	200.00	200.00

GROUP BY Multiple Columns

You can group by more than one column to create finer-grained buckets. Let's say you want to see order counts broken down by both customer_id and status:

SELECT
  customer_id,
  status,
  COUNT(*) AS order_count
FROM orders
GROUP BY customer_id, status;

Result:

customer_id	status	order_count
101	completed	2
102	completed	2
103	pending	1

Each unique combination of customer_id + status becomes its own group.

Introducing HAVING: Filtering After Grouping

GROUP BY creates groups. HAVING lets you filter those groups based on the result of an aggregate function.

Think of HAVING as a WHERE clause that runs after grouping — it can reference aggregate results that don't exist until after grouping happens.

Syntax:

SELECT column_name, AGGREGATE_FUNCTION(other_column)
FROM table_name
GROUP BY column_name
HAVING AGGREGATE_FUNCTION(other_column) condition;

Example: Find High-Value Customers

Which customers have spent more than $100 in total?

SELECT
  customer_id,
  SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id
HAVING SUM(amount) > 100;

Result:

customer_id	total_spent
102	195.00
103	200.00

Customer 101 (who spent $80 total) is filtered out because their aggregate didn't meet the condition.

WHERE vs. HAVING: The Critical Difference

This trips up many beginners. Here's a clear mental model:

WHERE filters individual rows before grouping
HAVING filters groups after grouping

They can work together in the same query. Here's an example using both:

Find customers with more than one completed order:

SELECT
  customer_id,
  COUNT(*) AS completed_order_count
FROM orders
WHERE status = 'completed'        -- Filter rows BEFORE grouping
GROUP BY customer_id
HAVING COUNT(*) > 1;              -- Filter groups AFTER grouping

Result:

customer_id	completed_order_count
101	2
102	2

The WHERE status = 'completed' removes the pending order for customer 103 before any grouping occurs. Then HAVING COUNT(*) > 1 removes customer groups with only one order.

The Execution Order of a SQL Query

Understanding why WHERE and HAVING behave differently becomes much clearer when you know the logical order SQL processes a query:

FROM — identify the table(s)
WHERE — filter individual rows
GROUP BY — collapse rows into groups
HAVING — filter groups
SELECT — compute output columns
ORDER BY — sort
LIMIT/OFFSET — paginate

WHERE runs at step 2, before any grouping. HAVING runs at step 4, after groups exist. This is why you cannot use an aggregate function inside a WHERE clause — the aggregates simply don't exist yet at that stage.

Common Mistakes to Avoid

1. Selecting a non-grouped, non-aggregated column

-- ❌ This will fail (or return unpredictable results in MySQL)
SELECT customer_id, amount, COUNT(*)
FROM orders
GROUP BY customer_id;

amount is neither in GROUP BY nor wrapped in an aggregate function. The database doesn't know which amount to show for the group. Fix it: either add amount to GROUP BY, or aggregate it (e.g., SUM(amount)).

2. Using WHERE to filter aggregates

-- ❌ This will throw an error
SELECT customer_id, SUM(amount)
FROM orders
WHERE SUM(amount) > 100
GROUP BY customer_id;

SUM(amount) doesn't exist at the WHERE stage. Use HAVING instead:

-- ✅ Correct
SELECT customer_id, SUM(amount) AS total
FROM orders
GROUP BY customer_id
HAVING SUM(amount) > 100;

3. Forgetting that GROUP BY treats NULLs as a single group

If your grouped column has NULL values in multiple rows, all those NULL rows will be collapsed into one group. This is usually fine — just be aware that a NULL group will appear in your results unless you filter it out with WHERE column IS NOT NULL before grouping.

4. Confusing COUNT(*) and COUNT(column)

COUNT(*) counts all rows in the group, including those with NULL values
COUNT(column) counts only rows where that column is not NULL

SELECT
  customer_id,
  COUNT(*)           AS total_rows,
  COUNT(amount)      AS non_null_amounts
FROM orders
GROUP BY customer_id;

If amount is never NULL in your table, these will be equal. But in tables with optional fields, the difference matters.

A Real-World Example: Sales Report by Category

Let's put it all together with a slightly richer scenario. Here's a products table and a sales table:

-- Find product categories with more than 5 sales,
-- showing total revenue and average sale price,
-- sorted by total revenue descending
SELECT
  p.category,
  COUNT(s.sale_id)    AS total_sales,
  SUM(s.sale_price)   AS total_revenue,
  AVG(s.sale_price)   AS avg_price
FROM sales s
JOIN products p ON s.product_id = p.product_id
WHERE s.sale_date >= '2024-01-01'
GROUP BY p.category
HAVING COUNT(s.sale_id) > 5
ORDER BY total_revenue DESC;

This query:

Joins sales with products
Filters to sales from 2024 onward (WHERE)
Groups by product category
Keeps only categories with more than 5 sales (HAVING)
Sorts by revenue, highest first

Key Takeaways

GROUP BY collapses rows that share a value into summary groups; always pair it with an aggregate function
HAVING filters groups based on aggregate results — it's the post-grouping version of WHERE
WHERE and HAVING can coexist in the same query and serve complementary roles
Every column in your SELECT list must be either in GROUP BY or inside an aggregate function
SQL processes WHERE before grouping and HAVING after grouping — this is why you can't use aggregate functions in a WHERE clause

What's Next?

GROUP BY and HAVING open the door to powerful data analysis. Once you're comfortable here, great next steps include:

Window Functions — aggregate over groups without collapsing rows
CTEs — break complex GROUP BY queries into readable, step-by-step logic
Indexes on grouped columns — for performance when your tables get large

Have you run into a tricky GROUP BY or HAVING situation at work? Drop it in the comments — I'd love to see real-world examples from the community!

DEV Community

SQL GROUP BY & HAVING: The Beginner's Guide to Summarizing Data Like a Pro

What Is GROUP BY?

A Simple Example

GROUP BY With Multiple Aggregate Functions

GROUP BY Multiple Columns

Introducing HAVING: Filtering After Grouping

Example: Find High-Value Customers

WHERE vs. HAVING: The Critical Difference

The Execution Order of a SQL Query

Common Mistakes to Avoid

1. Selecting a non-grouped, non-aggregated column

2. Using WHERE to filter aggregates

3. Forgetting that GROUP BY treats NULLs as a single group

4. Confusing COUNT(*) and COUNT(column)

A Real-World Example: Sales Report by Category

Key Takeaways

What's Next?

Top comments (0)