DEV Community

Vivek Kumar
Vivek Kumar

Posted on

SQL GROUP BY & HAVING: The Beginner's Guide to Summarizing Data Like a Pro

If you've ever needed to answer questions like "How many orders did each customer place?" or "Which product categories generated more than $10,000 in revenue?", you've needed GROUP BY and HAVING. These two clauses are the backbone of summarizing and analyzing relational data in SQL.

In this guide, we'll go from zero to confident with both clauses. You'll understand what they do, how they differ from each other (and from WHERE), and walk away with real working examples you can adapt immediately.


What Is GROUP BY?

The GROUP BY clause collapses multiple rows that share the same value in one or more columns into a single summary row. You almost always pair it with an aggregate functionCOUNT(), SUM(), AVG(), MIN(), or MAX() — to compute something meaningful about each group.

Syntax:

SELECT column_name, AGGREGATE_FUNCTION(other_column)
FROM table_name
GROUP BY column_name;
Enter fullscreen mode Exit fullscreen mode

A Simple Example

Imagine an orders table:

order_id customer_id amount status
1 101 50.00 completed
2 102 120.00 completed
3 101 30.00 completed
4 103 200.00 pending
5 102 75.00 completed

How many orders has each customer placed?

SELECT customer_id, COUNT(*) AS order_count
FROM orders
GROUP BY customer_id;
Enter fullscreen mode Exit fullscreen mode

Result:

customer_id order_count
101 2
102 2
103 1

SQL gathered all rows for customer 101 and counted them — giving you one result row per unique customer_id.


GROUP BY With Multiple Aggregate Functions

You're not limited to one aggregate per query. Here's how to get the total and average order amount per customer in one shot:

SELECT
  customer_id,
  COUNT(*)        AS order_count,
  SUM(amount)     AS total_spent,
  AVG(amount)     AS avg_order_value
FROM orders
GROUP BY customer_id;
Enter fullscreen mode Exit fullscreen mode

Result:

customer_id order_count total_spent avg_order_value
101 2 80.00 40.00
102 2 195.00 97.50
103 1 200.00 200.00

GROUP BY Multiple Columns

You can group by more than one column to create finer-grained buckets. Let's say you want to see order counts broken down by both customer_id and status:

SELECT
  customer_id,
  status,
  COUNT(*) AS order_count
FROM orders
GROUP BY customer_id, status;
Enter fullscreen mode Exit fullscreen mode

Result:

customer_id status order_count
101 completed 2
102 completed 2
103 pending 1

Each unique combination of customer_id + status becomes its own group.


Introducing HAVING: Filtering After Grouping

GROUP BY creates groups. HAVING lets you filter those groups based on the result of an aggregate function.

Think of HAVING as a WHERE clause that runs after grouping — it can reference aggregate results that don't exist until after grouping happens.

Syntax:

SELECT column_name, AGGREGATE_FUNCTION(other_column)
FROM table_name
GROUP BY column_name
HAVING AGGREGATE_FUNCTION(other_column) condition;
Enter fullscreen mode Exit fullscreen mode

Example: Find High-Value Customers

Which customers have spent more than $100 in total?

SELECT
  customer_id,
  SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id
HAVING SUM(amount) > 100;
Enter fullscreen mode Exit fullscreen mode

Result:

customer_id total_spent
102 195.00
103 200.00

Customer 101 (who spent $80 total) is filtered out because their aggregate didn't meet the condition.


WHERE vs. HAVING: The Critical Difference

This trips up many beginners. Here's a clear mental model:

  • WHERE filters individual rows before grouping
  • HAVING filters groups after grouping

They can work together in the same query. Here's an example using both:

Find customers with more than one completed order:

SELECT
  customer_id,
  COUNT(*) AS completed_order_count
FROM orders
WHERE status = 'completed'        -- Filter rows BEFORE grouping
GROUP BY customer_id
HAVING COUNT(*) > 1;              -- Filter groups AFTER grouping
Enter fullscreen mode Exit fullscreen mode

Result:

customer_id completed_order_count
101 2
102 2

The WHERE status = 'completed' removes the pending order for customer 103 before any grouping occurs. Then HAVING COUNT(*) > 1 removes customer groups with only one order.


The Execution Order of a SQL Query

Understanding why WHERE and HAVING behave differently becomes much clearer when you know the logical order SQL processes a query:

  1. FROM — identify the table(s)
  2. WHERE — filter individual rows
  3. GROUP BY — collapse rows into groups
  4. HAVING — filter groups
  5. SELECT — compute output columns
  6. ORDER BY — sort
  7. LIMIT/OFFSET — paginate

WHERE runs at step 2, before any grouping. HAVING runs at step 4, after groups exist. This is why you cannot use an aggregate function inside a WHERE clause — the aggregates simply don't exist yet at that stage.


Common Mistakes to Avoid

1. Selecting a non-grouped, non-aggregated column

-- ❌ This will fail (or return unpredictable results in MySQL)
SELECT customer_id, amount, COUNT(*)
FROM orders
GROUP BY customer_id;
Enter fullscreen mode Exit fullscreen mode

amount is neither in GROUP BY nor wrapped in an aggregate function. The database doesn't know which amount to show for the group. Fix it: either add amount to GROUP BY, or aggregate it (e.g., SUM(amount)).

2. Using WHERE to filter aggregates

-- ❌ This will throw an error
SELECT customer_id, SUM(amount)
FROM orders
WHERE SUM(amount) > 100
GROUP BY customer_id;
Enter fullscreen mode Exit fullscreen mode

SUM(amount) doesn't exist at the WHERE stage. Use HAVING instead:

-- ✅ Correct
SELECT customer_id, SUM(amount) AS total
FROM orders
GROUP BY customer_id
HAVING SUM(amount) > 100;
Enter fullscreen mode Exit fullscreen mode

3. Forgetting that GROUP BY treats NULLs as a single group

If your grouped column has NULL values in multiple rows, all those NULL rows will be collapsed into one group. This is usually fine — just be aware that a NULL group will appear in your results unless you filter it out with WHERE column IS NOT NULL before grouping.

4. Confusing COUNT(*) and COUNT(column)

  • COUNT(*) counts all rows in the group, including those with NULL values
  • COUNT(column) counts only rows where that column is not NULL
SELECT
  customer_id,
  COUNT(*)           AS total_rows,
  COUNT(amount)      AS non_null_amounts
FROM orders
GROUP BY customer_id;
Enter fullscreen mode Exit fullscreen mode

If amount is never NULL in your table, these will be equal. But in tables with optional fields, the difference matters.


A Real-World Example: Sales Report by Category

Let's put it all together with a slightly richer scenario. Here's a products table and a sales table:

-- Find product categories with more than 5 sales,
-- showing total revenue and average sale price,
-- sorted by total revenue descending
SELECT
  p.category,
  COUNT(s.sale_id)    AS total_sales,
  SUM(s.sale_price)   AS total_revenue,
  AVG(s.sale_price)   AS avg_price
FROM sales s
JOIN products p ON s.product_id = p.product_id
WHERE s.sale_date >= '2024-01-01'
GROUP BY p.category
HAVING COUNT(s.sale_id) > 5
ORDER BY total_revenue DESC;
Enter fullscreen mode Exit fullscreen mode

This query:

  1. Joins sales with products
  2. Filters to sales from 2024 onward (WHERE)
  3. Groups by product category
  4. Keeps only categories with more than 5 sales (HAVING)
  5. Sorts by revenue, highest first

Key Takeaways

  • GROUP BY collapses rows that share a value into summary groups; always pair it with an aggregate function
  • HAVING filters groups based on aggregate results — it's the post-grouping version of WHERE
  • WHERE and HAVING can coexist in the same query and serve complementary roles
  • Every column in your SELECT list must be either in GROUP BY or inside an aggregate function
  • SQL processes WHERE before grouping and HAVING after grouping — this is why you can't use aggregate functions in a WHERE clause

What's Next?

GROUP BY and HAVING open the door to powerful data analysis. Once you're comfortable here, great next steps include:

  • Window Functions — aggregate over groups without collapsing rows
  • CTEs — break complex GROUP BY queries into readable, step-by-step logic
  • Indexes on grouped columns — for performance when your tables get large

Have you run into a tricky GROUP BY or HAVING situation at work? Drop it in the comments — I'd love to see real-world examples from the community!

Top comments (0)