Day 6: Aggregation Functions and GROUP BY
Welcome Back! π
Today we'll explore one of PostgreSQL's most powerful features: aggregation functions and grouping data. These tools are essential for data analysis and reporting.
What You'll Learn Today
- Understanding Aggregation Functions
- Using GROUP BY clause
- HAVING clause for filtering groups
- Common aggregation patterns
- Multi-level grouping
1. Understanding Aggregation Functions
Aggregation functions perform calculations on multiple rows and return a single result.
Common Aggregation Functions
-- COUNT: Count rows
SELECT COUNT(*) FROM employees;
SELECT COUNT(email) FROM employees; -- Ignores NULL values
SELECT COUNT(DISTINCT department) FROM employees;
-- SUM: Calculate total
SELECT SUM(salary) FROM employees;
SELECT SUM(salary) AS total_payroll FROM employees;
-- AVG: Calculate average
SELECT AVG(salary) FROM employees;
SELECT ROUND(AVG(salary), 2) AS average_salary FROM employees;
-- MIN and MAX: Find minimum and maximum
SELECT MIN(salary) AS lowest_salary FROM employees;
SELECT MAX(salary) AS highest_salary FROM employees;
-- Multiple aggregations in one query
SELECT
COUNT(*) AS total_employees,
AVG(salary) AS avg_salary,
MIN(salary) AS min_salary,
MAX(salary) AS max_salary,
SUM(salary) AS total_payroll
FROM employees;
2. GROUP BY Clause
The GROUP BY clause divides rows into groups and applies aggregation functions to each group.
Basic GROUP BY
-- Count employees by department
SELECT
department,
COUNT(*) AS employee_count
FROM employees
GROUP BY department;
-- Average salary by department
SELECT
department,
ROUND(AVG(salary), 2) AS avg_salary
FROM employees
GROUP BY department
ORDER BY avg_salary DESC;
-- Total sales by product
SELECT
product_id,
COUNT(*) AS order_count,
SUM(quantity) AS total_quantity,
SUM(total_amount) AS total_revenue
FROM orders
GROUP BY product_id;
Multiple Column Grouping
-- Group by multiple columns
SELECT
department,
job_title,
COUNT(*) AS employee_count,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department, job_title
ORDER BY department, job_title;
-- Sales by year and month
SELECT
EXTRACT(YEAR FROM order_date) AS year,
EXTRACT(MONTH FROM order_date) AS month,
COUNT(*) AS order_count,
SUM(total_amount) AS revenue
FROM orders
GROUP BY EXTRACT(YEAR FROM order_date), EXTRACT(MONTH FROM order_date)
ORDER BY year, month;
3. HAVING Clause
The HAVING clause filters groups after aggregation (while WHERE filters rows before aggregation).
-- Departments with more than 5 employees
SELECT
department,
COUNT(*) AS employee_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 5
ORDER BY employee_count DESC;
-- Departments with average salary above $60,000
SELECT
department,
AVG(salary) AS avg_salary,
COUNT(*) AS employee_count
FROM employees
GROUP BY department
HAVING AVG(salary) > 60000
ORDER BY avg_salary DESC;
-- Combining WHERE and HAVING
SELECT
department,
COUNT(*) AS employee_count,
AVG(salary) AS avg_salary
FROM employees
WHERE hire_date >= '2020-01-01' -- Filter rows first
GROUP BY department
HAVING COUNT(*) >= 3 -- Then filter groups
ORDER BY avg_salary DESC;
4. Common Aggregation Patterns
Pattern 1: Finding Top N Groups
-- Top 5 departments by total payroll
SELECT
department,
SUM(salary) AS total_payroll,
COUNT(*) AS employee_count
FROM employees
GROUP BY department
ORDER BY total_payroll DESC
LIMIT 5;
Pattern 2: Percentage Calculations
-- Department salary as percentage of total
SELECT
department,
SUM(salary) AS dept_total,
ROUND(
(SUM(salary) * 100.0 / (SELECT SUM(salary) FROM employees)),
2
) AS percentage_of_total
FROM employees
GROUP BY department
ORDER BY dept_total DESC;
Pattern 3: Grouping with Conditional Aggregation
-- Count employees by salary range
SELECT
department,
COUNT(*) AS total_employees,
COUNT(CASE WHEN salary < 50000 THEN 1 END) AS low_salary_count,
COUNT(CASE WHEN salary BETWEEN 50000 AND 80000 THEN 1 END) AS mid_salary_count,
COUNT(CASE WHEN salary > 80000 THEN 1 END) AS high_salary_count
FROM employees
GROUP BY department;
5. Advanced Grouping Techniques
ROLLUP - Hierarchical Grouping
-- Subtotals by department and grand total
SELECT
department,
job_title,
COUNT(*) AS employee_count,
SUM(salary) AS total_salary
FROM employees
GROUP BY ROLLUP(department, job_title)
ORDER BY department NULLS LAST, job_title NULLS LAST;
GROUPING SETS
-- Multiple grouping combinations in one query
SELECT
department,
job_title,
COUNT(*) AS employee_count
FROM employees
GROUP BY GROUPING SETS (
(department),
(job_title),
(department, job_title),
()
);
Real-World Example: Sales Analysis
-- Create sample sales table
CREATE TABLE sales (
sale_id SERIAL PRIMARY KEY,
product_name VARCHAR(100),
category VARCHAR(50),
sale_date DATE,
quantity INTEGER,
unit_price DECIMAL(10,2)
);
-- Comprehensive sales analysis
SELECT
category,
COUNT(DISTINCT sale_id) AS total_orders,
SUM(quantity) AS total_units_sold,
ROUND(AVG(quantity), 2) AS avg_units_per_order,
SUM(quantity * unit_price) AS total_revenue,
ROUND(AVG(unit_price), 2) AS avg_unit_price,
MIN(sale_date) AS first_sale,
MAX(sale_date) AS last_sale
FROM sales
GROUP BY category
HAVING SUM(quantity * unit_price) > 1000
ORDER BY total_revenue DESC;
Important Notes
- Column Selection Rule: Any column in SELECT that's not in an aggregation function must be in GROUP BY
-
WHERE vs HAVING:
- WHERE filters rows before grouping
- HAVING filters groups after aggregation
- NULL Handling: Aggregation functions ignore NULL values (except COUNT(*))
- Performance: Add indexes on columns used in GROUP BY for better performance
Common Mistakes to Avoid
-- β WRONG: Column not in GROUP BY or aggregation
SELECT department, job_title, COUNT(*)
FROM employees
GROUP BY department;
-- Error: job_title must be in GROUP BY
-- β
CORRECT
SELECT department, COUNT(*)
FROM employees
GROUP BY department;
-- β WRONG: Using WHERE with aggregation
SELECT department, COUNT(*)
FROM employees
WHERE COUNT(*) > 5
GROUP BY department;
-- Error: Use HAVING instead
-- β
CORRECT
SELECT department, COUNT(*)
FROM employees
GROUP BY department
HAVING COUNT(*) > 5;
Today's Challenge! π―
Create a comprehensive employee analysis query that:
- Groups employees by department
- Shows count, average salary, min/max salary for each department
- Only includes departments with more than 3 employees
- Filters to show only departments where average salary > $55,000
- Orders results by average salary descending
-- Your solution here:
-- Write a query that meets all the requirements above
Summary
Today you learned:
- β Essential aggregation functions (COUNT, SUM, AVG, MIN, MAX)
- β GROUP BY clause for data grouping
- β HAVING clause for filtering aggregated results
- β Advanced grouping with ROLLUP and GROUPING SETS
- β Real-world data analysis patterns
Coming Up Next!
Day 7: Subqueries and Nested Queries - Learn how to write queries within queries for complex data retrieval! π
Questions? Drop them in the comments! Let's learn together! π¬
Top comments (0)