DEV Community

Hardik Kanajariya
Hardik Kanajariya

Posted on

Day 6: Aggregation Functions and GROUP BY

Day 6: Aggregation Functions and GROUP BY

Welcome Back! πŸš€

Today we'll explore one of PostgreSQL's most powerful features: aggregation functions and grouping data. These tools are essential for data analysis and reporting.


What You'll Learn Today

  1. Understanding Aggregation Functions
  2. Using GROUP BY clause
  3. HAVING clause for filtering groups
  4. Common aggregation patterns
  5. Multi-level grouping

1. Understanding Aggregation Functions

Aggregation functions perform calculations on multiple rows and return a single result.

Common Aggregation Functions

-- COUNT: Count rows
SELECT COUNT(*) FROM employees;
SELECT COUNT(email) FROM employees; -- Ignores NULL values
SELECT COUNT(DISTINCT department) FROM employees;

-- SUM: Calculate total
SELECT SUM(salary) FROM employees;
SELECT SUM(salary) AS total_payroll FROM employees;

-- AVG: Calculate average
SELECT AVG(salary) FROM employees;
SELECT ROUND(AVG(salary), 2) AS average_salary FROM employees;

-- MIN and MAX: Find minimum and maximum
SELECT MIN(salary) AS lowest_salary FROM employees;
SELECT MAX(salary) AS highest_salary FROM employees;

-- Multiple aggregations in one query
SELECT 
    COUNT(*) AS total_employees,
    AVG(salary) AS avg_salary,
    MIN(salary) AS min_salary,
    MAX(salary) AS max_salary,
    SUM(salary) AS total_payroll
FROM employees;
Enter fullscreen mode Exit fullscreen mode

2. GROUP BY Clause

The GROUP BY clause divides rows into groups and applies aggregation functions to each group.

Basic GROUP BY

-- Count employees by department
SELECT 
    department,
    COUNT(*) AS employee_count
FROM employees
GROUP BY department;

-- Average salary by department
SELECT 
    department,
    ROUND(AVG(salary), 2) AS avg_salary
FROM employees
GROUP BY department
ORDER BY avg_salary DESC;

-- Total sales by product
SELECT 
    product_id,
    COUNT(*) AS order_count,
    SUM(quantity) AS total_quantity,
    SUM(total_amount) AS total_revenue
FROM orders
GROUP BY product_id;
Enter fullscreen mode Exit fullscreen mode

Multiple Column Grouping

-- Group by multiple columns
SELECT 
    department,
    job_title,
    COUNT(*) AS employee_count,
    AVG(salary) AS avg_salary
FROM employees
GROUP BY department, job_title
ORDER BY department, job_title;

-- Sales by year and month
SELECT 
    EXTRACT(YEAR FROM order_date) AS year,
    EXTRACT(MONTH FROM order_date) AS month,
    COUNT(*) AS order_count,
    SUM(total_amount) AS revenue
FROM orders
GROUP BY EXTRACT(YEAR FROM order_date), EXTRACT(MONTH FROM order_date)
ORDER BY year, month;
Enter fullscreen mode Exit fullscreen mode

3. HAVING Clause

The HAVING clause filters groups after aggregation (while WHERE filters rows before aggregation).

-- Departments with more than 5 employees
SELECT 
    department,
    COUNT(*) AS employee_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 5
ORDER BY employee_count DESC;

-- Departments with average salary above $60,000
SELECT 
    department,
    AVG(salary) AS avg_salary,
    COUNT(*) AS employee_count
FROM employees
GROUP BY department
HAVING AVG(salary) > 60000
ORDER BY avg_salary DESC;

-- Combining WHERE and HAVING
SELECT 
    department,
    COUNT(*) AS employee_count,
    AVG(salary) AS avg_salary
FROM employees
WHERE hire_date >= '2020-01-01'  -- Filter rows first
GROUP BY department
HAVING COUNT(*) >= 3              -- Then filter groups
ORDER BY avg_salary DESC;
Enter fullscreen mode Exit fullscreen mode

4. Common Aggregation Patterns

Pattern 1: Finding Top N Groups

-- Top 5 departments by total payroll
SELECT 
    department,
    SUM(salary) AS total_payroll,
    COUNT(*) AS employee_count
FROM employees
GROUP BY department
ORDER BY total_payroll DESC
LIMIT 5;
Enter fullscreen mode Exit fullscreen mode

Pattern 2: Percentage Calculations

-- Department salary as percentage of total
SELECT 
    department,
    SUM(salary) AS dept_total,
    ROUND(
        (SUM(salary) * 100.0 / (SELECT SUM(salary) FROM employees)), 
        2
    ) AS percentage_of_total
FROM employees
GROUP BY department
ORDER BY dept_total DESC;
Enter fullscreen mode Exit fullscreen mode

Pattern 3: Grouping with Conditional Aggregation

-- Count employees by salary range
SELECT 
    department,
    COUNT(*) AS total_employees,
    COUNT(CASE WHEN salary < 50000 THEN 1 END) AS low_salary_count,
    COUNT(CASE WHEN salary BETWEEN 50000 AND 80000 THEN 1 END) AS mid_salary_count,
    COUNT(CASE WHEN salary > 80000 THEN 1 END) AS high_salary_count
FROM employees
GROUP BY department;
Enter fullscreen mode Exit fullscreen mode

5. Advanced Grouping Techniques

ROLLUP - Hierarchical Grouping

-- Subtotals by department and grand total
SELECT 
    department,
    job_title,
    COUNT(*) AS employee_count,
    SUM(salary) AS total_salary
FROM employees
GROUP BY ROLLUP(department, job_title)
ORDER BY department NULLS LAST, job_title NULLS LAST;
Enter fullscreen mode Exit fullscreen mode

GROUPING SETS

-- Multiple grouping combinations in one query
SELECT 
    department,
    job_title,
    COUNT(*) AS employee_count
FROM employees
GROUP BY GROUPING SETS (
    (department),
    (job_title),
    (department, job_title),
    ()
);
Enter fullscreen mode Exit fullscreen mode

Real-World Example: Sales Analysis

-- Create sample sales table
CREATE TABLE sales (
    sale_id SERIAL PRIMARY KEY,
    product_name VARCHAR(100),
    category VARCHAR(50),
    sale_date DATE,
    quantity INTEGER,
    unit_price DECIMAL(10,2)
);

-- Comprehensive sales analysis
SELECT 
    category,
    COUNT(DISTINCT sale_id) AS total_orders,
    SUM(quantity) AS total_units_sold,
    ROUND(AVG(quantity), 2) AS avg_units_per_order,
    SUM(quantity * unit_price) AS total_revenue,
    ROUND(AVG(unit_price), 2) AS avg_unit_price,
    MIN(sale_date) AS first_sale,
    MAX(sale_date) AS last_sale
FROM sales
GROUP BY category
HAVING SUM(quantity * unit_price) > 1000
ORDER BY total_revenue DESC;
Enter fullscreen mode Exit fullscreen mode

Important Notes

  1. Column Selection Rule: Any column in SELECT that's not in an aggregation function must be in GROUP BY
  2. WHERE vs HAVING:
    • WHERE filters rows before grouping
    • HAVING filters groups after aggregation
  3. NULL Handling: Aggregation functions ignore NULL values (except COUNT(*))
  4. Performance: Add indexes on columns used in GROUP BY for better performance

Common Mistakes to Avoid

-- ❌ WRONG: Column not in GROUP BY or aggregation
SELECT department, job_title, COUNT(*)
FROM employees
GROUP BY department;
-- Error: job_title must be in GROUP BY

-- βœ… CORRECT
SELECT department, COUNT(*)
FROM employees
GROUP BY department;

-- ❌ WRONG: Using WHERE with aggregation
SELECT department, COUNT(*)
FROM employees
WHERE COUNT(*) > 5
GROUP BY department;
-- Error: Use HAVING instead

-- βœ… CORRECT
SELECT department, COUNT(*)
FROM employees
GROUP BY department
HAVING COUNT(*) > 5;
Enter fullscreen mode Exit fullscreen mode

Today's Challenge! 🎯

Create a comprehensive employee analysis query that:

  1. Groups employees by department
  2. Shows count, average salary, min/max salary for each department
  3. Only includes departments with more than 3 employees
  4. Filters to show only departments where average salary > $55,000
  5. Orders results by average salary descending
-- Your solution here:
-- Write a query that meets all the requirements above
Enter fullscreen mode Exit fullscreen mode

Summary

Today you learned:

  • βœ… Essential aggregation functions (COUNT, SUM, AVG, MIN, MAX)
  • βœ… GROUP BY clause for data grouping
  • βœ… HAVING clause for filtering aggregated results
  • βœ… Advanced grouping with ROLLUP and GROUPING SETS
  • βœ… Real-world data analysis patterns

Coming Up Next!

Day 7: Subqueries and Nested Queries - Learn how to write queries within queries for complex data retrieval! πŸ”


Questions? Drop them in the comments! Let's learn together! πŸ’¬

PostgreSQL #Database #SQL #DataAnalysis #15DaysOfPostgreSQL

Top comments (0)