DEV Community

Kelvin
Kelvin

Posted on

Advanced SQL for Data Analytics: Advanced Techniques Every Data Analyst Should Know.

WINDOW FUNCTIONS IN SQL.

Window functions perform calculations across a set of rows related to the current row without grouping the rows into a single output. They preserve the individual row details while providing additional contextual insights.

Key components.
Window functions are defined using the mandatory OVER() clause which specifies how the rows are partitioned and ordered for the calculation.

Common types of window functions.
1: RANKING - Row_number, rank, dense_rank - It assigns numbers or ranks to rows based on order.
Rank

Row number

2: AGGREGATE - sum, avg, min, max, count - calculates sum, averages or extremes across the window.
Total

Average

Count

Max

3: VALUE/OFFSET - lag, lead - Accesses data from rows before, after or at specific points in the window.

lag
Used to compare rows i.e. comparing current value to previous value

lead
looks forward to the next row or a specified number of rows ahead.

Subqueries

A subquery (or nested query) is a SQL query written inside another query. It allows you to
perform an operation that depends on the result of another query.

  1. The SELECT clause - Subqueries in the SELECT clause add an extra computed column to the result set. Each subquery executes once for every row of the outer query.
SELECT  
name, 
(SELECT COUNT(*)  
FROM projects  
WHERE projects.employee_id = employees.employee_id) AS total_projects 
FROM employees; 
Enter fullscreen mode Exit fullscreen mode
  1. The FROM clause - A subquery in the FROM clause creates a temporary or derived table. The outer query can then select or filter data from it.
SELECT * 
FROM ( 
SELECT  
department_id,  
COUNT(*) AS total_employees 
FROM employees 
GROUP BY department_id 
) AS dept_summary; 
Enter fullscreen mode Exit fullscreen mode
  1. The WHERE or HAVING clause - Subqueries in the WHERE clause allow the outer query to filter results based on another queryโ€™s output.
SELECT  
department_name 
FROM departments 
WHERE department_id IN ( 
SELECT department_id 
FROM employees 
GROUP BY department_id 
HAVING COUNT(*) > 1 
); 
Enter fullscreen mode Exit fullscreen mode

Correlated vs Non-Correlated Subqueries

  1. Non-correlated subquery - A non-correlated subquery does not depend on the outer query. It can be executed on its own.
  2. Correlated subquery - A correlated subquery depends on the outer query. It is evaluated once per row of the outer query.

Common Table Expressions (CTEs)

A CTE (Common Table Expression) is like creating a temporary result set (or virtual table)
that exists only during the execution of a single SQL query.

WITH high_salary AS ( 
SELECT name, salary 
FROM employees 
WHERE salary > 45000 
) 
SELECT * 
FROM high_salary; 
Enter fullscreen mode Exit fullscreen mode

CTE with Joins

You can use CTEs to simplify joins

WITH emp_dept AS ( 
SELECT e.name, d.department_name, e.salary 
FROM employees e 
LEFT JOIN departments d ON e.department_id = d.department_id 
) 
SELECT * FROM emp_dept;
Enter fullscreen mode Exit fullscreen mode

Stored Procedures

A stored procedure is a named block of SQL logic stored inside the database that you can
execute using: CALL procedure_name(parameters);

Procedures are for actions, such as:

  • Insert data (add customer, add book, add order)
  • Update data (update contact, update email)
  • Delete data (delete customer safely)
  • Validation and enforcement of business rules
  • Multi-step operations (create order + check customer + log)

Procedure Syntax

CREATE OR REPLACE PROCEDURE procedure_name(param_name param_type, ...)
LANGUAGE plpgsql
AS $$
BEGIN -- SQL statements
END;
$$;

Explanation of each keyword

CREATE - creates the procedure
OR REPLACE - updates the procedure if it already exists (no need to drop first)
PROCEDURE procedure_name(...) - name + inputs (parameters)
LANGUAGE plpgsql - enables procedural features (IF, LOOP, variables)
AS $$ ... $$ - defines the body of the procedure
BEGIN ... END; - start/end of procedure logic
Call with: CALL procedure_name(...)

CREATE OR REPLACE PROCEDURE add_customer( 
p_first_name VARCHAR, 
p_last_name  VARCHAR, 
p_email      VARCHAR, 
p_contact    VARCHAR, 
p_city       VARCHAR) 
LANGUAGE plpgsql 
AS $$ 
BEGIN 
INSERT INTO customers(first_name, last_name, email, contact, city) 
VALUES (p_first_name, p_last_name, p_email, p_contact, p_city); 
END; 
$$;  
Enter fullscreen mode Exit fullscreen mode

Explanation

  • CREATE OR REPLACE PROCEDURE add_customer(...) - Creates a reusable procedure named add_customer.
  • Parameters like p_first_name VARCHAR - Inputs we pass when calling the procedure.
  • LANGUAGE plpgsql - Enables procedural execution.
  • BEGIN - Starts execution block.
  • INSERT INTO customers(...) - Specifies the table and columns being inserted.
  • VALUES (...) - Inserts the parameter values.
  • END;- Ends procedure.
  • CALL add_customer(...) - Runs the procedure by inserting values into our table.

Delete Customer Safely

CREATE OR REPLACE PROCEDURE delete_customer_safely( 
p_customer_id INT 
) 
LANGUAGE plpgsql 
AS $$ 
BEGIN 
DELETE FROM orders 
WHERE customer_id = p_customer_id; 
DELETE FROM customers 
WHERE customer_id = p_customer_id; 
END; 
$$; 
Enter fullscreen mode Exit fullscreen mode

Top comments (0)