Beverline Otiende

Posted on Mar 3

EXPLORING SQL JOINS AND WINDOWS

#database #analytics #sql #tutorial

In real world analytics, data rarely exists in isolation, rather it is usually stored in separate but related tables. Extracting meaningful insights from these related tables usually require more advanced SQL concepts like;

Joins: which allow you to combine data from multiple tables.
Window functions: allow you to perform calculations across rows without collapsing your dataset.

Mastering these tools transforms SQL from a querying language into a powerful analytical engine. Together they unlock powerful analysis.

In this article we explore more about JOINS and WINDOWS and how we can use them in our analysis.

SQL JOINS
A SQL join combines rows from two or more tables based on a related column. Think of it like merging two Excel sheets sing a common column.

We use joins when;

You have a Normalized Database
Information is split across multiple tables
You need enriched reporting datasets

Types of Joins
Lets use these two tables to understand more about the types of joins.

The first table is the employees table

employees table
| Employee ID | Name    | Department ID | Manager ID | Salary |
| ----------- | ------- | ------------- | ---------- | ------ |
| 1           | Alice   | 1             | NULL       | 50,000 |
| 2           | Bob     | 2             | 1          | 45,000 |
| 3           | Charlie | 1             | 1          | 47,000 |
| 4           | Diana   | 3             | NULL       | 60,000 |
| 5           | Eve     | NULL          | NULL       | 40,000 |

Below is the second table, departments table

 departments table
+----+---------------+-----------------+
| #  | department_id | department_name |
+----+---------------+-----------------+
| 1  | 1             | Engineering     |
| 2  | 2             | Sales           |
| 3  | 3             | Marketing       |
| 4  | 4             | Finance         |
+----+---------------+-----------------+

1.Inner Join
An Inner Join in SQL returns only the rows that have matching values in all the tables involved in the join, excluding any non-matching data.

From our tables employee table and departments table with the common column department_id, If you want to find the rows with matching values on both the tables you do an Inner Join using the SQL query below;

SELECT employees.name, departments.department_name
FROM employees
INNER JOIN departments 
ON employees.department_id = departments.department_id;

The result of this Inner Join query is;

+----+---------+-----------------+
| #  | name    | department_name |
+----+---------+-----------------+
| 1  | Alice   | Engineering     |
| 2  | Bob     | Sales           |
| 3  | Charlie | Engineering     |
| 4  | Diana   | Marketing       |
+----+---------+-----------------+

2.Left Join(Left Outer Join)
A Left Join returns all records from the left table and the matched records from the right table. If no match exists Null values are returned for the right table's columns. It is essential for retaining all data from the primary table while appending related data.

From our table 'employees' and 'departments' with a shared column 'department_id', we can do a LEFT JOIN to find all employees and their respective departments.

SELECT employees.name, departments.department_name
FROM employees
LEFT JOIN departments 
ON employees.department_id = departments.department_id;

The result of this Inner Join query is;

+----+---------+-----------------+
| #  | name    | department_name |
+----+---------+-----------------+
| 1  | Alice   | Engineering     |
| 2  | Bob     | Sales           |
| 3  | Charlie | Engineering     |
| 4  | Diana   | Marketing       |
| 5  | Eve     | NULL            |
+----+---------+-----------------+

3.Right Join(Right Outer Join)
A Right Join in SQL returns all records from the right table and the matching records from the left table.

If a row in the right table has no corresponding match in the left table, the columns from the left table will contain Null values in the result set.

From our table 'Employees' and 'Departments' with a shared column 'department_id', we can do a RIGHT JOIN to find all departments and their respective employees.

SELECT employees.name, departments.department_name
FROM employees
RIGHT JOIN departments 
ON employees.department_id = departments.department_id;

This Right Join query returns;

+----+---------+-----------------+
| #  | name    | department_name |
+----+---------+-----------------+
| 1  | Alice   | Engineering     |
| 2  | Bob     | Sales           |
| 3  | Charlie | Engineering     |
| 4  | Diana   | Marketing       |
| 5  | NULL    | Finance         |
+----+---------+-----------------+

4.Full Join (Full Outer Join)
A Full Join is a SQL operation that returns all rows from both the left and right tables, combining matching rows and including non-matching rows from either table with Null values for the columns of the table that lacks a match.

It can be thought of as a combination of a Left Join and a Right Join.

From our table 'Employees' and 'Departments' with a shared column 'department_id', we can do a FULL JOIN to find all employees and all the departments.

SELECT employees.name, departments.department_name
FROM employees
FULL JOIN departments 
ON employees.department_id = departments.department_id;

The result of this full join is;

+----+---------+-----------------+
| #  | name    | department_name |
+----+---------+-----------------+
| 1  | Alice   | Engineering     |
| 2  | Bob     | Sales           |
| 3  | Charlie | Engineering     |
| 4  | Diana   | Marketing       |
| 5  | Eve     | NULL            |
| 6  | NULL    | Finance         |
+----+---------+-----------------+

WINDOW FUNCTIONS
SQL window functions perform calculations across a set of related rows (a "window") and return a single value for each original row, without collapsing the rows. They contrast with standard aggregate functions that return a single value for an entire group of rows.

They are commonly used for tasks like Aggregates, rankings and running totals.

Basic Windows syntax
OVER() - defines the window
PARTITION BY - splits data into groups
ORER BY - defines calculation order

Types of Window Functions
The two types of Window functions are;

1. Aggregate Window functions
2. Ranking Window Functions

Let us use this Staff table to explore Window functions;

Staff table
+---------+--------+-----+-----------------+---------+
| staff_id| name   | age | department_name | salary  |
+---------+--------+-----+-----------------+---------+
| 1       | Tonny  | 26  | Engineering     | 80,000  |
| 2       | Phenny | 16  | Marketing       | 65,000  |
| 3       | Steve  | 30  | Engineering     | 75,000  |
| 4       | Juliet | 40  | Sales           | 30,000  |
| 5       | Nelly  | 22  | Marketing       | 27,000  |
| 6       | Joshua | 20  | Finance         | 55,000  |
+---------+--------+-----+-----------------+---------+

Aggregate Window Functions
These calculate aggregates over a window of rows while retaining the individual rows.

They include;

SUM() - Adds together numbers within a window.

Lets find the Sum of the Salaries from the Staff table. We use the query

SELECT name, age, department_name, salary,
SUM(salary) over (partition by Department_name) as Sum_of_Salary
FROM Staff;

From this query, we get the output;

+----+--------+-----+-----------------+---------+---------------+
| #  | name   | age | department_name | salary  | sum_of_salary |
+----+--------+-----+-----------------+---------+---------------+
| 1  | Tonny  | 26  | Engineering     | 80,000  | 155,000       |
| 2  | Steve  | 30  | Engineering     | 75,000  | 155,000       |
| 3  | Joshua | 20  | Finance         | 55,000  | 55,000        |
| 4  | Phenny | 16  | Marketing       | 65,000  | 92,000        |
| 5  | Nelly  | 22  | Marketing       | 27,000  | 92,000        |
| 6  | Juliet | 40  | Sales           | 30,000  | 30,000        |

AVG() - calculates the average within a window

Eg Finding the average from our Staff table, we use the SQL query;

SELECT name, age, department_name, salary,
AVG(salary) over (partition by Department_name) as AVG_Salary
FROM Staff;

From this query we get;

+----+--------+-----+-----------------+---------+-----------+
| #  | name   | age | department_name | salary  | avg_salary|
+----+--------+-----+-----------------+---------+-----------+
| 1  | Tonny  | 26  | Engineering     | 80,000  | 77,500    |
| 2  | Steve  | 30  | Engineering     | 75,000  | 77,500    |
| 3  | Joshua | 20  | Finance         | 55,000  | 55,000    |
| 4  | Phenny | 16  | Marketing       | 65,000  | 46,000    |
| 5  | Nelly  | 22  | Marketing       | 27,000  | 46,000    |
| 6  | Juliet | 40  | Sales           | 30,000  | 30,000    |

MAX() - returns maximum value in the window

For example lets find the Maximum salary in the Staff table and partition by department_name. we use the sql query;

SELECT name, age, department_name, salary,
MAX(salary) over (partition by Department_name) as max_Salary
FROM Staff;

We get the output as;

+----+--------+-----+-----------------+---------+------------+
| #  | name   | age | department_name | salary  | max_salary |
+----+--------+-----+-----------------+---------+------------+
| 1  | Tonny  | 26  | Engineering     | 80,000  | 80,000     |
| 2  | Steve  | 30  | Engineering     | 75,000  | 80,000     |
| 3  | Joshua | 20  | Finance         | 55,000  | 55,000     |
| 4  | Phenny | 16  | Marketing       | 65,000  | 65,000     |
| 5  | Nelly  | 22  | Marketing       | 27,000  | 65,000     |
| 6  | Juliet | 40  | Sales           | 30,000  | 30,000     |

MIN() - returns minimum value in the window.

For example lets find the Minimum salary in the Staff table and partition by department_name. we use the SQL query;

SELECT name, age, department_name, salary,
MIN(salary) over (partition by Department_name) as min_Salary
FROM Staff;

From the query we get;

+----+--------+-----+-----------------+---------+------------+
| #  | name   | age | department_name | salary  | min_salary |
+----+--------+-----+-----------------+---------+------------+
| 1  | Tonny  | 26  | Engineering     | 80,000  | 75,000     |
| 2  | Steve  | 30  | Engineering     | 75,000  | 75,000     |
| 3  | Joshua | 20  | Finance         | 55,000  | 55,000     |
| 4  | Phenny | 16  | Marketing       | 65,000  | 27,000     |
| 5  | Nelly  | 22  | Marketing       | 27,000  | 27,000     |
| 6  | Juliet | 40  | Sales           | 30,000  | 30,000     |

COUNT() - counts the rows within a window.

For example lets find the count of names in the Staff table and partition by department_name. we use the SQL query;

SELECT name, age, department_name, salary,
COUNT(salary) over (partition by Department_name) as count_of_names
FROM Staff;

From the query we get;

+----+--------+-----+-----------------+---------+----------------+
| #  | name   | age | department_name | salary  | count_of_names |
+----+--------+-----+-----------------+---------+----------------+
| 1  | Tonny  | 26  | Engineering     | 80,000  | 2              |
| 2  | Steve  | 30  | Engineering     | 75,000  | 2              |
| 3  | Joshua | 20  | Finance         | 55,000  | 1              |
| 4  | Phenny | 16  | Marketing       | 65,000  | 2              |
| 5  | Nelly  | 22  | Marketing       | 27,000  | 2              |
| 6  | Juliet | 40  | Sales           | 30,000  | 1              |

Ranking Window Functions
These return rankings of rows within a partition based on a specific criteria.

They include;

RANK() - Assigns ranks to rows(it skips ranks for duplicates) within a window.

Lets use RANK() to rank our staff salaries in Descending order and partition by department_name. We use the query;

SELECT name, age, department_name, salary,
RANK() over (partition by Department_name order by Salary DESC) as salary_rank
FROM Staff;

The query returns;

+----+--------+-----+-----------------+---------+------------+
| #  | name   | age | department_name | salary  | salary_rank|
+----+--------+-----+-----------------+---------+------------+
| 1  | Tonny  | 26  | Engineering     | 80,000  | 1          |
| 2  | Steve  | 30  | Engineering     | 75,000  | 2          |
| 3  | Joshua | 20  | Finance         | 55,000  | 1          |
| 4  | Phenny | 16  | Marketing       | 65,000  | 1          |
| 5  | Nelly  | 22  | Marketing       | 27,000  | 2          |
| 6  | Juliet | 40  | Sales           | 30,000  | 1          |

ROW_NUMBER() - assigns unique numbers to rows.

Lets get the row numbers from our staff table based on Salary DESC and partition by department_name;

SELECT name, age, department_name, salary,
ROW_NUMBER() OVER (PARTITION BY Department_name order by Salary DESC) as staff_row_number
FROM Staff;

This query returns;

+----+--------+-----+-----------------+---------+----------------+
| #  | name   | age | department_name | salary  | staff_row_number|
+----+--------+-----+-----------------+---------+----------------+
| 1  | Tonny  | 26  | Engineering     | 80,000  | 1              |
| 2  | Steve  | 30  | Engineering     | 75,000  | 2              |
| 3  | Joshua | 20  | Finance         | 55,000  | 1              |
| 4  | Phenny | 16  | Marketing       | 65,000  | 1              |
| 5  | Nelly  | 22  | Marketing       | 27,000  | 2              |
| 6  | Juliet | 40  | Sales           | 30,000  | 1              |

DENSE_RANK()- assigns ranks to rows without skipping rank numbers for duplicates.
PERCENT_RANK() - shows the relative rank of a row as a percentage between 0 and 1.

When working with Window functions, you have to ;

Partition carefully because without Partition by the whole table is treated as one group.
Order by where necessary

In Conclusion, SQL Joins and Windows are fundamental to effective data analysis.

Joins allow us to combine related tables into meaningful datasets while window functions enable advanced calculations such as rankings and aggregate operations while keeping the rows intact.

Together, these tools transform SQL from a simple querying language into a powerful analytical resource.

DEV Community

EXPLORING SQL JOINS AND WINDOWS

Top comments (0)