A guide to SQL joins and windows functions

#sql #beginners #programming

Joins in SQL

Join is a verb that technically means to put 2 or more things together. In SQL, it pretty much means the same thing, as it involves bringing together two or more tables with a similar column reference.

Below we have sample data from 3 tables that we will use throughout to demonstrate the different types of joins and examples.
Our table shows sample data from a company showing employees, their designated departments, and the projects they are working on.

If we wanted a table that shows employees and the projects that they are currently working on, we would write our sql query as follows

Select * from employee_table e join project_table p on e.employee_id = p.employee_id;

Types of joins

1. Left Join
This is the most commonly used join. It pulls all data from the first table and pulls data that matches it from the second table.

Example
Let's say from our tables above, we want to show all the departmental records and the employees belonging to each department.
We will structure our SQL query as follows

Select * from department_table d join employee_table e on d.department_id =e.department_id;

Results have all data from the department table, and only employees matching from the employee table will be displayed

2. Right Join
The join pulls all data from the second table and displays data that matches it from the first table. It is the opposite of the left join.

3. Inner Join
This shows data that is unique to both tables.

Example
We want to see employees who have been assigned projects from the project table. Our query will be as shown below

Select * from department_table d innerjoin employee_table e on d.department_id =e.department_id;

The results will only have employees assigned to projects.

4. Full Outer Join
A full outer join joins multiple tables together and displays all data from all tables.

## Window Functions
These are functions that are run on multiple rows of data. The results are given for every row of data that we have when we run the functions.
Syntax for the window function is;

Select * function() over (partition by (field_name) order by (field_name)) from table;

The breakdown is as follows;

function()

These can either be aggregate functions or special functions.

over

This is the clause that determines the rows that a function will operate on and ensures that the original rows are maintained in the result

Partition by()

This is a clause that groups data based on category. It is not a compulsory condition, as it is only used when it is necessary to group data.

Order by()

This specifies how rows should be arranged

There are different functions associated with the window's function as follows
1. Aggregate functions
This includes count, sum, average, minimum, and maximum with a sample query below.

Select * sum/average/max/min/count() over (partition by (field_name) order by (field_name)) from table;

For sum, you get the running totals for every row, and for average, minimum, and maximum, the value is displayed in every row.

For special functions we have the following
2. Row_number() functions

Select * row_number() over (order by (field_name)) from table;

This involves assigning a unique number to each row depending on the order by column. It starts from 1

note
When the partition by function is used, the row number resets after every category.
e.g., When partitioning by department id from our sample data above, the row number resets after each department.
When only order by is used, the row numbers run from the first row to the last without reassigning.

2. Rank() functions
This assigns rank to each row and skips rows with ties, i.e., they share rank based on the criterion used for ranking.
For example, you want to rank students based on subject performance. The query used would look as follows.

Select student_name, sum(subject), rank() over (order by sum(subject)) from table;

In cases where values have a tie, we use the dense_rank function to avoid gaps in our rank column.

3. Lead and Lag functions
The lead function allows one to view values of the next rows, while the lag function allows one to view previous rows.
The syntax is as follows;

Select * lead/lag(field_name) over (order by (field_name)) from table;

4. ntile functions
This window function divides the data into groups based on a specified criterion. The syntax is as follows

Select * ntile(no of quartiles you want) over (order by (field_name)) from table;

Real-life examples in which ntiles are used are