INTRODUCTION
SQL is one of the most powerful tools that has stood the test of time when it comes to the world of data. It has proved to be so reliable and flexible enough to work on any relational data. It has many functionalities within it that make data manipulation easy to navigate.
Among the functionalities are the join and window functions.
JOIN FUNCTION
This comes in handy when dealing with data spread across multiple tables. The join function is the one used when one wants to analyse data from different tables and perform data analysis or data engineering. It uses related columns from the different normalised tables and combines the rows. It can be used in two or more tables. There are different types of joins, including:
1. Inner Join
This is one of the most common forms of join that is frequently used. It helps to join two tables based on the rows that have matching values in both tables. This means that if there are values that do not match in the two tables, they will be dropped from the resultant joined tables.
The syntax of an inner join looks like
select columns
from table1
inner join table2
on table1.column = table2.column
Assuming you have products and sales tables
select p.product_name,s.price
from products p
join sales s
on p.product_id = s.product_id
From the above example, you could see that the two tables are relatable by the product_id column in both tables, that make it easy to join the two tables.
2. Left Join
Also called Left Outer Join, this one returns every row on the left table and only the matching columns on the right table. This means there will be null values returned from the right tables against the unmatched and unreturned values on the left table.
The syntax of a left join looks like
select columns
from table1
left join table2
on table1.column = table2.column
Assuming you have products and sales tables
select p.product_name,s.price
from products p
left join sales s
on p.product_id = s.product_id
From the above examples, all the rows from the products table will be returned, and on matching rows(product_id) from the sales department will be returned. The excess rows from the products table will have null values in the sales department.
2. Right Join
Also called Right Outer Join, this one returns every row on the right table and only the matching columns on the left table. This means there will be null values returned from the left tables against the unmatched and unreturned values on the right table.
The syntax of a right join looks like
select columns
from table1
right join table2
on table1.column = table2.column
Assuming you have products and sales tables
select p.product_name,s.price
from products p
right join sales s
on p.product_id = s.product_id
From the above examples, all the rows from the products table will be returned, and on matching rows(product_id) from the sales department will be returned. The excess rows from the products table will have null values in the sales department.
3. Full Outer Join
This one combines both the Left outer join and the Right outer join. This one shows all the rows from both tables, including unmatched rows from both tables.
select columns
from table1
full outer join table2
on table1.column = table2.column
Assuming you have products and sales tables
select p.product_name,s.price
from products p
full outer join sales s
on p.product_id = s.product_id
5. Cross Join
This one returns every row from the first table combined with every row from the second, i.e., a cartesian product. Assuming you have 3 customers and 3 orders, the resulting table will have 9 rows.
The syntax of a cross join looks like
select columns
from table1
cross join table2;
Assuming you have products and sales tables
select p.product_name,s.price
from products p
cross join sales s;
6. Self Join
This is where the table is joined to itself; this is mostly used when one wants to compare rows within the same table. For this to be effective, table aliases are used to treat the same table as two separate tables in a query.
The syntax of a self-join looks like
select columns
from table1 as A
self join table1 as B
on A.column = B.column
A and B are aliases representing the same table1
select E.name, M.name
from Employees as E
self join Employees as M
on E.manager_id = M.employee_id;
7 Natural Join
This is a type of join that combines two tables with the same name and compatible data types. This one does not need to join using a condition ON because it automatically detects common columns and uses them to match rows.
The syntax of a self-join looks like
select columns
from table1
natural join table2
Assuming you have products and sales tables
select *
from products p
natural join sales s;
The downside of natural joins is that if the tables have common multiple columns with the same name, SQL will join using all the columns, producing unexpected results.
Window Functions
These are functions that perform calculations across a set of rows related to the current rows without grouping them into a single result. They return results for each row, unlike the aggregate functions, such as sum() or avg().
The window functions use over() clause to define the windows or a set of rows.
select emp_name,
department,
salary,
avg(salary)over(partition by department) as avg_department_salary
from employees;
This one calculates the average salary for each department while returning the name of each employee.
Among the common window functions are
1. Row_Number()
This one assigns a unique number to each row starting from 1, following the specified order by order by clause.
In the case of partition by, the starting number will reset on each new partition.
when using row_number() without resetting the row number
select
order_id,
product_name,
price,
row_number() over(order by price desc) as row_num
from orders;
when using row_number() while resetting the row number
select
order_id,
product_name,
price,
row_number() over(partition by order_id order by price desc) as row_num
from orders;
2. Rank()
Assigns ranks to rows but allows gaps when there are ties, for example, whenever ranking say order_id based on prices, if the first two order_ids have the same price will rank 1, the next order will be ranked 3rd.
select
order_id,
product_name,
price,
rank() over(order by price desc) as order_rank
from orders;
3. Dense_Rank()
As opposed to Rank(), this one does not allow gaps between ranks even if there are ties.
select
order_id,
product_name,
price,
dense_rank() over(order by price desc) as order_rank
from orders;
4. Lead()
This one is used to access the next rows in a particular table, useful for comparing or predicting the next period
select
date,
sale_amount,
lead(date)over(order by date) as next_day
from sale;
4. Lag()
This one is used to access the previous row values in a particular table, useful for comparing with the previous period
select
date,
sale_amount,
lag(date)over(order by date) as previous_day
from sale;
5. Ntile()
This one is mostly used when dividing a result into a specific number of equal groups. Each row will be assigned a specific group number based on the order specified in the query. It is commonly used when categorizing groups such as quartile, percentiles or rankings.
select
name,
salary,
ntile(4)over(order by salary desc) as salary_group
from employee;
Conclusion
SQL joins and window functions are powerful tools that enable analysts to work efficiently with complex datasets. Joins allow us to bring together data stored across multiple tables, while window functions provide deeper analytical capabilities without losing row-level detail. Mastering both techniques significantly improves the ability to analyse, transform, and derive insights from data.



Top comments (0)