Mastering Joins and Window SQL Functions for Relational Databases.

#sqlserver #postgressql #datascience #dataengineering

This article will explain SQL Joins and Window Functions — two powerful tools that will help you query and analyze relational data.

JOIN FUNCTIONS:

These are SQL functions that combine rows from multiple tables that have a common column or a link called a Key.

The're 2 types of keys that allow for relationships between tables in a normalised database.

Primary Key: These are unique identifiers in a table that, as the name suggests, uniquely identify a record/row in a table.

Foreign Key: A key that allows relationships between tables. The foreign key is a primary key in another table that also exists in the secondary table, i.e., 2 common columns that contain the same values, creating a commonality between the two tables.

Below is the Core syntax to perform a join:

SELECT (columns_name)
FROM table_name
[INNER | LEFT | RIGHT | FULL OUTER | CROSS] JOIN table_name_1
ON table_name_1.key = table_name_2.key;

The types of Joins.

1. INNER join:
This joins 2 tables that have similar rows. It performs an intersection, and the output is the common rows between the two tables. You can also inner join multiple tables by nesting the joins using sequential joins method.

As shown:

Select 
o.orderID,
c.CustomerName,
S.Shipping_Name
from orders o
inner join Customers c on c.customer_id = o.customer_id =
inner Join Shippers s on o.shipperID = s.shipper_id;

Inner Joins are effective in controlling Nulls or Missing values when performing queries.

This is similar to the Intersect SQL set operator.

Intersect set operator syntax example where a filter is applied
SELECT column_name FROM table_name WHERE () INTERSECT SELECT column_name FROM table_name WHERE ()

2.LEFT Join:

This combines all the rows from the left table and matching rows that exist in the second table. This join will also show all missing records.

Practical Example:

Business question: Show every customer and their total order amount. Include customers who never ordered (This counts for retention analysis).

SELECT 
    c.customer_id,
    c.name,
    c.city,
    COALESCE(SUM(o.amount), 0) AS total_spent,
    COUNT(o.order_id) AS order_count
FROM customers c
LEFT JOIN orders o 
    ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name, c.city;

3.RIGHT Join:
The exact opposite of the left outer joins, where the join will output all the rows in the right table and the matching rows in the left table.

4.FULL(OUTER):
It combines all rows from both tables; this is rarely used and expensive because it combines all the data from both tables.

This Join is Similar to the UNION ALL SQL set operator that will combine all data from both select operations and will include null values.

5.CROSS Join:
This type of join performs a Cartesian product, where it combines the rows from both tables to show possible combinations as a single result i.e (AxB = AB). It will not display missing records.

WINDOW FUNCTIONS:

Window functions perform calculations across a set of rows related to the current selected row, but without collapsing the result set into a single result.

What this means:
A standard aggregate function like SUM or AVG collapses all the records to an individual result/row that shows the total or average for the whole selected set as a single result.

But..

Window Functions allow you to see the average right next to each record for every single row. You keep the individual details and the summary data at the same time.

Window functions are identified by the OVER clause, which defines the "window" of data the function should look at.

Syntax for window functions:

Select column_name, column_name_2, **row_number () over (order by/partition** column_name) as new_column_name
from table_name;

To Note:
Order by: Used to sort in ascending or descending order. Partition by: is used to group in a window function
Types of Window Functions:

1. ROW_NUMBER () OVER (order/partition by..):
This window function assigns a unique sequence of numbers to each row.

2. RANK ():
This function allows you to rank your data either in ascending or descending order. It will skip the next rank value in a tie situation. i.e if two rows are tied at rank 2, then the next record will be ranked 4th, not 3rd.

3. DENSE_RANK () OVER (order/partition by..):
This works the same as the rank window function, but the difference is that in the event of a tie, the next Rank assigned won't skip or have gaps. i.e If two rows have tied at rank 2, the next record will still be ranked 3rd.

4. NTILE(n) OVER (order/partition by..):
This function allows you to divide a set of rows into roughly equal parts that have been specifed. E,g NTILE(4) will divide the selected rows into 4 roughly equal parts; we say roughly to mean not all parts will always have the same number of rows.

5. LAG ():
This function will retrieve the previous value from the previous row, e.g if you want to compare current salary and previous salary for an employee

6. LEAD ():
This function will retrieve the next value in the next row. Used in comparisons and even calculating lead time days e.g you can subtract the order_date from the result of the lead() function and display the difference as the Lead time days.

Example in syntax

select order_id, order_date, customer_id, 
lead(order_date)
 over (order by order_date asc) as next_order_date, 
datediff(day, order_date, lead(order_date) over (partition by customer_id order by order_date)as lead_time_days
from orders ;

7.AGGREGATE FUNCTIONS:

These are functions that calculate the sum, average, MAX and MIN of values in a table. You can perform an aggregate across a set of rows by combining the aggregate function with the (over by / partiton by) clauses and specifying the table aliases when specifying the column.

Well, there you have it!

Let me know what your thoughts are on this article.

Adios.