Master SQL Joins and Window Functions

#postgres #database #sql #datascience

Introduction
Database management and querying practices cannot function without SQL joins and window functions. These are two distinct yet powerful mechanisms that facilitate data analysis and enable data scientists to retrieve the data they need efficiently. For instance, SQL joins combine two or more tables based on existing logical relationships and unique columns, including primary-foreign key relationships. Alternatively, a window function performs a calculation across a group of rows while keeping each row visible.
SQL Joins
Joins are SQL clauses that combine rows from one or more tables based on a related column.
SQL joins are crucial because they help.
a) Retrieve connected data stored across multiple tables.
b) Match table records based on standard columns.
c) Improve data analysis by combining related information.
d) Create meaningful result sets from separate tables.
Types of SQL Joins
INNER JOIN: Returns only rows that have matching values in both tables. It helps in combining records based on a related column.
a) Returning only matching rows from both tables.
b) Excluding non-matching data from the result set.
c) Ensuring accurate data relationships between tables.
Syntax:
SELECT table1.column1, table1.column2, table2.column1,... FROM table1 INNER JOIN table2 ON table1.matching_column = table2. matching_column;

LEFT (OUTER) JOIN: Returns all rows from the left table, and only the matched rows from the right table. It helps in:
a) Returning all records from the left table.
b) Showing matching data from the right table.
c) Displaying NULL values where no match exists in the right table.
Syntax: **
SELECT table1.column1,table1.column2,table2.column1,.... FROM table1 LEFT JOIN table2 ON table1.matching_column = table2.matching_column;
**RIGHT (OUTER) JOIN: Returns all rows from the right table, and only the matched rows from the left table. It helps in:
a) Returning all records from the right-side table.
b) Showing matching data from the left-side table.
c) Displaying NULL values where no match exists in the left table.
Syntax **
SELECT table1.column1, table1.column2, table2.column1,.... FROM table1 RIGHT JOIN table2 ON table1.matching_column = table2.matching_column;
**FULL (OUTER) JOIN: Returns all rows when there is a match in either the left or right table. It helps in:
a) Returning all rows from both tables.
b) Showing matching records from each table.
c) Displaying NULL values where no match exists in either table.
Syntax
SELECT table1.column1,table1.column2,table2.column1,.... FROM table1 FULL JOIN table2 ON table1.matching_column = table2.matching_column;

Window Functions
A window function is used to perform a calculation across a specific set of rows (the 'window' in question), defined by an OVER() clause.
Syntax
SELECT column_1, column_2, column_3, function() OVER (PARTITION BY partition_expression ORDER BY order_expression) as output_column_name FROM table_name
In this syntax:
• SELECT; defines the columns to be selected from the table_name table.
• function (); the window function applied.
• OVER; defines the partitioning and ordering of rows in the window.
• PARTITION BY; divides rows into partitions based on the specified partition_expression; if the partition_expression is not specified, the result set will be treated as a single partition.
• ORDER BY: define the order in which rows are processed within each partition; if the order_expression is not specified, rows will be processed in an undefined order.
• Finally, output_column_name is the name of your output column.
_N/B. Window functions are applied after the processing of WHERE, GROUP BY, and HAVING clauses. _

Types of SQL window functions

Aggregate window functions

a) AVG() returns the average of the values in a group, ignoring null values.
b) MAX() returns the maximum value in the expression.
c) MIN() returns the minimum value in the expression.
d) SUM() returns the sum of all the values, or only the DISTINCT values, in the expression.
e) COUNT() returns the number of items found in a group.
f) STDEV() returns the statistical standard deviation of all values in the specified expression.
g) STDEVP() returns the statistical standard deviation for the population for all values in the specified expression.
h) VAR() returns the statistical variance of all values in the specified expression; the OVER clause may follow it.
i) VARP() returns the statistical variance for the population for all values in the specified expression.

Ranking window functions

a) ROW_NUMBER() assigns a unique sequential integer to rows within a partition of a result set.
b) RANK() assigns a unique rank to each row within a partition with gaps in the ranking sequence when there are ties.
c) DENSE_RANK() assigns a unique rank to each row within a partition without gaps in the ranking sequence when there are ties.
d) PERCENT_RANK() calculates the relative rank of a row within a group of rows.
e) NTILE() distributes rows in an ordered partition into a specified number of approximately equal groups.

Value window functions

a) LAG() retrieves values from rows that precede the current row in the result set.
b) LEAD() retrieves values from rows that follow the current row in the result set.
c) FIRST_VALUE() returns the first value in an ordered set of values within a partition.
d) LAST_VALUE() returns the last value in an ordered set of values within a partition.
e) NTH_VALUE() returns the value of the nth row in the ordered set of values.
f) CUME_DIST() returns the cumulative distribution of a value in a group of values.

DEV Community

Master SQL Joins and Window Functions

Top comments (0)