Edmund Eryuba

Posted on Mar 2

Designing Efficient Queries with SQL Joins and Window Functions

#sql #database #datascience #analytics

SQL(Structured Query Language) is a powerful tool to search through large amounts of data and return specific information for analysis. Learning SQL is crucial for anyone aspiring to be a data analyst, data engineer, or data scientist, and helpful in many other fields such as web development or marketing.

SQL Joins

JOINS in SQL are commands which are used to combine rows from two or more tables, based on a related column between those tables. They are predominantly used when a user is trying to extract data from tables which have one-to-many or many-to-many relationships between them.

There are mainly four types of joins that you need to understand. They are:

(INNER) JOIN
LEFT (OUTER) JOIN
RIGHT (OUTER) JOIN
FULL (OUTER) JOIN

INNER JOIN

INNER JOIN is used to retrieve rows where matching values exist in both tables. It helps in:

Combining records based on a related column.
Returning only matching rows from both tables.
Excluding non-matching data from the result set.
Ensuring accurate data relationships between tables.

Syntax:

SELECT left_table.id, left_table.left_val, right_table.right_val
FROM left_table
INNER JOIN right_table
ON left_table.id = right_table.id;

LEFT JOIN

LEFT JOIN is used to retrieve all rows from the left table and matching rows from the right table. It helps in:

Returning all records from the left table.
Showing matching data from the right table.
Displaying NULL values where no match exists in the right table.
Performing outer joins, also known as LEFT OUTER JOIN.

Syntax:

SELECT left_table.id, left_table.left_val, right_table.right_val
FROM left_table
LEFT JOIN right_table
ON left_table.id = right_table.id;

RIGHT JOIN

RIGHT JOIN is used to retrieve all rows from the right table and the matching rows from the left table. It helps in:

Returning all records from the right-side table.
Showing matching data from the left-side table.
Displaying NULL values where no match exists in the left table.
Performing outer joins, also known as RIGHT OUTER JOIN.

Syntax:

SELECT left_table.id, left_table.left_val, right_table.right_val
FROM left_table 
RIGHT JOIN right_tale
ON left_table.id = right_table.id;

FULL JOIN

FULL JOIN is used to combine the results of both LEFT JOIN and RIGHT JOIN. It helps in:

Returning all rows from both tables.
Showing matching records from each table.
Displaying NULL values where no match exists in either table.
Providing complete data from both sides of the join.

Syntax:

SELECT left_table.id, left_table.left_val, right_table.right_val
FROM left_table 
FULL JOIN right_tale
ON left_table.id = right_table.id;

Core Insights

SQL joins are fundamental for relational data modeling, enabling the combination of rows from multiple tables based on defined relationships, typically via primary and foreign keys.

Proper join selection directly affects result cardinality, null propagation, and business logic interpretation. Performance considerations include indexing join columns, minimizing unnecessary joins and understanding join order in execution plans.

Key takeaways are that joins operationalize relational integrity, drive multi-table analytics and must be designed carefully to avoid duplication, unintended filtering or performance degradation especially in high-volume transactional or analytical databases.

SQL Window Functions

A window function in SQL is a type of function that performs a calculation across a specific set of rows (the 'window' in question), defined by an OVER() clause.

Window functions use values from one or multiple rows to return a value for each row, which makes them different from traditional aggregate functions, which return a single value for multiple rows.

Similar to aggregate function GROUP BY, a window function performs calculations across multiple rows. Unlike aggregate functions, a window function does not group rows into one single row.

Key components of SQL window functions

The syntax for window functions is as follows:

SELECT column_1, column_2, column_3, function()
OVER (PARTITION BY partition_expression ORDER BY order_expression) as output_column_name
FROM table_name

In this syntax:

The SELECT clause defines the columns you want to select from the table_name table.
The function() is the window function you want to use.
The OVER clause defines the partitioning and ordering of rows in the window.
The PARTITION BY clause divides rows into partitions based on the specified partition_expression; if not specified, the result set will be treated as a single partition.
The ORDER BY clause uses the specified order_expression to define the order in which rows will be processed within each partition; if not specified, rows will be processed in an undefined order.
Finally, output_column_name is the name of your output column.

These are the key SQL window function components. One more thing worth mentioning is that window functions are applied after the processing of WHERE, GROUP BY, and HAVING clauses. This means you can use the output of your window functions in subsequent clauses of your queries.

The OVER() clause

The OVER() clause in SQL is essentially the core of window functions. It determines the partitioning and ordering of a rowset before the associated window function is applied.
The OVER() clause can be applied with functions to compute aggregated values such as moving averages, running totals, cumulative aggregates, or top N per group results.

The PARTITION BY clause

The PARTITION BY clause is used to partition the rows of a table into groups. This comes in handy when dealing with large datasets that need to be split into smaller parts, which are easier to manage.
PARTITION BY is always used inside the OVER() clause; if it is omitted, the entire table is treated as a single partition.

The ORDER BY clause

The ORDER BY determines the order of rows within a partition; if it is omitted, the order is undefined.
For instance, when it comes to ranking functions, ORDER BY specifies the order in which ranks are assigned to rows.

Frame Specification

In the same OVER() clause, you can specify the upper and lower bounds of a window frame using one of the two subclauses, ROWS or RANGE. The basic syntax for both of these subclauses is essentially the same:

ROWS BETWEEN lower_bound AND upper_bound

RANGE BETWEEN lower_bound AND upper_bound

And in some cases, they might even return the same result. However, there's an important difference.

In the ROWS subclause, the frame is defined by beginning and ending row positions. Offsets are differences in row numbers from the current row number.

As opposed to that, in the RANGE subclause, the frame is defined by a value range. Offsets are differences in row values from the current row value.

Types of SQL Window Functions

Window functions in SQL Server are divided into three main types: aggregate, ranking, and value functions. Let's have a brief overview of each.

Aggregate Window Functions

AVG(): returns the average of the values in a group, ignoring null values.
MAX(): returns the maximum value in the expression.
MIN(): returns the minimum value in the expression.
SUM(): returns the sum of all the values, or only the DISTINCT values, in the expression.
COUNT(): returns the number of items found in a group.
STDEV(): returns the statistical standard deviation of all values in the specified expression.
STDEVP(): returns the statistical standard deviation for the population for all values in the specified expression.
VAR(): returns the statistical variance of all values in the specified expression; it may be followed by the OVER clause.
VARP(): returns the statistical variance for the population for all values in the specified expression.

Sample query:

SELECT name, salary,
  SUM(salary) OVER (PARTITION BY dept) AS dept_total,
  AVG(salary) OVER (PARTITION BY dept) AS dept_avg
FROM employees;

Ranking Window Functions

Used to assign rank or position within partitions.

ROW_NUMBER(): assigns a unique sequential integer to rows within a partition of a result set.
RANK(): assigns a unique rank to each row within a partition with gaps in the ranking sequence when there are ties.
DENSE_RANK(): assigns a unique rank to each row within a partition without gaps in the ranking sequence when there are ties.
PERCENT_RANK(): calculates the relative rank of a row within a group of rows.
NTILE(): distributes rows in an ordered partition into a specified number of approximately equal groups.

Sample query:

SELECT name, salary,
  RANK() OVER (PARTITION BY dept ORDER BY salary DESC) AS dept_rank
FROM employees;

Offset(Value) Window Functions

Used to access data from other rows.

LAG(): retrieves values from rows that precede the current row in the result set.
LEAD(): retrieves values from rows that follow the current row in the result set.
FIRST_VALUE(): returns the first value in an ordered set of values within a partition.
LAST_VALUE(): returns the last value in an ordered set of values within a partition.
NTH_VALUE(): returns the value of the nth row in the ordered set of values.
CUME_DIST(): returns the cumulative distribution of a value in a group of values.

Sample Query:

SELECT date, revenue,
  LAG(revenue, 1) OVER (ORDER BY date) AS prev_month,
  revenue - LAG(revenue, 1) OVER (ORDER BY date) AS change
FROM monthly_sales;

Summary

SQL window functions provide a powerful analytical layer within standard SQL, enabling complex calculations across related rows while preserving row-level granularity. Unlike GROUP BY, they do not collapse result sets, which makes them ideal for scenarios requiring both detail and aggregate insight in the same query.

The OVER() clause is central, with PARTITION BY defining logical groups, ORDER BY controlling calculation sequence, and optional frame specifications (ROWS or RANGE) refining scope.

Key functional categories include aggregate window functions for running totals and moving averages, ranking functions such as ROW_NUMBER() and RANK() for ordered comparisons and offset functions like LAG() and LEAD() for time-series or sequential analysis.

When used correctly, window functions significantly reduce query complexity, eliminate the need for self-joins in many analytical patterns and improve expressiveness in reporting and business intelligence workloads.

DEV Community

Designing Efficient Queries with SQL Joins and Window Functions

SQL Joins

INNER JOIN

LEFT JOIN

RIGHT JOIN

FULL JOIN

Core Insights

SQL Window Functions

Key components of SQL window functions

The OVER() clause

The PARTITION BY clause

The ORDER BY clause

Frame Specification

Types of SQL Window Functions

Aggregate Window Functions

Ranking Window Functions

Offset(Value) Window Functions

Summary

Top comments (0)