SQL(Structured Query Language) is a powerful tool to search through large amounts of data and return specific information for analysis. Learning SQL is crucial for anyone aspiring to be a data analyst, data engineer, or data scientist, and helpful in many other fields such as web development or marketing.
SQL Joins
JOINS in SQL are commands which are used to combine rows from two or more tables, based on a related column between those tables. They are predominantly used when a user is trying to extract data from tables which have one-to-many or many-to-many relationships between them.
There are mainly four types of joins that you need to understand. They are:
- (INNER) JOIN
- LEFT (OUTER) JOIN
- RIGHT (OUTER) JOIN
- FULL (OUTER) JOIN
INNER JOIN
INNER JOIN is used to retrieve rows where matching values exist in both tables. It helps in:
- Combining records based on a related column.
- Returning only matching rows from both tables.
- Excluding non-matching data from the result set.
- Ensuring accurate data relationships between tables.
Syntax:
SELECT left_table.id, left_table.left_val, right_table.right_val
FROM left_table
INNER JOIN right_table
ON left_table.id = right_table.id;
LEFT JOIN
LEFT JOIN is used to retrieve all rows from the left table and matching rows from the right table. It helps in:
- Returning all records from the left table.
- Showing matching data from the right table.
- Displaying NULL values where no match exists in the right table.
- Performing outer joins, also known as LEFT OUTER JOIN.
Syntax:
SELECT left_table.id, left_table.left_val, right_table.right_val
FROM left_table
LEFT JOIN right_table
ON left_table.id = right_table.id;
RIGHT JOIN
RIGHT JOIN is used to retrieve all rows from the right table and the matching rows from the left table. It helps in:
- Returning all records from the right-side table.
- Showing matching data from the left-side table.
- Displaying NULL values where no match exists in the left table.
- Performing outer joins, also known as RIGHT OUTER JOIN.
Syntax:
SELECT left_table.id, left_table.left_val, right_table.right_val
FROM left_table
RIGHT JOIN right_tale
ON left_table.id = right_table.id;
FULL JOIN
FULL JOIN is used to combine the results of both LEFT JOIN and RIGHT JOIN. It helps in:
- Returning all rows from both tables.
- Showing matching records from each table.
- Displaying NULL values where no match exists in either table.
- Providing complete data from both sides of the join.
Syntax:
SELECT left_table.id, left_table.left_val, right_table.right_val
FROM left_table
FULL JOIN right_tale
ON left_table.id = right_table.id;
Core Insights
SQL joins are fundamental for relational data modeling, enabling the combination of rows from multiple tables based on defined relationships, typically via primary and foreign keys.
Proper join selection directly affects result cardinality, null propagation, and business logic interpretation. Performance considerations include indexing join columns, minimizing unnecessary joins and understanding join order in execution plans.
Key takeaways are that joins operationalize relational integrity, drive multi-table analytics and must be designed carefully to avoid duplication, unintended filtering or performance degradation especially in high-volume transactional or analytical databases.
SQL Window Functions
A window function in SQL is a type of function that performs a calculation across a specific set of rows (the 'window' in question), defined by an OVER() clause.
Window functions use values from one or multiple rows to return a value for each row, which makes them different from traditional aggregate functions, which return a single value for multiple rows.
Similar to aggregate function GROUP BY, a window function performs calculations across multiple rows. Unlike aggregate functions, a window function does not group rows into one single row.
Key components of SQL window functions
The syntax for window functions is as follows:
SELECT column_1, column_2, column_3, function()
OVER (PARTITION BY partition_expression ORDER BY order_expression) as output_column_name
FROM table_name
In this syntax:
- The
SELECTclause defines the columns you want to select from thetable_nametable. - The
function()is the window function you want to use. - The
OVERclause defines the partitioning and ordering of rows in the window. - The
PARTITION BYclause divides rows into partitions based on the specifiedpartition_expression; if not specified, the result set will be treated as a single partition. - The
ORDER BYclause uses the specifiedorder_expressionto define the order in which rows will be processed within each partition; if not specified, rows will be processed in an undefined order. - Finally,
output_column_nameis the name of your output column.
These are the key SQL window function components. One more thing worth mentioning is that window functions are applied after the processing of WHERE, GROUP BY, and HAVING clauses. This means you can use the output of your window functions in subsequent clauses of your queries.
The OVER() clause
The OVER() clause in SQL is essentially the core of window functions. It determines the partitioning and ordering of a rowset before the associated window function is applied.
The OVER() clause can be applied with functions to compute aggregated values such as moving averages, running totals, cumulative aggregates, or top N per group results.
The PARTITION BY clause
The PARTITION BY clause is used to partition the rows of a table into groups. This comes in handy when dealing with large datasets that need to be split into smaller parts, which are easier to manage.
PARTITION BY is always used inside the OVER() clause; if it is omitted, the entire table is treated as a single partition.
The ORDER BY clause
The ORDER BY determines the order of rows within a partition; if it is omitted, the order is undefined.
For instance, when it comes to ranking functions, ORDER BY specifies the order in which ranks are assigned to rows.
Frame Specification
In the same OVER() clause, you can specify the upper and lower bounds of a window frame using one of the two subclauses, ROWS or RANGE. The basic syntax for both of these subclauses is essentially the same:
ROWS BETWEEN lower_bound AND upper_bound
RANGE BETWEEN lower_bound AND upper_bound
And in some cases, they might even return the same result. However, there's an important difference.
In the ROWS subclause, the frame is defined by beginning and ending row positions. Offsets are differences in row numbers from the current row number.
As opposed to that, in the RANGE subclause, the frame is defined by a value range. Offsets are differences in row values from the current row value.
Types of SQL Window Functions
Window functions in SQL Server are divided into three main types: aggregate, ranking, and value functions. Let's have a brief overview of each.
Aggregate Window Functions
-
AVG(): returns the average of the values in a group, ignoring null values. -
MAX(): returns the maximum value in the expression. -
MIN(): returns the minimum value in the expression. -
SUM(): returns the sum of all the values, or only the DISTINCT values, in the expression. -
COUNT(): returns the number of items found in a group. -
STDEV(): returns the statistical standard deviation of all values in the specified expression. -
STDEVP(): returns the statistical standard deviation for the population for all values in the specified expression. -
VAR(): returns the statistical variance of all values in the specified expression; it may be followed by the OVER clause. -
VARP(): returns the statistical variance for the population for all values in the specified expression.
Sample query:
SELECT name, salary,
SUM(salary) OVER (PARTITION BY dept) AS dept_total,
AVG(salary) OVER (PARTITION BY dept) AS dept_avg
FROM employees;
Ranking Window Functions
Used to assign rank or position within partitions.
-
ROW_NUMBER(): assigns a unique sequential integer to rows within a partition of a result set. -
RANK(): assigns a unique rank to each row within a partition with gaps in the ranking sequence when there are ties. -
DENSE_RANK(): assigns a unique rank to each row within a partition without gaps in the ranking sequence when there are ties. -
PERCENT_RANK(): calculates the relative rank of a row within a group of rows. -
NTILE(): distributes rows in an ordered partition into a specified number of approximately equal groups.
Sample query:
SELECT name, salary,
RANK() OVER (PARTITION BY dept ORDER BY salary DESC) AS dept_rank
FROM employees;
Offset(Value) Window Functions
Used to access data from other rows.
-
LAG(): retrieves values from rows that precede the current row in the result set. -
LEAD(): retrieves values from rows that follow the current row in the result set. -
FIRST_VALUE(): returns the first value in an ordered set of values within a partition. -
LAST_VALUE(): returns the last value in an ordered set of values within a partition. -
NTH_VALUE(): returns the value of the nth row in the ordered set of values. -
CUME_DIST(): returns the cumulative distribution of a value in a group of values.
Sample Query:
SELECT date, revenue,
LAG(revenue, 1) OVER (ORDER BY date) AS prev_month,
revenue - LAG(revenue, 1) OVER (ORDER BY date) AS change
FROM monthly_sales;
Summary
SQL window functions provide a powerful analytical layer within standard SQL, enabling complex calculations across related rows while preserving row-level granularity. Unlike GROUP BY, they do not collapse result sets, which makes them ideal for scenarios requiring both detail and aggregate insight in the same query.
The OVER() clause is central, with PARTITION BY defining logical groups, ORDER BY controlling calculation sequence, and optional frame specifications (ROWS or RANGE) refining scope.
Key functional categories include aggregate window functions for running totals and moving averages, ranking functions such as ROW_NUMBER() and RANK() for ordered comparisons and offset functions like LAG() and LEAD() for time-series or sequential analysis.
When used correctly, window functions significantly reduce query complexity, eliminate the need for self-joins in many analytical patterns and improve expressiveness in reporting and business intelligence workloads.





Top comments (0)