Joins and window functions are two of the most powerful tools in SQL. Understanding them well is really key in the journey of becoming a data professional.
A join combines rows from two or more tables based on a related column. The most common is the INNER JOIN, which returns only rows where a match exists in both tables. If you need to preserve all rows from one side regardless of a match, you use a LEFT or RIGHT JOIN β NULLs fill in where no match is found. A FULL OUTER JOIN goes further, returning all rows from both tables with NULLs on either side where matches are missing. Less common but worth knowing, a CROSS JOIN produces every possible combination of rows between two tables, and a SELF JOIN joins a table to itself β handy for hierarchical data like employee-manager relationships.
Here's a join query combining employees with their departments:
SELECT e.name, d.name AS department, e.salary
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.id;
Window functions compute values across a set of rows related to the current row β without collapsing the result like GROUP BY does. They use the OVER() clause, with PARTITION BY to define groups and ORDER BY to control row ordering within each group.
Ranking functions like RANK(), DENSE_RANK(), and ROW_NUMBER() are among the most used. The difference is: RANK() skips numbers after a tie, while DENSE_RANK() does not. Aggregate functions like SUM() and AVG() can also be used as window functions. Add an ORDER BY inside OVER() and they become cumulative, perfect for running totals. LAG and LEAD let you look at previous or next row values without a self-join, making period-over-period comparisons simple. Functions like FIRST_VALUE and NTILE round out the toolkit for benchmarking and bucketing data into equal groups.
below is an example of a window function showing each student' score alongside the class average and their rank without losing any rows. Assuming that a list of students and their exam score was initially given.
SELECT
name,
subject,
score,
ROUND(AVG(score) OVER (PARTITION BY subject), 2) AS class_average,
RANK() OVER (PARTITION BY subject ORDER BY score DESC) AS class_rank
FROM student_scores;
Top comments (0)