If you've ever stared at a SQL query and wondered what PARTITION BY actually does, this article is for you. I'm going to break down two of the most important SQL concepts Joins and Window Functions
PART 1: SQL JOINS
What is a Join?
A Join is how you combine data from two or more tables. Think of it like connecting two spreadsheets using a shared column like a customer ID that appears in both a Customers table and an Orders table.
The Tables We'll Use
Customers:
| id | name |
|---|---|
| 1 | Alice |
| 2 | Bob |
| 3 | Carol |
Orders:
| id | customer_id | amount |
|---|---|---|
| 1 | 1 | 500 |
| 2 | 1 | 300 |
| 3 | 2 | 200 |
Notice that Carol (id = 3) has no orders, and all orders belong to either Alice or Bob. Keep that in mind. it's going to matter a lot when we look at how different JOINs behave.
Understanding Left and Right Tables
Before we look at any specific JOIN type, there's one concept you need to lock in first. Every JOIN has a Left table and a Right table, and the difference matters.
The table written after FROM is the Left table
The table written after JOIN is the Right table
SELECT customers.name, orders.amount
FROM customers -- LEFT table
LEFT JOIN orders -- RIGHT table
ON customers.id = orders.customer_id;
This isn't just terminology. SQL uses this position to decide which table's rows get priority when there's no matching data on the other side.
INNER JOIN
An INNER JOIN returns rows where there is a match in BOTH tables.
SELECT customers.name, orders.amount
FROM customers
INNER JOIN orders ON customers.id = orders.customer_id;
Result:
| name | amount |
|---|---|
| Alice | 500 |
| Alice | 300 |
| Bob | 200 |
Carol is gone. She has no orders, so she doesn't appear. INNER JOIN only shows you where both sides match.
LEFT JOIN
A LEFT JOIN returns ALL rows from the left table, and matches from the right. Where there's no match, you get NULL.
SELECT customers.name, orders.amount
FROM customers
LEFT JOIN orders ON customers.id = orders.customer_id;
Result:
| name | amount |
|---|---|
| Alice | 500 |
| Alice | 300 |
| Bob | 200 |
| Carol | NULL |
Carol is back. She has no orders, so her amount is NULL. Use LEFT JOIN when you don't want to lose rows from your main table.
RIGHT JOIN
The opposite of LEFT JOIN. All rows from the right table are kept, and matches come from the left. Less common, but useful in certain situations.
FULL OUTER JOIN
Returns all rows from both tables, with NULLs where there's no match on either side.
SELECT customers.name, orders.amount
FROM customers
FULL OUTER JOIN orders ON customers.id = orders.customer_id;
Use this when you want to see unmatched records from both sides.
A Simple Way to Remember Joins
INNER = Only the overlap between the two tables
LEFT = All of left + whatever matches on the right
RIGHT = All of right + whatever matches on the left
FULL = Everything from both, matched or not
Key Takeaways for Joins
- Joins combine tables using a shared column
- INNER JOIN is the most common — only matched rows
- LEFT JOIN is your friend when you don't want to lose rows from your primary table
- Always think: "What happens to rows with no match?" this is what determines which join to use
PART 2: WINDOW FUNCTIONS
What is a Window Function?
A window function performs a calculation across a set of rows related to the current row without collapsing them like GROUP BY does.
This is the key difference:
GROUP BY: Collapses multiple rows into one result per group
Window Function: Keeps all rows, and adds a calculated column to each one
Think of it like this. for each row, you "open a window" into the data, do a calculation across that window, and write the result back into that row without losing any rows.
The Syntax
function() OVER (PARTITION BY column ORDER BY column)
OVER is what makes it a window function. Without OVER, it's just a regular function.
The Table We'll Use
Sales:
| employee | dept | sales |
|---|---|---|
| Alice | IT | 500 |
| Bob | IT | 300 |
| Carol | HR | 400 |
| Dave | HR | 600 |
ROW_NUMBER() — Rank Rows Within a Group
SELECT employee, dept, sales,
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY sales DESC) AS rank
FROM sales;
Result:
| employee | dept | sales | rank |
|---|---|---|---|
| Alice | IT | 500 | 1 |
| Bob | IT | 300 | 2 |
| Dave | HR | 600 | 1 |
| Carol | HR | 400 | 2 |
*What happened here?
*
PARTITION BY dept splits the data into two groups: IT and HR.
ORDER BY sales DESC sorts each group from highest to lowest sales.
ROW_NUMBER() assigns 1, 2, 3... to each row within that group.
Both Alice and Dave get rank 1 — because the ranking restarted for each department. That's the power of PARTITION BY.
RANK() vs DENSE_RANK() vs ROW_NUMBER()
These three are easy to confuse. Here's the difference when two people tie:
SELECT employee, sales,
ROW_NUMBER() OVER (ORDER BY sales DESC) AS row_num,
RANK() OVER (ORDER BY sales DESC) AS rnk,
DENSE_RANK() OVER (ORDER BY sales DESC) AS dense_rnk
FROM sales;
| employee | sales | row_num | rnk | dense_rnk |
|---|---|---|---|---|
| Dave | 600 | 1 | 1 | 1 |
| Alice | 500 | 2 | 2 | 2 |
| Carol | 500 | 3 | 2 | 2 |
| Bob | 300 | 4 | 4 | 3 |
ROW_NUMBER: Always unique. No ties. Just counts 1, 2, 3, 4.
RANK: Ties get the same number, then skips. Alice and Carol both get 2, next is 4.
DENSE_RANK: Ties get the same number, but does NOT skip. Next after 2 is 3.
SUM() OVER()
SELECT employee, dept, sales,
SUM(sales) OVER (PARTITION BY dept) AS dept_total
FROM sales;
Result:
| employee | dept | sales | dept_total |
|---|---|---|---|
| Alice | IT | 500 | 800 |
| Bob | IT | 300 | 800 |
| Dave | HR | 600 | 1000 |
| Carol | HR | 400 | 1000 |
Every row still exists. But now each row also shows the total for their department. With GROUP BY, you'd lose the individual rows. With SUM() OVER(), you keep them all.
LAG() and LEAD()
LAG() lets you look at the value from the previous row. LEAD() looks ahead to the next row. Perfect for comparing month-over-month changes.
SELECT employee, sales,
LAG(sales) OVER (ORDER BY sales) AS previous_sales,
LEAD(sales) OVER (ORDER BY sales) AS next_sales
FROM sales;
Result:
| employee | sales | previous_sales | next_sales |
|---|---|---|---|
| Bob | 300 | NULL | 400 |
| Carol | 400 | 300 | 500 |
| Alice | 500 | 400 | 600 |
| Dave | 600 | 500 | NULL |
The first row has no previous row, so it's NULL. The last row has no next row, so that's NULL too.
ROWS BETWEEN
This is more advanced, but incredibly useful for moving averages. It is mostly used to control the Size of Your Window
SELECT employee, sales,
AVG(sales) OVER (
ORDER BY sales
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
) AS moving_avg
FROM sales;
This calculates the average of the current row and the one before it. Change the numbers to control how many rows back or forward your window looks.
Key Takeaways for Window Functions
- Window functions do NOT collapse rows — this is the biggest difference from GROUP BY
- OVER() is what makes it a window function
- PARTITION BY is like GROUP BY but keeps all rows
- ORDER BY inside OVER() controls the order within the win
Top comments (0)