jim kinyua

Posted on Mar 2

Understanding SQL Joins and Window Functions

#sql #database #tutorial #beginners

If you've ever stared at a SQL query and wondered what PARTITION BY actually does, this article is for you. I'm going to break down two of the most important SQL concepts Joins and Window Functions

PART 1: SQL JOINS

What is a Join?

A Join is how you combine data from two or more tables. Think of it like connecting two spreadsheets using a shared column like a customer ID that appears in both a Customers table and an Orders table.

The Tables We'll Use

Customers:

id	name
1	Alice
2	Bob
3	Carol

Orders:

id	customer_id	amount
1	1	500
2	1	300
3	2	200

Notice that Carol (id = 3) has no orders, and all orders belong to either Alice or Bob. Keep that in mind. it's going to matter a lot when we look at how different JOINs behave.

Understanding Left and Right Tables

Before we look at any specific JOIN type, there's one concept you need to lock in first. Every JOIN has a Left table and a Right table, and the difference matters.

The table written after FROM is the Left table
The table written after JOIN is the Right table

SELECT customers.name, orders.amount
FROM customers          -- LEFT table
LEFT JOIN orders        -- RIGHT table
  ON customers.id = orders.customer_id;

This isn't just terminology. SQL uses this position to decide which table's rows get priority when there's no matching data on the other side.

INNER JOIN

An INNER JOIN returns rows where there is a match in BOTH tables.

SELECT customers.name, orders.amount
FROM customers
INNER JOIN orders ON customers.id = orders.customer_id;

Result:

name	amount
Alice	500
Alice	300
Bob	200

Carol is gone. She has no orders, so she doesn't appear. INNER JOIN only shows you where both sides match.

LEFT JOIN

A LEFT JOIN returns ALL rows from the left table, and matches from the right. Where there's no match, you get NULL.

SELECT customers.name, orders.amount
FROM customers
LEFT JOIN orders ON customers.id = orders.customer_id;

Result:

name	amount
Alice	500
Alice	300
Bob	200
Carol	NULL

Carol is back. She has no orders, so her amount is NULL. Use LEFT JOIN when you don't want to lose rows from your main table.

RIGHT JOIN

The opposite of LEFT JOIN. All rows from the right table are kept, and matches come from the left. Less common, but useful in certain situations.

FULL OUTER JOIN

Returns all rows from both tables, with NULLs where there's no match on either side.

SELECT customers.name, orders.amount
FROM customers
FULL OUTER JOIN orders ON customers.id = orders.customer_id;

Use this when you want to see unmatched records from both sides.

A Simple Way to Remember Joins

INNER = Only the overlap between the two tables
LEFT = All of left + whatever matches on the right
RIGHT = All of right + whatever matches on the left
FULL = Everything from both, matched or not

Key Takeaways for Joins

Joins combine tables using a shared column
INNER JOIN is the most common — only matched rows
LEFT JOIN is your friend when you don't want to lose rows from your primary table
Always think: "What happens to rows with no match?" this is what determines which join to use

PART 2: WINDOW FUNCTIONS

What is a Window Function?

A window function performs a calculation across a set of rows related to the current row without collapsing them like GROUP BY does.

This is the key difference:

GROUP BY: Collapses multiple rows into one result per group
Window Function: Keeps all rows, and adds a calculated column to each one

Think of it like this. for each row, you "open a window" into the data, do a calculation across that window, and write the result back into that row without losing any rows.

The Syntax

function() OVER (PARTITION BY column ORDER BY column)

OVER is what makes it a window function. Without OVER, it's just a regular function.

The Table We'll Use

Sales:

employee	dept	sales
Alice	IT	500
Bob	IT	300
Carol	HR	400
Dave	HR	600

ROW_NUMBER() — Rank Rows Within a Group

SELECT employee, dept, sales,
  ROW_NUMBER() OVER (PARTITION BY dept ORDER BY sales DESC) AS rank
FROM sales;

Result:

employee	dept	sales	rank
Alice	IT	500	1
Bob	IT	300	2
Dave	HR	600	1
Carol	HR	400	2

*What happened here?
*
PARTITION BY dept splits the data into two groups: IT and HR.
ORDER BY sales DESC sorts each group from highest to lowest sales.
ROW_NUMBER() assigns 1, 2, 3... to each row within that group.

Both Alice and Dave get rank 1 — because the ranking restarted for each department. That's the power of PARTITION BY.

RANK() vs DENSE_RANK() vs ROW_NUMBER()

These three are easy to confuse. Here's the difference when two people tie:

SELECT employee, sales,
  ROW_NUMBER() OVER (ORDER BY sales DESC) AS row_num,
  RANK()       OVER (ORDER BY sales DESC) AS rnk,
  DENSE_RANK() OVER (ORDER BY sales DESC) AS dense_rnk
FROM sales;

employee	sales	row_num	rnk	dense_rnk
Dave	600	1	1	1
Alice	500	2	2	2
Carol	500	3	2	2
Bob	300	4	4	3

ROW_NUMBER: Always unique. No ties. Just counts 1, 2, 3, 4.
RANK: Ties get the same number, then skips. Alice and Carol both get 2, next is 4.
DENSE_RANK: Ties get the same number, but does NOT skip. Next after 2 is 3.

SUM() OVER()

SELECT employee, dept, sales,
  SUM(sales) OVER (PARTITION BY dept) AS dept_total
FROM sales;

Result:

employee	dept	sales	dept_total
Alice	IT	500	800
Bob	IT	300	800
Dave	HR	600	1000
Carol	HR	400	1000

Every row still exists. But now each row also shows the total for their department. With GROUP BY, you'd lose the individual rows. With SUM() OVER(), you keep them all.

LAG() and LEAD()

LAG() lets you look at the value from the previous row. LEAD() looks ahead to the next row. Perfect for comparing month-over-month changes.

SELECT employee, sales,
  LAG(sales)  OVER (ORDER BY sales) AS previous_sales,
  LEAD(sales) OVER (ORDER BY sales) AS next_sales
FROM sales;

Result:

employee	sales	previous_sales	next_sales
Bob	300	NULL	400
Carol	400	300	500
Alice	500	400	600
Dave	600	500	NULL

The first row has no previous row, so it's NULL. The last row has no next row, so that's NULL too.

ROWS BETWEEN

This is more advanced, but incredibly useful for moving averages. It is mostly used to control the Size of Your Window

SELECT employee, sales,
  AVG(sales) OVER (
    ORDER BY sales
    ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
  ) AS moving_avg
FROM sales;

This calculates the average of the current row and the one before it. Change the numbers to control how many rows back or forward your window looks.

Key Takeaways for Window Functions

Window functions do NOT collapse rows — this is the biggest difference from GROUP BY
OVER() is what makes it a window function
PARTITION BY is like GROUP BY but keeps all rows
ORDER BY inside OVER() controls the order within the win

DEV Community

Understanding SQL Joins and Window Functions

PART 1: SQL JOINS

What is a Join?

The Tables We'll Use

Understanding Left and Right Tables

INNER JOIN

LEFT JOIN

RIGHT JOIN

FULL OUTER JOIN

A Simple Way to Remember Joins

Key Takeaways for Joins

PART 2: WINDOW FUNCTIONS

What is a Window Function?

The Syntax

The Table We'll Use

ROW_NUMBER() — Rank Rows Within a Group

RANK() vs DENSE_RANK() vs ROW_NUMBER()

SUM() OVER()

LAG() and LEAD()

ROWS BETWEEN

Key Takeaways for Window Functions

Top comments (0)