DEV Community

jim kinyua
jim kinyua

Posted on

Understanding SQL Joins and Window Functions

If you've ever stared at a SQL query and wondered what PARTITION BY actually does, this article is for you. I'm going to break down two of the most important SQL concepts Joins and Window Functions

PART 1: SQL JOINS

What is a Join?

A Join is how you combine data from two or more tables. Think of it like connecting two spreadsheets using a shared column like a customer ID that appears in both a Customers table and an Orders table.

The Tables We'll Use

Customers:

id name
1 Alice
2 Bob
3 Carol

Orders:

id customer_id amount
1 1 500
2 1 300
3 2 200

Notice that Carol (id = 3) has no orders, and all orders belong to either Alice or Bob. Keep that in mind. it's going to matter a lot when we look at how different JOINs behave.

Understanding Left and Right Tables

Before we look at any specific JOIN type, there's one concept you need to lock in first. Every JOIN has a Left table and a Right table, and the difference matters.

The table written after FROM is the Left table
The table written after JOIN is the Right table

SELECT customers.name, orders.amount
FROM customers          -- LEFT table
LEFT JOIN orders        -- RIGHT table
  ON customers.id = orders.customer_id;
Enter fullscreen mode Exit fullscreen mode

This isn't just terminology. SQL uses this position to decide which table's rows get priority when there's no matching data on the other side.

INNER JOIN

An INNER JOIN returns rows where there is a match in BOTH tables.

SELECT customers.name, orders.amount
FROM customers
INNER JOIN orders ON customers.id = orders.customer_id;
Enter fullscreen mode Exit fullscreen mode

Result:

name amount
Alice 500
Alice 300
Bob 200

Carol is gone. She has no orders, so she doesn't appear. INNER JOIN only shows you where both sides match.

LEFT JOIN

A LEFT JOIN returns ALL rows from the left table, and matches from the right. Where there's no match, you get NULL.

SELECT customers.name, orders.amount
FROM customers
LEFT JOIN orders ON customers.id = orders.customer_id;
Enter fullscreen mode Exit fullscreen mode

Result:

name amount
Alice 500
Alice 300
Bob 200
Carol NULL

Carol is back. She has no orders, so her amount is NULL. Use LEFT JOIN when you don't want to lose rows from your main table.

RIGHT JOIN

The opposite of LEFT JOIN. All rows from the right table are kept, and matches come from the left. Less common, but useful in certain situations.

FULL OUTER JOIN

Returns all rows from both tables, with NULLs where there's no match on either side.

SELECT customers.name, orders.amount
FROM customers
FULL OUTER JOIN orders ON customers.id = orders.customer_id;
Enter fullscreen mode Exit fullscreen mode

Use this when you want to see unmatched records from both sides.

A Simple Way to Remember Joins

INNER = Only the overlap between the two tables
LEFT = All of left + whatever matches on the right
RIGHT = All of right + whatever matches on the left
FULL = Everything from both, matched or not


Key Takeaways for Joins

  1. Joins combine tables using a shared column
  2. INNER JOIN is the most common — only matched rows
  3. LEFT JOIN is your friend when you don't want to lose rows from your primary table
  4. Always think: "What happens to rows with no match?" this is what determines which join to use

PART 2: WINDOW FUNCTIONS

What is a Window Function?

A window function performs a calculation across a set of rows related to the current row without collapsing them like GROUP BY does.

This is the key difference:

GROUP BY: Collapses multiple rows into one result per group
Window Function: Keeps all rows, and adds a calculated column to each one

Think of it like this. for each row, you "open a window" into the data, do a calculation across that window, and write the result back into that row without losing any rows.

The Syntax

function() OVER (PARTITION BY column ORDER BY column)
Enter fullscreen mode Exit fullscreen mode

OVER is what makes it a window function. Without OVER, it's just a regular function.

The Table We'll Use

Sales:

employee dept sales
Alice IT 500
Bob IT 300
Carol HR 400
Dave HR 600

ROW_NUMBER() — Rank Rows Within a Group

SELECT employee, dept, sales,
  ROW_NUMBER() OVER (PARTITION BY dept ORDER BY sales DESC) AS rank
FROM sales;
Enter fullscreen mode Exit fullscreen mode

Result:

employee dept sales rank
Alice IT 500 1
Bob IT 300 2
Dave HR 600 1
Carol HR 400 2

*What happened here?
*

PARTITION BY dept splits the data into two groups: IT and HR.
ORDER BY sales DESC sorts each group from highest to lowest sales.
ROW_NUMBER() assigns 1, 2, 3... to each row within that group.

Both Alice and Dave get rank 1 — because the ranking restarted for each department. That's the power of PARTITION BY.

RANK() vs DENSE_RANK() vs ROW_NUMBER()

These three are easy to confuse. Here's the difference when two people tie:

SELECT employee, sales,
  ROW_NUMBER() OVER (ORDER BY sales DESC) AS row_num,
  RANK()       OVER (ORDER BY sales DESC) AS rnk,
  DENSE_RANK() OVER (ORDER BY sales DESC) AS dense_rnk
FROM sales;
Enter fullscreen mode Exit fullscreen mode
employee sales row_num rnk dense_rnk
Dave 600 1 1 1
Alice 500 2 2 2
Carol 500 3 2 2
Bob 300 4 4 3

ROW_NUMBER: Always unique. No ties. Just counts 1, 2, 3, 4.
RANK: Ties get the same number, then skips. Alice and Carol both get 2, next is 4.
DENSE_RANK: Ties get the same number, but does NOT skip. Next after 2 is 3.

SUM() OVER()

SELECT employee, dept, sales,
  SUM(sales) OVER (PARTITION BY dept) AS dept_total
FROM sales;
Enter fullscreen mode Exit fullscreen mode

Result:

employee dept sales dept_total
Alice IT 500 800
Bob IT 300 800
Dave HR 600 1000
Carol HR 400 1000

Every row still exists. But now each row also shows the total for their department. With GROUP BY, you'd lose the individual rows. With SUM() OVER(), you keep them all.


LAG() and LEAD()

LAG() lets you look at the value from the previous row. LEAD() looks ahead to the next row. Perfect for comparing month-over-month changes.

SELECT employee, sales,
  LAG(sales)  OVER (ORDER BY sales) AS previous_sales,
  LEAD(sales) OVER (ORDER BY sales) AS next_sales
FROM sales;
Enter fullscreen mode Exit fullscreen mode

Result:

employee sales previous_sales next_sales
Bob 300 NULL 400
Carol 400 300 500
Alice 500 400 600
Dave 600 500 NULL

The first row has no previous row, so it's NULL. The last row has no next row, so that's NULL too.

ROWS BETWEEN

This is more advanced, but incredibly useful for moving averages. It is mostly used to control the Size of Your Window

SELECT employee, sales,
  AVG(sales) OVER (
    ORDER BY sales
    ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
  ) AS moving_avg
FROM sales;
Enter fullscreen mode Exit fullscreen mode

This calculates the average of the current row and the one before it. Change the numbers to control how many rows back or forward your window looks.

Key Takeaways for Window Functions

  1. Window functions do NOT collapse rows — this is the biggest difference from GROUP BY
  2. OVER() is what makes it a window function
  3. PARTITION BY is like GROUP BY but keeps all rows
  4. ORDER BY inside OVER() controls the order within the win

Top comments (0)