If you have been writing SQL for a while, you have probably come across joins and window functions. These two features appear in a large percentage of real-world SQL queries, and for good reason. They are among the most powerful tools SQL has to offer.
However, they can be confusing at first. Understanding how tables connect with joins and how window functions analyze data across rows is a key step in becoming comfortable with SQL.
In this article, we will walk through the most common types of joins using a simple example and explain how they work.
1. What are Joins?
In SQL, joins are used to combine rows from two or more tables based on a related column between them.
Think of it as sliding two spreadsheets together so that matching values line up, the join type determines what you do with rows that have no partner on the other side.
To understand joins better, we will use the following two tables.
In the Customers table, customer_id is the primary key.
In the Orders table, order_id is the primary key, while customer_id acts as a foreign key that references the Customers table.
Table 1: Customers Table
| customer_id | customer_name |
|------------|---------------|
| 1 | Alice |
| 2 | Bob |
| 3 | Charlie |
| 4 | Diana |
Table 2: Orders Table
| order_id | customer_id | total_amount |
|----------|------------|--------------|
| 101 | 1 | 250 |
| 102 | 2 | 150 |
| 103 | 1 | 300 |
| 104 | 3 | 200 |
| 105 | 5 | 100 |
NB:
- Customer 4 (Diana) has no orders
- Order 105 belongs to customer_id 5, which does not exist in Customers.
A) INNER Join
Returns rows where there is a match in both tables.
If there’s no match, the result set will not include those records.
SELECT c.customer_name, o.order_id, o.total_amount
FROM Customers c
INNER JOIN Orders o
ON c.customer_id = o.customer_id;
Result: The above query returns only customers who actually have orders.
customer_name order_id total_amount
Alice 101 250
Alice 103 300
Bob 102 150
Charlie 104 200
NB
Notice that Diana (no orders) and Order 105 (no matching customer) are both absent. Only rows with a valid match on both sides make it through.
This is the most common join.
It is mostly when you want records that exist in both tables eg orders with valid customers, students with enrolled courses.
B. LEFT(OUTER) Join
This join returns all records from the left table and the matched records from the right table.
If there’s no match, NULL values are returned for the right table’s columns.
In this case, the left table is the customers table and on the right is the orders table.
SELECT c.customer_name, o.order_id, o.total_amount
FROM Customers c
LEFT JOIN Orders o
ON c.customer_id = o.customer_id;
Result: The above query returns all customers even those with no orders.
customer_name order_id total_amount
Alice 101 250
Alice 103 300
Bob 102 150
Charlie 104 200
Diana NULL NULL
NB
- Diana now appears in the results, even though she has no orders — her order columns simply return NULL.
- Left Join is the foundation of the anti-join pattern — filtering to WHERE o.order_id IS NULL gives you only the customers who have never placed an order.
B.i) Anti Join
An Anti Join is used when you want to find records in one table that do not have a corresponding match in another table.
For Example:
- Find customers who haven’t purchased anything
- Identify products with no sales
Using the above table, LEFT JOIN + WHERE NULL
SELECT c.customer_id, c.customer_name
FROM Customers c
LEFT JOIN Orders o
ON c.customer_id = o.customer_id
WHERE o.order_id IS NULL;
Result: For customers without matching orders, order_idreturns NULL.
The WHERE o.order_id IS NULL condition filters these unmatched rows.
This is the anti join pattern.
customer_id customer_name
4 Diana
C) RIGHT Join
Returns all rows from the right table and matching rows from the left table.
SELECT c.customer_name, o.order_id, o.total_amount
FROM Customers c
RIGHT JOIN Orders o
ON c.customer_id = o.customer_id;
Result: The above query returns all orders, even those without matching customers.
customer_name order_id total_amount
Alice 101 250
Alice 103 300
Bob 102 150
Charlie 104 200
NULL 105 100
d) FULL OUTER Join
Returns all rows when there is a match in one of the tables.
SELECT c.customer_name, o.order_id, o.total_amount
FROM Customers c
FULL OUTER JOIN Orders o
ON c.customer_id = o.customer_id;
Result: Includes all customers and all orders, matching where possible. Combine both LEFT + RIGHT and everything shown
customer_name order_id total_amount
Alice 101 250
Alice 103 300
Bob 102 150
Charlie 104 200
Diana NULL NULL
NULL 105 100
e) SELF Join
A Self Join is when a table is joined with itself.
Instead of joining two different tables, you treat the same table as if it were two separate tables by using table aliases.
Self joins are commonly used when a table contains hierarchical or related data within itself, such as:
- employees and managers
- categories and subcategories
Example Table: Employees
employee_id employee_name manager_id
1 Alice NULL
2 Bob 1
3 Charlie 1
4 Diana 2
NB
- Alice is the top manager (no manager).
- Bob and Charlie report to Alice.
- Diana reports to Bob.
SELECT
e.employee_name AS employee,
m.employee_name AS manager
FROM employees e
LEFT JOIN employees m
ON e.manager_id = m.employee_id;
Result: The above query essentially asks: "For each employee, find the person whose employee_id matches their manager_id."
employee manager
Alice NULL
Bob Alice
Charlie Alice
Diana Bob
Let me explain:
- The employees table is referenced twice in the query.
- e represents the employee.
- m represents the manager.
- We match manager_id from one instance of the table to employee_id in the other.
2. Window Functions
While joins help you combine data across tables, window functions help you analyze data across rows — without collapsing those rows into a single result the way GROUP BY does.
This is the key distinction: GROUP BY aggregates rows and destroys their individual identity. A window function performs the same kind of calculation but preserves every row, attaching the result as a new column alongside the original data.
The syntax uses OVER()
The OVER() clause is what transforms an ordinary aggregate into a window function. An empty OVER() applies the function across the entire result set.
PARTITION BY subdivides that set into groups, restarting the calculation for each one.
a) Ranking Functions (ROW_NUMBER, RANK, DENSE_RANK)
Suppose you want to rank each order by its total_amount — highest to lowest.
Three ranking functions can do this, and their difference only shows up when values tie.
SELECT o.order_id, c.customer_name, o.total_amount,
-- Always unique; no ties are possible (1, 2, 3, 4 …)
ROW_NUMBER() OVER (ORDER BY o.total_amount DESC) AS row_num,
-- Tied rows share a rank; the next rank skips (1, 1, 3 …)
RANK() OVER (ORDER BY o.total_amount DESC) AS rnk,
-- Tied rows share a rank; no ranks are skipped (1, 1, 2 …)
DENSE_RANK() OVER (ORDER BY o.total_amount DESC) AS dense_rnk
FROM Orders o
INNER JOIN Customers c ON c.customer_id = o.customer_id;
Result:
| order_id | customer_name | total_amount | row_num | rnk | dense_rnk |
|---|---|---|---|---|---|
| 103 | Alice | 300 | 1 | 1 | 1 |
| 101 | Alice | 250 | 2 | 2 | 2 |
| 104 | Charlie | 200 | 3 | 3 | 3 |
| 102 | Bob | 150 | 4 | 4 | 4 |
NB:
- Order 105 does not appear because it has no matching customer — the INNER JOIN filters it out.
b) Aggregates Over a Window — SUM, AVG, COUNT
One of the most powerful things window functions enable is comparing each row against an overall or group-level aggregate — without GROUP BY collapsing your data.
Here, we calculate the total revenue across all orders and each order's percentage share of that total:
SELECT o.order_id, c.customer_name, o.total_amount, SUM(o.total_amount) OVER () AS grand_total,
FROM Orders o
INNER JOIN Customers c ON c.customer_id = o.customer_id;
Result:
| order_id | customer_name | total_amount | grand_total |
|---|---|---|---|
| 101 | Alice | 250 | 900 |
| 102 | Bob | 150 | 900 |
| 103 | Alice | 300 | 900 |
| 104 | Charlie | 200 | 900 |
NB
- Notice that
grand_totalis the same on every row. - The
OVER()means "look across the entire result set." - Each row keeps its own identity while also having access to the overall total.
c) LAG and LEAD — Comparing Orders Across Rows
LAG looks at the previous row's value and LEAD looks at the next row's value.
Here, we can use LAG to show how much each of Alice's orders changed compared to her previous one.
SELECT o.order_id, c.customer_name, o.total_amount,
-- We are now looking for the previous order's amount for the same customer
-- o.total_amount is the column we want to “look back” at.
-- the` 1 `returns how many rows back we want to look (1 row back in this case)
LAG(o.total_amount, 1) OVER (PARTITION BY o.customer_id
ORDER BY o.order_id) AS prev_order_amount,
-- Difference from the previous order using minus sign
o.total_amount - LAG(o.total_amount, 1)
OVER (PARTITION BY o.customer_id
ORDER BY o.order_id) AS change_from_prev
FROM Orders o
INNER JOIN Customers c ON c.customer_id = o.customer_id
ORDER BY c.customer_name, o.order_id;
Result:
| order_id | customer_name | total_amount | prev_order_amount | change_from_prev |
|---|---|---|---|---|
| 101 | Alice | 250 | NULL | NULL |
| 103 | Alice | 300 | 250 | 50 |
| 102 | Bob | 150 | NULL | NULL |
| 104 | Charlie | 200 | NULL | NULL |
NB
- Alice's second order (103) was 50 more than her first (101). Bob and - Charlie each only have one order, so their
prev_order_amountreturnsNULL— there is no previous row to look back at.
d) NTILE — Bucketing Customers by Spend
NTILE(n) divides rows into n roughly equal buckets and assigns each row a bucket number. It is perfect for segmenting customers by how much they have spent.
Since Diana has no orders, we first join and aggregate, then apply NTILE to rank the customers who do have orders:
SELECT c.customer_name, SUM(o.total_amount) AS total_spent,
NTILE(3) OVER (ORDER BY SUM(o.total_amount) DESC) AS spend_tier,
CASE NTILE(3) OVER (ORDER BY SUM(o.total_amount) DESC)
WHEN 1 THEN 'High Spender'
WHEN 2 THEN 'Mid Spender'
ELSE 'Low Spender'
END AS segment
FROM Orders o
INNER JOIN Customers c ON c.customer_id = o.customer_id
GROUP BY c.customer_name
ORDER BY total_spent DESC;
Result:
| customer_name | total_spent | spend_tier | segment |
|---|---|---|---|
| Alice | 550 | 1 | High Spender |
| Charlie | 200 | 2 | Mid Spender |
| Bob | 150 | 3 | Low Spender |
*NB
What NTILE(3) does is divide the results into 3 equal groups (tiles).
Alice, with a combined spend of 550, lands in Tier 1. Bob and Charlie each have a single order and are bucketed accordingly. Diana does not appear because she has no orders in the Orders table.
In conclusion, as mentioned earlier, these two concepts require a lot of practice to master. The best way to build mastery is to work with a dataset you already have and understand or a spreadsheet you have exported and deliberately write one query using each join type and one query using each of the window functions covered here.

Top comments (0)