Introduction
In relational databases, data is stored in separate tables to keep it organized, reduce duplication, and improve efficiency. For example, employee details might be stored in one table while department information is stored in another.
When you need information from both tables at the same time, you use a JOIN. A JOIN combines rows from two tables using a related column, typically a primary key in one table and a foreign key in another. This relationship defines how the records are connected.
The ON clause specifies the exact condition used to match rows between the tables. SQL evaluates this condition to determine which records relate to each other. The result is a single, combined dataset that brings together information that was originally stored separately.
This guide walks through both Joins and Window Functions clearly, with examples you can follow from start to finish.
Dataset
We will use two tables throughout this entire article so the examples stay consistent and easy to follow.
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
country VARCHAR(50)
);
INSERT INTO customers VALUES
(1, 'Alice', 'Mensah', 'Kenya'),
(2, 'Brian', 'Okonkwo', 'Nigeria'),
(3, 'Carol', 'Wanjiku', 'Kenya'),
(4, 'Diana', 'Asante', 'Ghana'),
(5, 'Edward', 'Kimani', 'Kenya');
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
product VARCHAR(100),
amount DECIMAL(10,2),
order_date DATE
);
INSERT INTO orders VALUES
(101, 1, 'Laptop', 900.00, '2024-01-10'),
(102, 1, 'Mouse', 25.00, '2024-02-14'),
(103, 3, 'Keyboard', 75.00, '2024-03-01'),
(104, 2, 'Monitor', 300.00, '2024-03-15'),
(105, 6, 'Headphones', 120.00, '2024-04-02');
Note:
customer_id = 6in the orders table has no matching customer. Diana (customer_id = 4) and Edward (customer_id = 5) have never placed an order. These details will matter as we go through each join type.
A Quick Note on Aliases
You will notice queries using short names like c and o instead of the full table name. These are called aliases.
-- Without aliases
SELECT customers.first_name, orders.product
FROM customers
INNER JOIN orders ON customers.customer_id = orders.customer_id;
-- With aliases — same result, much cleaner
SELECT c.first_name, o.product
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;
You assign an alias by writing a short name right after the table name in the FROM or JOIN clause. Aliases are especially useful when table names are long or when the same table appears more than once in a query. Every example from here on uses them.
Part 1 — SQL Joins
1. INNER JOIN
Returns only the rows where a match exists in both tables. Rows with no match on either side are excluded.
SELECT c.first_name, c.last_name, o.product, o.amount, o.order_date
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;
| first_name | last_name | product | amount | order_date |
|---|---|---|---|---|
| Alice | Mensah | Laptop | 900.00 | 2024-01-10 |
| Alice | Mensah | Mouse | 25.00 | 2024-02-14 |
| Brian | Okonkwo | Monitor | 300.00 | 2024-03-15 |
| Carol | Wanjiku | Keyboard | 75.00 | 2024-03-01 |
Alice, Brian and Carol matched so they appear. Diana and Edward have no orders so they are excluded. Order 105 (customer_id = 6) has no matching customer so it is also excluded.
2. LEFT JOIN
Returns all rows from the left table and matching rows from the right. Where there is no match, the right-side columns return NULL.
The "left" table is whichever comes after FROM.
SELECT c.first_name, c.last_name, o.product, o.amount
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id;
| first_name | last_name | product | amount |
|---|---|---|---|
| Alice | Mensah | Laptop | 900.00 |
| Alice | Mensah | Mouse | 25.00 |
| Brian | Okonkwo | Monitor | 300.00 |
| Carol | Wanjiku | Keyboard | 75.00 |
| Diana | Asante | NULL | NULL |
| Edward | Kimani | NULL | NULL |
All five customers appear. Diana and Edward have no orders so their product and amount columns are NULL. Order 105 is still excluded because it has no matching customer in the left table.
The LEFT JOIN is also known as the LEFT OUTER JOIN.
3. RIGHT JOIN
Returns all rows from the right table and matching rows from the left. Where there is no match, the left-side columns return NULL.
SELECT c.first_name, c.last_name, o.product, o.amount
FROM customers c
RIGHT JOIN orders o ON c.customer_id = o.customer_id;
| first_name | last_name | product | amount |
|---|---|---|---|
| Alice | Mensah | Laptop | 900.00 |
| Alice | Mensah | Mouse | 25.00 |
| Brian | Okonkwo | Monitor | 300.00 |
| Carol | Wanjiku | Keyboard | 75.00 |
| NULL | NULL | Headphones | 120.00 |
All five orders appear. Order 105 (Headphones) has no matching customer so first_name and last_name are NULL. Diana and Edward are gone because they live in the left table and have no matching order.
The RIGHT JOIN is also known as the RIGHT OUTER JOIN.
💡 Most developers rewrite
RIGHT JOINas aLEFT JOINby swapping the table order as it reads more naturally. These two queries return identical results:-- As RIGHT JOIN SELECT c.first_name, o.product FROM customers c RIGHT JOIN orders o ON c.customer_id = o.customer_id; -- Same result as LEFT JOIN (tables swapped) SELECT c.first_name, o.product FROM orders o LEFT JOIN customers c ON c.customer_id = o.customer_id;
4. FULL OUTER JOIN
Returns all rows from both tables. Where there is no match on either side, NULL fills in the gap. No row from either table is left out.
SELECT c.first_name, c.last_name, o.product, o.amount
FROM customers c
FULL OUTER JOIN orders o ON c.customer_id = o.customer_id;
| first_name | last_name | product | amount |
|---|---|---|---|
| Alice | Mensah | Laptop | 900.00 |
| Alice | Mensah | Mouse | 25.00 |
| Brian | Okonkwo | Monitor | 300.00 |
| Carol | Wanjiku | Keyboard | 75.00 |
| Diana | Asante | NULL | NULL |
| Edward | Kimani | NULL | NULL |
| NULL | NULL | Headphones | 120.00 |
Every customer and every order appears, matched or not.
⚠️ MySQL does not support
FULL OUTER JOIN. Use aUNIONto emulate it:SELECT c.first_name, o.product FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id UNION SELECT c.first_name, o.product FROM customers c RIGHT JOIN orders o ON c.customer_id = o.customer_id;
5. SELF JOIN
A SELF JOIN joins a table to itself. It is used when rows in a table relate to other rows in the same table. The most common example is an employee table where each employee has a manager_id that points to another employee in the same table.
We will use a separate employees table for this example:
CREATE TABLE employees (
emp_id INT PRIMARY KEY,
name VARCHAR(100),
manager_id INT
);
INSERT INTO employees VALUES
(1, 'Alice', 3),
(2, 'Brian', 3),
(3, 'Carol', 4),
(4, 'Diana', NULL);
Alice and Brian report to Carol (emp_id = 3). Carol reports to Diana (emp_id = 4). Diana is at the top so she has no manager.
SELECT
e.name AS employee,
m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.emp_id;
| employee | manager |
|---|---|
| Alice | Carol |
| Brian | Carol |
| Carol | Diana |
| Diana | NULL |
Notice the aliases e and m. Because we are joining the employees table to itself, aliases are what tell SQL which copy of the table is which. Without them the query would be ambiguous and SQL would not know which name column to use. Diana returns NULL because she has no manager.
6. CROSS JOIN
A CROSS JOIN returns every possible combination of rows from both tables. There is no ON clause because you are not matching rows, you are pairing every row from one table with every row from the other.
Using our existing tables:
SELECT c.first_name, o.product
FROM customers c
CROSS JOIN orders o;
With 5 customers and 5 orders, the result has 5 x 5 = 25 rows. Every customer is paired with every product regardless of whether they ordered it.
| first_name | product |
|---|---|
| Alice | Laptop |
| Alice | Mouse |
| Alice | Keyboard |
| Alice | Monitor |
| Alice | Headphones |
| Brian | Laptop |
| Brian | Mouse |
| ... | ... |
This is useful for generating all possible combinations such as product variants, scheduling slots, or test data.
⚠️ On large tables, CROSS JOIN grows very fast. Two tables with 1,000 rows each produce 1,000,000 rows.
Join Types Summary
| Join | What It Returns |
|---|---|
INNER JOIN |
Only rows with a match in both tables |
LEFT JOIN |
All rows from the left table + matching rows from the right |
RIGHT JOIN |
All rows from the right table + matching rows from the left |
FULL OUTER JOIN |
All rows from both tables |
SELF JOIN |
A table joined to itself |
CROSS JOIN |
Every possible combination of rows from both tables |
Part 2 — Window Functions
What Are Window Functions?
Window functions perform calculations across a set of rows without collapsing them. That is what separates them from regular aggregate functions.
With GROUP BY, every group gets summarised into a single row. With window functions, every row stays in the result and the calculation is added as an extra column alongside it.
They operate over a window of rows defined by the OVER() clause.
Syntax
SELECT column_name,
function() OVER (
PARTITION BY partition_column
ORDER BY order_column
) AS output_column
FROM table_name;
| Part | What It Does |
|---|---|
function() |
The window function e.g. RANK(), SUM(), LAG()
|
OVER() |
Tells SQL this is a window function and defines the window |
PARTITION BY |
Divides rows into independent groups. The function restarts for each group |
ORDER BY |
Defines the order of rows within each partition |
AS output_column |
Names the result column |
Important: Window functions run after
WHERE,GROUP BY, andHAVING, so you cannot filter on a window function directly in aWHEREclause. Wrap it in a CTE or subquery instead.
For the window function examples we will use this employees table:
CREATE TABLE employees (
emp_id INT PRIMARY KEY,
name VARCHAR(100),
department VARCHAR(50),
salary DECIMAL(10,2)
);
INSERT INTO employees VALUES
(1, 'Alice', 'Engineering', 72000),
(2, 'Brian', 'Marketing', 58000),
(3, 'Carol', 'Engineering', 65000),
(4, 'Diana', 'Marketing', 71000),
(5, 'Edward', 'Finance', 91000),
(6, 'Faith', 'Engineering', 72000);
Types of Window Functions
Window functions fall into three categories:
Aggregate: SUM(), AVG(), COUNT(), MIN(), MAX()
Ranking: ROW_NUMBER(), RANK(), DENSE_RANK(), PERCENT_RANK()
Value: LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE()
1. ROW_NUMBER()
Assigns a unique sequential number to each row within a partition, ordered by the column you specify.
SELECT name, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num
FROM employees;
| name | department | salary | row_num |
|---|---|---|---|
| Alice | Engineering | 72000.00 | 1 |
| Faith | Engineering | 72000.00 | 2 |
| Carol | Engineering | 65000.00 | 3 |
| Diana | Marketing | 71000.00 | 1 |
| Brian | Marketing | 58000.00 | 2 |
| Edward | Finance | 91000.00 | 1 |
The numbering restarts for each department because of PARTITION BY department. Alice and Faith earn the same salary but still get different row numbers because ROW_NUMBER is always unique.
2. RANK()
Assigns ranks within a partition. When rows tie, they receive the same rank and the next rank skips to account for the tie.
SELECT name, department, salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
FROM employees;
| name | department | salary | rank |
|---|---|---|---|
| Alice | Engineering | 72000.00 | 1 |
| Faith | Engineering | 72000.00 | 1 |
| Carol | Engineering | 65000.00 | 3 |
| Diana | Marketing | 71000.00 | 1 |
| Brian | Marketing | 58000.00 | 2 |
| Edward | Finance | 91000.00 | 1 |
Alice and Faith both earn 72,000 so they both get rank 1. The next rank jumps to 3 because rank 2 is skipped to account for the tie.
3. DENSE_RANK()
Works like RANK() but without the gaps. Tied rows share the same rank and the next rank continues without skipping.
SELECT name, department, salary,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rank
FROM employees;
| name | department | salary | dense_rank |
|---|---|---|---|
| Alice | Engineering | 72000.00 | 1 |
| Faith | Engineering | 72000.00 | 1 |
| Carol | Engineering | 65000.00 | 2 |
| Diana | Marketing | 71000.00 | 1 |
| Brian | Marketing | 58000.00 | 2 |
| Edward | Finance | 91000.00 | 1 |
Alice and Faith are still both rank 1 but now Carol is rank 2 not 3. No numbers are skipped.
Comparing ROW_NUMBER, RANK, and DENSE_RANK
Using the Engineering department where Alice and Faith both earn 72,000:
| name | salary | ROW_NUMBER | RANK | DENSE_RANK |
|---|---|---|---|---|
| Alice | 72000.00 | 1 | 1 | 1 |
| Faith | 72000.00 | 2 | 1 | 1 |
| Carol | 65000.00 | 3 | 3 | 2 |
- ROW_NUMBER always unique, no ties recognised
- RANK ties get the same rank, next rank skips
- DENSE_RANK ties get the same rank, next rank does not skip
4. SUM() — Running Total
When you use SUM() as a window function with ORDER BY, it accumulates a running total as it moves down the rows.
SELECT name, department, salary,
SUM(salary) OVER (
PARTITION BY department
ORDER BY emp_id
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM employees;
| name | department | salary | running_total |
|---|---|---|---|
| Alice | Engineering | 72000.00 | 72000.00 |
| Carol | Engineering | 65000.00 | 137000.00 |
| Faith | Engineering | 72000.00 | 209000.00 |
| Brian | Marketing | 58000.00 | 58000.00 |
| Diana | Marketing | 71000.00 | 129000.00 |
| Edward | Finance | 91000.00 | 91000.00 |
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW tells SQL to include every row from the start of the partition up to the current row. The total resets for each department because of PARTITION BY.
5. AVG() — Department Average Per Row
Without ORDER BY, a window aggregate runs across the entire partition at once, giving you the group value on every individual row.
SELECT name, department, salary,
ROUND(AVG(salary) OVER (PARTITION BY department), 2) AS dept_avg
FROM employees;
| name | department | salary | dept_avg |
|---|---|---|---|
| Alice | Engineering | 72000.00 | 69666.67 |
| Carol | Engineering | 65000.00 | 69666.67 |
| Faith | Engineering | 72000.00 | 69666.67 |
| Brian | Marketing | 58000.00 | 64500.00 |
| Diana | Marketing | 71000.00 | 64500.00 |
| Edward | Finance | 91000.00 | 91000.00 |
Every employee keeps their individual row and the department average sits right next to their salary. This is the query that GROUP BY simply cannot do.
6. LAG() and LEAD()
LAG() looks back at the value of a previous row. LEAD() looks ahead at the value of a future row. Both are used for period-over-period comparisons.
SELECT
order_date,
amount,
LAG(amount, 1) OVER (ORDER BY order_date) AS prev_order,
LEAD(amount, 1) OVER (ORDER BY order_date) AS next_order
FROM orders
WHERE customer_id = 1;
| order_date | amount | prev_order | next_order |
|---|---|---|---|
| 2024-01-10 | 900.00 | NULL | 25.00 |
| 2024-02-14 | 25.00 | 900.00 | NULL |
The first row returns NULL for LAG because there is no row before it. The last row returns NULL for LEAD because there is no row after it. This is expected behaviour, not an error.
Getting the Top Earner Per Department
This is one of the most common window function patterns. It requires a CTE because you cannot filter on a window function in a WHERE clause directly.
WITH ranked AS (
SELECT name, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees
)
SELECT name, department, salary
FROM ranked
WHERE rn = 1;
| name | department | salary |
|---|---|---|
| Alice | Engineering | 72000.00 |
| Diana | Marketing | 71000.00 |
| Edward | Finance | 91000.00 |
We use ROW_NUMBER() here instead of RANK() so that ties produce exactly one result per department. The CTE materialises the window function result first and then the outer WHERE filters on it.
Conclusion
Joins connect data that lives in separate tables. The join type you pick decides what happens to rows with no match. INNER JOIN drops them, LEFT JOIN keeps the left side, FULL OUTER JOIN keeps everything.
Window Functions bring group-level calculations down to the row level without losing any individual rows. PARTITION BY creates the groups, ORDER BY inside OVER() controls the order within them, and functions like RANK(), SUM(), and LAG() do the work.
Master these two and most real-world SQL analysis becomes a lot more straightforward.
Have a question about a specific join or function? Drop it in the comments.
Top comments (0)