Mariam Turnesh

Posted on Mar 2

SQL Joins and Window Functions

#sql #beginners #database #dataanlysis

Introduction

In relational databases, data is stored in separate tables to keep it organized, reduce duplication, and improve efficiency. For example, employee details might be stored in one table while department information is stored in another.

When you need information from both tables at the same time, you use a JOIN. A JOIN combines rows from two tables using a related column, typically a primary key in one table and a foreign key in another. This relationship defines how the records are connected.

The ON clause specifies the exact condition used to match rows between the tables. SQL evaluates this condition to determine which records relate to each other. The result is a single, combined dataset that brings together information that was originally stored separately.

This guide walks through both Joins and Window Functions clearly, with examples you can follow from start to finish.

Dataset

We will use two tables throughout this entire article so the examples stay consistent and easy to follow.

CREATE TABLE customers (
    customer_id   INT PRIMARY KEY,
    first_name    VARCHAR(50),
    last_name     VARCHAR(50),
    country       VARCHAR(50)
);

INSERT INTO customers VALUES
(1, 'Alice',   'Mensah',   'Kenya'),
(2, 'Brian',   'Okonkwo',  'Nigeria'),
(3, 'Carol',   'Wanjiku',  'Kenya'),
(4, 'Diana',   'Asante',   'Ghana'),
(5, 'Edward',  'Kimani',   'Kenya');

CREATE TABLE orders (
    order_id     INT PRIMARY KEY,
    customer_id  INT,
    product      VARCHAR(100),
    amount       DECIMAL(10,2),
    order_date   DATE
);

INSERT INTO orders VALUES
(101, 1, 'Laptop',     900.00, '2024-01-10'),
(102, 1, 'Mouse',       25.00, '2024-02-14'),
(103, 3, 'Keyboard',    75.00, '2024-03-01'),
(104, 2, 'Monitor',    300.00, '2024-03-15'),
(105, 6, 'Headphones', 120.00, '2024-04-02');

Note: customer_id = 6 in the orders table has no matching customer. Diana (customer_id = 4) and Edward (customer_id = 5) have never placed an order. These details will matter as we go through each join type.

A Quick Note on Aliases

You will notice queries using short names like c and o instead of the full table name. These are called aliases.

-- Without aliases
SELECT customers.first_name, orders.product
FROM customers
INNER JOIN orders ON customers.customer_id = orders.customer_id;

-- With aliases — same result, much cleaner
SELECT c.first_name, o.product
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;

You assign an alias by writing a short name right after the table name in the FROM or JOIN clause. Aliases are especially useful when table names are long or when the same table appears more than once in a query. Every example from here on uses them.

Part 1 — SQL Joins

1. INNER JOIN

Returns only the rows where a match exists in both tables. Rows with no match on either side are excluded.

SELECT c.first_name, c.last_name, o.product, o.amount, o.order_date
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;

first_name	last_name	product	amount	order_date
Alice	Mensah	Laptop	900.00	2024-01-10
Alice	Mensah	Mouse	25.00	2024-02-14
Brian	Okonkwo	Monitor	300.00	2024-03-15
Carol	Wanjiku	Keyboard	75.00	2024-03-01

Alice, Brian and Carol matched so they appear. Diana and Edward have no orders so they are excluded. Order 105 (customer_id = 6) has no matching customer so it is also excluded.

2. LEFT JOIN

Returns all rows from the left table and matching rows from the right. Where there is no match, the right-side columns return NULL.

The "left" table is whichever comes after FROM.

SELECT c.first_name, c.last_name, o.product, o.amount
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id;

first_name	last_name	product	amount
Alice	Mensah	Laptop	900.00
Alice	Mensah	Mouse	25.00
Brian	Okonkwo	Monitor	300.00
Carol	Wanjiku	Keyboard	75.00
Diana	Asante	NULL	NULL
Edward	Kimani	NULL	NULL

All five customers appear. Diana and Edward have no orders so their product and amount columns are NULL. Order 105 is still excluded because it has no matching customer in the left table.

The LEFT JOIN is also known as the LEFT OUTER JOIN.

3. RIGHT JOIN

Returns all rows from the right table and matching rows from the left. Where there is no match, the left-side columns return NULL.

SELECT c.first_name, c.last_name, o.product, o.amount
FROM customers c
RIGHT JOIN orders o ON c.customer_id = o.customer_id;

first_name	last_name	product	amount
Alice	Mensah	Laptop	900.00
Alice	Mensah	Mouse	25.00
Brian	Okonkwo	Monitor	300.00
Carol	Wanjiku	Keyboard	75.00
NULL	NULL	Headphones	120.00

All five orders appear. Order 105 (Headphones) has no matching customer so first_name and last_name are NULL. Diana and Edward are gone because they live in the left table and have no matching order.

The RIGHT JOIN is also known as the RIGHT OUTER JOIN.

💡 Most developers rewrite RIGHT JOIN as a LEFT JOIN by swapping the table order as it reads more naturally. These two queries return identical results:
-- As RIGHT JOIN
SELECT c.first_name, o.product
FROM customers c
RIGHT JOIN orders o ON c.customer_id = o.customer_id;

-- Same result as LEFT JOIN (tables swapped)
SELECT c.first_name, o.product
FROM orders o
LEFT JOIN customers c ON c.customer_id = o.customer_id;

4. FULL OUTER JOIN

Returns all rows from both tables. Where there is no match on either side, NULL fills in the gap. No row from either table is left out.

SELECT c.first_name, c.last_name, o.product, o.amount
FROM customers c
FULL OUTER JOIN orders o ON c.customer_id = o.customer_id;

first_name	last_name	product	amount
Alice	Mensah	Laptop	900.00
Alice	Mensah	Mouse	25.00
Brian	Okonkwo	Monitor	300.00
Carol	Wanjiku	Keyboard	75.00
Diana	Asante	NULL	NULL
Edward	Kimani	NULL	NULL
NULL	NULL	Headphones	120.00

Every customer and every order appears, matched or not.

⚠️ MySQL does not support FULL OUTER JOIN. Use a UNION to emulate it:

SELECT c.first_name, o.product
FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id
UNION
SELECT c.first_name, o.product
FROM customers c RIGHT JOIN orders o ON c.customer_id = o.customer_id;

5. SELF JOIN

A SELF JOIN joins a table to itself. It is used when rows in a table relate to other rows in the same table. The most common example is an employee table where each employee has a manager_id that points to another employee in the same table.

We will use a separate employees table for this example:

CREATE TABLE employees (
    emp_id      INT PRIMARY KEY,
    name        VARCHAR(100),
    manager_id  INT
);

INSERT INTO employees VALUES
(1, 'Alice',  3),
(2, 'Brian',  3),
(3, 'Carol',  4),
(4, 'Diana',  NULL);

Alice and Brian report to Carol (emp_id = 3). Carol reports to Diana (emp_id = 4). Diana is at the top so she has no manager.

SELECT
    e.name AS employee,
    m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.emp_id;

employee	manager
Alice	Carol
Brian	Carol
Carol	Diana
Diana	NULL

Notice the aliases e and m. Because we are joining the employees table to itself, aliases are what tell SQL which copy of the table is which. Without them the query would be ambiguous and SQL would not know which name column to use. Diana returns NULL because she has no manager.

6. CROSS JOIN

A CROSS JOIN returns every possible combination of rows from both tables. There is no ON clause because you are not matching rows, you are pairing every row from one table with every row from the other.

Using our existing tables:

SELECT c.first_name, o.product
FROM customers c
CROSS JOIN orders o;

With 5 customers and 5 orders, the result has 5 x 5 = 25 rows. Every customer is paired with every product regardless of whether they ordered it.

first_name	product
Alice	Laptop
Alice	Mouse
Alice	Keyboard
Alice	Monitor
Alice	Headphones
Brian	Laptop
Brian	Mouse
...	...

This is useful for generating all possible combinations such as product variants, scheduling slots, or test data.

⚠️ On large tables, CROSS JOIN grows very fast. Two tables with 1,000 rows each produce 1,000,000 rows.

Join Types Summary

Join	What It Returns
`INNER JOIN`	Only rows with a match in both tables
`LEFT JOIN`	All rows from the left table + matching rows from the right
`RIGHT JOIN`	All rows from the right table + matching rows from the left
`FULL OUTER JOIN`	All rows from both tables
`SELF JOIN`	A table joined to itself
`CROSS JOIN`	Every possible combination of rows from both tables

Part 2 — Window Functions

What Are Window Functions?

Window functions perform calculations across a set of rows without collapsing them. That is what separates them from regular aggregate functions.

With GROUP BY, every group gets summarised into a single row. With window functions, every row stays in the result and the calculation is added as an extra column alongside it.

They operate over a window of rows defined by the OVER() clause.

Syntax

SELECT column_name,
       function() OVER (
           PARTITION BY partition_column
           ORDER BY     order_column
       ) AS output_column
FROM table_name;

Part	What It Does
`function()`	The window function e.g. `RANK()`, `SUM()`, `LAG()`
`OVER()`	Tells SQL this is a window function and defines the window
`PARTITION BY`	Divides rows into independent groups. The function restarts for each group
`ORDER BY`	Defines the order of rows within each partition
`AS output_column`	Names the result column

Important: Window functions run after WHERE, GROUP BY, and HAVING, so you cannot filter on a window function directly in a WHERE clause. Wrap it in a CTE or subquery instead.

For the window function examples we will use this employees table:

CREATE TABLE employees (
    emp_id      INT PRIMARY KEY,
    name        VARCHAR(100),
    department  VARCHAR(50),
    salary      DECIMAL(10,2)
);

INSERT INTO employees VALUES
(1, 'Alice',   'Engineering', 72000),
(2, 'Brian',   'Marketing',   58000),
(3, 'Carol',   'Engineering', 65000),
(4, 'Diana',   'Marketing',   71000),
(5, 'Edward',  'Finance',     91000),
(6, 'Faith',   'Engineering', 72000);

Types of Window Functions

Window functions fall into three categories:

Aggregate: SUM(), AVG(), COUNT(), MIN(), MAX()

Ranking: ROW_NUMBER(), RANK(), DENSE_RANK(), PERCENT_RANK()

Value: LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE()

1. ROW_NUMBER()

Assigns a unique sequential number to each row within a partition, ordered by the column you specify.

SELECT name, department, salary,
       ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num
FROM employees;

name	department	salary	row_num
Alice	Engineering	72000.00	1
Faith	Engineering	72000.00	2
Carol	Engineering	65000.00	3
Diana	Marketing	71000.00	1
Brian	Marketing	58000.00	2
Edward	Finance	91000.00	1

The numbering restarts for each department because of PARTITION BY department. Alice and Faith earn the same salary but still get different row numbers because ROW_NUMBER is always unique.

2. RANK()

Assigns ranks within a partition. When rows tie, they receive the same rank and the next rank skips to account for the tie.

SELECT name, department, salary,
       RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
FROM employees;

name	department	salary	rank
Alice	Engineering	72000.00	1
Faith	Engineering	72000.00	1
Carol	Engineering	65000.00	3
Diana	Marketing	71000.00	1
Brian	Marketing	58000.00	2
Edward	Finance	91000.00	1

Alice and Faith both earn 72,000 so they both get rank 1. The next rank jumps to 3 because rank 2 is skipped to account for the tie.

3. DENSE_RANK()

Works like RANK() but without the gaps. Tied rows share the same rank and the next rank continues without skipping.

SELECT name, department, salary,
       DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rank
FROM employees;

name	department	salary	dense_rank
Alice	Engineering	72000.00	1
Faith	Engineering	72000.00	1
Carol	Engineering	65000.00	2
Diana	Marketing	71000.00	1
Brian	Marketing	58000.00	2
Edward	Finance	91000.00	1

Alice and Faith are still both rank 1 but now Carol is rank 2 not 3. No numbers are skipped.

Comparing ROW_NUMBER, RANK, and DENSE_RANK

Using the Engineering department where Alice and Faith both earn 72,000:

name	salary	ROW_NUMBER	RANK	DENSE_RANK
Alice	72000.00	1	1	1
Faith	72000.00	2	1	1
Carol	65000.00	3	3	2

ROW_NUMBER always unique, no ties recognised
RANK ties get the same rank, next rank skips
DENSE_RANK ties get the same rank, next rank does not skip

4. SUM() — Running Total

When you use SUM() as a window function with ORDER BY, it accumulates a running total as it moves down the rows.

SELECT name, department, salary,
       SUM(salary) OVER (
           PARTITION BY department
           ORDER BY emp_id
           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
       ) AS running_total
FROM employees;

name	department	salary	running_total
Alice	Engineering	72000.00	72000.00
Carol	Engineering	65000.00	137000.00
Faith	Engineering	72000.00	209000.00
Brian	Marketing	58000.00	58000.00
Diana	Marketing	71000.00	129000.00
Edward	Finance	91000.00	91000.00

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW tells SQL to include every row from the start of the partition up to the current row. The total resets for each department because of PARTITION BY.

5. AVG() — Department Average Per Row

Without ORDER BY, a window aggregate runs across the entire partition at once, giving you the group value on every individual row.

SELECT name, department, salary,
       ROUND(AVG(salary) OVER (PARTITION BY department), 2) AS dept_avg
FROM employees;

name	department	salary	dept_avg
Alice	Engineering	72000.00	69666.67
Carol	Engineering	65000.00	69666.67
Faith	Engineering	72000.00	69666.67
Brian	Marketing	58000.00	64500.00
Diana	Marketing	71000.00	64500.00
Edward	Finance	91000.00	91000.00

Every employee keeps their individual row and the department average sits right next to their salary. This is the query that GROUP BY simply cannot do.

6. LAG() and LEAD()

LAG() looks back at the value of a previous row. LEAD() looks ahead at the value of a future row. Both are used for period-over-period comparisons.

SELECT
    order_date,
    amount,
    LAG(amount, 1)  OVER (ORDER BY order_date) AS prev_order,
    LEAD(amount, 1) OVER (ORDER BY order_date) AS next_order
FROM orders
WHERE customer_id = 1;

order_date	amount	prev_order	next_order
2024-01-10	900.00	NULL	25.00
2024-02-14	25.00	900.00	NULL

The first row returns NULL for LAG because there is no row before it. The last row returns NULL for LEAD because there is no row after it. This is expected behaviour, not an error.

Getting the Top Earner Per Department

This is one of the most common window function patterns. It requires a CTE because you cannot filter on a window function in a WHERE clause directly.

WITH ranked AS (
    SELECT name, department, salary,
           ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
    FROM employees
)
SELECT name, department, salary
FROM ranked
WHERE rn = 1;

name	department	salary
Alice	Engineering	72000.00
Diana	Marketing	71000.00
Edward	Finance	91000.00

We use ROW_NUMBER() here instead of RANK() so that ties produce exactly one result per department. The CTE materialises the window function result first and then the outer WHERE filters on it.

Conclusion

Joins connect data that lives in separate tables. The join type you pick decides what happens to rows with no match. INNER JOIN drops them, LEFT JOIN keeps the left side, FULL OUTER JOIN keeps everything.

Window Functions bring group-level calculations down to the row level without losing any individual rows. PARTITION BY creates the groups, ORDER BY inside OVER() controls the order within them, and functions like RANK(), SUM(), and LAG() do the work.

Master these two and most real-world SQL analysis becomes a lot more straightforward.

Have a question about a specific join or function? Drop it in the comments.

DEV Community