DEV Community

Cover image for SQL Joins and Window Functions
Mariam Turnesh
Mariam Turnesh

Posted on

SQL Joins and Window Functions

Introduction

In relational databases, data is stored in separate tables to keep it organized, reduce duplication, and improve efficiency. For example, employee details might be stored in one table while department information is stored in another.

When you need information from both tables at the same time, you use a JOIN. A JOIN combines rows from two tables using a related column, typically a primary key in one table and a foreign key in another. This relationship defines how the records are connected.

The ON clause specifies the exact condition used to match rows between the tables. SQL evaluates this condition to determine which records relate to each other. The result is a single, combined dataset that brings together information that was originally stored separately.

This guide walks through both Joins and Window Functions clearly, with examples you can follow from start to finish.


Dataset

We will use two tables throughout this entire article so the examples stay consistent and easy to follow.

CREATE TABLE customers (
    customer_id   INT PRIMARY KEY,
    first_name    VARCHAR(50),
    last_name     VARCHAR(50),
    country       VARCHAR(50)
);

INSERT INTO customers VALUES
(1, 'Alice',   'Mensah',   'Kenya'),
(2, 'Brian',   'Okonkwo',  'Nigeria'),
(3, 'Carol',   'Wanjiku',  'Kenya'),
(4, 'Diana',   'Asante',   'Ghana'),
(5, 'Edward',  'Kimani',   'Kenya');

CREATE TABLE orders (
    order_id     INT PRIMARY KEY,
    customer_id  INT,
    product      VARCHAR(100),
    amount       DECIMAL(10,2),
    order_date   DATE
);

INSERT INTO orders VALUES
(101, 1, 'Laptop',     900.00, '2024-01-10'),
(102, 1, 'Mouse',       25.00, '2024-02-14'),
(103, 3, 'Keyboard',    75.00, '2024-03-01'),
(104, 2, 'Monitor',    300.00, '2024-03-15'),
(105, 6, 'Headphones', 120.00, '2024-04-02');
Enter fullscreen mode Exit fullscreen mode

Note: customer_id = 6 in the orders table has no matching customer. Diana (customer_id = 4) and Edward (customer_id = 5) have never placed an order. These details will matter as we go through each join type.


A Quick Note on Aliases

You will notice queries using short names like c and o instead of the full table name. These are called aliases.

-- Without aliases
SELECT customers.first_name, orders.product
FROM customers
INNER JOIN orders ON customers.customer_id = orders.customer_id;

-- With aliases — same result, much cleaner
SELECT c.first_name, o.product
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;
Enter fullscreen mode Exit fullscreen mode

You assign an alias by writing a short name right after the table name in the FROM or JOIN clause. Aliases are especially useful when table names are long or when the same table appears more than once in a query. Every example from here on uses them.


Part 1 — SQL Joins

1. INNER JOIN

Returns only the rows where a match exists in both tables. Rows with no match on either side are excluded.

SELECT c.first_name, c.last_name, o.product, o.amount, o.order_date
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;
Enter fullscreen mode Exit fullscreen mode
first_name last_name product amount order_date
Alice Mensah Laptop 900.00 2024-01-10
Alice Mensah Mouse 25.00 2024-02-14
Brian Okonkwo Monitor 300.00 2024-03-15
Carol Wanjiku Keyboard 75.00 2024-03-01

Alice, Brian and Carol matched so they appear. Diana and Edward have no orders so they are excluded. Order 105 (customer_id = 6) has no matching customer so it is also excluded.


2. LEFT JOIN

Returns all rows from the left table and matching rows from the right. Where there is no match, the right-side columns return NULL.

The "left" table is whichever comes after FROM.

SELECT c.first_name, c.last_name, o.product, o.amount
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id;
Enter fullscreen mode Exit fullscreen mode
first_name last_name product amount
Alice Mensah Laptop 900.00
Alice Mensah Mouse 25.00
Brian Okonkwo Monitor 300.00
Carol Wanjiku Keyboard 75.00
Diana Asante NULL NULL
Edward Kimani NULL NULL

All five customers appear. Diana and Edward have no orders so their product and amount columns are NULL. Order 105 is still excluded because it has no matching customer in the left table.

The LEFT JOIN is also known as the LEFT OUTER JOIN.


3. RIGHT JOIN

Returns all rows from the right table and matching rows from the left. Where there is no match, the left-side columns return NULL.

SELECT c.first_name, c.last_name, o.product, o.amount
FROM customers c
RIGHT JOIN orders o ON c.customer_id = o.customer_id;
Enter fullscreen mode Exit fullscreen mode
first_name last_name product amount
Alice Mensah Laptop 900.00
Alice Mensah Mouse 25.00
Brian Okonkwo Monitor 300.00
Carol Wanjiku Keyboard 75.00
NULL NULL Headphones 120.00

All five orders appear. Order 105 (Headphones) has no matching customer so first_name and last_name are NULL. Diana and Edward are gone because they live in the left table and have no matching order.

The RIGHT JOIN is also known as the RIGHT OUTER JOIN.

💡 Most developers rewrite RIGHT JOIN as a LEFT JOIN by swapping the table order as it reads more naturally. These two queries return identical results:

-- As RIGHT JOIN
SELECT c.first_name, o.product
FROM customers c
RIGHT JOIN orders o ON c.customer_id = o.customer_id;

-- Same result as LEFT JOIN (tables swapped)
SELECT c.first_name, o.product
FROM orders o
LEFT JOIN customers c ON c.customer_id = o.customer_id;

4. FULL OUTER JOIN

Returns all rows from both tables. Where there is no match on either side, NULL fills in the gap. No row from either table is left out.

SELECT c.first_name, c.last_name, o.product, o.amount
FROM customers c
FULL OUTER JOIN orders o ON c.customer_id = o.customer_id;
Enter fullscreen mode Exit fullscreen mode
first_name last_name product amount
Alice Mensah Laptop 900.00
Alice Mensah Mouse 25.00
Brian Okonkwo Monitor 300.00
Carol Wanjiku Keyboard 75.00
Diana Asante NULL NULL
Edward Kimani NULL NULL
NULL NULL Headphones 120.00

Every customer and every order appears, matched or not.

⚠️ MySQL does not support FULL OUTER JOIN. Use a UNION to emulate it:

SELECT c.first_name, o.product
FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id
UNION
SELECT c.first_name, o.product
FROM customers c RIGHT JOIN orders o ON c.customer_id = o.customer_id;

5. SELF JOIN

A SELF JOIN joins a table to itself. It is used when rows in a table relate to other rows in the same table. The most common example is an employee table where each employee has a manager_id that points to another employee in the same table.

We will use a separate employees table for this example:

CREATE TABLE employees (
    emp_id      INT PRIMARY KEY,
    name        VARCHAR(100),
    manager_id  INT
);

INSERT INTO employees VALUES
(1, 'Alice',  3),
(2, 'Brian',  3),
(3, 'Carol',  4),
(4, 'Diana',  NULL);
Enter fullscreen mode Exit fullscreen mode

Alice and Brian report to Carol (emp_id = 3). Carol reports to Diana (emp_id = 4). Diana is at the top so she has no manager.

SELECT
    e.name AS employee,
    m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.emp_id;
Enter fullscreen mode Exit fullscreen mode
employee manager
Alice Carol
Brian Carol
Carol Diana
Diana NULL

Notice the aliases e and m. Because we are joining the employees table to itself, aliases are what tell SQL which copy of the table is which. Without them the query would be ambiguous and SQL would not know which name column to use. Diana returns NULL because she has no manager.


6. CROSS JOIN

A CROSS JOIN returns every possible combination of rows from both tables. There is no ON clause because you are not matching rows, you are pairing every row from one table with every row from the other.

Using our existing tables:

SELECT c.first_name, o.product
FROM customers c
CROSS JOIN orders o;
Enter fullscreen mode Exit fullscreen mode

With 5 customers and 5 orders, the result has 5 x 5 = 25 rows. Every customer is paired with every product regardless of whether they ordered it.

first_name product
Alice Laptop
Alice Mouse
Alice Keyboard
Alice Monitor
Alice Headphones
Brian Laptop
Brian Mouse
... ...

This is useful for generating all possible combinations such as product variants, scheduling slots, or test data.

⚠️ On large tables, CROSS JOIN grows very fast. Two tables with 1,000 rows each produce 1,000,000 rows.


Join Types Summary

Join What It Returns
INNER JOIN Only rows with a match in both tables
LEFT JOIN All rows from the left table + matching rows from the right
RIGHT JOIN All rows from the right table + matching rows from the left
FULL OUTER JOIN All rows from both tables
SELF JOIN A table joined to itself
CROSS JOIN Every possible combination of rows from both tables

Part 2 — Window Functions

What Are Window Functions?

Window functions perform calculations across a set of rows without collapsing them. That is what separates them from regular aggregate functions.

With GROUP BY, every group gets summarised into a single row. With window functions, every row stays in the result and the calculation is added as an extra column alongside it.

They operate over a window of rows defined by the OVER() clause.


Syntax

SELECT column_name,
       function() OVER (
           PARTITION BY partition_column
           ORDER BY     order_column
       ) AS output_column
FROM table_name;
Enter fullscreen mode Exit fullscreen mode
Part What It Does
function() The window function e.g. RANK(), SUM(), LAG()
OVER() Tells SQL this is a window function and defines the window
PARTITION BY Divides rows into independent groups. The function restarts for each group
ORDER BY Defines the order of rows within each partition
AS output_column Names the result column

Important: Window functions run after WHERE, GROUP BY, and HAVING, so you cannot filter on a window function directly in a WHERE clause. Wrap it in a CTE or subquery instead.


For the window function examples we will use this employees table:

CREATE TABLE employees (
    emp_id      INT PRIMARY KEY,
    name        VARCHAR(100),
    department  VARCHAR(50),
    salary      DECIMAL(10,2)
);

INSERT INTO employees VALUES
(1, 'Alice',   'Engineering', 72000),
(2, 'Brian',   'Marketing',   58000),
(3, 'Carol',   'Engineering', 65000),
(4, 'Diana',   'Marketing',   71000),
(5, 'Edward',  'Finance',     91000),
(6, 'Faith',   'Engineering', 72000);
Enter fullscreen mode Exit fullscreen mode

Types of Window Functions

Window functions fall into three categories:

Aggregate: SUM(), AVG(), COUNT(), MIN(), MAX()

Ranking: ROW_NUMBER(), RANK(), DENSE_RANK(), PERCENT_RANK()

Value: LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE()


1. ROW_NUMBER()

Assigns a unique sequential number to each row within a partition, ordered by the column you specify.

SELECT name, department, salary,
       ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num
FROM employees;
Enter fullscreen mode Exit fullscreen mode
name department salary row_num
Alice Engineering 72000.00 1
Faith Engineering 72000.00 2
Carol Engineering 65000.00 3
Diana Marketing 71000.00 1
Brian Marketing 58000.00 2
Edward Finance 91000.00 1

The numbering restarts for each department because of PARTITION BY department. Alice and Faith earn the same salary but still get different row numbers because ROW_NUMBER is always unique.


2. RANK()

Assigns ranks within a partition. When rows tie, they receive the same rank and the next rank skips to account for the tie.

SELECT name, department, salary,
       RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
FROM employees;
Enter fullscreen mode Exit fullscreen mode
name department salary rank
Alice Engineering 72000.00 1
Faith Engineering 72000.00 1
Carol Engineering 65000.00 3
Diana Marketing 71000.00 1
Brian Marketing 58000.00 2
Edward Finance 91000.00 1

Alice and Faith both earn 72,000 so they both get rank 1. The next rank jumps to 3 because rank 2 is skipped to account for the tie.


3. DENSE_RANK()

Works like RANK() but without the gaps. Tied rows share the same rank and the next rank continues without skipping.

SELECT name, department, salary,
       DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rank
FROM employees;
Enter fullscreen mode Exit fullscreen mode
name department salary dense_rank
Alice Engineering 72000.00 1
Faith Engineering 72000.00 1
Carol Engineering 65000.00 2
Diana Marketing 71000.00 1
Brian Marketing 58000.00 2
Edward Finance 91000.00 1

Alice and Faith are still both rank 1 but now Carol is rank 2 not 3. No numbers are skipped.


Comparing ROW_NUMBER, RANK, and DENSE_RANK

Using the Engineering department where Alice and Faith both earn 72,000:

name salary ROW_NUMBER RANK DENSE_RANK
Alice 72000.00 1 1 1
Faith 72000.00 2 1 1
Carol 65000.00 3 3 2
  • ROW_NUMBER always unique, no ties recognised
  • RANK ties get the same rank, next rank skips
  • DENSE_RANK ties get the same rank, next rank does not skip

4. SUM() — Running Total

When you use SUM() as a window function with ORDER BY, it accumulates a running total as it moves down the rows.

SELECT name, department, salary,
       SUM(salary) OVER (
           PARTITION BY department
           ORDER BY emp_id
           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
       ) AS running_total
FROM employees;
Enter fullscreen mode Exit fullscreen mode
name department salary running_total
Alice Engineering 72000.00 72000.00
Carol Engineering 65000.00 137000.00
Faith Engineering 72000.00 209000.00
Brian Marketing 58000.00 58000.00
Diana Marketing 71000.00 129000.00
Edward Finance 91000.00 91000.00

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW tells SQL to include every row from the start of the partition up to the current row. The total resets for each department because of PARTITION BY.


5. AVG() — Department Average Per Row

Without ORDER BY, a window aggregate runs across the entire partition at once, giving you the group value on every individual row.

SELECT name, department, salary,
       ROUND(AVG(salary) OVER (PARTITION BY department), 2) AS dept_avg
FROM employees;
Enter fullscreen mode Exit fullscreen mode
name department salary dept_avg
Alice Engineering 72000.00 69666.67
Carol Engineering 65000.00 69666.67
Faith Engineering 72000.00 69666.67
Brian Marketing 58000.00 64500.00
Diana Marketing 71000.00 64500.00
Edward Finance 91000.00 91000.00

Every employee keeps their individual row and the department average sits right next to their salary. This is the query that GROUP BY simply cannot do.


6. LAG() and LEAD()

LAG() looks back at the value of a previous row. LEAD() looks ahead at the value of a future row. Both are used for period-over-period comparisons.

SELECT
    order_date,
    amount,
    LAG(amount, 1)  OVER (ORDER BY order_date) AS prev_order,
    LEAD(amount, 1) OVER (ORDER BY order_date) AS next_order
FROM orders
WHERE customer_id = 1;
Enter fullscreen mode Exit fullscreen mode
order_date amount prev_order next_order
2024-01-10 900.00 NULL 25.00
2024-02-14 25.00 900.00 NULL

The first row returns NULL for LAG because there is no row before it. The last row returns NULL for LEAD because there is no row after it. This is expected behaviour, not an error.


Getting the Top Earner Per Department

This is one of the most common window function patterns. It requires a CTE because you cannot filter on a window function in a WHERE clause directly.

WITH ranked AS (
    SELECT name, department, salary,
           ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
    FROM employees
)
SELECT name, department, salary
FROM ranked
WHERE rn = 1;
Enter fullscreen mode Exit fullscreen mode
name department salary
Alice Engineering 72000.00
Diana Marketing 71000.00
Edward Finance 91000.00

We use ROW_NUMBER() here instead of RANK() so that ties produce exactly one result per department. The CTE materialises the window function result first and then the outer WHERE filters on it.


Conclusion

Joins connect data that lives in separate tables. The join type you pick decides what happens to rows with no match. INNER JOIN drops them, LEFT JOIN keeps the left side, FULL OUTER JOIN keeps everything.

Window Functions bring group-level calculations down to the row level without losing any individual rows. PARTITION BY creates the groups, ORDER BY inside OVER() controls the order within them, and functions like RANK(), SUM(), and LAG() do the work.

Master these two and most real-world SQL analysis becomes a lot more straightforward.


Have a question about a specific join or function? Drop it in the comments.

Top comments (0)