DEV Community

Mariam Turnesh
Mariam Turnesh

Posted on

SQL Joins and Window Functions Explained

The Tables We'll Use

We'll keep the same two tables throughout so examples stay consistent and easy to follow.

CREATE TABLE employees (
    emp_id    INT PRIMARY KEY,
    name      VARCHAR(100),
    dept_id   INT,
    salary    DECIMAL(10,2)
);

INSERT INTO employees VALUES
(1, 'Alice',  1, 72000),
(2, 'Brian',  2, 58000),
(3, 'Carol',  1, 65000),
(4, 'Eve',    3, 91000);

CREATE TABLE departments (
    dept_id    INT PRIMARY KEY,
    dept_name  VARCHAR(100),
    location   VARCHAR(100)
);

INSERT INTO departments VALUES
(1, 'Engineering', 'Nairobi'),
(2, 'Marketing',   'Lagos'),
(3, 'Finance',     'Accra');
Enter fullscreen mode Exit fullscreen mode

Part 1 — SQL Joins

What is a Join?

In relational databases, data is stored in separate tables to keep it organized, reduce duplication, and improve efficiency. For example: employee details might be stored in one table, while department information is stored in another.

When you need information from both tables at the same time, you use a JOIN. A JOIN combines rows from two tables using a related column, typically a primary key in one table and a foreign key in another. This relationship defines how the records are connected.

The ON clause specifies the exact condition used to match rows between the tables. SQL evaluates this condition to determine which records relate to each other. The result is a single, combined dataset that brings together information that was originally stored separately.

For instance, if both employees and departments share a column like dept_id, that column acts as the bridge between them. The JOIN uses it to match each employee to the correct department.

FROM table_a
JOIN table_b ON table_a.shared_column = table_b.shared_column
Enter fullscreen mode Exit fullscreen mode

A Note on Aliases

Before we get into join types, let's talk about aliases — the shorthand names you see on tables (e, d) in most join queries.

-- Without aliases works,but it gets unnecessarily wordy
SELECT employees.name, departments.dept_name
FROM employees
INNER JOIN departments ON employees.dept_id = departments.dept_id;

-- With aliases it is same result, much cleaner
SELECT e.name, d.dept_name
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id;
Enter fullscreen mode Exit fullscreen mode

You assign an alias by putting a short name right after the table name. From that point on, e means employees and d means departments. Aliases are especially important when:

  • Your table names are long
  • You're joining multiple tables and need to be clear about which column belongs to which table
  • You're writing a self join, where the same table appears twice and needs two different names

You'll see aliases in every example from here on.


INNER JOIN

An INNER JOIN returns only the rows where a match exists in both tables. If a row in one table has no matching row in the other, it is left out entirely.

SELECT e.name, e.salary, d.dept_name, d.location
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id;
Enter fullscreen mode Exit fullscreen mode
name salary dept_name location
Alice 72,000.00 Engineering Nairobi
Brian 58,000.00 Marketing Lagos
Carol 65,000.00 Engineering Nairobi
Eve 91,000.00 Finance Accra

Every employee has a matching department, so every employee appears. If any employee had a dept_id that didn't exist in departments, they would be excluded.

Use INNER JOIN when you only want rows with complete matching data on both sides. It's the strictest join, nothing without a match gets through.


LEFT JOIN

A LEFT JOIN returns all rows from the left table, and brings in matching data from the right table where it exists. Where there's no match, the right-side columns come back as NULL.

The "left" table is simply whichever table you write after FROM.

SELECT e.name, e.salary, d.dept_name, d.location
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.dept_id;
Enter fullscreen mode Exit fullscreen mode
name salary dept_name location
Alice 72,000.00 Engineering Nairobi
Brian 58,000.00 Marketing Lagos
Carol 65,000.00 Engineering Nairobi
Eve 91,000.00 Finance Accra

With our clean dataset this looks identical to the INNER JOIN result , that's fine. The real difference shows up the moment an employee has no matching department. They'd still appear, with NULL in the department columns.

Use LEFT JOIN when you need every record from the left table regardless of whether it has a match. This is the most commonly used join in analytics. Classic example: show all customers, including those who haven't placed an order yet.


RIGHT JOIN

A RIGHT JOIN is the mirror of a LEFT JOIN. It keeps all rows from the right table and fills NULL for unmatched left-side columns.

SELECT e.name, d.dept_name, d.location
FROM employees e
RIGHT JOIN departments d ON e.dept_id = d.dept_id;
Enter fullscreen mode Exit fullscreen mode
name dept_name location
Alice Engineering Nairobi
Carol Engineering Nairobi
Brian Marketing Lagos
Eve Finance Accra

Here all departments are protected. If a department had no employees, it would still appear with NULL for the employee name.

💡 In practice, most developers rewrite RIGHT JOIN as a LEFT JOIN by swapping the table order — it reads more naturally. These two queries return the same result:

-- As RIGHT JOIN
SELECT e.name, d.dept_name
FROM employees e
RIGHT JOIN departments d ON e.dept_id = d.dept_id;

-- Same result as LEFT JOIN (swap the tables)
SELECT e.name, d.dept_name
FROM departments d
LEFT JOIN employees e ON d.dept_id = e.dept_id;
Enter fullscreen mode Exit fullscreen mode

FULL OUTER JOIN

A FULL OUTER JOIN returns every row from both tables. Where there's no match on either side, NULL fills in the gap. Nothing is left out.

SELECT e.name, d.dept_name, d.location
FROM employees e
FULL OUTER JOIN departments d ON e.dept_id = d.dept_id;
Enter fullscreen mode Exit fullscreen mode
name dept_name location
Alice Engineering Nairobi
Brian Marketing Lagos
Carol Engineering Nairobi
Eve Finance Accra

With our dataset every row matches, so the result looks complete. But when working with real-world data — where mismatches and missing records are common — FULL OUTER JOIN ensures nothing from either side gets silently dropped.

Use FULL OUTER JOIN when reconciling two data sources or auditing records where you cannot afford to lose rows from either side.

⚠️ MySQL doesn't support FULL OUTER JOIN natively. Use a UNION instead:

SELECT e.name, d.dept_name
FROM employees e LEFT JOIN departments d ON e.dept_id = d.dept_id
UNION
SELECT e.name, d.dept_name
FROM employees e RIGHT JOIN departments d ON e.dept_id = d.dept_id;
Enter fullscreen mode Exit fullscreen mode

SELF JOIN

A SELF JOIN joins a table to itself. It sounds strange, but it's the natural tool for hierarchical data — the classic example being an employee table where each row has a manager_id that points to another employee in the same table.

ALTER TABLE employees ADD COLUMN manager_id INT;
UPDATE employees SET manager_id = 4 WHERE emp_id IN (1, 2, 3); -- Alice, Brian, Carol report to Eve

-- Join the table to itself to get manager names
SELECT
    e.name          AS employee,
    m.name          AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.emp_id;
Enter fullscreen mode Exit fullscreen mode
employee manager
Alice Eve
Brian Eve
Carol Eve
Eve NULL

Notice the aliases here — e for the employee and m for the manager. Without aliases, SQL wouldn't know which copy of the table you mean. Eve has no manager, so she returns NULL — which is why we use LEFT JOIN instead of INNER JOIN.

Use SELF JOIN when your data has a relationship with itself — org charts, categories with subcategories, or any parent-child structure stored in one table.


CROSS JOIN

A CROSS JOIN produces the Cartesian product of two tables — every row from Table A is paired with every row from Table B. There's no ON clause because you're not matching rows, you're combining all of them.

SELECT s.size_name, c.colour_name
FROM sizes s
CROSS JOIN colours c;
Enter fullscreen mode Exit fullscreen mode

If sizes has 3 rows (S, M, L) and colours has 4 rows (Black, White, Red, Navy):

size_name colour_name
S Black
S White
S Red
S Navy
M Black
M White
... ...

3 × 4 = 12 rows total.

⚠️ On large tables this explodes fast. Two tables with 1,000 rows each produce 1,000,000 rows. Always use CROSS JOIN intentionally.


Join Types at a Glance

Join Returns Drops Unmatched?
INNER JOIN Matching rows from both tables Yes — both sides
LEFT JOIN All left rows + matching right Right side only
RIGHT JOIN All right rows + matching left Left side only
FULL OUTER JOIN All rows from both tables Neither side
SELF JOIN Table joined to itself Depends on join type used
CROSS JOIN Every possible row combination N/A

Part 2 — Window Functions

What is a Window Function?

A window function performs a calculation across a set of rows without collapsing them into one. That's the key difference from GROUP BY.

Say you want each employee's salary alongside the average salary for their department:

-- GROUP BY collapses individual rows — you lose the employees
SELECT dept_id, AVG(salary) AS dept_avg
FROM employees
GROUP BY dept_id;
Enter fullscreen mode Exit fullscreen mode
dept_id dept_avg
1 68,500.00
2 58,000.00
3 91,000.00

The individual employees are gone. You got the group summary but lost the detail.

With a window function:

SELECT
    e.name,
    e.salary,
    d.dept_name,
    AVG(e.salary) OVER (PARTITION BY e.dept_id) AS dept_avg
FROM employees e
JOIN departments d ON e.dept_id = d.dept_id;
Enter fullscreen mode Exit fullscreen mode
name salary dept_name dept_avg
Alice 72,000.00 Engineering 68,500.00
Carol 65,000.00 Engineering 68,500.00
Brian 58,000.00 Marketing 58,000.00
Eve 91,000.00 Finance 91,000.00

Every employee row stays. The department average is added as an extra column. Alice and Carol share 68,500 — the average of their two salaries.

The rule: GROUP BY collapses rows into summaries. Window functions add a calculation to each row without removing any.


The OVER() Clause

Every window function uses an OVER() clause. This is what defines the "window" — the set of rows the function looks at for each calculation.

function_name() OVER (
    PARTITION BY  column   -- split rows into groups
    ORDER BY      column   -- sort rows within each group
)
Enter fullscreen mode Exit fullscreen mode
  • PARTITION BY — divides the data into groups. The function restarts its calculation for each group. Without it, all rows are treated as one single window.
  • ORDER BY inside OVER() — sorts rows within each partition. Required for ranking and running totals. This is separate from the ORDER BY at the end of your query.

ROW_NUMBER, RANK, and DENSE_RANK

These three functions assign a position number to each row within a partition. The difference comes down to how they handle ties.

SELECT
    e.name,
    d.dept_name,
    e.salary,
    ROW_NUMBER() OVER (PARTITION BY e.dept_id ORDER BY e.salary DESC) AS row_num,
    RANK()       OVER (PARTITION BY e.dept_id ORDER BY e.salary DESC) AS rnk,
    DENSE_RANK() OVER (PARTITION BY e.dept_id ORDER BY e.salary DESC) AS dense_rnk
FROM employees e
JOIN departments d ON e.dept_id = d.dept_id;
Enter fullscreen mode Exit fullscreen mode
name dept_name salary row_num rnk dense_rnk
Alice Engineering 72,000.00 1 1 1
Carol Engineering 65,000.00 2 2 2
Brian Marketing 58,000.00 1 1 1
Eve Finance 91,000.00 1 1 1

The numbering resets for each department because of PARTITION BY dept_id. To see the difference between RANK and DENSE_RANK, you need a tie. Imagine two employees both earning 72,000:

salary ROW_NUMBER RANK DENSE_RANK
72,000.00 1 1 1
72,000.00 2 1 1
65,000.00 3 3 2
58,000.00 4 4 3
  • ROW_NUMBER — always unique. Ties get different numbers.
  • RANK — tied rows share a rank. The next rank skips to account for the tie (1, 1, 3).
  • DENSE_RANK — tied rows share a rank. No numbers are skipped (1, 1, 2).

Getting the top earner per department:

-- Window functions can't go in WHERE directly.
-- Wrap them in a CTE first, then filter.
WITH ranked AS (
    SELECT
        e.name,
        d.dept_name,
        e.salary,
        ROW_NUMBER() OVER (PARTITION BY e.dept_id ORDER BY e.salary DESC) AS rn
    FROM employees e
    JOIN departments d ON e.dept_id = d.dept_id
)
SELECT name, dept_name, salary
FROM ranked
WHERE rn = 1;
Enter fullscreen mode Exit fullscreen mode
name dept_name salary
Alice Engineering 72,000.00
Brian Marketing 58,000.00
Eve Finance 91,000.00

LAG and LEAD

LAG looks back at a previous row's value. LEAD looks ahead at a future row's value. They're the standard tool for period-over-period comparisons.

SELECT
    sale_month,
    revenue,
    LAG(revenue, 1)  OVER (ORDER BY sale_month) AS prev_month,
    LEAD(revenue, 1) OVER (ORDER BY sale_month) AS next_month,
    revenue - LAG(revenue, 1) OVER (ORDER BY sale_month) AS change
FROM monthly_revenue
ORDER BY sale_month;
Enter fullscreen mode Exit fullscreen mode
sale_month revenue prev_month next_month change
2024-01 50,000 NULL 65,000 NULL
2024-02 65,000 50,000 58,000 +15,000
2024-03 58,000 65,000 71,000 -7,000
2024-04 71,000 58,000 NULL +13,000

LAG(revenue, 1) means: look back 1 row and return its revenue value. The first row returns NULL for LAG because there's no row before it — that's expected, not an error.

Use LAG/LEAD when comparing values across time periods — monthly revenue, daily active users, week-over-week sales.


Running Totals with SUM() OVER

Using SUM() as a window function with ORDER BY gives you a running total — each row accumulates the sum of all previous rows.

SELECT
    sale_date,
    daily_sales,
    SUM(daily_sales) OVER (
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_total
FROM daily_sales
ORDER BY sale_date;
Enter fullscreen mode Exit fullscreen mode
sale_date daily_sales running_total
2024-01-01 1,000 1,000
2024-01-02 1,500 2,500
2024-01-03 800 3,300
2024-01-04 2,200 5,500

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW tells SQL: include every row from the very beginning of the partition up to the current row.


NTILE — Dividing Rows into Buckets

NTILE(n) divides your rows into n equal groups and assigns each row a bucket number. It's the go-to function for creating tiers, quartiles, and percentile segments.

SELECT
    e.name,
    e.salary,
    NTILE(4) OVER (ORDER BY e.salary DESC) AS quartile,
    CASE NTILE(4) OVER (ORDER BY e.salary DESC)
        WHEN 1 THEN 'Top 25%'
        WHEN 2 THEN 'Upper Mid'
        WHEN 3 THEN 'Lower Mid'
        WHEN 4 THEN 'Bottom 25%'
    END AS salary_band
FROM employees e
ORDER BY e.salary DESC;
Enter fullscreen mode Exit fullscreen mode
name salary quartile salary_band
Eve 91,000.00 1 Top 25%
Alice 72,000.00 2 Upper Mid
Carol 65,000.00 3 Lower Mid
Brian 58,000.00 4 Bottom 25%

Window Functions at a Glance

Function What It Does
ROW_NUMBER() Unique sequential number per row — no ties
RANK() Rank with gaps after ties
DENSE_RANK() Rank without gaps after ties
NTILE(n) Divide rows into n equal buckets
LAG(col, n) Return value from n rows before current
LEAD(col, n) Return value from n rows after current
SUM() OVER() Running or partitioned total
AVG() OVER() Running or partitioned average
COUNT() OVER() Running or partitioned count
MIN() OVER() Minimum within the window
MAX() OVER() Maximum within the window

Key Takeaways

Joins:

  • Joins combine data from multiple tables using a shared column defined in the ON clause.
  • INNER JOIN — only rows that match on both sides.
  • LEFT JOIN — all rows from the left, NULL where there's no match on the right.
  • FULL OUTER JOIN — all rows from both sides, no exceptions.
  • SELF JOIN — a table joined to itself, great for hierarchical data.
  • Always use aliases to keep join queries clean and unambiguous.

Window Functions:

  • GROUP BY collapses rows into summaries. Window functions add calculations to each row without removing any.
  • PARTITION BY creates groups but keeps all individual rows intact.
  • ORDER BY inside OVER() is separate from your query-level ORDER BY.
  • You can't filter window functions in WHERE — wrap them in a CTE first.
  • ROW_NUMBER for unique ranks, RANK for gaps on ties, DENSE_RANK for no gaps.

If a specific part didn't click, drop it in the comments — happy to break it down further.

Top comments (0)