The Tables We'll Use
We'll keep the same two tables throughout so examples stay consistent and easy to follow.
CREATE TABLE employees (
emp_id INT PRIMARY KEY,
name VARCHAR(100),
dept_id INT,
salary DECIMAL(10,2)
);
INSERT INTO employees VALUES
(1, 'Alice', 1, 72000),
(2, 'Brian', 2, 58000),
(3, 'Carol', 1, 65000),
(4, 'Eve', 3, 91000);
CREATE TABLE departments (
dept_id INT PRIMARY KEY,
dept_name VARCHAR(100),
location VARCHAR(100)
);
INSERT INTO departments VALUES
(1, 'Engineering', 'Nairobi'),
(2, 'Marketing', 'Lagos'),
(3, 'Finance', 'Accra');
Part 1 — SQL Joins
What is a Join?
In relational databases, data is stored in separate tables to keep it organized, reduce duplication, and improve efficiency. For example: employee details might be stored in one table, while department information is stored in another.
When you need information from both tables at the same time, you use a JOIN. A JOIN combines rows from two tables using a related column, typically a primary key in one table and a foreign key in another. This relationship defines how the records are connected.
The ON clause specifies the exact condition used to match rows between the tables. SQL evaluates this condition to determine which records relate to each other. The result is a single, combined dataset that brings together information that was originally stored separately.
For instance, if both employees and departments share a column like dept_id, that column acts as the bridge between them. The JOIN uses it to match each employee to the correct department.
FROM table_a
JOIN table_b ON table_a.shared_column = table_b.shared_column
A Note on Aliases
Before we get into join types, let's talk about aliases — the shorthand names you see on tables (e, d) in most join queries.
-- Without aliases works,but it gets unnecessarily wordy
SELECT employees.name, departments.dept_name
FROM employees
INNER JOIN departments ON employees.dept_id = departments.dept_id;
-- With aliases it is same result, much cleaner
SELECT e.name, d.dept_name
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id;
You assign an alias by putting a short name right after the table name. From that point on, e means employees and d means departments. Aliases are especially important when:
- Your table names are long
- You're joining multiple tables and need to be clear about which column belongs to which table
- You're writing a self join, where the same table appears twice and needs two different names
You'll see aliases in every example from here on.
INNER JOIN
An INNER JOIN returns only the rows where a match exists in both tables. If a row in one table has no matching row in the other, it is left out entirely.
SELECT e.name, e.salary, d.dept_name, d.location
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id;
| name | salary | dept_name | location |
|---|---|---|---|
| Alice | 72,000.00 | Engineering | Nairobi |
| Brian | 58,000.00 | Marketing | Lagos |
| Carol | 65,000.00 | Engineering | Nairobi |
| Eve | 91,000.00 | Finance | Accra |
Every employee has a matching department, so every employee appears. If any employee had a dept_id that didn't exist in departments, they would be excluded.
✅ Use INNER JOIN when you only want rows with complete matching data on both sides. It's the strictest join, nothing without a match gets through.
LEFT JOIN
A LEFT JOIN returns all rows from the left table, and brings in matching data from the right table where it exists. Where there's no match, the right-side columns come back as NULL.
The "left" table is simply whichever table you write after FROM.
SELECT e.name, e.salary, d.dept_name, d.location
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.dept_id;
| name | salary | dept_name | location |
|---|---|---|---|
| Alice | 72,000.00 | Engineering | Nairobi |
| Brian | 58,000.00 | Marketing | Lagos |
| Carol | 65,000.00 | Engineering | Nairobi |
| Eve | 91,000.00 | Finance | Accra |
With our clean dataset this looks identical to the INNER JOIN result , that's fine. The real difference shows up the moment an employee has no matching department. They'd still appear, with NULL in the department columns.
✅ Use LEFT JOIN when you need every record from the left table regardless of whether it has a match. This is the most commonly used join in analytics. Classic example: show all customers, including those who haven't placed an order yet.
RIGHT JOIN
A RIGHT JOIN is the mirror of a LEFT JOIN. It keeps all rows from the right table and fills NULL for unmatched left-side columns.
SELECT e.name, d.dept_name, d.location
FROM employees e
RIGHT JOIN departments d ON e.dept_id = d.dept_id;
| name | dept_name | location |
|---|---|---|
| Alice | Engineering | Nairobi |
| Carol | Engineering | Nairobi |
| Brian | Marketing | Lagos |
| Eve | Finance | Accra |
Here all departments are protected. If a department had no employees, it would still appear with NULL for the employee name.
💡 In practice, most developers rewrite
RIGHT JOINas aLEFT JOINby swapping the table order — it reads more naturally. These two queries return the same result:
-- As RIGHT JOIN
SELECT e.name, d.dept_name
FROM employees e
RIGHT JOIN departments d ON e.dept_id = d.dept_id;
-- Same result as LEFT JOIN (swap the tables)
SELECT e.name, d.dept_name
FROM departments d
LEFT JOIN employees e ON d.dept_id = e.dept_id;
FULL OUTER JOIN
A FULL OUTER JOIN returns every row from both tables. Where there's no match on either side, NULL fills in the gap. Nothing is left out.
SELECT e.name, d.dept_name, d.location
FROM employees e
FULL OUTER JOIN departments d ON e.dept_id = d.dept_id;
| name | dept_name | location |
|---|---|---|
| Alice | Engineering | Nairobi |
| Brian | Marketing | Lagos |
| Carol | Engineering | Nairobi |
| Eve | Finance | Accra |
With our dataset every row matches, so the result looks complete. But when working with real-world data — where mismatches and missing records are common — FULL OUTER JOIN ensures nothing from either side gets silently dropped.
✅ Use FULL OUTER JOIN when reconciling two data sources or auditing records where you cannot afford to lose rows from either side.
⚠️ MySQL doesn't support
FULL OUTER JOINnatively. Use aUNIONinstead:
SELECT e.name, d.dept_name
FROM employees e LEFT JOIN departments d ON e.dept_id = d.dept_id
UNION
SELECT e.name, d.dept_name
FROM employees e RIGHT JOIN departments d ON e.dept_id = d.dept_id;
SELF JOIN
A SELF JOIN joins a table to itself. It sounds strange, but it's the natural tool for hierarchical data — the classic example being an employee table where each row has a manager_id that points to another employee in the same table.
ALTER TABLE employees ADD COLUMN manager_id INT;
UPDATE employees SET manager_id = 4 WHERE emp_id IN (1, 2, 3); -- Alice, Brian, Carol report to Eve
-- Join the table to itself to get manager names
SELECT
e.name AS employee,
m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.emp_id;
| employee | manager |
|---|---|
| Alice | Eve |
| Brian | Eve |
| Carol | Eve |
| Eve | NULL |
Notice the aliases here — e for the employee and m for the manager. Without aliases, SQL wouldn't know which copy of the table you mean. Eve has no manager, so she returns NULL — which is why we use LEFT JOIN instead of INNER JOIN.
✅ Use SELF JOIN when your data has a relationship with itself — org charts, categories with subcategories, or any parent-child structure stored in one table.
CROSS JOIN
A CROSS JOIN produces the Cartesian product of two tables — every row from Table A is paired with every row from Table B. There's no ON clause because you're not matching rows, you're combining all of them.
SELECT s.size_name, c.colour_name
FROM sizes s
CROSS JOIN colours c;
If sizes has 3 rows (S, M, L) and colours has 4 rows (Black, White, Red, Navy):
| size_name | colour_name |
|---|---|
| S | Black |
| S | White |
| S | Red |
| S | Navy |
| M | Black |
| M | White |
| ... | ... |
3 × 4 = 12 rows total.
⚠️ On large tables this explodes fast. Two tables with 1,000 rows each produce 1,000,000 rows. Always use
CROSS JOINintentionally.
Join Types at a Glance
| Join | Returns | Drops Unmatched? |
|---|---|---|
INNER JOIN |
Matching rows from both tables | Yes — both sides |
LEFT JOIN |
All left rows + matching right | Right side only |
RIGHT JOIN |
All right rows + matching left | Left side only |
FULL OUTER JOIN |
All rows from both tables | Neither side |
SELF JOIN |
Table joined to itself | Depends on join type used |
CROSS JOIN |
Every possible row combination | N/A |
Part 2 — Window Functions
What is a Window Function?
A window function performs a calculation across a set of rows without collapsing them into one. That's the key difference from GROUP BY.
Say you want each employee's salary alongside the average salary for their department:
-- GROUP BY collapses individual rows — you lose the employees
SELECT dept_id, AVG(salary) AS dept_avg
FROM employees
GROUP BY dept_id;
| dept_id | dept_avg |
|---|---|
| 1 | 68,500.00 |
| 2 | 58,000.00 |
| 3 | 91,000.00 |
The individual employees are gone. You got the group summary but lost the detail.
With a window function:
SELECT
e.name,
e.salary,
d.dept_name,
AVG(e.salary) OVER (PARTITION BY e.dept_id) AS dept_avg
FROM employees e
JOIN departments d ON e.dept_id = d.dept_id;
| name | salary | dept_name | dept_avg |
|---|---|---|---|
| Alice | 72,000.00 | Engineering | 68,500.00 |
| Carol | 65,000.00 | Engineering | 68,500.00 |
| Brian | 58,000.00 | Marketing | 58,000.00 |
| Eve | 91,000.00 | Finance | 91,000.00 |
Every employee row stays. The department average is added as an extra column. Alice and Carol share 68,500 — the average of their two salaries.
The rule:
GROUP BYcollapses rows into summaries. Window functions add a calculation to each row without removing any.
The OVER() Clause
Every window function uses an OVER() clause. This is what defines the "window" — the set of rows the function looks at for each calculation.
function_name() OVER (
PARTITION BY column -- split rows into groups
ORDER BY column -- sort rows within each group
)
-
PARTITION BY— divides the data into groups. The function restarts its calculation for each group. Without it, all rows are treated as one single window. -
ORDER BYinsideOVER()— sorts rows within each partition. Required for ranking and running totals. This is separate from theORDER BYat the end of your query.
ROW_NUMBER, RANK, and DENSE_RANK
These three functions assign a position number to each row within a partition. The difference comes down to how they handle ties.
SELECT
e.name,
d.dept_name,
e.salary,
ROW_NUMBER() OVER (PARTITION BY e.dept_id ORDER BY e.salary DESC) AS row_num,
RANK() OVER (PARTITION BY e.dept_id ORDER BY e.salary DESC) AS rnk,
DENSE_RANK() OVER (PARTITION BY e.dept_id ORDER BY e.salary DESC) AS dense_rnk
FROM employees e
JOIN departments d ON e.dept_id = d.dept_id;
| name | dept_name | salary | row_num | rnk | dense_rnk |
|---|---|---|---|---|---|
| Alice | Engineering | 72,000.00 | 1 | 1 | 1 |
| Carol | Engineering | 65,000.00 | 2 | 2 | 2 |
| Brian | Marketing | 58,000.00 | 1 | 1 | 1 |
| Eve | Finance | 91,000.00 | 1 | 1 | 1 |
The numbering resets for each department because of PARTITION BY dept_id. To see the difference between RANK and DENSE_RANK, you need a tie. Imagine two employees both earning 72,000:
| salary | ROW_NUMBER | RANK | DENSE_RANK |
|---|---|---|---|
| 72,000.00 | 1 | 1 | 1 |
| 72,000.00 | 2 | 1 | 1 |
| 65,000.00 | 3 | 3 | 2 |
| 58,000.00 | 4 | 4 | 3 |
-
ROW_NUMBER— always unique. Ties get different numbers. -
RANK— tied rows share a rank. The next rank skips to account for the tie (1, 1, 3). -
DENSE_RANK— tied rows share a rank. No numbers are skipped (1, 1, 2).
Getting the top earner per department:
-- Window functions can't go in WHERE directly.
-- Wrap them in a CTE first, then filter.
WITH ranked AS (
SELECT
e.name,
d.dept_name,
e.salary,
ROW_NUMBER() OVER (PARTITION BY e.dept_id ORDER BY e.salary DESC) AS rn
FROM employees e
JOIN departments d ON e.dept_id = d.dept_id
)
SELECT name, dept_name, salary
FROM ranked
WHERE rn = 1;
| name | dept_name | salary |
|---|---|---|
| Alice | Engineering | 72,000.00 |
| Brian | Marketing | 58,000.00 |
| Eve | Finance | 91,000.00 |
LAG and LEAD
LAG looks back at a previous row's value. LEAD looks ahead at a future row's value. They're the standard tool for period-over-period comparisons.
SELECT
sale_month,
revenue,
LAG(revenue, 1) OVER (ORDER BY sale_month) AS prev_month,
LEAD(revenue, 1) OVER (ORDER BY sale_month) AS next_month,
revenue - LAG(revenue, 1) OVER (ORDER BY sale_month) AS change
FROM monthly_revenue
ORDER BY sale_month;
| sale_month | revenue | prev_month | next_month | change |
|---|---|---|---|---|
| 2024-01 | 50,000 | NULL | 65,000 | NULL |
| 2024-02 | 65,000 | 50,000 | 58,000 | +15,000 |
| 2024-03 | 58,000 | 65,000 | 71,000 | -7,000 |
| 2024-04 | 71,000 | 58,000 | NULL | +13,000 |
LAG(revenue, 1) means: look back 1 row and return its revenue value. The first row returns NULL for LAG because there's no row before it — that's expected, not an error.
✅ Use LAG/LEAD when comparing values across time periods — monthly revenue, daily active users, week-over-week sales.
Running Totals with SUM() OVER
Using SUM() as a window function with ORDER BY gives you a running total — each row accumulates the sum of all previous rows.
SELECT
sale_date,
daily_sales,
SUM(daily_sales) OVER (
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM daily_sales
ORDER BY sale_date;
| sale_date | daily_sales | running_total |
|---|---|---|
| 2024-01-01 | 1,000 | 1,000 |
| 2024-01-02 | 1,500 | 2,500 |
| 2024-01-03 | 800 | 3,300 |
| 2024-01-04 | 2,200 | 5,500 |
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW tells SQL: include every row from the very beginning of the partition up to the current row.
NTILE — Dividing Rows into Buckets
NTILE(n) divides your rows into n equal groups and assigns each row a bucket number. It's the go-to function for creating tiers, quartiles, and percentile segments.
SELECT
e.name,
e.salary,
NTILE(4) OVER (ORDER BY e.salary DESC) AS quartile,
CASE NTILE(4) OVER (ORDER BY e.salary DESC)
WHEN 1 THEN 'Top 25%'
WHEN 2 THEN 'Upper Mid'
WHEN 3 THEN 'Lower Mid'
WHEN 4 THEN 'Bottom 25%'
END AS salary_band
FROM employees e
ORDER BY e.salary DESC;
| name | salary | quartile | salary_band |
|---|---|---|---|
| Eve | 91,000.00 | 1 | Top 25% |
| Alice | 72,000.00 | 2 | Upper Mid |
| Carol | 65,000.00 | 3 | Lower Mid |
| Brian | 58,000.00 | 4 | Bottom 25% |
Window Functions at a Glance
| Function | What It Does |
|---|---|
ROW_NUMBER() |
Unique sequential number per row — no ties |
RANK() |
Rank with gaps after ties |
DENSE_RANK() |
Rank without gaps after ties |
NTILE(n) |
Divide rows into n equal buckets |
LAG(col, n) |
Return value from n rows before current |
LEAD(col, n) |
Return value from n rows after current |
SUM() OVER() |
Running or partitioned total |
AVG() OVER() |
Running or partitioned average |
COUNT() OVER() |
Running or partitioned count |
MIN() OVER() |
Minimum within the window |
MAX() OVER() |
Maximum within the window |
Key Takeaways
Joins:
- Joins combine data from multiple tables using a shared column defined in the
ONclause. -
INNER JOIN— only rows that match on both sides. -
LEFT JOIN— all rows from the left,NULLwhere there's no match on the right. -
FULL OUTER JOIN— all rows from both sides, no exceptions. -
SELF JOIN— a table joined to itself, great for hierarchical data. - Always use aliases to keep join queries clean and unambiguous.
Window Functions:
-
GROUP BYcollapses rows into summaries. Window functions add calculations to each row without removing any. -
PARTITION BYcreates groups but keeps all individual rows intact. -
ORDER BYinsideOVER()is separate from your query-levelORDER BY. - You can't filter window functions in
WHERE— wrap them in a CTE first. -
ROW_NUMBERfor unique ranks,RANKfor gaps on ties,DENSE_RANKfor no gaps.
If a specific part didn't click, drop it in the comments — happy to break it down further.
Top comments (0)