🗃️ The Dataset We'll Use Throughout
Let's set up two tables and keep them consistent across every example. This way, when something changes between queries, you know exactly why.
CREATE TABLE employees (
emp_id INT PRIMARY KEY,
full_name VARCHAR(100),
dept_id INT,
salary DECIMAL(10,2),
hire_date DATE
);
INSERT INTO employees VALUES
(1, 'Alice Mensah', 1, 72000, '2020-03-15'),
(2, 'Brian Okonkwo', 2, 58000, '2021-07-01'),
(3, 'Carol Wanjiku', 1, 65000, '2019-11-20'),
(4, 'David Kimani', NULL, 49000, '2023-01-10'), -- no department assigned
(5, 'Evelyn Njoroge', 3, 91000, '2018-06-05');
CREATE TABLE departments (
dept_id INT PRIMARY KEY,
dept_name VARCHAR(100),
location VARCHAR(100)
);
INSERT INTO departments VALUES
(1, 'Engineering', 'Nairobi'),
(2, 'Marketing', 'Lagos'),
(3, 'Finance', 'Accra'),
(4, 'Human Resources', 'Nairobi'); -- no employees assigned yet
Two things to notice before we write a single join:
-
David (emp_id 4) has
dept_id = NULL— he exists but belongs to no department. - Department 4 (Human Resources) has no employees assigned to it yet.
These two edge cases are what separate the join types from each other. Every join answers the same question differently: what do you do with rows that have no match?
Part 1 — SQL Joins
What Is a Join?
In relational databases, data is organized into multiple tables to keep it structured and efficient. SQL Joins allow you to combine related information from these tables into one unified result. That is what a join does — it combines rows from two tables based on a matching column (usually a primary key and a foreign key).
The ON clause is where you define the match condition. SQL evaluates it row by row and decides what to include in the result based on which join type you used.
-- General shape of a join
SELECT columns
FROM table_a
[JOIN TYPE] table_b ON table_a.column = table_b.column;
INNER JOIN — Only the Matches
An INNER JOIN returns rows only when a match exists in both tables. Rows that have no match on either side are dropped.
SELECT
e.full_name,
e.salary,
d.dept_name,
d.location
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id;
Result:
| full_name | salary | dept_name | location |
|---|---|---|---|
| Alice Mensah | 72,000.00 | Engineering | Nairobi |
| Brian Okonkwo | 58,000.00 | Marketing | Lagos |
| Carol Wanjiku | 65,000.00 | Engineering | Nairobi |
| Evelyn Njoroge | 91,000.00 | Finance | Accra |
What happened to David? His dept_id is NULL. There is no row in departments where dept_id = NULL, so he has no match — INNER JOIN drops him entirely.
What happened to Human Resources? No employee has dept_id = 4, so it also has no match — also dropped.
✅ Use INNER JOIN when you only want records with complete, matching data on both sides. It is the strictest join — nothing without a partner gets through.
LEFT JOIN — Keep Everything on the Left
A LEFT JOIN returns all rows from the left table, plus any matching rows from the right. Where there is no match, the right-side columns are filled with NULL.
SELECT
e.full_name,
e.salary,
d.dept_name,
d.location
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.dept_id;
Result:
| full_name | salary | dept_name | location |
|---|---|---|---|
| Alice Mensah | 72,000.00 | Engineering | Nairobi |
| Brian Okonkwo | 58,000.00 | Marketing | Lagos |
| Carol Wanjiku | 65,000.00 | Engineering | Nairobi |
| David Kimani | 49,000.00 | NULL | NULL |
| Evelyn Njoroge | 91,000.00 | Finance | Accra |
David is back — with NULL where his department would be. Human Resources is still absent because it lives in the right table and LEFT JOIN does not protect it.
✅ Use LEFT JOIN when you need all records from one table regardless of whether they have a match. Classic use case: show all customers and their orders, including customers who haven't ordered yet.
Pro tip — the anti-join pattern:
A LEFT JOIN + WHERE IS NULL finds rows in the left table that have no match in the right table at all:
-- Find employees with no department assigned
SELECT e.full_name
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.dept_id
WHERE d.dept_id IS NULL;
-- Returns: David Kimani
RIGHT JOIN — Keep Everything on the Right
A RIGHT JOIN keeps all rows from the right table, and fills NULL for unmatched left-side columns.
SELECT
e.full_name,
d.dept_name,
d.location
FROM employees e
RIGHT JOIN departments d ON e.dept_id = d.dept_id;
Result:
| full_name | dept_name | location |
|---|---|---|
| Alice Mensah | Engineering | Nairobi |
| Carol Wanjiku | Engineering | Nairobi |
| Brian Okonkwo | Marketing | Lagos |
| Evelyn Njoroge | Finance | Accra |
| NULL | Human Resources | Nairobi |
Human Resources now appears with NULL for full_name. David is gone — he is in the left table and has no right-table match.
💡 Most developers rewrite
RIGHT JOINas aLEFT JOINby swapping the table order. The two queries below produce identical results:
-- As RIGHT JOIN:
SELECT e.full_name, d.dept_name
FROM employees e
RIGHT JOIN departments d ON e.dept_id = d.dept_id;
-- Equivalent as LEFT JOIN (usually preferred for readability):
SELECT e.full_name, d.dept_name
FROM departments d
LEFT JOIN employees e ON d.dept_id = e.dept_id;
FULL OUTER JOIN — Keep Everything from Both Sides
A FULL OUTER JOIN returns every row from both tables, using NULL to fill gaps where a match does not exist.
SELECT
e.full_name,
d.dept_name,
d.location
FROM employees e
FULL OUTER JOIN departments d ON e.dept_id = d.dept_id;
Result:
| full_name | dept_name | location |
|---|---|---|
| Alice Mensah | Engineering | Nairobi |
| Brian Okonkwo | Marketing | Lagos |
| Carol Wanjiku | Engineering | Nairobi |
| David Kimani | NULL | NULL |
| Evelyn Njoroge | Finance | Accra |
| NULL | Human Resources | Nairobi |
Both David and Human Resources appear — no row is lost from either table.
✅ Use FULL OUTER JOIN when reconciling two data sources, auditing for orphaned records, or merging datasets where you cannot afford to lose rows from either side.
⚠️ MySQL does not support
FULL OUTER JOINnatively. Emulate it with aUNION:
SELECT e.full_name, d.dept_name
FROM employees e LEFT JOIN departments d ON e.dept_id = d.dept_id
UNION
SELECT e.full_name, d.dept_name
FROM employees e RIGHT JOIN departments d ON e.dept_id = d.dept_id;
SELF JOIN — A Table That References Itself
A SELF JOIN joins a table to itself. It is the natural solution for hierarchical data — like an employee table where each row stores a manager_id pointing to another row in the same table.
-- Each employee reports to a manager who is also in the employees table
ALTER TABLE employees ADD COLUMN manager_id INT;
UPDATE employees SET manager_id = 3 WHERE emp_id IN (1, 2); -- Alice & Brian → Carol
UPDATE employees SET manager_id = 5 WHERE emp_id IN (3, 4); -- Carol & David → Evelyn
-- Find each employee and their manager's name
SELECT
e.full_name AS employee,
m.full_name AS reports_to
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.emp_id;
Result:
| employee | reports_to |
|---|---|
| Alice Mensah | Carol Wanjiku |
| Brian Okonkwo | Carol Wanjiku |
| Carol Wanjiku | Evelyn Njoroge |
| David Kimani | Evelyn Njoroge |
| Evelyn Njoroge | NULL |
We use aliases (e for the employee copy, m for the manager copy) to tell SQL which version of the table we mean. We use LEFT JOIN so Evelyn — who has no manager — still appears.
CROSS JOIN — Every Possible Combination
A CROSS JOIN produces the Cartesian product of two tables. There is no ON clause because you are not matching rows — you are pairing every row from Table A with every row from Table B.
SELECT
s.size_name,
c.colour_name,
s.size_name || '-' || c.colour_name AS sku
FROM sizes s -- 3 rows: S, M, L
CROSS JOIN colours c; -- 4 rows: Black, White, Red, Navy
-- Result: 3 × 4 = 12 rows
⚠️ Be careful with scale. A
CROSS JOINon two tables with 10,000 rows each produces 100,000,000 rows. Always use it intentionally.
Multi-Table Joins
You can chain as many joins as you need in a single query. SQL builds an intermediate result after each one and feeds it into the next.
-- Full order breakdown: customer → order → product → category
SELECT
c.full_name,
o.order_id,
o.order_date,
p.product_name,
cat.category_name,
oi.quantity,
oi.unit_price,
(oi.quantity * oi.unit_price) AS line_total
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
INNER JOIN order_items oi ON o.order_id = oi.order_id
INNER JOIN products p ON oi.product_id = p.product_id
INNER JOIN categories cat ON p.category_id = cat.category_id
WHERE o.order_date >= '2024-01-01'
ORDER BY line_total DESC;
Join Type Reference
| Join Type | Returns | Drops Unmatched Rows? | Best Use Case |
|---|---|---|---|
INNER JOIN |
Matching rows from both tables | Both sides dropped | Orders with a valid customer |
LEFT JOIN |
All left + matching right | Right side only | All customers, orders or not |
RIGHT JOIN |
All right + matching left | Left side only | All departments, staffed or not |
FULL OUTER JOIN |
Everything from both tables | Neither side dropped | Reconciling two data sources |
SELF JOIN |
Same table joined to itself | Depends on join type | Employee–manager hierarchy |
CROSS JOIN |
Every row × every row | N/A | Generate all product variants |
Common Join Mistakes
1. Missing ON clause → Cartesian product
-- ❌ This silently returns employees × departments (20 rows of nonsense)
SELECT * FROM employees, departments;
-- ✅ Always specify the match condition
SELECT * FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id;
2. Ambiguous column names
-- ❌ SQL can't tell which table's dept_id you want
SELECT dept_id FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id;
-- ✅ Always qualify with a table alias
SELECT e.dept_id FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id;
3. Using INNER JOIN when you should use LEFT JOIN
If a stakeholder asks "why is David missing from this report?" — that is almost always a sign you used INNER JOIN where you needed LEFT JOIN. Any time unmatched rows should still appear in output, reach for LEFT JOIN.
Part 2 — Window Functions
The Problem They Exist to Solve
Here is a question that looks simple but trips up most beginners:
"Show me each employee's name, salary, and the average salary for their department — all on the same row."
Your first instinct might be GROUP BY:
-- ❌ GROUP BY collapses all employees into one row per department
SELECT dept_id, AVG(salary) AS dept_avg
FROM employees
GROUP BY dept_id;
-- Returns 3 rows (one per dept) — individual employees are gone
Window functions solve this. They perform calculations across a group of rows without collapsing those rows. Every employee row stays, and the calculation is added as an extra column.
-- ✅ Window function: every row stays, department average added
SELECT
full_name,
dept_id,
salary,
AVG(salary) OVER (PARTITION BY dept_id) AS dept_avg_salary
FROM employees;
| full_name | dept_id | salary | dept_avg_salary |
|---|---|---|---|
| Alice Mensah | 1 | 72,000.00 | 68,500.00 |
| Carol Wanjiku | 1 | 65,000.00 | 68,500.00 |
| Brian Okonkwo | 2 | 58,000.00 | 58,000.00 |
| Evelyn Njoroge | 3 | 91,000.00 | 91,000.00 |
| David Kimani | NULL | 49,000.00 | 49,000.00 |
Alice and Carol share a department average of 68,500 — the average of their two salaries. Every individual row is preserved.
Anatomy of the OVER() Clause
Every window function is paired with an OVER() clause that has three optional components:
function_name(column) OVER (
PARTITION BY grouping_column -- divide rows into independent sub-groups
ORDER BY sort_column -- define row order within each group
ROWS BETWEEN frame_start -- define how far back/forward to look
AND frame_end
)
| Component | What It Does | Required? |
|---|---|---|
PARTITION BY |
Divides rows into groups; function restarts per group | No |
ORDER BY |
Sorts rows within each partition | Only for rankings & running totals |
| Frame clause | Controls which rows are included in the calculation | No (has default) |
ROW_NUMBER, RANK, and DENSE_RANK
These three functions assign a position number to each row within a partition. They differ only in how they handle ties.
SELECT
full_name,
dept_id,
salary,
ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS row_num,
RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rnk,
DENSE_RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS dense_rnk
FROM employees;
To see the difference clearly, picture four employees with salaries: 80,000 / 72,000 / 72,000 / 58,000:
ROW_NUMBER: 1, 2, 3, 4 ← always unique, ties get different numbers
RANK: 1, 2, 2, 4 ← tied rows share a rank; next rank is skipped (no 3)
DENSE_RANK: 1, 2, 2, 3 ← tied rows share a rank; no numbers are skipped
| Function | Tied Rows Get | Next Rank After Tie | Best Used For |
|---|---|---|---|
ROW_NUMBER() |
Different numbers (arbitrary) | Continues normally | Deduplication, pagination |
RANK() |
Same rank | Skips to account for ties | Sports leaderboards |
DENSE_RANK() |
Same rank | Continues without skipping | Clean rankings, interview questions |
The Top-N per group pattern — by far the most common window function use case:
-- Top 2 earners per department
-- Window functions can't go in WHERE directly, so we wrap in a CTE
WITH ranked_employees AS (
SELECT
full_name,
dept_id,
salary,
ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rn
FROM employees
)
SELECT full_name, dept_id, salary
FROM ranked_employees
WHERE rn <= 2;
⚠️ Why can't window functions go in WHERE? SQL's execution order is:
FROM → JOIN → WHERE → GROUP BY → HAVING → SELECT → ORDER BY. Window functions run duringSELECT— afterWHEREhas already filtered rows. To filter on a window function result, always wrap it in a CTE or subquery first.
LAG and LEAD — Looking Backwards and Forwards
LAG(column, n) returns the value from n rows before the current row.
LEAD(column, n) returns the value from n rows after the current row.
Both accept an optional third argument as a default when no prior/next row exists.
Before window functions, period-over-period comparisons required a self-join:
-- ❌ Old way: self-join to get last month's revenue
SELECT a.sale_month, a.revenue, b.revenue AS prev_month_revenue
FROM monthly_revenue a
LEFT JOIN monthly_revenue b
ON b.sale_month = DATEADD(month, -1, a.sale_month);
-- ✅ New way: LAG does it in one line
SELECT sale_month, revenue,
LAG(revenue, 1, 0) OVER (ORDER BY sale_month) AS prev_month_revenue
FROM monthly_revenue;
Full month-over-month analysis:
SELECT
sale_month,
revenue,
LAG(revenue, 1) OVER (ORDER BY sale_month) AS prev_month,
LEAD(revenue, 1) OVER (ORDER BY sale_month) AS next_month,
revenue - LAG(revenue, 1) OVER (ORDER BY sale_month) AS mom_change,
ROUND(
100.0 * (revenue - LAG(revenue, 1) OVER (ORDER BY sale_month))
/ NULLIF(LAG(revenue, 1) OVER (ORDER BY sale_month), 0),
1
) AS mom_pct_change
FROM monthly_revenue
ORDER BY sale_month;
| sale_month | revenue | prev_month | next_month | mom_change | mom_pct_change |
|---|---|---|---|---|---|
| 2024-01 | 50,000 | NULL | 65,000 | NULL | NULL |
| 2024-02 | 65,000 | 50,000 | 58,000 | +15,000 | +30.0% |
| 2024-03 | 58,000 | 65,000 | 71,000 | -7,000 | -10.8% |
| 2024-04 | 71,000 | 58,000 | NULL | +13,000 | +22.4% |
The
NULLIF(..., 0)prevents division-by-zero whenprev_monthis 0. The first row returnsNULLforLAGbecause there is no prior row — that is expected, not a bug.
Running Totals and Moving Averages — The Frame Clause
When you use ORDER BY inside a SUM() or AVG() window function, SQL needs to know which rows to include for each calculation. That is what the frame clause controls.
SELECT
sale_date,
daily_sales,
-- Running total: from the very first row up to this row
SUM(daily_sales) OVER (
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total,
-- 7-day moving average: this row + the 6 rows before it
ROUND(AVG(daily_sales) OVER (
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
), 2) AS moving_avg_7day
FROM daily_sales
ORDER BY sale_date;
Frame clause keywords explained:
| Keyword | Meaning |
|---|---|
UNBOUNDED PRECEDING |
The very first row in the partition |
n PRECEDING |
n rows before the current row |
CURRENT ROW |
The current row itself |
n FOLLOWING |
n rows after the current row |
UNBOUNDED FOLLOWING |
The very last row in the partition |
⚠️ Default frame behaviour: When you write
ORDER BYinsideOVER()without a frame clause, SQL defaults toROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW(a running total). This is fine forSUM, but it causes a silent bug withLAST_VALUE— see below.
FIRST_VALUE and LAST_VALUE
These functions return the first or last value in a window frame — useful for comparing each row against the boundary of its group.
SELECT
full_name,
dept_id,
salary,
FIRST_VALUE(full_name) OVER (
PARTITION BY dept_id
ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS top_earner_in_dept,
LAST_VALUE(full_name) OVER (
PARTITION BY dept_id
ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS lowest_earner_in_dept
FROM employees;
⚠️ Always specify
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWINGwhen usingLAST_VALUE. Without it, the default frame stops at the current row — meaningLAST_VALUEalways returns the current row's own value. This is one of the most common silent bugs in window function code.
NTILE, PERCENT_RANK, and CUME_DIST
These three functions answer the question: where does this row sit relative to everyone else?
SELECT
full_name,
salary,
NTILE(4) OVER (ORDER BY salary DESC) AS quartile,
ROUND(PERCENT_RANK() OVER (ORDER BY salary), 2) AS pct_rank,
ROUND(CUME_DIST() OVER (ORDER BY salary), 2) AS cum_dist
FROM employees
ORDER BY salary DESC;
| full_name | salary | quartile | pct_rank | cum_dist |
|---|---|---|---|---|
| Evelyn Njoroge | 91,000.00 | 1 | 1.00 | 1.00 |
| Alice Mensah | 72,000.00 | 1 | 0.75 | 0.80 |
| Carol Wanjiku | 65,000.00 | 2 | 0.50 | 0.60 |
| Brian Okonkwo | 58,000.00 | 3 | 0.25 | 0.40 |
| David Kimani | 49,000.00 | 4 | 0.00 | 0.20 |
-
NTILE(4)— divides rows into 4 equal buckets. Use for tiers and segments. -
PERCENT_RANK—(rank - 1) / (total rows - 1). First row is always 0, last is always 1. -
CUME_DIST— proportion of rows with a value ≤ the current row. Useful for "what percentage of employees earn at or below this salary?"
Named Windows — Reuse OVER() Clauses
If you use the same OVER() definition multiple times in one query, SQL lets you name it with WINDOW:
-- ❌ Repetitive — same OVER() clause written three times
SELECT
full_name,
AVG(salary) OVER (PARTITION BY dept_id ORDER BY salary DESC) AS dept_avg,
RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS dept_rank,
SUM(salary) OVER (PARTITION BY dept_id ORDER BY salary DESC) AS dept_total
FROM employees;
-- ✅ Named window — define once, reuse everywhere
SELECT
full_name,
AVG(salary) OVER dept_window AS dept_avg,
RANK() OVER dept_window AS dept_rank,
SUM(salary) OVER dept_window AS dept_total
FROM employees
WINDOW dept_window AS (PARTITION BY dept_id ORDER BY salary DESC);
Cleaner to read, and modern query engines can optimise it better.
Combining Window Functions with CTEs
The real power of window functions comes when you layer them inside CTEs to build multi-step analyses.
-- Find employees earning above their department average,
-- and rank them within their department
WITH dept_stats AS (
SELECT
emp_id,
full_name,
dept_id,
salary,
AVG(salary) OVER (PARTITION BY dept_id) AS dept_avg,
RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS dept_rank
FROM employees
)
SELECT
full_name,
dept_id,
salary,
ROUND(dept_avg, 2) AS dept_avg,
ROUND(salary - dept_avg, 2) AS above_avg_by,
dept_rank
FROM dept_stats
WHERE salary > dept_avg
ORDER BY above_avg_by DESC;
Window Functions Reference
| Function | Category | What It Does | Key Detail |
|---|---|---|---|
ROW_NUMBER() |
Ranking | Unique sequential number | No ties — always distinct |
RANK() |
Ranking | Rank with gaps after ties | 1, 2, 2, 4 |
DENSE_RANK() |
Ranking | Rank without gaps after ties | 1, 2, 2, 3 |
NTILE(n) |
Ranking | Divide into n equal buckets | Quartiles, deciles |
PERCENT_RANK() |
Ranking | Relative rank as 0–1 value | First row = 0, last = 1 |
CUME_DIST() |
Ranking | Cumulative distribution | % of rows ≤ current value |
LAG(col, n) |
Navigation | Value n rows before current | Replaces self-join for time series |
LEAD(col, n) |
Navigation | Value n rows after current | Preview future period |
FIRST_VALUE(col) |
Navigation | First value in window frame | Great for comparing to group leader |
LAST_VALUE(col) |
Navigation | Last value in window frame | Always specify full frame! |
SUM(col) |
Aggregate | Running or partitioned total | With frame → running total |
AVG(col) |
Aggregate | Running or partitioned average | With n PRECEDING → moving avg |
COUNT(col) |
Aggregate | Running or partitioned count | Cumulative transaction count |
MIN(col) |
Aggregate | Minimum within window | Running minimum |
MAX(col) |
Aggregate | Maximum within window | Running maximum |
Performance Tips
For Joins:
-
Index your join columns. Unindexed
ONclause columns force full table scans. Always index foreign keys. - Filter before joining. Use a CTE or subquery to reduce rows before the join, not after.
-
Select only what you need.
SELECT *across multiple joined tables moves far more data than necessary. -
Use
EXPLAIN/EXPLAIN ANALYZE. This shows the execution plan and tells you whether SQL is using your indexes.
For Window Functions:
- Window functions execute after
WHERE— filter first to reduce the data the window function sees. - Use named
WINDOWclauses to avoid repeating identicalOVER()definitions. - On very large tables, partitioning your source data before applying window functions (via a CTE) can significantly reduce compute time.
⚡ Complete Cheat Sheet
-- ─── JOINS ───────────────────────────────────────────────────────────────────
-- INNER: only rows matching in both tables
SELECT e.full_name, d.dept_name
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id;
-- LEFT: all employees, department info where it exists
SELECT e.full_name, d.dept_name
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.dept_id;
-- Anti-join: employees with NO department
SELECT e.full_name FROM employees e
LEFT JOIN departments d ON e.dept_id = d.dept_id
WHERE d.dept_id IS NULL;
-- FULL OUTER: everything from both tables
SELECT e.full_name, d.dept_name
FROM employees e
FULL OUTER JOIN departments d ON e.dept_id = d.dept_id;
-- SELF: employee + their manager
SELECT e.full_name AS employee, m.full_name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.emp_id;
-- CROSS: all size × colour combos
SELECT s.size_name, c.colour_name
FROM sizes s CROSS JOIN colours c;
-- ─── WINDOW FUNCTIONS ────────────────────────────────────────────────────────
-- Dept average without collapsing rows
SELECT full_name, salary,
AVG(salary) OVER (PARTITION BY dept_id) AS dept_avg
FROM employees;
-- Top 2 earners per department
WITH ranked AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rn
FROM employees
)
SELECT * FROM ranked WHERE rn <= 2;
-- RANK vs DENSE_RANK
SELECT full_name, salary,
RANK() OVER (ORDER BY salary DESC) AS rnk,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rnk
FROM employees;
-- Month-over-month change
SELECT sale_month, revenue,
LAG(revenue, 1) OVER (ORDER BY sale_month) AS prev_month,
revenue - LAG(revenue, 1) OVER (ORDER BY sale_month) AS change
FROM monthly_revenue;
-- Running total
SELECT sale_date, daily_sales,
SUM(daily_sales) OVER (ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM daily_sales;
-- 7-day moving average
SELECT sale_date, daily_sales,
ROUND(AVG(daily_sales) OVER (ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW), 2) AS moving_avg_7d
FROM daily_sales;
-- NTILE: quartile segmentation
SELECT customer_name, lifetime_spend,
NTILE(4) OVER (ORDER BY lifetime_spend DESC) AS quartile
FROM customers;
-- Named window (reuse OVER clause)
SELECT full_name, salary,
AVG(salary) OVER w AS dept_avg,
RANK() OVER w AS dept_rank
FROM employees
WINDOW w AS (PARTITION BY dept_id ORDER BY salary DESC);
Key Takeaways
On Joins:
- Every join type answers the same question differently: what happens to rows that have no match?
-
INNER JOINdrops unmatched rows from both sides. -
LEFT JOINprotects the left table — it is the most used join in day-to-day analytics. -
FULL OUTER JOINprotects both sides — use it when you cannot afford to lose rows from either table. - Always index join columns. Always alias your tables. Always qualify ambiguous column names.
On Window Functions:
-
GROUP BYcollapses rows. Window functions add information to rows without removing them — that distinction is everything. -
PARTITION BYdivides into groups but keeps all rows. It is notGROUP BY. -
ORDER BYinsideOVER()is independent of the query-levelORDER BY. - Window functions run during
SELECT— afterWHERE— so you must wrap them in a CTE or subquery to filter on their results. - Always specify a full frame clause when using
LAST_VALUE.
Anything unclear or want a deeper dive into a specific function? Drop it in the comments.
Top comments (0)