DEV Community

Mariam Turnesh
Mariam Turnesh

Posted on

sql join and windows function

🗃️ The Dataset We'll Use Throughout

Let's set up two tables and keep them consistent across every example. This way, when something changes between queries, you know exactly why.

CREATE TABLE employees (
    emp_id       INT PRIMARY KEY,
    full_name    VARCHAR(100),
    dept_id      INT,
    salary       DECIMAL(10,2),
    hire_date    DATE
);

INSERT INTO employees VALUES
(1, 'Alice Mensah',    1, 72000, '2020-03-15'),
(2, 'Brian Okonkwo',   2, 58000, '2021-07-01'),
(3, 'Carol Wanjiku',   1, 65000, '2019-11-20'),
(4, 'David Kimani',    NULL, 49000, '2023-01-10'),  -- no department assigned
(5, 'Evelyn Njoroge',  3, 91000, '2018-06-05');

CREATE TABLE departments (
    dept_id    INT PRIMARY KEY,
    dept_name  VARCHAR(100),
    location   VARCHAR(100)
);

INSERT INTO departments VALUES
(1, 'Engineering',    'Nairobi'),
(2, 'Marketing',      'Lagos'),
(3, 'Finance',        'Accra'),
(4, 'Human Resources', 'Nairobi');  -- no employees assigned yet
Enter fullscreen mode Exit fullscreen mode

Two things to notice before we write a single join:

  • David (emp_id 4) has dept_id = NULL — he exists but belongs to no department.
  • Department 4 (Human Resources) has no employees assigned to it yet.

These two edge cases are what separate the join types from each other. Every join answers the same question differently: what do you do with rows that have no match?


Part 1 — SQL Joins

What Is a Join?

In relational databases, data is organized into multiple tables to keep it structured and efficient. SQL Joins allow you to combine related information from these tables into one unified result. That is what a join does — it combines rows from two tables based on a matching column (usually a primary key and a foreign key).

The ON clause is where you define the match condition. SQL evaluates it row by row and decides what to include in the result based on which join type you used.

-- General shape of a join
SELECT   columns
FROM     table_a
[JOIN TYPE] table_b ON table_a.column = table_b.column;
Enter fullscreen mode Exit fullscreen mode

INNER JOIN — Only the Matches

An INNER JOIN returns rows only when a match exists in both tables. Rows that have no match on either side are dropped.

SELECT
    e.full_name,
    e.salary,
    d.dept_name,
    d.location
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id;
Enter fullscreen mode Exit fullscreen mode

Result:

full_name salary dept_name location
Alice Mensah 72,000.00 Engineering Nairobi
Brian Okonkwo 58,000.00 Marketing Lagos
Carol Wanjiku 65,000.00 Engineering Nairobi
Evelyn Njoroge 91,000.00 Finance Accra

What happened to David? His dept_id is NULL. There is no row in departments where dept_id = NULL, so he has no match — INNER JOIN drops him entirely.

What happened to Human Resources? No employee has dept_id = 4, so it also has no match — also dropped.

Use INNER JOIN when you only want records with complete, matching data on both sides. It is the strictest join — nothing without a partner gets through.


LEFT JOIN — Keep Everything on the Left

A LEFT JOIN returns all rows from the left table, plus any matching rows from the right. Where there is no match, the right-side columns are filled with NULL.

SELECT
    e.full_name,
    e.salary,
    d.dept_name,
    d.location
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.dept_id;
Enter fullscreen mode Exit fullscreen mode

Result:

full_name salary dept_name location
Alice Mensah 72,000.00 Engineering Nairobi
Brian Okonkwo 58,000.00 Marketing Lagos
Carol Wanjiku 65,000.00 Engineering Nairobi
David Kimani 49,000.00 NULL NULL
Evelyn Njoroge 91,000.00 Finance Accra

David is back — with NULL where his department would be. Human Resources is still absent because it lives in the right table and LEFT JOIN does not protect it.

Use LEFT JOIN when you need all records from one table regardless of whether they have a match. Classic use case: show all customers and their orders, including customers who haven't ordered yet.

Pro tip — the anti-join pattern:

A LEFT JOIN + WHERE IS NULL finds rows in the left table that have no match in the right table at all:

-- Find employees with no department assigned
SELECT e.full_name
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.dept_id
WHERE d.dept_id IS NULL;

-- Returns: David Kimani
Enter fullscreen mode Exit fullscreen mode

RIGHT JOIN — Keep Everything on the Right

A RIGHT JOIN keeps all rows from the right table, and fills NULL for unmatched left-side columns.

SELECT
    e.full_name,
    d.dept_name,
    d.location
FROM employees e
RIGHT JOIN departments d ON e.dept_id = d.dept_id;
Enter fullscreen mode Exit fullscreen mode

Result:

full_name dept_name location
Alice Mensah Engineering Nairobi
Carol Wanjiku Engineering Nairobi
Brian Okonkwo Marketing Lagos
Evelyn Njoroge Finance Accra
NULL Human Resources Nairobi

Human Resources now appears with NULL for full_name. David is gone — he is in the left table and has no right-table match.

💡 Most developers rewrite RIGHT JOIN as a LEFT JOIN by swapping the table order. The two queries below produce identical results:

-- As RIGHT JOIN:
SELECT e.full_name, d.dept_name
FROM employees e
RIGHT JOIN departments d ON e.dept_id = d.dept_id;

-- Equivalent as LEFT JOIN (usually preferred for readability):
SELECT e.full_name, d.dept_name
FROM departments d
LEFT JOIN employees e ON d.dept_id = e.dept_id;
Enter fullscreen mode Exit fullscreen mode

FULL OUTER JOIN — Keep Everything from Both Sides

A FULL OUTER JOIN returns every row from both tables, using NULL to fill gaps where a match does not exist.

SELECT
    e.full_name,
    d.dept_name,
    d.location
FROM employees e
FULL OUTER JOIN departments d ON e.dept_id = d.dept_id;
Enter fullscreen mode Exit fullscreen mode

Result:

full_name dept_name location
Alice Mensah Engineering Nairobi
Brian Okonkwo Marketing Lagos
Carol Wanjiku Engineering Nairobi
David Kimani NULL NULL
Evelyn Njoroge Finance Accra
NULL Human Resources Nairobi

Both David and Human Resources appear — no row is lost from either table.

Use FULL OUTER JOIN when reconciling two data sources, auditing for orphaned records, or merging datasets where you cannot afford to lose rows from either side.

⚠️ MySQL does not support FULL OUTER JOIN natively. Emulate it with a UNION:

SELECT e.full_name, d.dept_name
FROM employees e LEFT JOIN departments d ON e.dept_id = d.dept_id
UNION
SELECT e.full_name, d.dept_name
FROM employees e RIGHT JOIN departments d ON e.dept_id = d.dept_id;
Enter fullscreen mode Exit fullscreen mode

SELF JOIN — A Table That References Itself

A SELF JOIN joins a table to itself. It is the natural solution for hierarchical data — like an employee table where each row stores a manager_id pointing to another row in the same table.

-- Each employee reports to a manager who is also in the employees table
ALTER TABLE employees ADD COLUMN manager_id INT;
UPDATE employees SET manager_id = 3 WHERE emp_id IN (1, 2); -- Alice & Brian → Carol
UPDATE employees SET manager_id = 5 WHERE emp_id IN (3, 4); -- Carol & David → Evelyn

-- Find each employee and their manager's name
SELECT
    e.full_name  AS employee,
    m.full_name  AS reports_to
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.emp_id;
Enter fullscreen mode Exit fullscreen mode

Result:

employee reports_to
Alice Mensah Carol Wanjiku
Brian Okonkwo Carol Wanjiku
Carol Wanjiku Evelyn Njoroge
David Kimani Evelyn Njoroge
Evelyn Njoroge NULL

We use aliases (e for the employee copy, m for the manager copy) to tell SQL which version of the table we mean. We use LEFT JOIN so Evelyn — who has no manager — still appears.


CROSS JOIN — Every Possible Combination

A CROSS JOIN produces the Cartesian product of two tables. There is no ON clause because you are not matching rows — you are pairing every row from Table A with every row from Table B.

SELECT
    s.size_name,
    c.colour_name,
    s.size_name || '-' || c.colour_name AS sku
FROM sizes s         -- 3 rows: S, M, L
CROSS JOIN colours c; -- 4 rows: Black, White, Red, Navy
-- Result: 3 × 4 = 12 rows
Enter fullscreen mode Exit fullscreen mode

⚠️ Be careful with scale. A CROSS JOIN on two tables with 10,000 rows each produces 100,000,000 rows. Always use it intentionally.


Multi-Table Joins

You can chain as many joins as you need in a single query. SQL builds an intermediate result after each one and feeds it into the next.

-- Full order breakdown: customer → order → product → category
SELECT
    c.full_name,
    o.order_id,
    o.order_date,
    p.product_name,
    cat.category_name,
    oi.quantity,
    oi.unit_price,
    (oi.quantity * oi.unit_price) AS line_total
FROM customers c
INNER JOIN orders      o   ON c.customer_id = o.customer_id
INNER JOIN order_items oi  ON o.order_id    = oi.order_id
INNER JOIN products    p   ON oi.product_id = p.product_id
INNER JOIN categories  cat ON p.category_id = cat.category_id
WHERE o.order_date >= '2024-01-01'
ORDER BY line_total DESC;
Enter fullscreen mode Exit fullscreen mode

Join Type Reference

Join Type Returns Drops Unmatched Rows? Best Use Case
INNER JOIN Matching rows from both tables Both sides dropped Orders with a valid customer
LEFT JOIN All left + matching right Right side only All customers, orders or not
RIGHT JOIN All right + matching left Left side only All departments, staffed or not
FULL OUTER JOIN Everything from both tables Neither side dropped Reconciling two data sources
SELF JOIN Same table joined to itself Depends on join type Employee–manager hierarchy
CROSS JOIN Every row × every row N/A Generate all product variants

Common Join Mistakes

1. Missing ON clause → Cartesian product

-- ❌ This silently returns employees × departments (20 rows of nonsense)
SELECT * FROM employees, departments;

-- ✅ Always specify the match condition
SELECT * FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id;
Enter fullscreen mode Exit fullscreen mode

2. Ambiguous column names

-- ❌ SQL can't tell which table's dept_id you want
SELECT dept_id FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id;

-- ✅ Always qualify with a table alias
SELECT e.dept_id FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id;
Enter fullscreen mode Exit fullscreen mode

3. Using INNER JOIN when you should use LEFT JOIN

If a stakeholder asks "why is David missing from this report?" — that is almost always a sign you used INNER JOIN where you needed LEFT JOIN. Any time unmatched rows should still appear in output, reach for LEFT JOIN.


Part 2 — Window Functions

The Problem They Exist to Solve

Here is a question that looks simple but trips up most beginners:

"Show me each employee's name, salary, and the average salary for their department — all on the same row."

Your first instinct might be GROUP BY:

-- ❌ GROUP BY collapses all employees into one row per department
SELECT dept_id, AVG(salary) AS dept_avg
FROM employees
GROUP BY dept_id;

-- Returns 3 rows (one per dept) — individual employees are gone
Enter fullscreen mode Exit fullscreen mode

Window functions solve this. They perform calculations across a group of rows without collapsing those rows. Every employee row stays, and the calculation is added as an extra column.

-- ✅ Window function: every row stays, department average added
SELECT
    full_name,
    dept_id,
    salary,
    AVG(salary) OVER (PARTITION BY dept_id) AS dept_avg_salary
FROM employees;
Enter fullscreen mode Exit fullscreen mode
full_name dept_id salary dept_avg_salary
Alice Mensah 1 72,000.00 68,500.00
Carol Wanjiku 1 65,000.00 68,500.00
Brian Okonkwo 2 58,000.00 58,000.00
Evelyn Njoroge 3 91,000.00 91,000.00
David Kimani NULL 49,000.00 49,000.00

Alice and Carol share a department average of 68,500 — the average of their two salaries. Every individual row is preserved.


Anatomy of the OVER() Clause

Every window function is paired with an OVER() clause that has three optional components:

function_name(column) OVER (
    PARTITION BY  grouping_column    -- divide rows into independent sub-groups
    ORDER BY      sort_column        -- define row order within each group
    ROWS BETWEEN  frame_start        -- define how far back/forward to look
                  AND frame_end
)
Enter fullscreen mode Exit fullscreen mode
Component What It Does Required?
PARTITION BY Divides rows into groups; function restarts per group No
ORDER BY Sorts rows within each partition Only for rankings & running totals
Frame clause Controls which rows are included in the calculation No (has default)

ROW_NUMBER, RANK, and DENSE_RANK

These three functions assign a position number to each row within a partition. They differ only in how they handle ties.

SELECT
    full_name,
    dept_id,
    salary,
    ROW_NUMBER()  OVER (PARTITION BY dept_id ORDER BY salary DESC) AS row_num,
    RANK()        OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rnk,
    DENSE_RANK()  OVER (PARTITION BY dept_id ORDER BY salary DESC) AS dense_rnk
FROM employees;
Enter fullscreen mode Exit fullscreen mode

To see the difference clearly, picture four employees with salaries: 80,000 / 72,000 / 72,000 / 58,000:

ROW_NUMBER:  1, 2, 3, 4   ← always unique, ties get different numbers
RANK:        1, 2, 2, 4   ← tied rows share a rank; next rank is skipped (no 3)
DENSE_RANK:  1, 2, 2, 3   ← tied rows share a rank; no numbers are skipped
Enter fullscreen mode Exit fullscreen mode
Function Tied Rows Get Next Rank After Tie Best Used For
ROW_NUMBER() Different numbers (arbitrary) Continues normally Deduplication, pagination
RANK() Same rank Skips to account for ties Sports leaderboards
DENSE_RANK() Same rank Continues without skipping Clean rankings, interview questions

The Top-N per group pattern — by far the most common window function use case:

-- Top 2 earners per department
-- Window functions can't go in WHERE directly, so we wrap in a CTE
WITH ranked_employees AS (
    SELECT
        full_name,
        dept_id,
        salary,
        ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rn
    FROM employees
)
SELECT full_name, dept_id, salary
FROM ranked_employees
WHERE rn <= 2;
Enter fullscreen mode Exit fullscreen mode

⚠️ Why can't window functions go in WHERE? SQL's execution order is: FROM → JOIN → WHERE → GROUP BY → HAVING → SELECT → ORDER BY. Window functions run during SELECT — after WHERE has already filtered rows. To filter on a window function result, always wrap it in a CTE or subquery first.


LAG and LEAD — Looking Backwards and Forwards

LAG(column, n) returns the value from n rows before the current row.
LEAD(column, n) returns the value from n rows after the current row.
Both accept an optional third argument as a default when no prior/next row exists.

Before window functions, period-over-period comparisons required a self-join:

-- ❌ Old way: self-join to get last month's revenue
SELECT a.sale_month, a.revenue, b.revenue AS prev_month_revenue
FROM monthly_revenue a
LEFT JOIN monthly_revenue b
    ON b.sale_month = DATEADD(month, -1, a.sale_month);

-- ✅ New way: LAG does it in one line
SELECT sale_month, revenue,
    LAG(revenue, 1, 0) OVER (ORDER BY sale_month) AS prev_month_revenue
FROM monthly_revenue;
Enter fullscreen mode Exit fullscreen mode

Full month-over-month analysis:

SELECT
    sale_month,
    revenue,
    LAG(revenue, 1)  OVER (ORDER BY sale_month) AS prev_month,
    LEAD(revenue, 1) OVER (ORDER BY sale_month) AS next_month,
    revenue - LAG(revenue, 1) OVER (ORDER BY sale_month) AS mom_change,
    ROUND(
        100.0 * (revenue - LAG(revenue, 1) OVER (ORDER BY sale_month))
        / NULLIF(LAG(revenue, 1) OVER (ORDER BY sale_month), 0),
        1
    ) AS mom_pct_change
FROM monthly_revenue
ORDER BY sale_month;
Enter fullscreen mode Exit fullscreen mode
sale_month revenue prev_month next_month mom_change mom_pct_change
2024-01 50,000 NULL 65,000 NULL NULL
2024-02 65,000 50,000 58,000 +15,000 +30.0%
2024-03 58,000 65,000 71,000 -7,000 -10.8%
2024-04 71,000 58,000 NULL +13,000 +22.4%

The NULLIF(..., 0) prevents division-by-zero when prev_month is 0. The first row returns NULL for LAG because there is no prior row — that is expected, not a bug.


Running Totals and Moving Averages — The Frame Clause

When you use ORDER BY inside a SUM() or AVG() window function, SQL needs to know which rows to include for each calculation. That is what the frame clause controls.

SELECT
    sale_date,
    daily_sales,

    -- Running total: from the very first row up to this row
    SUM(daily_sales) OVER (
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_total,

    -- 7-day moving average: this row + the 6 rows before it
    ROUND(AVG(daily_sales) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ), 2) AS moving_avg_7day

FROM daily_sales
ORDER BY sale_date;
Enter fullscreen mode Exit fullscreen mode

Frame clause keywords explained:

Keyword Meaning
UNBOUNDED PRECEDING The very first row in the partition
n PRECEDING n rows before the current row
CURRENT ROW The current row itself
n FOLLOWING n rows after the current row
UNBOUNDED FOLLOWING The very last row in the partition

⚠️ Default frame behaviour: When you write ORDER BY inside OVER() without a frame clause, SQL defaults to ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW (a running total). This is fine for SUM, but it causes a silent bug with LAST_VALUE — see below.


FIRST_VALUE and LAST_VALUE

These functions return the first or last value in a window frame — useful for comparing each row against the boundary of its group.

SELECT
    full_name,
    dept_id,
    salary,
    FIRST_VALUE(full_name) OVER (
        PARTITION BY dept_id
        ORDER BY salary DESC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS top_earner_in_dept,

    LAST_VALUE(full_name) OVER (
        PARTITION BY dept_id
        ORDER BY salary DESC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS lowest_earner_in_dept

FROM employees;
Enter fullscreen mode Exit fullscreen mode

⚠️ Always specify ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING when using LAST_VALUE. Without it, the default frame stops at the current row — meaning LAST_VALUE always returns the current row's own value. This is one of the most common silent bugs in window function code.


NTILE, PERCENT_RANK, and CUME_DIST

These three functions answer the question: where does this row sit relative to everyone else?

SELECT
    full_name,
    salary,
    NTILE(4)        OVER (ORDER BY salary DESC) AS quartile,
    ROUND(PERCENT_RANK() OVER (ORDER BY salary), 2) AS pct_rank,
    ROUND(CUME_DIST()    OVER (ORDER BY salary), 2) AS cum_dist
FROM employees
ORDER BY salary DESC;
Enter fullscreen mode Exit fullscreen mode
full_name salary quartile pct_rank cum_dist
Evelyn Njoroge 91,000.00 1 1.00 1.00
Alice Mensah 72,000.00 1 0.75 0.80
Carol Wanjiku 65,000.00 2 0.50 0.60
Brian Okonkwo 58,000.00 3 0.25 0.40
David Kimani 49,000.00 4 0.00 0.20
  • NTILE(4) — divides rows into 4 equal buckets. Use for tiers and segments.
  • PERCENT_RANK(rank - 1) / (total rows - 1). First row is always 0, last is always 1.
  • CUME_DIST — proportion of rows with a value the current row. Useful for "what percentage of employees earn at or below this salary?"

Named Windows — Reuse OVER() Clauses

If you use the same OVER() definition multiple times in one query, SQL lets you name it with WINDOW:

-- ❌ Repetitive — same OVER() clause written three times
SELECT
    full_name,
    AVG(salary)  OVER (PARTITION BY dept_id ORDER BY salary DESC) AS dept_avg,
    RANK()       OVER (PARTITION BY dept_id ORDER BY salary DESC) AS dept_rank,
    SUM(salary)  OVER (PARTITION BY dept_id ORDER BY salary DESC) AS dept_total
FROM employees;

-- ✅ Named window — define once, reuse everywhere
SELECT
    full_name,
    AVG(salary)  OVER dept_window AS dept_avg,
    RANK()       OVER dept_window AS dept_rank,
    SUM(salary)  OVER dept_window AS dept_total
FROM employees
WINDOW dept_window AS (PARTITION BY dept_id ORDER BY salary DESC);
Enter fullscreen mode Exit fullscreen mode

Cleaner to read, and modern query engines can optimise it better.


Combining Window Functions with CTEs

The real power of window functions comes when you layer them inside CTEs to build multi-step analyses.

-- Find employees earning above their department average,
-- and rank them within their department
WITH dept_stats AS (
    SELECT
        emp_id,
        full_name,
        dept_id,
        salary,
        AVG(salary) OVER (PARTITION BY dept_id)                  AS dept_avg,
        RANK()      OVER (PARTITION BY dept_id ORDER BY salary DESC) AS dept_rank
    FROM employees
)
SELECT
    full_name,
    dept_id,
    salary,
    ROUND(dept_avg, 2)           AS dept_avg,
    ROUND(salary - dept_avg, 2)  AS above_avg_by,
    dept_rank
FROM dept_stats
WHERE salary > dept_avg
ORDER BY above_avg_by DESC;
Enter fullscreen mode Exit fullscreen mode

Window Functions Reference

Function Category What It Does Key Detail
ROW_NUMBER() Ranking Unique sequential number No ties — always distinct
RANK() Ranking Rank with gaps after ties 1, 2, 2, 4
DENSE_RANK() Ranking Rank without gaps after ties 1, 2, 2, 3
NTILE(n) Ranking Divide into n equal buckets Quartiles, deciles
PERCENT_RANK() Ranking Relative rank as 0–1 value First row = 0, last = 1
CUME_DIST() Ranking Cumulative distribution % of rows ≤ current value
LAG(col, n) Navigation Value n rows before current Replaces self-join for time series
LEAD(col, n) Navigation Value n rows after current Preview future period
FIRST_VALUE(col) Navigation First value in window frame Great for comparing to group leader
LAST_VALUE(col) Navigation Last value in window frame Always specify full frame!
SUM(col) Aggregate Running or partitioned total With frame → running total
AVG(col) Aggregate Running or partitioned average With n PRECEDING → moving avg
COUNT(col) Aggregate Running or partitioned count Cumulative transaction count
MIN(col) Aggregate Minimum within window Running minimum
MAX(col) Aggregate Maximum within window Running maximum

Performance Tips

For Joins:

  • Index your join columns. Unindexed ON clause columns force full table scans. Always index foreign keys.
  • Filter before joining. Use a CTE or subquery to reduce rows before the join, not after.
  • Select only what you need. SELECT * across multiple joined tables moves far more data than necessary.
  • Use EXPLAIN/EXPLAIN ANALYZE. This shows the execution plan and tells you whether SQL is using your indexes.

For Window Functions:

  • Window functions execute after WHERE — filter first to reduce the data the window function sees.
  • Use named WINDOW clauses to avoid repeating identical OVER() definitions.
  • On very large tables, partitioning your source data before applying window functions (via a CTE) can significantly reduce compute time.

⚡ Complete Cheat Sheet

-- ─── JOINS ───────────────────────────────────────────────────────────────────

-- INNER: only rows matching in both tables
SELECT e.full_name, d.dept_name
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id;

-- LEFT: all employees, department info where it exists
SELECT e.full_name, d.dept_name
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.dept_id;

-- Anti-join: employees with NO department
SELECT e.full_name FROM employees e
LEFT JOIN departments d ON e.dept_id = d.dept_id
WHERE d.dept_id IS NULL;

-- FULL OUTER: everything from both tables
SELECT e.full_name, d.dept_name
FROM employees e
FULL OUTER JOIN departments d ON e.dept_id = d.dept_id;

-- SELF: employee + their manager
SELECT e.full_name AS employee, m.full_name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.emp_id;

-- CROSS: all size × colour combos
SELECT s.size_name, c.colour_name
FROM sizes s CROSS JOIN colours c;

-- ─── WINDOW FUNCTIONS ────────────────────────────────────────────────────────

-- Dept average without collapsing rows
SELECT full_name, salary,
    AVG(salary) OVER (PARTITION BY dept_id) AS dept_avg
FROM employees;

-- Top 2 earners per department
WITH ranked AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rn
    FROM employees
)
SELECT * FROM ranked WHERE rn <= 2;

-- RANK vs DENSE_RANK
SELECT full_name, salary,
    RANK()        OVER (ORDER BY salary DESC) AS rnk,
    DENSE_RANK()  OVER (ORDER BY salary DESC) AS dense_rnk
FROM employees;

-- Month-over-month change
SELECT sale_month, revenue,
    LAG(revenue, 1)  OVER (ORDER BY sale_month) AS prev_month,
    revenue - LAG(revenue, 1) OVER (ORDER BY sale_month) AS change
FROM monthly_revenue;

-- Running total
SELECT sale_date, daily_sales,
    SUM(daily_sales) OVER (ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM daily_sales;

-- 7-day moving average
SELECT sale_date, daily_sales,
    ROUND(AVG(daily_sales) OVER (ORDER BY sale_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW), 2) AS moving_avg_7d
FROM daily_sales;

-- NTILE: quartile segmentation
SELECT customer_name, lifetime_spend,
    NTILE(4) OVER (ORDER BY lifetime_spend DESC) AS quartile
FROM customers;

-- Named window (reuse OVER clause)
SELECT full_name, salary,
    AVG(salary) OVER w AS dept_avg,
    RANK()      OVER w AS dept_rank
FROM employees
WINDOW w AS (PARTITION BY dept_id ORDER BY salary DESC);
Enter fullscreen mode Exit fullscreen mode

Key Takeaways

On Joins:

  • Every join type answers the same question differently: what happens to rows that have no match?
  • INNER JOIN drops unmatched rows from both sides.
  • LEFT JOIN protects the left table — it is the most used join in day-to-day analytics.
  • FULL OUTER JOIN protects both sides — use it when you cannot afford to lose rows from either table.
  • Always index join columns. Always alias your tables. Always qualify ambiguous column names.

On Window Functions:

  • GROUP BY collapses rows. Window functions add information to rows without removing them — that distinction is everything.
  • PARTITION BY divides into groups but keeps all rows. It is not GROUP BY.
  • ORDER BY inside OVER() is independent of the query-level ORDER BY.
  • Window functions run during SELECT — after WHERE — so you must wrap them in a CTE or subquery to filter on their results.
  • Always specify a full frame clause when using LAST_VALUE.

Anything unclear or want a deeper dive into a specific function? Drop it in the comments.

Top comments (0)