Charles

Posted on Mar 7

Mastering SQL Joins and Window Functions: A Practical Guide with an E‑commerce Dataset

#sql #database #softwaredevelopment #datascience

Introduction & The Dataset

If you're working with SQL, you've probably used basic SELECT statements, filtered with WHERE, and maybe even done some simple aggregations. But the real power of SQL unfolds when you start combining data from multiple tables and performing complex calculations across rows without losing detail. That's where JOINs and Window Functions come in.

In this article, we'll dive deep into these advanced concepts using a realistic e‑commerce dataset.
By the end, you'll be able to:

Combine data from multiple tables using different types of joins.
Perform rankings, running totals, and comparisons across rows with window functions.
Organize complex queries using Common Table Expressions (CTEs).
Solve practical business questions that real data analysts face every day.

We'll use a consistent dataset throughout – so you can follow along, try the queries yourself, and truly master these techniques.

Meet Our Data

Imagine we're running an online store. Our database has three main tables:

customers – people who registered on our site.
products – items we sell, with prices and stock.
sales – records of purchases, linking customers to products.

Here are the table structures (simplified for clarity) and a few sample rows.

1. Customers

CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(100),
    phone_number VARCHAR(50),
    registration_date DATE,
    membership_status VARCHAR(10)  -- 'Bronze', 'Silver', 'Gold'
);

customer_id	first_name	last_name	email	phone_number	registration_date	membership_status
1	Karen	Molina	gonzalezkimberly@glass.com	(728)697-1206	2020-08-27	Bronze
2	Elizabeth	Archer	tramirez@gmail.com	778.104.6553	2023-08-28	Silver
3	Roberta	Massey	davislori@gmail.com	+1-365-606-7458	2024-06-12	Bronze

2. Products

CREATE TABLE products (
    product_id INT PRIMARY KEY,
    product_name VARCHAR(100),
    category VARCHAR(50),
    price DECIMAL(10, 2),
    supplier VARCHAR(100),
    stock_quantity INT
);

product_id	product_name	category	price	supplier	stock_quantity
1	Laptop	Electronics	999.99	Dell	50
2	Smartphone	Electronics	799.99	Samsung	150
3	Washing Machine	Appliances	499.99	LG	30
4	Headphones	Accessories	199.99	Sony	100

3. Sales

CREATE TABLE sales (
    sale_id INT PRIMARY KEY,
    customer_id INT,
    product_id INT,
    quantity_sold INT,
    sale_date DATE,
    total_amount DECIMAL(10, 2),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id),
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

sale_id	customer_id	product_id	quantity_sold	sale_date	total_amount
1	1	1	1	2023-07-15	999.99
2	2	2	2	2023-08-20	1599.98
3	3	3	1	2023-09-10	499.99
4	4	4	3	2023-07-25	599.97

What We'll Build

We'll use these tables to answer questions like:

"Which customers have never placed an order?"
"What is the total revenue per product category?"
"Who are the top 3 customers by spending, and how do they rank within their membership tier?"
"How does each sale compare to the previous sale for the same customer?"

By the time you finish reading, you'll be able to write queries that combine these tables and apply window functions like a pro.

SQL Joins – Bringing Data Together

In a well‑designed database, data is split across multiple tables to avoid redundancy. For example, we store customer details in one table, product information in another, and sales transactions in a third. To answer real‑world questions, we need to join these tables back together.

INNER JOIN

An INNER JOIN returns only the rows that have matching values in both tables. Rows without a match are excluded.

Use case: Find all customers who have placed orders, along with their order details.

Example 1: List customers and their orders

SELECT 
    c.customer_id,
    c.first_name,
    c.last_name,
    s.sale_id,
    s.sale_date,
    s.total_amount
FROM customers c
INNER JOIN sales s ON c.customer_id = s.customer_id;

Result:

customer_id	first_name	last_name	sale_id	sale_date	total_amount
1	Karen	Molina	1	2023-07-15	999.99
2	Elizabeth	Archer	2	2023-08-20	1599.98
3	Roberta	Massey	3	2023-09-10	499.99
4	Jacob	Adams	4	2023-07-25	599.97
5	Cynthia	Lowery	5	2023-06-18	1200.00

What happened?

The INNER JOIN combined rows from customers and sales wherever customer_id matched. Customers who have never placed an order (like some of our 50 customers) do not appear in the result.

Example 2: Add product details to the orders
We can join more than two tables. Let's include the product name and price.

SELECT 
    c.first_name,
    c.last_name,
    s.sale_date,
    p.product_name,
    s.quantity_sold,
    s.total_amount
FROM customers c
INNER JOIN sales s ON c.customer_id = s.customer_id
INNER JOIN products p ON s.product_id = p.product_id;

Result:

first_name	last_name	sale_date	product_name	quantity_sold	total_amount
Karen	Molina	2023-07-15	Laptop	1	999.99
Elizabeth	Archer	2023-08-20	Smartphone	2	1599.98
Roberta	Massey	2023-09-10	Washing Machine	1	499.99
Jacob	Adams	2023-07-25	Headphones	3	599.97
Cynthia	Lowery	2023-06-18	Refrigerator	1	1200.00

Key point: Each INNER JOIN adds another table, filtering out rows that don't match the join condition. Only orders with valid customer and product IDs appear.

LEFT JOIN (or LEFT OUTER JOIN)

A LEFT JOIN returns all rows from the left table, and matching rows from the right table. If there's no match, the right‑table columns are filled with NULL.

Use case: Show all customers, even those who have never placed an order, and include any orders they may have.

SELECT 
    c.customer_id,
    c.first_name,
    c.last_name,
    s.sale_id,
    s.sale_date,
    s.total_amount
FROM customers c
LEFT JOIN sales s ON c.customer_id = s.customer_id
ORDER BY c.customer_id;

Result (first few rows):

customer_id	first_name	last_name	sale_id	sale_date	total_amount
1	Karen	Molina	1	2023-07-15	999.99
2	Elizabeth	Archer	2	2023-08-20	1599.98
3	Roberta	Massey	3	2023-09-10	499.99
4	Jacob	Adams	4	2023-07-25	599.97
5	Cynthia	Lowery	5	2023-06-18	1200.00
6	Emily	King	NULL	NULL	NULL
7	Linda	Larsen	NULL	NULL	NULL

Notice: Customers 6 and 7 (Emily King, Linda Larsen) have no sales, so the sale_id, sale_date, and total_amount are NULL. This is a great way to find customers who haven't purchased anything.

RIGHT JOIN

A RIGHT JOIN is the mirror of LEFT JOIN: it returns all rows from the right table, and matching rows from the left. Unmatched left‑table columns become NULL.
Because you can always rewrite a RIGHT JOIN as a LEFT JOIN by swapping the table order, many developers rarely use it. But it's good to know it exists.

Use case: List all products and show which ones have been sold (if any).

SELECT 
    p.product_id,
    p.product_name,
    s.sale_id,
    s.sale_date,
    s.total_amount
FROM sales s
RIGHT JOIN products p ON s.product_id = p.product_id
ORDER BY p.product_id;

Result (first few rows):

product_id	product_name	sale_id	sale_date	total_amount
1	Laptop	1	2023-07-15	999.99
2	Smartphone	2	2023-08-20	1599.98
3	Washing Machine	3	2023-09-10	499.99
4	Headphones	4	2023-07-25	599.97
5	Refrigerator	5	2023-06-18	1200.00
6	Smart TV	NULL	NULL	NULL
7	Microwave	NULL	NULL	NULL

Products 6 (Smart TV) and 7 (Microwave) haven't been sold – their sale details are NULL.

FULL OUTER JOIN

FULL OUTER JOIN returns all rows from both tables, matching where possible. Unmatched rows from either side are filled with NULL. It's like a combination of LEFT and RIGHT joins.

Use case: See every customer and every product side by side – which is not very common in practice, but useful for finding orphans.

SELECT 
    c.customer_id,
    c.first_name,
    c.last_name,
    p.product_id,
    p.product_name,
    s.sale_id
FROM customers c
FULL OUTER JOIN sales s ON c.customer_id = s.customer_id
FULL OUTER JOIN products p ON s.product_id = p.product_id
LIMIT 10;  -- just to keep the output manageable

Result (truncated):
You'll get a mix of customers without sales, products without sales, and actual transactions. Many NULL cells appear.

Because FULL OUTER JOIN can produce huge result sets, it's used less often, but it's indispensable for certain data reconciliation tasks.

CROSS JOIN

A CROSS JOIN returns the Cartesian product of two tables – every row from the first table paired with every row from the second. There's no ON condition.

Use case: Generate all possible combinations, e.g., for creating product‑customer promotional lists.

SELECT 
    c.customer_id,
    c.first_name,
    p.product_id,
    p.product_name
FROM customers c
CROSS JOIN products p
WHERE c.customer_id <= 3 AND p.product_id <= 3  -- limit for readability
ORDER BY c.customer_id, p.product_id;

Result:

customer_id	first_name	product_id	product_name
1	Karen	1	Laptop
1	Karen	2	Smartphone
1	Karen	3	Washing Machine
2	Elizabeth	1	Laptop
2	Elizabeth	2	Smartphone
2	Elizabeth	3	Washing Machine
3	Roberta	1	Laptop
3	Roberta	2	Smartphone
3	Roberta	3	Washing Machine

Every combination of the selected customers and products appears. This is handy for generating base data for analysis.

SELF JOIN

A SELF JOIN is when a table is joined with itself. It's useful for hierarchical data or comparing rows within the same table.

Use case: Find pairs of customers who have the same membership status.

SELECT 
    c1.customer_id AS customer1,
    c1.first_name AS first1,
    c1.last_name AS last1,
    c2.customer_id AS customer2,
    c2.first_name AS first2,
    c2.last_name AS last2,
    c1.membership_status
FROM customers c1
JOIN customers c2 ON c1.membership_status = c2.membership_status
WHERE c1.customer_id < c2.customer_id
ORDER BY c1.membership_status, customer1;

Result (few rows):

customer1	first1	last1	customer2	first2	last2	membership_status
1	Karen	Molina	3	Roberta	Massey	Bronze
1	Karen	Molina	8	Angela	Hanson	Bronze
1	Karen	Molina	9	Whitney	Wilson	Bronze
2	Elizabeth	Archer	5	Cynthia	Lowery	Silver

We use c1.customer_id < c2.customer_id to avoid duplicate pairs (e.g., (1,3) and (3,1)) and to exclude a customer paired with themselves.

Joining Multiple Tables – Putting It All Together

Often you need to join more than two tables. Here's a query that combines customers, sales, and products to give a complete picture of each transaction.

SELECT 
    c.first_name,
    c.last_name,
    s.sale_date,
    p.product_name,
    p.category,
    s.quantity_sold,
    s.total_amount
FROM customers c
JOIN sales s ON c.customer_id = s.customer_id
JOIN products p ON s.product_id = p.product_id
WHERE c.membership_status = 'Gold'   -- only Gold members
ORDER BY s.sale_date;

Result (few rows):

first_name	last_name	sale_date	product_name	category	quantity_sold	total_amount
James	Rodriguez	2023-07-15	Gaming Console	Electronics	1	350.00
Steven	Burnett	2023-08-01	Microwave	Appliances	1	180.00

This kind of multi‑table join is the backbone of reporting and analytics.

Now that you've seen all join types, you're ready to combine data from any set of related tables. In the next section, we'll explore Window Functions – a whole new level of analytical power.

Window Functions – Analyzing Data Without Losing Detail

Window functions are a game‑changer. Unlike regular aggregate functions (GROUP BY) that collapse multiple rows into one, window functions perform calculations across a set of rows related to the current row while keeping each row intact. This makes them perfect for rankings, running totals, moving averages, and comparing values between rows.

Basic Syntax

function_name (expression) OVER (
    [PARTITION BY column(s)]
    [ORDER BY column(s)]
    [frame_clause]
)

function_name: e.g., ROW_NUMBER(), RANK(), SUM(), LAG()
PARTITION BY: divides rows into groups (optional) – the function is applied separately to each group.
ORDER BY: defines the order within each partition.
frame_clause: specifies which rows to include (for aggregate window functions).

We'll use our sales table, sometimes joined with customers or products, to illustrate.

1. ROW_NUMBER() – Unique Sequential Number

Assigns a unique integer to each row within a partition, starting at 1. Ties are broken arbitrarily.

Example: Number all sales by date

SELECT 
    sale_id,
    customer_id,
    sale_date,
    total_amount,
    ROW_NUMBER() OVER (ORDER BY sale_date) AS row_num
FROM sales;

Result:

sale_id	customer_id	sale_date	total_amount	row_num
5	5	2023-06-18	1200.00	1
1	1	2023-07-15	999.99	2
4	4	2023-07-25	599.97	3
2	2	2023-08-20	1599.98	4
3	3	2023-09-10	499.99	5

Use case: Pagination, assigning a unique identifier to each row in a result set.

2. RANK() and DENSE_RANK() – Ranking with Ties

RANK() gives the same rank to equal values, then skips the next rank(s) (e.g., 1,1,3).
DENSE_RANK() also gives the same rank to ties, but does not skip numbers (e.g., 1,1,2).

Example: Rank customers by total spending (including those with no purchases).

WITH customer_spending AS (
    SELECT 
        c.customer_id,
        c.first_name,
        c.last_name,
        COALESCE(SUM(s.total_amount), 0) AS total_spent
    FROM customers c
    LEFT JOIN sales s ON c.customer_id = s.customer_id
    GROUP BY c.customer_id, c.first_name, c.last_name
)
SELECT 
    customer_id,
    first_name,
    last_name,
    total_spent,
    RANK() OVER (ORDER BY total_spent DESC) AS rank,
    DENSE_RANK() OVER (ORDER BY total_spent DESC) AS dense_rank
FROM customer_spending
ORDER BY total_spent DESC
LIMIT 10;  -- just show top 10 for brevity

Result (first few rows):

customer_id	first_name	last_name	total_spent	rank	dense_rank
2	Elizabeth	Archer	1599.98	1	1
6	Emily	King	1500.00	2	2
1	Karen	Molina	999.99	3	3
5	Cynthia	Lowery	1200.00	4	4
4	Jacob	Adams	599.97	5	5
3	Roberta	Massey	499.99	6	6
7	Linda	Larsen	0.00	7	7
8	Angela	Hanson	0.00	7	7
9	Whitney	Wilson	0.00	7	7

Notice how RANK and DENSE_RANK handle ties differently: when multiple customers have zero spending, RANK skips from 7 to 10 (depending on how many zeros), while DENSE_RANK assigns the same rank to all zeros and then increments by 1.

3. SUM() as a Window Function – Running Totals

Using aggregate functions like SUM with OVER allows you to compute a running total without collapsing rows.

Example: Calculate a running total of sales amounts ordered by date.

SELECT 
    sale_id,
    sale_date,
    total_amount,
    SUM(total_amount) OVER (ORDER BY sale_date) AS running_total
FROM sales
ORDER BY sale_date;

Result:

sale_id	sale_date	total_amount	running_total
5	2023-06-18	1200.00	1200.00
1	2023-07-15	999.99	2199.99
4	2023-07-25	599.97	2799.96
2	2023-08-20	1599.98	4399.94
3	2023-09-10	499.99	4899.93

Each row's running_total is the sum of all total_amount from the earliest sale up to that row.

With partitioning: You can reset the running total for each customer.

SELECT 
    customer_id,
    sale_id,
    sale_date,
    total_amount,
    SUM(total_amount) OVER (PARTITION BY customer_id ORDER BY sale_date) AS customer_running_total
FROM sales
ORDER BY customer_id, sale_date;

This would show, for each customer, the cumulative amount they've spent over time.

4. LAG() and LEAD() – Accessing Other Rows

LAG() lets you access data from a previous row, while LEAD() accesses a following row – great for comparing values.

Example: For each sale, show the previous sale amount (by date).

SELECT 
    sale_id,
    sale_date,
    total_amount,
    LAG(total_amount) OVER (ORDER BY sale_date) AS prev_sale_amount,
    total_amount - LAG(total_amount) OVER (ORDER BY sale_date) AS difference_from_prev
FROM sales
ORDER BY sale_date;

Result:

sale_id	sale_date	total_amount	prev_sale_amount	difference_from_prev
5	2023-06-18	1200.00	NULL	NULL
1	2023-07-15	999.99	1200.00	-200.01
4	2023-07-25	599.97	999.99	-400.02
2	2023-08-20	1599.98	599.97	1000.01
3	2023-09-10	499.99	1599.98	-1099.99

The first row has NULL because there's no previous sale.

5. NTILE() – Dividing into Buckets

NTILE(n) divides rows into n roughly equal groups (buckets) based on the ordering.

Example: Divide customers into 4 quartiles based on total spending.

WITH customer_spending AS (
    SELECT 
        c.customer_id,
        c.first_name,
        c.last_name,
        COALESCE(SUM(s.total_amount), 0) AS total_spent
    FROM customers c
    LEFT JOIN sales s ON c.customer_id = s.customer_id
    GROUP BY c.customer_id, c.first_name, c.last_name
)
SELECT 
    customer_id,
    first_name,
    last_name,
    total_spent,
    NTILE(4) OVER (ORDER BY total_spent DESC) AS spending_quartile
FROM customer_spending
ORDER BY total_spent DESC;

Result (sample):

customer_id	first_name	last_name	total_spent	spending_quartile
2	Elizabeth	Archer	1599.98	1
6	Emily	King	1500.00	1
1	Karen	Molina	999.99	1
5	Cynthia	Lowery	1200.00	1
4	Jacob	Adams	599.97	2
3	Roberta	Massey	499.99	2
7	Linda	Larsen	0.00	3
8	Angela	Hanson	0.00	3
9	Whitney	Wilson	0.00	4
10	Angela	Atkins	75.00	2

Quartile 1 contains the top 25% of spenders, quartile 4 the bottom 25%.

6. Window Functions with PARTITION BY – Per‑Group Rankings

PARTITION BY restricts the window to rows with the same value in a column.

Example: Rank products by price within each category.

SELECT 
    product_id,
    product_name,
    category,
    price,
    RANK() OVER (PARTITION BY category ORDER BY price DESC) AS price_rank_in_category
FROM products
ORDER BY category, price_rank_in_category;

Result (first few rows):

product_id	product_name	category	price	price_rank_in_category
5	Refrigerator	Appliances	1200.00	1
3	Washing Machine	Appliances	499.99	2
7	Microwave	Appliances	180.00	3
8	Blender	Appliances	50.00	4
6	Smart TV	Electronics	1500.00	1
1	Laptop	Electronics	999.99	2
2	Smartphone	Electronics	799.99	3
9	Gaming Console	Electronics	350.00	4
12	Monitor	Electronics	250.00	5

Now you can see the most expensive product in each category at a glance.

Key Takeaways

Window functions do not collapse rows – they add calculated columns while preserving detail.

PARTITION BY is like GROUP BY within the window.
ORDER BY inside OVER controls the order for functions like ROW_NUMBER, RANK, and running totals.
Use RANK vs DENSE_RANK depending on whether you want gaps after ties.
LAG and LEAD are invaluable for time‑series and comparative analysis.
NTILE is great for segmenting data into percentiles or quartiles.

In the next section, we'll explore Common Table Expressions (CTEs) – a powerful way to organize complex queries, especially when combined with window functions.

Common Table Expressions (CTEs) – Organizing Complex Queries

A Common Table Expression (CTE) is a temporary result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. Think of it as a named subquery that makes your SQL more readable, reusable, and easier to debug.

Basic Syntax

WITH cte_name AS (
    -- CTE query
    SELECT ...
)
-- Main query that uses the CTE
SELECT ...
FROM cte_name;

You can also define multiple CTEs in one WITH clause by separating them with commas:

WITH 
cte1 AS (SELECT ...),
cte2 AS (SELECT ...)
SELECT ...
FROM cte1 JOIN cte2 ON ...;

CTEs are especially valuable when you need to:

Break a complex query into simple, logical steps.
Reference the same subquery multiple times in a query.
Create recursive queries (though we'll focus on non‑recursive here).

Example 1: Simple CTE – Total Spending per Customer

Suppose we want a list of customers with their total spending, but we also need to use that result in further calculations. Instead of repeating the aggregation, we can define a CTE.

WITH customer_spending AS (
    SELECT 
        c.customer_id,
        c.first_name,
        c.last_name,
        COALESCE(SUM(s.total_amount), 0) AS total_spent
    FROM customers c
    LEFT JOIN sales s ON c.customer_id = s.customer_id
    GROUP BY c.customer_id, c.first_name, c.last_name
)
SELECT 
    customer_id,
    first_name,
    last_name,
    total_spent
FROM customer_spending
ORDER BY total_spent DESC
LIMIT 5;

Result (top 5 spenders):

customer_id	first_name	last_name	total_spent
2	Elizabeth	Archer	1599.98
6	Emily	King	1500.00
5	Cynthia	Lowery	1200.00
1	Karen	Molina	999.99
4	Jacob	Adams	599.97

Now customer_spending can be reused in the same query or even in subsequent CTEs.

Example 2: Chaining Multiple CTEs

You can build one CTE on top of another. This is great for step‑wise transformations.
Task: Find customers who spent more than the average spending of all customers.
We'll first compute each customer's total, then the overall average, then filter.

WITH customer_spending AS (
    SELECT 
        c.customer_id,
        c.first_name,
        c.last_name,
        COALESCE(SUM(s.total_amount), 0) AS total_spent
    FROM customers c
    LEFT JOIN sales s ON c.customer_id = s.customer_id
    GROUP BY c.customer_id, c.first_name, c.last_name
),
overall_avg AS (
    SELECT AVG(total_spent) AS avg_spent
    FROM customer_spending
)
SELECT 
    cs.customer_id,
    cs.first_name,
    cs.last_name,
    cs.total_spent,
    oa.avg_spent
FROM customer_spending cs
CROSS JOIN overall_avg oa
WHERE cs.total_spent > oa.avg_spent
ORDER BY cs.total_spent DESC;

Result (sample):

customer_id	first_name	last_name	total_spent	avg_spent
2	Elizabeth	Archer	1599.98	287.65
6	Emily	King	1500.00	287.65
5	Cynthia	Lowery	1200.00	287.65
1	Karen	Molina	999.99	287.65
4	Jacob	Adams	599.97	287.65
3	Roberta	Massey	499.99	287.65

(Here the average is around 287.65 – you can see which customers are above it.)

Example 3: CTEs with Window Functions

CTEs and window functions are a match made in heaven. The CTE prepares the data, and the window function adds analytics.
Task: Rank customers by total spending and show their quartile.

WITH customer_spending AS (
    SELECT 
        c.customer_id,
        c.first_name,
        c.last_name,
        COALESCE(SUM(s.total_amount), 0) AS total_spent
    FROM customers c
    LEFT JOIN sales s ON c.customer_id = s.customer_id
    GROUP BY c.customer_id, c.first_name, c.last_name
)
SELECT 
    customer_id,
    first_name,
    last_name,
    total_spent,
    RANK() OVER (ORDER BY total_spent DESC) AS spending_rank,
    NTILE(4) OVER (ORDER BY total_spent DESC) AS spending_quartile
FROM customer_spending
ORDER BY total_spent DESC;

Result (first few rows):

customer_id	first_name	last_name	total_spent	spending_rank	spending_quartile
2	Elizabeth	Archer	1599.98	1	1
6	Emily	King	1500.00	2	1
5	Cynthia	Lowery	1200.00	3	1
1	Karen	Molina	999.99	4	1
4	Jacob	Adams	599.97	5	2
3	Roberta	Massey	499.99	6	2
7	Linda	Larsen	0.00	7	3
8	Angela	Hanson	0.00	7	3
9	Whitney	Wilson	0.00	7	4
10	Angela	Atkins	75.00	10	2

Notice how RANK handles ties (customers with zero spending share rank 7) and NTILE distributes rows into four buckets as evenly as possible.

Example 4: CTEs for Readability – Multi‑Step Analysis

Imagine you need to produce a report that shows, for each product category, the top‑selling product and its sales. Without CTEs, the query could become a tangled mess of subqueries. With CTEs, it's clean:

WITH category_sales AS (
    SELECT 
        p.category,
        p.product_name,
        SUM(s.total_amount) AS product_sales
    FROM products p
    LEFT JOIN sales s ON p.product_id = s.product_id
    GROUP BY p.category, p.product_name
),
ranked_products AS (
    SELECT 
        category,
        product_name,
        product_sales,
        RANK() OVER (PARTITION BY category ORDER BY product_sales DESC) AS rank
    FROM category_sales
)
SELECT 
    category,
    product_name,
    product_sales
FROM ranked_products
WHERE rank = 1;

Result:

category	product_name	product_sales
Appliances	Refrigerator	1200.00
Electronics	Smart TV	1500.00
Accessories	Headphones	599.97

(Assuming only a few sales; your actual data may vary.)

Key Takeaways

CTEs improve readability by letting you name intermediate results.
They are not materialized (in most databases) – they're just syntax, but they make complex queries easier to write and debug.
You can use them with joins, window functions, and aggregations.
Multiple CTEs can be chained to build up a solution step by step.
They are essential for recursive queries (though we didn't cover that here).

Now that you've seen how CTEs can organize even the most intricate logic, you're ready to tackle advanced analytical queries that combine all these concepts.

In the final section, we'll bring everything together with some real‑world analytical problems that use joins, window functions, and CTEs in harmony.

Putting It All Together – Real‑World Analytical Problems

Now that we've covered joins, window functions, and CTEs individually, let's combine them to solve realistic business questions. These examples show how these tools work in harmony to extract valuable insights from our e‑commerce data.

Example 1: Top 3 Customers by Spending in Each Membership Tier

Business question: For each membership tier (Bronze, Silver, Gold), who are the top 3 customers by total spending?

Approach:

CTE to compute total spending per customer, including their membership status.
Window function RANK() partitioned by membership status to rank customers within their tier.
Filter to keep only ranks 1–3.

WITH customer_spending AS (
    SELECT 
        c.customer_id,
        c.first_name,
        c.last_name,
        c.membership_status,
        COALESCE(SUM(s.total_amount), 0) AS total_spent
    FROM customers c
    LEFT JOIN sales s ON c.customer_id = s.customer_id
    GROUP BY c.customer_id, c.first_name, c.last_name, c.membership_status
),
ranked_customers AS (
    SELECT 
        customer_id,
        first_name,
        last_name,
        membership_status,
        total_spent,
        RANK() OVER (PARTITION BY membership_status ORDER BY total_spent DESC) AS rank_in_tier
    FROM customer_spending
)
SELECT 
    membership_status,
    rank_in_tier,
    customer_id,
    first_name,
    last_name,
    total_spent
FROM ranked_customers
WHERE rank_in_tier <= 3
ORDER BY membership_status, rank_in_tier;

Result (sample from our data):

membership_status	rank_in_tier	customer_id	first_name	last_name	total_spent
Bronze	1	1	Karen	Molina	999.99
Bronze	2	3	Roberta	Massey	499.99
Bronze	3	8	Angela	Hanson	0.00
Gold	1	4	Jacob	Adams	599.97
Gold	2	17	James	Rodriguez	350.00
Gold	3	18	Steven	Burnett	180.00
Silver	1	2	Elizabeth	Archer	1599.98
Silver	2	5	Cynthia	Lowery	1200.00
Silver	3	6	Emily	King	1500.00

This query combines a LEFT JOIN (to include customers with no purchases), a CTE for the base aggregation, a window function for ranking, and finally a filter on the rank. The result gives us a quick view of the best customers in each tier.

Example 2: Product Performance vs. Category Average

Business question: For each product, show its total sales and how much it differs from the average sales of products in the same category. Which products are outperforming their category?

Approach:

First CTE computes total sales per product with category.
Second CTE computes average sales per category.
Main query joins the two and calculates the difference.

WITH product_sales AS (
    SELECT 
        p.category,
        p.product_id,
        p.product_name,
        COALESCE(SUM(s.total_amount), 0) AS product_total
    FROM products p
    LEFT JOIN sales s ON p.product_id = s.product_id
    GROUP BY p.category, p.product_id, p.product_name
),
category_avg AS (
    SELECT 
        category,
        AVG(product_total) AS avg_in_category
    FROM product_sales
    GROUP BY category
)
SELECT 
    ps.category,
    ps.product_name,
    ps.product_total,
    ca.avg_in_category,
    ps.product_total - ca.avg_in_category AS difference_from_avg
FROM product_sales ps
JOIN category_avg ca ON ps.category = ca.category
ORDER BY ps.category, difference_from_avg DESC;

Result (sample):

category	product_name	product_total	avg_in_category	difference_from_avg
Accessories	Headphones	599.97	99.99	499.98
Accessories	Wireless Mouse	0.00	99.99	-99.99
Accessories	Keyboard	0.00	99.99	-99.99
Appliances	Refrigerator	1200.00	244.99	955.01
Appliances	Washing Machine	499.99	244.99	255.00
Appliances	Microwave	180.00	244.99	-64.99
Appliances	Blender	0.00	244.99	-244.99
Electronics	Smart TV	1500.00	424.99	1075.01
Electronics	Laptop	999.99	424.99	575.00
Electronics	Smartphone	1599.98	424.99	1174.99
Electronics	Gaming Console	350.00	424.99	-74.99
Electronics	Monitor	0.00	424.99	-424.99
Electronics	External Hard Drive	0.00	424.99	-424.99
Electronics	Tablet	0.00	424.99	-424.99
Electronics	Smartwatch	0.00	424.99	-424.99

Here, joins link products to sales, CTEs structure the intermediate calculations, and the final ORDER BY highlights top performers. We can immediately see which products are stars and which are lagging in their category.

Example 3: Customers Who Made Consecutive Purchases Within 7 Days

Business question: Identify customers who have made purchases within 7 days of a previous purchase – a sign of engaged shoppers.

Approach:

Use LAG() window function to get the previous purchase date for each customer.
Compute the date difference.
Filter where the gap is ≤ 7 days.

WITH customer_purchases AS (
    SELECT 
        customer_id,
        sale_date,
        LAG(sale_date) OVER (PARTITION BY customer_id ORDER BY sale_date) AS prev_purchase_date
    FROM sales
)
SELECT DISTINCT
    c.customer_id,
    c.first_name,
    c.last_name,
    cp.sale_date AS purchase_date,
    cp.prev_purchase_date,
    (cp.sale_date - cp.prev_purchase_date) AS days_between
FROM customer_purchases cp
JOIN customers c ON cp.customer_id = c.customer_id
WHERE cp.prev_purchase_date IS NOT NULL
  AND (cp.sale_date - cp.prev_purchase_date) <= 7
ORDER BY c.customer_id, cp.sale_date;

Result (In respect to our data)

customer_id	first_name	last_name	purchase_date	prev_purchase_date	days_between
1	Karen	Molina	2023-07-15	2023-07-10	5
5	Cynthia	Lowery	2023-06-18	2023-06-12	6

This query uses a window function (LAG) within a CTE, then joins with the customers table to get names, and finally filters. It demonstrates how window functions can identify sequential patterns.

Example 4: Monthly Sales Trends – Running Total and Month‑over‑Month Change

Business question: Show monthly sales totals, a running total, and the percentage change from the previous month.

Approach:

CTE to aggregate sales by month.
Window functions for running total (SUM() OVER) and previous month's sales (LAG).
Compute the change.

WITH monthly_sales AS (
    SELECT 
        DATE_TRUNC('month', sale_date)::DATE AS month,
        SUM(total_amount) AS monthly_total
    FROM sales
    GROUP BY DATE_TRUNC('month', sale_date)
)
SELECT 
    month,
    monthly_total,
    SUM(monthly_total) OVER (ORDER BY month) AS running_total,
    LAG(monthly_total) OVER (ORDER BY month) AS prev_month_total,
    CASE 
        WHEN LAG(monthly_total) OVER (ORDER BY month) IS NOT NULL 
        THEN ROUND(100.0 * (monthly_total - LAG(monthly_total) OVER (ORDER BY month)) / LAG(monthly_total) OVER (ORDER BY month), 2)
        ELSE NULL 
    END AS pct_change
FROM monthly_sales
ORDER BY month;

Result;

month	monthly_total	running_total	prev_month_total	pct_change
2023-06-01	1200.00	1200.00	NULL	NULL
2023-07-01	1599.96	2799.96	1200.00	33.33
2023-08-01	1599.98	4399.94	1599.96	0.00
2023-09-01	499.99	4899.93	1599.98	-68.75

Here, the CTE aggregates data, and window functions add running total and previous‑month comparison. This gives a concise view of sales momentum.

Conclusion

Throughout this article, we've explored three powerful SQL concepts—joins, window functions, and CTEs—using a practical e‑commerce dataset. As a data professional, knowing when and how to apply these tools is essential for building efficient, readable, and maintainable queries.

Here's a quick reference guide based on what we've learned:

Use Window Functions When:

You need to perform calculations across rows without collapsing the result set (e.g., running totals, moving averages).
You're working with rankings or percentiles (e.g., top customers by spending, quartiles).
You need to compare values between rows (e.g., previous sale amount, month‑over‑month change).
Your analysis requires retaining detail while still gaining aggregate insights.

Use CTEs When:

You're breaking down a complex query into simpler, logical steps.
The same intermediate result needs to be referenced multiple times in a query.
You want to improve readability and make your code easier to debug.
You're preparing data for window functions or further transformations.

Use Joins When:

You need to combine data from multiple related tables (e.g., customers with their orders).
Your database is normalized and information is spread across tables.
You're building reporting datasets that require fields from different sources.
You need to understand relationships between entities in your data.

By mastering these advanced SQL concepts, you're not just writing queries—you're building a toolkit for solving real business problems. Whether it's identifying top‑performing products, understanding customer behavior, or tracking sales trends over time, the techniques you've learned here will serve you across industries and datasets.

Data Analysis Step by Step;

1st Read: Git & Github Beginner's guide

If you’re also learning version control with Git, you can read my Git & GitHub beginner’s guide here:
https://dev.to/charles_ndungu/git-for-data-scientists-data-engineers-my-very-first-beginner-guide-git-bash-github-3952

2nd Read: Mastering Excel

After mastering Git basics, you can learn how to analyze data using Microsoft Excel here:
https://dev.to/charles_ndungu/ms-excel-for-data-analytics-a-friendly-practical-guide-for-beginners-hjn

3rd Read: Data Modelling & Schemas

This article dives into data modelling in Power BI, covering star and snowflake schemas, fact and dimension tables, relationships, and why good modelling is essential for accurate insights and fast reports.
https://dev.to/charles_ndungu/the-backbone-of-power-bi-a-deep-dive-into-data-modeling-schemas-1o1l

4th Read: Data Analysis Steps in Power BI

This article reveals how Power BI analysts act as data translators, bridging the gap between messy data and clear business action. We break down their essential three-step process: cleaning raw information, encoding logic with DAX, and designing dashboards that drive real decisions.
https://dev.to/charles_ndungu/from-raw-data-to-real-action-the-analysts-journey-as-a-data-translator-in-power-bi-2gl6

5th Read: From Tables to Insights – A SQL Masterclass

Data lives scattered across tables—customers here, products there, sales everywhere. This article is your guide to bringing it all together. Follow along with a real online store dataset as we build queries that rank top customers, spot sales trends, and uncover hidden patterns. By the end, you won't just write SQL—you'll tell stories with data.
https://dev.to/charles_ndungu/mastering-sql-joins-and-window-functions-a-practical-guide-with-an-e-commerce-dataset-ejg

Repo

https://github.com/Charles-Ndungu/excel-for-data-analytics