DEV Community

Cover image for Mastering SQL Joins and Window Functions: A Practical Guide with an E‑commerce Dataset
Charles
Charles

Posted on

Mastering SQL Joins and Window Functions: A Practical Guide with an E‑commerce Dataset

Introduction & The Dataset

If you're working with SQL, you've probably used basic SELECT statements, filtered with WHERE, and maybe even done some simple aggregations. But the real power of SQL unfolds when you start combining data from multiple tables and performing complex calculations across rows without losing detail. That's where JOINs and Window Functions come in.

In this article, we'll dive deep into these advanced concepts using a realistic e‑commerce dataset.
By the end, you'll be able to:

  • Combine data from multiple tables using different types of joins.
  • Perform rankings, running totals, and comparisons across rows with window functions.
  • Organize complex queries using Common Table Expressions (CTEs).
  • Solve practical business questions that real data analysts face every day.

We'll use a consistent dataset throughout – so you can follow along, try the queries yourself, and truly master these techniques.

Meet Our Data

Imagine we're running an online store. Our database has three main tables:

  • customers – people who registered on our site.
  • products – items we sell, with prices and stock.
  • sales – records of purchases, linking customers to products.

Here are the table structures (simplified for clarity) and a few sample rows.

1. Customers

CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(100),
    phone_number VARCHAR(50),
    registration_date DATE,
    membership_status VARCHAR(10)  -- 'Bronze', 'Silver', 'Gold'
);
Enter fullscreen mode Exit fullscreen mode
customer_id first_name last_name email phone_number registration_date membership_status
1 Karen Molina gonzalezkimberly@glass.com (728)697-1206 2020-08-27 Bronze
2 Elizabeth Archer tramirez@gmail.com 778.104.6553 2023-08-28 Silver
3 Roberta Massey davislori@gmail.com +1-365-606-7458 2024-06-12 Bronze

2. Products

CREATE TABLE products (
    product_id INT PRIMARY KEY,
    product_name VARCHAR(100),
    category VARCHAR(50),
    price DECIMAL(10, 2),
    supplier VARCHAR(100),
    stock_quantity INT
);
Enter fullscreen mode Exit fullscreen mode
product_id product_name category price supplier stock_quantity
1 Laptop Electronics 999.99 Dell 50
2 Smartphone Electronics 799.99 Samsung 150
3 Washing Machine Appliances 499.99 LG 30
4 Headphones Accessories 199.99 Sony 100

3. Sales

CREATE TABLE sales (
    sale_id INT PRIMARY KEY,
    customer_id INT,
    product_id INT,
    quantity_sold INT,
    sale_date DATE,
    total_amount DECIMAL(10, 2),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id),
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);
Enter fullscreen mode Exit fullscreen mode
sale_id customer_id product_id quantity_sold sale_date total_amount
1 1 1 1 2023-07-15 999.99
2 2 2 2 2023-08-20 1599.98
3 3 3 1 2023-09-10 499.99
4 4 4 3 2023-07-25 599.97

What We'll Build

We'll use these tables to answer questions like:

  • "Which customers have never placed an order?"
  • "What is the total revenue per product category?"
  • "Who are the top 3 customers by spending, and how do they rank within their membership tier?"
  • "How does each sale compare to the previous sale for the same customer?"

By the time you finish reading, you'll be able to write queries that combine these tables and apply window functions like a pro.

SQL Joins – Bringing Data Together

In a well‑designed database, data is split across multiple tables to avoid redundancy. For example, we store customer details in one table, product information in another, and sales transactions in a third. To answer real‑world questions, we need to join these tables back together.

INNER JOIN

An INNER JOIN returns only the rows that have matching values in both tables. Rows without a match are excluded.

Use case: Find all customers who have placed orders, along with their order details.

Example 1: List customers and their orders

SELECT 
    c.customer_id,
    c.first_name,
    c.last_name,
    s.sale_id,
    s.sale_date,
    s.total_amount
FROM customers c
INNER JOIN sales s ON c.customer_id = s.customer_id;
Enter fullscreen mode Exit fullscreen mode

Result:

customer_id first_name last_name sale_id sale_date total_amount
1 Karen Molina 1 2023-07-15 999.99
2 Elizabeth Archer 2 2023-08-20 1599.98
3 Roberta Massey 3 2023-09-10 499.99
4 Jacob Adams 4 2023-07-25 599.97
5 Cynthia Lowery 5 2023-06-18 1200.00

What happened?

The INNER JOIN combined rows from customers and sales wherever customer_id matched. Customers who have never placed an order (like some of our 50 customers) do not appear in the result.

Example 2: Add product details to the orders
We can join more than two tables. Let's include the product name and price.

SELECT 
    c.first_name,
    c.last_name,
    s.sale_date,
    p.product_name,
    s.quantity_sold,
    s.total_amount
FROM customers c
INNER JOIN sales s ON c.customer_id = s.customer_id
INNER JOIN products p ON s.product_id = p.product_id;
Enter fullscreen mode Exit fullscreen mode

Result:

first_name last_name sale_date product_name quantity_sold total_amount
Karen Molina 2023-07-15 Laptop 1 999.99
Elizabeth Archer 2023-08-20 Smartphone 2 1599.98
Roberta Massey 2023-09-10 Washing Machine 1 499.99
Jacob Adams 2023-07-25 Headphones 3 599.97
Cynthia Lowery 2023-06-18 Refrigerator 1 1200.00

Key point: Each INNER JOIN adds another table, filtering out rows that don't match the join condition. Only orders with valid customer and product IDs appear.

LEFT JOIN (or LEFT OUTER JOIN)

A LEFT JOIN returns all rows from the left table, and matching rows from the right table. If there's no match, the right‑table columns are filled with NULL.

Use case: Show all customers, even those who have never placed an order, and include any orders they may have.

SELECT 
    c.customer_id,
    c.first_name,
    c.last_name,
    s.sale_id,
    s.sale_date,
    s.total_amount
FROM customers c
LEFT JOIN sales s ON c.customer_id = s.customer_id
ORDER BY c.customer_id;
Enter fullscreen mode Exit fullscreen mode

Result (first few rows):

customer_id first_name last_name sale_id sale_date total_amount
1 Karen Molina 1 2023-07-15 999.99
2 Elizabeth Archer 2 2023-08-20 1599.98
3 Roberta Massey 3 2023-09-10 499.99
4 Jacob Adams 4 2023-07-25 599.97
5 Cynthia Lowery 5 2023-06-18 1200.00
6 Emily King NULL NULL NULL
7 Linda Larsen NULL NULL NULL

Notice: Customers 6 and 7 (Emily King, Linda Larsen) have no sales, so the sale_id, sale_date, and total_amount are NULL. This is a great way to find customers who haven't purchased anything.

RIGHT JOIN

A RIGHT JOIN is the mirror of LEFT JOIN: it returns all rows from the right table, and matching rows from the left. Unmatched left‑table columns become NULL.
Because you can always rewrite a RIGHT JOIN as a LEFT JOIN by swapping the table order, many developers rarely use it. But it's good to know it exists.

Use case: List all products and show which ones have been sold (if any).

SELECT 
    p.product_id,
    p.product_name,
    s.sale_id,
    s.sale_date,
    s.total_amount
FROM sales s
RIGHT JOIN products p ON s.product_id = p.product_id
ORDER BY p.product_id;
Enter fullscreen mode Exit fullscreen mode

Result (first few rows):

product_id product_name sale_id sale_date total_amount
1 Laptop 1 2023-07-15 999.99
2 Smartphone 2 2023-08-20 1599.98
3 Washing Machine 3 2023-09-10 499.99
4 Headphones 4 2023-07-25 599.97
5 Refrigerator 5 2023-06-18 1200.00
6 Smart TV NULL NULL NULL
7 Microwave NULL NULL NULL

Products 6 (Smart TV) and 7 (Microwave) haven't been sold – their sale details are NULL.

FULL OUTER JOIN

FULL OUTER JOIN returns all rows from both tables, matching where possible. Unmatched rows from either side are filled with NULL. It's like a combination of LEFT and RIGHT joins.

Use case: See every customer and every product side by side – which is not very common in practice, but useful for finding orphans.

SELECT 
    c.customer_id,
    c.first_name,
    c.last_name,
    p.product_id,
    p.product_name,
    s.sale_id
FROM customers c
FULL OUTER JOIN sales s ON c.customer_id = s.customer_id
FULL OUTER JOIN products p ON s.product_id = p.product_id
LIMIT 10;  -- just to keep the output manageable
Enter fullscreen mode Exit fullscreen mode

Result (truncated):
You'll get a mix of customers without sales, products without sales, and actual transactions. Many NULL cells appear.

Because FULL OUTER JOIN can produce huge result sets, it's used less often, but it's indispensable for certain data reconciliation tasks.

CROSS JOIN

A CROSS JOIN returns the Cartesian product of two tables – every row from the first table paired with every row from the second. There's no ON condition.

Use case: Generate all possible combinations, e.g., for creating product‑customer promotional lists.

SELECT 
    c.customer_id,
    c.first_name,
    p.product_id,
    p.product_name
FROM customers c
CROSS JOIN products p
WHERE c.customer_id <= 3 AND p.product_id <= 3  -- limit for readability
ORDER BY c.customer_id, p.product_id;
Enter fullscreen mode Exit fullscreen mode

Result:

customer_id first_name product_id product_name
1 Karen 1 Laptop
1 Karen 2 Smartphone
1 Karen 3 Washing Machine
2 Elizabeth 1 Laptop
2 Elizabeth 2 Smartphone
2 Elizabeth 3 Washing Machine
3 Roberta 1 Laptop
3 Roberta 2 Smartphone
3 Roberta 3 Washing Machine

Every combination of the selected customers and products appears. This is handy for generating base data for analysis.

SELF JOIN

A SELF JOIN is when a table is joined with itself. It's useful for hierarchical data or comparing rows within the same table.

Use case: Find pairs of customers who have the same membership status.

SELECT 
    c1.customer_id AS customer1,
    c1.first_name AS first1,
    c1.last_name AS last1,
    c2.customer_id AS customer2,
    c2.first_name AS first2,
    c2.last_name AS last2,
    c1.membership_status
FROM customers c1
JOIN customers c2 ON c1.membership_status = c2.membership_status
WHERE c1.customer_id < c2.customer_id
ORDER BY c1.membership_status, customer1;
Enter fullscreen mode Exit fullscreen mode

Result (few rows):

customer1 first1 last1 customer2 first2 last2 membership_status
1 Karen Molina 3 Roberta Massey Bronze
1 Karen Molina 8 Angela Hanson Bronze
1 Karen Molina 9 Whitney Wilson Bronze
2 Elizabeth Archer 5 Cynthia Lowery Silver

We use c1.customer_id < c2.customer_id to avoid duplicate pairs (e.g., (1,3) and (3,1)) and to exclude a customer paired with themselves.

Joining Multiple Tables – Putting It All Together

Often you need to join more than two tables. Here's a query that combines customers, sales, and products to give a complete picture of each transaction.

SELECT 
    c.first_name,
    c.last_name,
    s.sale_date,
    p.product_name,
    p.category,
    s.quantity_sold,
    s.total_amount
FROM customers c
JOIN sales s ON c.customer_id = s.customer_id
JOIN products p ON s.product_id = p.product_id
WHERE c.membership_status = 'Gold'   -- only Gold members
ORDER BY s.sale_date;
Enter fullscreen mode Exit fullscreen mode

Result (few rows):

first_name last_name sale_date product_name category quantity_sold total_amount
James Rodriguez 2023-07-15 Gaming Console Electronics 1 350.00
Steven Burnett 2023-08-01 Microwave Appliances 1 180.00

This kind of multi‑table join is the backbone of reporting and analytics.

Now that you've seen all join types, you're ready to combine data from any set of related tables. In the next section, we'll explore Window Functions – a whole new level of analytical power.

Window Functions – Analyzing Data Without Losing Detail

Window functions are a game‑changer. Unlike regular aggregate functions (GROUP BY) that collapse multiple rows into one, window functions perform calculations across a set of rows related to the current row while keeping each row intact. This makes them perfect for rankings, running totals, moving averages, and comparing values between rows.

Basic Syntax

function_name (expression) OVER (
    [PARTITION BY column(s)]
    [ORDER BY column(s)]
    [frame_clause]
)
Enter fullscreen mode Exit fullscreen mode
  • function_name: e.g., ROW_NUMBER(), RANK(), SUM(), LAG()
  • PARTITION BY: divides rows into groups (optional) – the function is applied separately to each group.
  • ORDER BY: defines the order within each partition.
  • frame_clause: specifies which rows to include (for aggregate window functions).

We'll use our sales table, sometimes joined with customers or products, to illustrate.

1. ROW_NUMBER() – Unique Sequential Number

Assigns a unique integer to each row within a partition, starting at 1. Ties are broken arbitrarily.

Example: Number all sales by date

SELECT 
    sale_id,
    customer_id,
    sale_date,
    total_amount,
    ROW_NUMBER() OVER (ORDER BY sale_date) AS row_num
FROM sales;
Enter fullscreen mode Exit fullscreen mode

Result:

sale_id customer_id sale_date total_amount row_num
5 5 2023-06-18 1200.00 1
1 1 2023-07-15 999.99 2
4 4 2023-07-25 599.97 3
2 2 2023-08-20 1599.98 4
3 3 2023-09-10 499.99 5

Use case: Pagination, assigning a unique identifier to each row in a result set.

2. RANK() and DENSE_RANK() – Ranking with Ties

  • RANK() gives the same rank to equal values, then skips the next rank(s) (e.g., 1,1,3).
  • DENSE_RANK() also gives the same rank to ties, but does not skip numbers (e.g., 1,1,2).

Example: Rank customers by total spending (including those with no purchases).

WITH customer_spending AS (
    SELECT 
        c.customer_id,
        c.first_name,
        c.last_name,
        COALESCE(SUM(s.total_amount), 0) AS total_spent
    FROM customers c
    LEFT JOIN sales s ON c.customer_id = s.customer_id
    GROUP BY c.customer_id, c.first_name, c.last_name
)
SELECT 
    customer_id,
    first_name,
    last_name,
    total_spent,
    RANK() OVER (ORDER BY total_spent DESC) AS rank,
    DENSE_RANK() OVER (ORDER BY total_spent DESC) AS dense_rank
FROM customer_spending
ORDER BY total_spent DESC
LIMIT 10;  -- just show top 10 for brevity
Enter fullscreen mode Exit fullscreen mode

Result (first few rows):

customer_id first_name last_name total_spent rank dense_rank
2 Elizabeth Archer 1599.98 1 1
6 Emily King 1500.00 2 2
1 Karen Molina 999.99 3 3
5 Cynthia Lowery 1200.00 4 4
4 Jacob Adams 599.97 5 5
3 Roberta Massey 499.99 6 6
7 Linda Larsen 0.00 7 7
8 Angela Hanson 0.00 7 7
9 Whitney Wilson 0.00 7 7

Notice how RANK and DENSE_RANK handle ties differently: when multiple customers have zero spending, RANK skips from 7 to 10 (depending on how many zeros), while DENSE_RANK assigns the same rank to all zeros and then increments by 1.

3. SUM() as a Window Function – Running Totals

Using aggregate functions like SUM with OVER allows you to compute a running total without collapsing rows.

Example: Calculate a running total of sales amounts ordered by date.

SELECT 
    sale_id,
    sale_date,
    total_amount,
    SUM(total_amount) OVER (ORDER BY sale_date) AS running_total
FROM sales
ORDER BY sale_date;
Enter fullscreen mode Exit fullscreen mode

Result:

sale_id sale_date total_amount running_total
5 2023-06-18 1200.00 1200.00
1 2023-07-15 999.99 2199.99
4 2023-07-25 599.97 2799.96
2 2023-08-20 1599.98 4399.94
3 2023-09-10 499.99 4899.93

Each row's running_total is the sum of all total_amount from the earliest sale up to that row.

With partitioning: You can reset the running total for each customer.

SELECT 
    customer_id,
    sale_id,
    sale_date,
    total_amount,
    SUM(total_amount) OVER (PARTITION BY customer_id ORDER BY sale_date) AS customer_running_total
FROM sales
ORDER BY customer_id, sale_date;
Enter fullscreen mode Exit fullscreen mode

This would show, for each customer, the cumulative amount they've spent over time.

4. LAG() and LEAD() – Accessing Other Rows

LAG() lets you access data from a previous row, while LEAD() accesses a following row – great for comparing values.

Example: For each sale, show the previous sale amount (by date).

SELECT 
    sale_id,
    sale_date,
    total_amount,
    LAG(total_amount) OVER (ORDER BY sale_date) AS prev_sale_amount,
    total_amount - LAG(total_amount) OVER (ORDER BY sale_date) AS difference_from_prev
FROM sales
ORDER BY sale_date;
Enter fullscreen mode Exit fullscreen mode

Result:

sale_id sale_date total_amount prev_sale_amount difference_from_prev
5 2023-06-18 1200.00 NULL NULL
1 2023-07-15 999.99 1200.00 -200.01
4 2023-07-25 599.97 999.99 -400.02
2 2023-08-20 1599.98 599.97 1000.01
3 2023-09-10 499.99 1599.98 -1099.99

The first row has NULL because there's no previous sale.

5. NTILE() – Dividing into Buckets

NTILE(n) divides rows into n roughly equal groups (buckets) based on the ordering.

Example: Divide customers into 4 quartiles based on total spending.

WITH customer_spending AS (
    SELECT 
        c.customer_id,
        c.first_name,
        c.last_name,
        COALESCE(SUM(s.total_amount), 0) AS total_spent
    FROM customers c
    LEFT JOIN sales s ON c.customer_id = s.customer_id
    GROUP BY c.customer_id, c.first_name, c.last_name
)
SELECT 
    customer_id,
    first_name,
    last_name,
    total_spent,
    NTILE(4) OVER (ORDER BY total_spent DESC) AS spending_quartile
FROM customer_spending
ORDER BY total_spent DESC;
Enter fullscreen mode Exit fullscreen mode

Result (sample):

customer_id first_name last_name total_spent spending_quartile
2 Elizabeth Archer 1599.98 1
6 Emily King 1500.00 1
1 Karen Molina 999.99 1
5 Cynthia Lowery 1200.00 1
4 Jacob Adams 599.97 2
3 Roberta Massey 499.99 2
7 Linda Larsen 0.00 3
8 Angela Hanson 0.00 3
9 Whitney Wilson 0.00 4
10 Angela Atkins 75.00 2

Quartile 1 contains the top 25% of spenders, quartile 4 the bottom 25%.

6. Window Functions with PARTITION BY – Per‑Group Rankings

PARTITION BY restricts the window to rows with the same value in a column.

Example: Rank products by price within each category.

SELECT 
    product_id,
    product_name,
    category,
    price,
    RANK() OVER (PARTITION BY category ORDER BY price DESC) AS price_rank_in_category
FROM products
ORDER BY category, price_rank_in_category;
Enter fullscreen mode Exit fullscreen mode

Result (first few rows):

product_id product_name category price price_rank_in_category
5 Refrigerator Appliances 1200.00 1
3 Washing Machine Appliances 499.99 2
7 Microwave Appliances 180.00 3
8 Blender Appliances 50.00 4
6 Smart TV Electronics 1500.00 1
1 Laptop Electronics 999.99 2
2 Smartphone Electronics 799.99 3
9 Gaming Console Electronics 350.00 4
12 Monitor Electronics 250.00 5

Now you can see the most expensive product in each category at a glance.

Key Takeaways

Window functions do not collapse rows – they add calculated columns while preserving detail.

  • PARTITION BY is like GROUP BY within the window.
  • ORDER BY inside OVER controls the order for functions like ROW_NUMBER, RANK, and running totals.
  • Use RANK vs DENSE_RANK depending on whether you want gaps after ties.
  • LAG and LEAD are invaluable for time‑series and comparative analysis.
  • NTILE is great for segmenting data into percentiles or quartiles.

In the next section, we'll explore Common Table Expressions (CTEs) – a powerful way to organize complex queries, especially when combined with window functions.

Common Table Expressions (CTEs) – Organizing Complex Queries

A Common Table Expression (CTE) is a temporary result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. Think of it as a named subquery that makes your SQL more readable, reusable, and easier to debug.

Basic Syntax

WITH cte_name AS (
    -- CTE query
    SELECT ...
)
-- Main query that uses the CTE
SELECT ...
FROM cte_name;
Enter fullscreen mode Exit fullscreen mode

You can also define multiple CTEs in one WITH clause by separating them with commas:

WITH 
cte1 AS (SELECT ...),
cte2 AS (SELECT ...)
SELECT ...
FROM cte1 JOIN cte2 ON ...;
Enter fullscreen mode Exit fullscreen mode

CTEs are especially valuable when you need to:

  • Break a complex query into simple, logical steps.
  • Reference the same subquery multiple times in a query.
  • Create recursive queries (though we'll focus on non‑recursive here).

Example 1: Simple CTE – Total Spending per Customer

Suppose we want a list of customers with their total spending, but we also need to use that result in further calculations. Instead of repeating the aggregation, we can define a CTE.

WITH customer_spending AS (
    SELECT 
        c.customer_id,
        c.first_name,
        c.last_name,
        COALESCE(SUM(s.total_amount), 0) AS total_spent
    FROM customers c
    LEFT JOIN sales s ON c.customer_id = s.customer_id
    GROUP BY c.customer_id, c.first_name, c.last_name
)
SELECT 
    customer_id,
    first_name,
    last_name,
    total_spent
FROM customer_spending
ORDER BY total_spent DESC
LIMIT 5;
Enter fullscreen mode Exit fullscreen mode

Result (top 5 spenders):

customer_id first_name last_name total_spent
2 Elizabeth Archer 1599.98
6 Emily King 1500.00
5 Cynthia Lowery 1200.00
1 Karen Molina 999.99
4 Jacob Adams 599.97

Now customer_spending can be reused in the same query or even in subsequent CTEs.

Example 2: Chaining Multiple CTEs

You can build one CTE on top of another. This is great for step‑wise transformations.
Task: Find customers who spent more than the average spending of all customers.
We'll first compute each customer's total, then the overall average, then filter.

WITH customer_spending AS (
    SELECT 
        c.customer_id,
        c.first_name,
        c.last_name,
        COALESCE(SUM(s.total_amount), 0) AS total_spent
    FROM customers c
    LEFT JOIN sales s ON c.customer_id = s.customer_id
    GROUP BY c.customer_id, c.first_name, c.last_name
),
overall_avg AS (
    SELECT AVG(total_spent) AS avg_spent
    FROM customer_spending
)
SELECT 
    cs.customer_id,
    cs.first_name,
    cs.last_name,
    cs.total_spent,
    oa.avg_spent
FROM customer_spending cs
CROSS JOIN overall_avg oa
WHERE cs.total_spent > oa.avg_spent
ORDER BY cs.total_spent DESC;
Enter fullscreen mode Exit fullscreen mode

Result (sample):

customer_id first_name last_name total_spent avg_spent
2 Elizabeth Archer 1599.98 287.65
6 Emily King 1500.00 287.65
5 Cynthia Lowery 1200.00 287.65
1 Karen Molina 999.99 287.65
4 Jacob Adams 599.97 287.65
3 Roberta Massey 499.99 287.65

(Here the average is around 287.65 – you can see which customers are above it.)

Example 3: CTEs with Window Functions

CTEs and window functions are a match made in heaven. The CTE prepares the data, and the window function adds analytics.
Task: Rank customers by total spending and show their quartile.

WITH customer_spending AS (
    SELECT 
        c.customer_id,
        c.first_name,
        c.last_name,
        COALESCE(SUM(s.total_amount), 0) AS total_spent
    FROM customers c
    LEFT JOIN sales s ON c.customer_id = s.customer_id
    GROUP BY c.customer_id, c.first_name, c.last_name
)
SELECT 
    customer_id,
    first_name,
    last_name,
    total_spent,
    RANK() OVER (ORDER BY total_spent DESC) AS spending_rank,
    NTILE(4) OVER (ORDER BY total_spent DESC) AS spending_quartile
FROM customer_spending
ORDER BY total_spent DESC;
Enter fullscreen mode Exit fullscreen mode

Result (first few rows):

customer_id first_name last_name total_spent spending_rank spending_quartile
2 Elizabeth Archer 1599.98 1 1
6 Emily King 1500.00 2 1
5 Cynthia Lowery 1200.00 3 1
1 Karen Molina 999.99 4 1
4 Jacob Adams 599.97 5 2
3 Roberta Massey 499.99 6 2
7 Linda Larsen 0.00 7 3
8 Angela Hanson 0.00 7 3
9 Whitney Wilson 0.00 7 4
10 Angela Atkins 75.00 10 2

Notice how RANK handles ties (customers with zero spending share rank 7) and NTILE distributes rows into four buckets as evenly as possible.

Example 4: CTEs for Readability – Multi‑Step Analysis

Imagine you need to produce a report that shows, for each product category, the top‑selling product and its sales. Without CTEs, the query could become a tangled mess of subqueries. With CTEs, it's clean:

WITH category_sales AS (
    SELECT 
        p.category,
        p.product_name,
        SUM(s.total_amount) AS product_sales
    FROM products p
    LEFT JOIN sales s ON p.product_id = s.product_id
    GROUP BY p.category, p.product_name
),
ranked_products AS (
    SELECT 
        category,
        product_name,
        product_sales,
        RANK() OVER (PARTITION BY category ORDER BY product_sales DESC) AS rank
    FROM category_sales
)
SELECT 
    category,
    product_name,
    product_sales
FROM ranked_products
WHERE rank = 1;
Enter fullscreen mode Exit fullscreen mode

Result:

category product_name product_sales
Appliances Refrigerator 1200.00
Electronics Smart TV 1500.00
Accessories Headphones 599.97

(Assuming only a few sales; your actual data may vary.)

Key Takeaways

  • CTEs improve readability by letting you name intermediate results.
  • They are not materialized (in most databases) – they're just syntax, but they make complex queries easier to write and debug.
  • You can use them with joins, window functions, and aggregations.
  • Multiple CTEs can be chained to build up a solution step by step.
  • They are essential for recursive queries (though we didn't cover that here).

Now that you've seen how CTEs can organize even the most intricate logic, you're ready to tackle advanced analytical queries that combine all these concepts.

In the final section, we'll bring everything together with some real‑world analytical problems that use joins, window functions, and CTEs in harmony.

Putting It All Together – Real‑World Analytical Problems

Now that we've covered joins, window functions, and CTEs individually, let's combine them to solve realistic business questions. These examples show how these tools work in harmony to extract valuable insights from our e‑commerce data.

Example 1: Top 3 Customers by Spending in Each Membership Tier

Business question: For each membership tier (Bronze, Silver, Gold), who are the top 3 customers by total spending?

Approach:

  1. CTE to compute total spending per customer, including their membership status.
  2. Window function RANK() partitioned by membership status to rank customers within their tier.
  3. Filter to keep only ranks 1–3.
WITH customer_spending AS (
    SELECT 
        c.customer_id,
        c.first_name,
        c.last_name,
        c.membership_status,
        COALESCE(SUM(s.total_amount), 0) AS total_spent
    FROM customers c
    LEFT JOIN sales s ON c.customer_id = s.customer_id
    GROUP BY c.customer_id, c.first_name, c.last_name, c.membership_status
),
ranked_customers AS (
    SELECT 
        customer_id,
        first_name,
        last_name,
        membership_status,
        total_spent,
        RANK() OVER (PARTITION BY membership_status ORDER BY total_spent DESC) AS rank_in_tier
    FROM customer_spending
)
SELECT 
    membership_status,
    rank_in_tier,
    customer_id,
    first_name,
    last_name,
    total_spent
FROM ranked_customers
WHERE rank_in_tier <= 3
ORDER BY membership_status, rank_in_tier;
Enter fullscreen mode Exit fullscreen mode

Result (sample from our data):

membership_status rank_in_tier customer_id first_name last_name total_spent
Bronze 1 1 Karen Molina 999.99
Bronze 2 3 Roberta Massey 499.99
Bronze 3 8 Angela Hanson 0.00
Gold 1 4 Jacob Adams 599.97
Gold 2 17 James Rodriguez 350.00
Gold 3 18 Steven Burnett 180.00
Silver 1 2 Elizabeth Archer 1599.98
Silver 2 5 Cynthia Lowery 1200.00
Silver 3 6 Emily King 1500.00

This query combines a LEFT JOIN (to include customers with no purchases), a CTE for the base aggregation, a window function for ranking, and finally a filter on the rank. The result gives us a quick view of the best customers in each tier.

Example 2: Product Performance vs. Category Average

Business question: For each product, show its total sales and how much it differs from the average sales of products in the same category. Which products are outperforming their category?

Approach:

  1. First CTE computes total sales per product with category.
  2. Second CTE computes average sales per category.
  3. Main query joins the two and calculates the difference.
WITH product_sales AS (
    SELECT 
        p.category,
        p.product_id,
        p.product_name,
        COALESCE(SUM(s.total_amount), 0) AS product_total
    FROM products p
    LEFT JOIN sales s ON p.product_id = s.product_id
    GROUP BY p.category, p.product_id, p.product_name
),
category_avg AS (
    SELECT 
        category,
        AVG(product_total) AS avg_in_category
    FROM product_sales
    GROUP BY category
)
SELECT 
    ps.category,
    ps.product_name,
    ps.product_total,
    ca.avg_in_category,
    ps.product_total - ca.avg_in_category AS difference_from_avg
FROM product_sales ps
JOIN category_avg ca ON ps.category = ca.category
ORDER BY ps.category, difference_from_avg DESC;
Enter fullscreen mode Exit fullscreen mode

Result (sample):

category product_name product_total avg_in_category difference_from_avg
Accessories Headphones 599.97 99.99 499.98
Accessories Wireless Mouse 0.00 99.99 -99.99
Accessories Keyboard 0.00 99.99 -99.99
Appliances Refrigerator 1200.00 244.99 955.01
Appliances Washing Machine 499.99 244.99 255.00
Appliances Microwave 180.00 244.99 -64.99
Appliances Blender 0.00 244.99 -244.99
Electronics Smart TV 1500.00 424.99 1075.01
Electronics Laptop 999.99 424.99 575.00
Electronics Smartphone 1599.98 424.99 1174.99
Electronics Gaming Console 350.00 424.99 -74.99
Electronics Monitor 0.00 424.99 -424.99
Electronics External Hard Drive 0.00 424.99 -424.99
Electronics Tablet 0.00 424.99 -424.99
Electronics Smartwatch 0.00 424.99 -424.99

Here, joins link products to sales, CTEs structure the intermediate calculations, and the final ORDER BY highlights top performers. We can immediately see which products are stars and which are lagging in their category.

Example 3: Customers Who Made Consecutive Purchases Within 7 Days

Business question: Identify customers who have made purchases within 7 days of a previous purchase – a sign of engaged shoppers.

Approach:

  1. Use LAG() window function to get the previous purchase date for each customer.
  2. Compute the date difference.
  3. Filter where the gap is ≤ 7 days.
WITH customer_purchases AS (
    SELECT 
        customer_id,
        sale_date,
        LAG(sale_date) OVER (PARTITION BY customer_id ORDER BY sale_date) AS prev_purchase_date
    FROM sales
)
SELECT DISTINCT
    c.customer_id,
    c.first_name,
    c.last_name,
    cp.sale_date AS purchase_date,
    cp.prev_purchase_date,
    (cp.sale_date - cp.prev_purchase_date) AS days_between
FROM customer_purchases cp
JOIN customers c ON cp.customer_id = c.customer_id
WHERE cp.prev_purchase_date IS NOT NULL
  AND (cp.sale_date - cp.prev_purchase_date) <= 7
ORDER BY c.customer_id, cp.sale_date;
Enter fullscreen mode Exit fullscreen mode

Result (In respect to our data)

customer_id first_name last_name purchase_date prev_purchase_date days_between
1 Karen Molina 2023-07-15 2023-07-10 5
5 Cynthia Lowery 2023-06-18 2023-06-12 6

This query uses a window function (LAG) within a CTE, then joins with the customers table to get names, and finally filters. It demonstrates how window functions can identify sequential patterns.

Example 4: Monthly Sales Trends – Running Total and Month‑over‑Month Change

Business question: Show monthly sales totals, a running total, and the percentage change from the previous month.

Approach:

  1. CTE to aggregate sales by month.
  2. Window functions for running total (SUM() OVER) and previous month's sales (LAG).
  3. Compute the change.
WITH monthly_sales AS (
    SELECT 
        DATE_TRUNC('month', sale_date)::DATE AS month,
        SUM(total_amount) AS monthly_total
    FROM sales
    GROUP BY DATE_TRUNC('month', sale_date)
)
SELECT 
    month,
    monthly_total,
    SUM(monthly_total) OVER (ORDER BY month) AS running_total,
    LAG(monthly_total) OVER (ORDER BY month) AS prev_month_total,
    CASE 
        WHEN LAG(monthly_total) OVER (ORDER BY month) IS NOT NULL 
        THEN ROUND(100.0 * (monthly_total - LAG(monthly_total) OVER (ORDER BY month)) / LAG(monthly_total) OVER (ORDER BY month), 2)
        ELSE NULL 
    END AS pct_change
FROM monthly_sales
ORDER BY month;
Enter fullscreen mode Exit fullscreen mode

Result;

month monthly_total running_total prev_month_total pct_change
2023-06-01 1200.00 1200.00 NULL NULL
2023-07-01 1599.96 2799.96 1200.00 33.33
2023-08-01 1599.98 4399.94 1599.96 0.00
2023-09-01 499.99 4899.93 1599.98 -68.75

Here, the CTE aggregates data, and window functions add running total and previous‑month comparison. This gives a concise view of sales momentum.

Conclusion

Throughout this article, we've explored three powerful SQL concepts—joins, window functions, and CTEs—using a practical e‑commerce dataset. As a data professional, knowing when and how to apply these tools is essential for building efficient, readable, and maintainable queries.

Here's a quick reference guide based on what we've learned:

Use Window Functions When:

  • You need to perform calculations across rows without collapsing the result set (e.g., running totals, moving averages).
  • You're working with rankings or percentiles (e.g., top customers by spending, quartiles).
  • You need to compare values between rows (e.g., previous sale amount, month‑over‑month change).
  • Your analysis requires retaining detail while still gaining aggregate insights.

Use CTEs When:

  • You're breaking down a complex query into simpler, logical steps.
  • The same intermediate result needs to be referenced multiple times in a query.
  • You want to improve readability and make your code easier to debug.
  • You're preparing data for window functions or further transformations.

Use Joins When:

  • You need to combine data from multiple related tables (e.g., customers with their orders).
  • Your database is normalized and information is spread across tables.
  • You're building reporting datasets that require fields from different sources.
  • You need to understand relationships between entities in your data.

By mastering these advanced SQL concepts, you're not just writing queries—you're building a toolkit for solving real business problems. Whether it's identifying top‑performing products, understanding customer behavior, or tracking sales trends over time, the techniques you've learned here will serve you across industries and datasets.

Data Analysis Step by Step;

1st Read: Git & Github Beginner's guide

If you’re also learning version control with Git, you can read my Git & GitHub beginner’s guide here:
https://dev.to/charles_ndungu/git-for-data-scientists-data-engineers-my-very-first-beginner-guide-git-bash-github-3952

2nd Read: Mastering Excel

After mastering Git basics, you can learn how to analyze data using Microsoft Excel here:
https://dev.to/charles_ndungu/ms-excel-for-data-analytics-a-friendly-practical-guide-for-beginners-hjn

3rd Read: Data Modelling & Schemas

This article dives into data modelling in Power BI, covering star and snowflake schemas, fact and dimension tables, relationships, and why good modelling is essential for accurate insights and fast reports.
https://dev.to/charles_ndungu/the-backbone-of-power-bi-a-deep-dive-into-data-modeling-schemas-1o1l

4th Read: Data Analysis Steps in Power BI

This article reveals how Power BI analysts act as data translators, bridging the gap between messy data and clear business action. We break down their essential three-step process: cleaning raw information, encoding logic with DAX, and designing dashboards that drive real decisions.
https://dev.to/charles_ndungu/from-raw-data-to-real-action-the-analysts-journey-as-a-data-translator-in-power-bi-2gl6

5th Read: From Tables to Insights – A SQL Masterclass

Data lives scattered across tables—customers here, products there, sales everywhere. This article is your guide to bringing it all together. Follow along with a real online store dataset as we build queries that rank top customers, spot sales trends, and uncover hidden patterns. By the end, you won't just write SQL—you'll tell stories with data.
https://dev.to/charles_ndungu/mastering-sql-joins-and-window-functions-a-practical-guide-with-an-e-commerce-dataset-ejg

Repo

https://github.com/Charles-Ndungu/excel-for-data-analytics

Top comments (0)