Introduction & The Dataset
If you're working with SQL, you've probably used basic SELECT statements, filtered with WHERE, and maybe even done some simple aggregations. But the real power of SQL unfolds when you start combining data from multiple tables and performing complex calculations across rows without losing detail. That's where JOINs and Window Functions come in.
In this article, we'll dive deep into these advanced concepts using a realistic e‑commerce dataset.
By the end, you'll be able to:
- Combine data from multiple tables using different types of joins.
- Perform rankings, running totals, and comparisons across rows with window functions.
- Organize complex queries using Common Table Expressions (CTEs).
- Solve practical business questions that real data analysts face every day.
We'll use a consistent dataset throughout – so you can follow along, try the queries yourself, and truly master these techniques.
Meet Our Data
Imagine we're running an online store. Our database has three main tables:
-
customers– people who registered on our site. -
products– items we sell, with prices and stock. -
sales– records of purchases, linking customers to products.
Here are the table structures (simplified for clarity) and a few sample rows.
1. Customers
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(100),
phone_number VARCHAR(50),
registration_date DATE,
membership_status VARCHAR(10) -- 'Bronze', 'Silver', 'Gold'
);
| customer_id | first_name | last_name | phone_number | registration_date | membership_status | |
|---|---|---|---|---|---|---|
| 1 | Karen | Molina | gonzalezkimberly@glass.com | (728)697-1206 | 2020-08-27 | Bronze |
| 2 | Elizabeth | Archer | tramirez@gmail.com | 778.104.6553 | 2023-08-28 | Silver |
| 3 | Roberta | Massey | davislori@gmail.com | +1-365-606-7458 | 2024-06-12 | Bronze |
2. Products
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
category VARCHAR(50),
price DECIMAL(10, 2),
supplier VARCHAR(100),
stock_quantity INT
);
| product_id | product_name | category | price | supplier | stock_quantity |
|---|---|---|---|---|---|
| 1 | Laptop | Electronics | 999.99 | Dell | 50 |
| 2 | Smartphone | Electronics | 799.99 | Samsung | 150 |
| 3 | Washing Machine | Appliances | 499.99 | LG | 30 |
| 4 | Headphones | Accessories | 199.99 | Sony | 100 |
3. Sales
CREATE TABLE sales (
sale_id INT PRIMARY KEY,
customer_id INT,
product_id INT,
quantity_sold INT,
sale_date DATE,
total_amount DECIMAL(10, 2),
FOREIGN KEY (customer_id) REFERENCES customers(customer_id),
FOREIGN KEY (product_id) REFERENCES products(product_id)
);
| sale_id | customer_id | product_id | quantity_sold | sale_date | total_amount |
|---|---|---|---|---|---|
| 1 | 1 | 1 | 1 | 2023-07-15 | 999.99 |
| 2 | 2 | 2 | 2 | 2023-08-20 | 1599.98 |
| 3 | 3 | 3 | 1 | 2023-09-10 | 499.99 |
| 4 | 4 | 4 | 3 | 2023-07-25 | 599.97 |
What We'll Build
We'll use these tables to answer questions like:
- "Which customers have never placed an order?"
- "What is the total revenue per product category?"
- "Who are the top 3 customers by spending, and how do they rank within their membership tier?"
- "How does each sale compare to the previous sale for the same customer?"
By the time you finish reading, you'll be able to write queries that combine these tables and apply window functions like a pro.
SQL Joins – Bringing Data Together
In a well‑designed database, data is split across multiple tables to avoid redundancy. For example, we store customer details in one table, product information in another, and sales transactions in a third. To answer real‑world questions, we need to join these tables back together.
INNER JOIN
An INNER JOIN returns only the rows that have matching values in both tables. Rows without a match are excluded.
Use case: Find all customers who have placed orders, along with their order details.
Example 1: List customers and their orders
SELECT
c.customer_id,
c.first_name,
c.last_name,
s.sale_id,
s.sale_date,
s.total_amount
FROM customers c
INNER JOIN sales s ON c.customer_id = s.customer_id;
Result:
| customer_id | first_name | last_name | sale_id | sale_date | total_amount |
|---|---|---|---|---|---|
| 1 | Karen | Molina | 1 | 2023-07-15 | 999.99 |
| 2 | Elizabeth | Archer | 2 | 2023-08-20 | 1599.98 |
| 3 | Roberta | Massey | 3 | 2023-09-10 | 499.99 |
| 4 | Jacob | Adams | 4 | 2023-07-25 | 599.97 |
| 5 | Cynthia | Lowery | 5 | 2023-06-18 | 1200.00 |
What happened?
The INNER JOIN combined rows from customers and sales wherever customer_id matched. Customers who have never placed an order (like some of our 50 customers) do not appear in the result.
Example 2: Add product details to the orders
We can join more than two tables. Let's include the product name and price.
SELECT
c.first_name,
c.last_name,
s.sale_date,
p.product_name,
s.quantity_sold,
s.total_amount
FROM customers c
INNER JOIN sales s ON c.customer_id = s.customer_id
INNER JOIN products p ON s.product_id = p.product_id;
Result:
| first_name | last_name | sale_date | product_name | quantity_sold | total_amount |
|---|---|---|---|---|---|
| Karen | Molina | 2023-07-15 | Laptop | 1 | 999.99 |
| Elizabeth | Archer | 2023-08-20 | Smartphone | 2 | 1599.98 |
| Roberta | Massey | 2023-09-10 | Washing Machine | 1 | 499.99 |
| Jacob | Adams | 2023-07-25 | Headphones | 3 | 599.97 |
| Cynthia | Lowery | 2023-06-18 | Refrigerator | 1 | 1200.00 |
Key point: Each INNER JOIN adds another table, filtering out rows that don't match the join condition. Only orders with valid customer and product IDs appear.
LEFT JOIN (or LEFT OUTER JOIN)
A LEFT JOIN returns all rows from the left table, and matching rows from the right table. If there's no match, the right‑table columns are filled with NULL.
Use case: Show all customers, even those who have never placed an order, and include any orders they may have.
SELECT
c.customer_id,
c.first_name,
c.last_name,
s.sale_id,
s.sale_date,
s.total_amount
FROM customers c
LEFT JOIN sales s ON c.customer_id = s.customer_id
ORDER BY c.customer_id;
Result (first few rows):
| customer_id | first_name | last_name | sale_id | sale_date | total_amount |
|---|---|---|---|---|---|
| 1 | Karen | Molina | 1 | 2023-07-15 | 999.99 |
| 2 | Elizabeth | Archer | 2 | 2023-08-20 | 1599.98 |
| 3 | Roberta | Massey | 3 | 2023-09-10 | 499.99 |
| 4 | Jacob | Adams | 4 | 2023-07-25 | 599.97 |
| 5 | Cynthia | Lowery | 5 | 2023-06-18 | 1200.00 |
| 6 | Emily | King | NULL | NULL | NULL |
| 7 | Linda | Larsen | NULL | NULL | NULL |
Notice: Customers 6 and 7 (Emily King, Linda Larsen) have no sales, so the sale_id, sale_date, and total_amount are NULL. This is a great way to find customers who haven't purchased anything.
RIGHT JOIN
A RIGHT JOIN is the mirror of LEFT JOIN: it returns all rows from the right table, and matching rows from the left. Unmatched left‑table columns become NULL.
Because you can always rewrite a RIGHT JOIN as a LEFT JOIN by swapping the table order, many developers rarely use it. But it's good to know it exists.
Use case: List all products and show which ones have been sold (if any).
SELECT
p.product_id,
p.product_name,
s.sale_id,
s.sale_date,
s.total_amount
FROM sales s
RIGHT JOIN products p ON s.product_id = p.product_id
ORDER BY p.product_id;
Result (first few rows):
| product_id | product_name | sale_id | sale_date | total_amount |
|---|---|---|---|---|
| 1 | Laptop | 1 | 2023-07-15 | 999.99 |
| 2 | Smartphone | 2 | 2023-08-20 | 1599.98 |
| 3 | Washing Machine | 3 | 2023-09-10 | 499.99 |
| 4 | Headphones | 4 | 2023-07-25 | 599.97 |
| 5 | Refrigerator | 5 | 2023-06-18 | 1200.00 |
| 6 | Smart TV | NULL | NULL | NULL |
| 7 | Microwave | NULL | NULL | NULL |
Products 6 (Smart TV) and 7 (Microwave) haven't been sold – their sale details are NULL.
FULL OUTER JOIN
FULL OUTER JOIN returns all rows from both tables, matching where possible. Unmatched rows from either side are filled with NULL. It's like a combination of LEFT and RIGHT joins.
Use case: See every customer and every product side by side – which is not very common in practice, but useful for finding orphans.
SELECT
c.customer_id,
c.first_name,
c.last_name,
p.product_id,
p.product_name,
s.sale_id
FROM customers c
FULL OUTER JOIN sales s ON c.customer_id = s.customer_id
FULL OUTER JOIN products p ON s.product_id = p.product_id
LIMIT 10; -- just to keep the output manageable
Result (truncated):
You'll get a mix of customers without sales, products without sales, and actual transactions. Many NULL cells appear.
Because FULL OUTER JOIN can produce huge result sets, it's used less often, but it's indispensable for certain data reconciliation tasks.
CROSS JOIN
A CROSS JOIN returns the Cartesian product of two tables – every row from the first table paired with every row from the second. There's no ON condition.
Use case: Generate all possible combinations, e.g., for creating product‑customer promotional lists.
SELECT
c.customer_id,
c.first_name,
p.product_id,
p.product_name
FROM customers c
CROSS JOIN products p
WHERE c.customer_id <= 3 AND p.product_id <= 3 -- limit for readability
ORDER BY c.customer_id, p.product_id;
Result:
| customer_id | first_name | product_id | product_name |
|---|---|---|---|
| 1 | Karen | 1 | Laptop |
| 1 | Karen | 2 | Smartphone |
| 1 | Karen | 3 | Washing Machine |
| 2 | Elizabeth | 1 | Laptop |
| 2 | Elizabeth | 2 | Smartphone |
| 2 | Elizabeth | 3 | Washing Machine |
| 3 | Roberta | 1 | Laptop |
| 3 | Roberta | 2 | Smartphone |
| 3 | Roberta | 3 | Washing Machine |
Every combination of the selected customers and products appears. This is handy for generating base data for analysis.
SELF JOIN
A SELF JOIN is when a table is joined with itself. It's useful for hierarchical data or comparing rows within the same table.
Use case: Find pairs of customers who have the same membership status.
SELECT
c1.customer_id AS customer1,
c1.first_name AS first1,
c1.last_name AS last1,
c2.customer_id AS customer2,
c2.first_name AS first2,
c2.last_name AS last2,
c1.membership_status
FROM customers c1
JOIN customers c2 ON c1.membership_status = c2.membership_status
WHERE c1.customer_id < c2.customer_id
ORDER BY c1.membership_status, customer1;
Result (few rows):
| customer1 | first1 | last1 | customer2 | first2 | last2 | membership_status |
|---|---|---|---|---|---|---|
| 1 | Karen | Molina | 3 | Roberta | Massey | Bronze |
| 1 | Karen | Molina | 8 | Angela | Hanson | Bronze |
| 1 | Karen | Molina | 9 | Whitney | Wilson | Bronze |
| 2 | Elizabeth | Archer | 5 | Cynthia | Lowery | Silver |
We use c1.customer_id < c2.customer_id to avoid duplicate pairs (e.g., (1,3) and (3,1)) and to exclude a customer paired with themselves.
Joining Multiple Tables – Putting It All Together
Often you need to join more than two tables. Here's a query that combines customers, sales, and products to give a complete picture of each transaction.
SELECT
c.first_name,
c.last_name,
s.sale_date,
p.product_name,
p.category,
s.quantity_sold,
s.total_amount
FROM customers c
JOIN sales s ON c.customer_id = s.customer_id
JOIN products p ON s.product_id = p.product_id
WHERE c.membership_status = 'Gold' -- only Gold members
ORDER BY s.sale_date;
Result (few rows):
| first_name | last_name | sale_date | product_name | category | quantity_sold | total_amount |
|---|---|---|---|---|---|---|
| James | Rodriguez | 2023-07-15 | Gaming Console | Electronics | 1 | 350.00 |
| Steven | Burnett | 2023-08-01 | Microwave | Appliances | 1 | 180.00 |
This kind of multi‑table join is the backbone of reporting and analytics.
Now that you've seen all join types, you're ready to combine data from any set of related tables. In the next section, we'll explore Window Functions – a whole new level of analytical power.
Window Functions – Analyzing Data Without Losing Detail
Window functions are a game‑changer. Unlike regular aggregate functions (GROUP BY) that collapse multiple rows into one, window functions perform calculations across a set of rows related to the current row while keeping each row intact. This makes them perfect for rankings, running totals, moving averages, and comparing values between rows.
Basic Syntax
function_name (expression) OVER (
[PARTITION BY column(s)]
[ORDER BY column(s)]
[frame_clause]
)
-
function_name: e.g.,ROW_NUMBER(),RANK(),SUM(),LAG() -
PARTITION BY: divides rows into groups (optional) – the function is applied separately to each group. -
ORDER BY: defines the order within each partition. - frame_clause: specifies which rows to include (for aggregate window functions).
We'll use our sales table, sometimes joined with customers or products, to illustrate.
1. ROW_NUMBER() – Unique Sequential Number
Assigns a unique integer to each row within a partition, starting at 1. Ties are broken arbitrarily.
Example: Number all sales by date
SELECT
sale_id,
customer_id,
sale_date,
total_amount,
ROW_NUMBER() OVER (ORDER BY sale_date) AS row_num
FROM sales;
Result:
| sale_id | customer_id | sale_date | total_amount | row_num |
|---|---|---|---|---|
| 5 | 5 | 2023-06-18 | 1200.00 | 1 |
| 1 | 1 | 2023-07-15 | 999.99 | 2 |
| 4 | 4 | 2023-07-25 | 599.97 | 3 |
| 2 | 2 | 2023-08-20 | 1599.98 | 4 |
| 3 | 3 | 2023-09-10 | 499.99 | 5 |
Use case: Pagination, assigning a unique identifier to each row in a result set.
2. RANK() and DENSE_RANK() – Ranking with Ties
-
RANK()gives the same rank to equal values, then skips the next rank(s) (e.g., 1,1,3). -
DENSE_RANK()also gives the same rank to ties, but does not skip numbers (e.g., 1,1,2).
Example: Rank customers by total spending (including those with no purchases).
WITH customer_spending AS (
SELECT
c.customer_id,
c.first_name,
c.last_name,
COALESCE(SUM(s.total_amount), 0) AS total_spent
FROM customers c
LEFT JOIN sales s ON c.customer_id = s.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name
)
SELECT
customer_id,
first_name,
last_name,
total_spent,
RANK() OVER (ORDER BY total_spent DESC) AS rank,
DENSE_RANK() OVER (ORDER BY total_spent DESC) AS dense_rank
FROM customer_spending
ORDER BY total_spent DESC
LIMIT 10; -- just show top 10 for brevity
Result (first few rows):
| customer_id | first_name | last_name | total_spent | rank | dense_rank |
|---|---|---|---|---|---|
| 2 | Elizabeth | Archer | 1599.98 | 1 | 1 |
| 6 | Emily | King | 1500.00 | 2 | 2 |
| 1 | Karen | Molina | 999.99 | 3 | 3 |
| 5 | Cynthia | Lowery | 1200.00 | 4 | 4 |
| 4 | Jacob | Adams | 599.97 | 5 | 5 |
| 3 | Roberta | Massey | 499.99 | 6 | 6 |
| 7 | Linda | Larsen | 0.00 | 7 | 7 |
| 8 | Angela | Hanson | 0.00 | 7 | 7 |
| 9 | Whitney | Wilson | 0.00 | 7 | 7 |
Notice how RANK and DENSE_RANK handle ties differently: when multiple customers have zero spending, RANK skips from 7 to 10 (depending on how many zeros), while DENSE_RANK assigns the same rank to all zeros and then increments by 1.
3. SUM() as a Window Function – Running Totals
Using aggregate functions like SUM with OVER allows you to compute a running total without collapsing rows.
Example: Calculate a running total of sales amounts ordered by date.
SELECT
sale_id,
sale_date,
total_amount,
SUM(total_amount) OVER (ORDER BY sale_date) AS running_total
FROM sales
ORDER BY sale_date;
Result:
| sale_id | sale_date | total_amount | running_total |
|---|---|---|---|
| 5 | 2023-06-18 | 1200.00 | 1200.00 |
| 1 | 2023-07-15 | 999.99 | 2199.99 |
| 4 | 2023-07-25 | 599.97 | 2799.96 |
| 2 | 2023-08-20 | 1599.98 | 4399.94 |
| 3 | 2023-09-10 | 499.99 | 4899.93 |
Each row's running_total is the sum of all total_amount from the earliest sale up to that row.
With partitioning: You can reset the running total for each customer.
SELECT
customer_id,
sale_id,
sale_date,
total_amount,
SUM(total_amount) OVER (PARTITION BY customer_id ORDER BY sale_date) AS customer_running_total
FROM sales
ORDER BY customer_id, sale_date;
This would show, for each customer, the cumulative amount they've spent over time.
4. LAG() and LEAD() – Accessing Other Rows
LAG() lets you access data from a previous row, while LEAD() accesses a following row – great for comparing values.
Example: For each sale, show the previous sale amount (by date).
SELECT
sale_id,
sale_date,
total_amount,
LAG(total_amount) OVER (ORDER BY sale_date) AS prev_sale_amount,
total_amount - LAG(total_amount) OVER (ORDER BY sale_date) AS difference_from_prev
FROM sales
ORDER BY sale_date;
Result:
| sale_id | sale_date | total_amount | prev_sale_amount | difference_from_prev |
|---|---|---|---|---|
| 5 | 2023-06-18 | 1200.00 | NULL | NULL |
| 1 | 2023-07-15 | 999.99 | 1200.00 | -200.01 |
| 4 | 2023-07-25 | 599.97 | 999.99 | -400.02 |
| 2 | 2023-08-20 | 1599.98 | 599.97 | 1000.01 |
| 3 | 2023-09-10 | 499.99 | 1599.98 | -1099.99 |
The first row has NULL because there's no previous sale.
5. NTILE() – Dividing into Buckets
NTILE(n) divides rows into n roughly equal groups (buckets) based on the ordering.
Example: Divide customers into 4 quartiles based on total spending.
WITH customer_spending AS (
SELECT
c.customer_id,
c.first_name,
c.last_name,
COALESCE(SUM(s.total_amount), 0) AS total_spent
FROM customers c
LEFT JOIN sales s ON c.customer_id = s.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name
)
SELECT
customer_id,
first_name,
last_name,
total_spent,
NTILE(4) OVER (ORDER BY total_spent DESC) AS spending_quartile
FROM customer_spending
ORDER BY total_spent DESC;
Result (sample):
| customer_id | first_name | last_name | total_spent | spending_quartile |
|---|---|---|---|---|
| 2 | Elizabeth | Archer | 1599.98 | 1 |
| 6 | Emily | King | 1500.00 | 1 |
| 1 | Karen | Molina | 999.99 | 1 |
| 5 | Cynthia | Lowery | 1200.00 | 1 |
| 4 | Jacob | Adams | 599.97 | 2 |
| 3 | Roberta | Massey | 499.99 | 2 |
| 7 | Linda | Larsen | 0.00 | 3 |
| 8 | Angela | Hanson | 0.00 | 3 |
| 9 | Whitney | Wilson | 0.00 | 4 |
| 10 | Angela | Atkins | 75.00 | 2 |
Quartile 1 contains the top 25% of spenders, quartile 4 the bottom 25%.
6. Window Functions with PARTITION BY – Per‑Group Rankings
PARTITION BY restricts the window to rows with the same value in a column.
Example: Rank products by price within each category.
SELECT
product_id,
product_name,
category,
price,
RANK() OVER (PARTITION BY category ORDER BY price DESC) AS price_rank_in_category
FROM products
ORDER BY category, price_rank_in_category;
Result (first few rows):
| product_id | product_name | category | price | price_rank_in_category |
|---|---|---|---|---|
| 5 | Refrigerator | Appliances | 1200.00 | 1 |
| 3 | Washing Machine | Appliances | 499.99 | 2 |
| 7 | Microwave | Appliances | 180.00 | 3 |
| 8 | Blender | Appliances | 50.00 | 4 |
| 6 | Smart TV | Electronics | 1500.00 | 1 |
| 1 | Laptop | Electronics | 999.99 | 2 |
| 2 | Smartphone | Electronics | 799.99 | 3 |
| 9 | Gaming Console | Electronics | 350.00 | 4 |
| 12 | Monitor | Electronics | 250.00 | 5 |
Now you can see the most expensive product in each category at a glance.
Key Takeaways
Window functions do not collapse rows – they add calculated columns while preserving detail.
-
PARTITION BYis likeGROUP BYwithin the window. -
ORDER BYinsideOVERcontrols the order for functions likeROW_NUMBER,RANK, and running totals. - Use
RANKvsDENSE_RANKdepending on whether you want gaps after ties. -
LAGandLEADare invaluable for time‑series and comparative analysis. -
NTILEis great for segmenting data into percentiles or quartiles.
In the next section, we'll explore Common Table Expressions (CTEs) – a powerful way to organize complex queries, especially when combined with window functions.
Common Table Expressions (CTEs) – Organizing Complex Queries
A Common Table Expression (CTE) is a temporary result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. Think of it as a named subquery that makes your SQL more readable, reusable, and easier to debug.
Basic Syntax
WITH cte_name AS (
-- CTE query
SELECT ...
)
-- Main query that uses the CTE
SELECT ...
FROM cte_name;
You can also define multiple CTEs in one WITH clause by separating them with commas:
WITH
cte1 AS (SELECT ...),
cte2 AS (SELECT ...)
SELECT ...
FROM cte1 JOIN cte2 ON ...;
CTEs are especially valuable when you need to:
- Break a complex query into simple, logical steps.
- Reference the same subquery multiple times in a query.
- Create recursive queries (though we'll focus on non‑recursive here).
Example 1: Simple CTE – Total Spending per Customer
Suppose we want a list of customers with their total spending, but we also need to use that result in further calculations. Instead of repeating the aggregation, we can define a CTE.
WITH customer_spending AS (
SELECT
c.customer_id,
c.first_name,
c.last_name,
COALESCE(SUM(s.total_amount), 0) AS total_spent
FROM customers c
LEFT JOIN sales s ON c.customer_id = s.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name
)
SELECT
customer_id,
first_name,
last_name,
total_spent
FROM customer_spending
ORDER BY total_spent DESC
LIMIT 5;
Result (top 5 spenders):
| customer_id | first_name | last_name | total_spent |
|---|---|---|---|
| 2 | Elizabeth | Archer | 1599.98 |
| 6 | Emily | King | 1500.00 |
| 5 | Cynthia | Lowery | 1200.00 |
| 1 | Karen | Molina | 999.99 |
| 4 | Jacob | Adams | 599.97 |
Now customer_spending can be reused in the same query or even in subsequent CTEs.
Example 2: Chaining Multiple CTEs
You can build one CTE on top of another. This is great for step‑wise transformations.
Task: Find customers who spent more than the average spending of all customers.
We'll first compute each customer's total, then the overall average, then filter.
WITH customer_spending AS (
SELECT
c.customer_id,
c.first_name,
c.last_name,
COALESCE(SUM(s.total_amount), 0) AS total_spent
FROM customers c
LEFT JOIN sales s ON c.customer_id = s.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name
),
overall_avg AS (
SELECT AVG(total_spent) AS avg_spent
FROM customer_spending
)
SELECT
cs.customer_id,
cs.first_name,
cs.last_name,
cs.total_spent,
oa.avg_spent
FROM customer_spending cs
CROSS JOIN overall_avg oa
WHERE cs.total_spent > oa.avg_spent
ORDER BY cs.total_spent DESC;
Result (sample):
| customer_id | first_name | last_name | total_spent | avg_spent |
|---|---|---|---|---|
| 2 | Elizabeth | Archer | 1599.98 | 287.65 |
| 6 | Emily | King | 1500.00 | 287.65 |
| 5 | Cynthia | Lowery | 1200.00 | 287.65 |
| 1 | Karen | Molina | 999.99 | 287.65 |
| 4 | Jacob | Adams | 599.97 | 287.65 |
| 3 | Roberta | Massey | 499.99 | 287.65 |
(Here the average is around 287.65 – you can see which customers are above it.)
Example 3: CTEs with Window Functions
CTEs and window functions are a match made in heaven. The CTE prepares the data, and the window function adds analytics.
Task: Rank customers by total spending and show their quartile.
WITH customer_spending AS (
SELECT
c.customer_id,
c.first_name,
c.last_name,
COALESCE(SUM(s.total_amount), 0) AS total_spent
FROM customers c
LEFT JOIN sales s ON c.customer_id = s.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name
)
SELECT
customer_id,
first_name,
last_name,
total_spent,
RANK() OVER (ORDER BY total_spent DESC) AS spending_rank,
NTILE(4) OVER (ORDER BY total_spent DESC) AS spending_quartile
FROM customer_spending
ORDER BY total_spent DESC;
Result (first few rows):
| customer_id | first_name | last_name | total_spent | spending_rank | spending_quartile |
|---|---|---|---|---|---|
| 2 | Elizabeth | Archer | 1599.98 | 1 | 1 |
| 6 | Emily | King | 1500.00 | 2 | 1 |
| 5 | Cynthia | Lowery | 1200.00 | 3 | 1 |
| 1 | Karen | Molina | 999.99 | 4 | 1 |
| 4 | Jacob | Adams | 599.97 | 5 | 2 |
| 3 | Roberta | Massey | 499.99 | 6 | 2 |
| 7 | Linda | Larsen | 0.00 | 7 | 3 |
| 8 | Angela | Hanson | 0.00 | 7 | 3 |
| 9 | Whitney | Wilson | 0.00 | 7 | 4 |
| 10 | Angela | Atkins | 75.00 | 10 | 2 |
Notice how RANK handles ties (customers with zero spending share rank 7) and NTILE distributes rows into four buckets as evenly as possible.
Example 4: CTEs for Readability – Multi‑Step Analysis
Imagine you need to produce a report that shows, for each product category, the top‑selling product and its sales. Without CTEs, the query could become a tangled mess of subqueries. With CTEs, it's clean:
WITH category_sales AS (
SELECT
p.category,
p.product_name,
SUM(s.total_amount) AS product_sales
FROM products p
LEFT JOIN sales s ON p.product_id = s.product_id
GROUP BY p.category, p.product_name
),
ranked_products AS (
SELECT
category,
product_name,
product_sales,
RANK() OVER (PARTITION BY category ORDER BY product_sales DESC) AS rank
FROM category_sales
)
SELECT
category,
product_name,
product_sales
FROM ranked_products
WHERE rank = 1;
Result:
| category | product_name | product_sales |
|---|---|---|
| Appliances | Refrigerator | 1200.00 |
| Electronics | Smart TV | 1500.00 |
| Accessories | Headphones | 599.97 |
(Assuming only a few sales; your actual data may vary.)
Key Takeaways
- CTEs improve readability by letting you name intermediate results.
- They are not materialized (in most databases) – they're just syntax, but they make complex queries easier to write and debug.
- You can use them with joins, window functions, and aggregations.
- Multiple CTEs can be chained to build up a solution step by step.
- They are essential for recursive queries (though we didn't cover that here).
Now that you've seen how CTEs can organize even the most intricate logic, you're ready to tackle advanced analytical queries that combine all these concepts.
In the final section, we'll bring everything together with some real‑world analytical problems that use joins, window functions, and CTEs in harmony.
Putting It All Together – Real‑World Analytical Problems
Now that we've covered joins, window functions, and CTEs individually, let's combine them to solve realistic business questions. These examples show how these tools work in harmony to extract valuable insights from our e‑commerce data.
Example 1: Top 3 Customers by Spending in Each Membership Tier
Business question: For each membership tier (Bronze, Silver, Gold), who are the top 3 customers by total spending?
Approach:
- CTE to compute total spending per customer, including their membership status.
-
Window function
RANK()partitioned by membership status to rank customers within their tier. - Filter to keep only ranks 1–3.
WITH customer_spending AS (
SELECT
c.customer_id,
c.first_name,
c.last_name,
c.membership_status,
COALESCE(SUM(s.total_amount), 0) AS total_spent
FROM customers c
LEFT JOIN sales s ON c.customer_id = s.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name, c.membership_status
),
ranked_customers AS (
SELECT
customer_id,
first_name,
last_name,
membership_status,
total_spent,
RANK() OVER (PARTITION BY membership_status ORDER BY total_spent DESC) AS rank_in_tier
FROM customer_spending
)
SELECT
membership_status,
rank_in_tier,
customer_id,
first_name,
last_name,
total_spent
FROM ranked_customers
WHERE rank_in_tier <= 3
ORDER BY membership_status, rank_in_tier;
Result (sample from our data):
| membership_status | rank_in_tier | customer_id | first_name | last_name | total_spent |
|---|---|---|---|---|---|
| Bronze | 1 | 1 | Karen | Molina | 999.99 |
| Bronze | 2 | 3 | Roberta | Massey | 499.99 |
| Bronze | 3 | 8 | Angela | Hanson | 0.00 |
| Gold | 1 | 4 | Jacob | Adams | 599.97 |
| Gold | 2 | 17 | James | Rodriguez | 350.00 |
| Gold | 3 | 18 | Steven | Burnett | 180.00 |
| Silver | 1 | 2 | Elizabeth | Archer | 1599.98 |
| Silver | 2 | 5 | Cynthia | Lowery | 1200.00 |
| Silver | 3 | 6 | Emily | King | 1500.00 |
This query combines a LEFT JOIN (to include customers with no purchases), a CTE for the base aggregation, a window function for ranking, and finally a filter on the rank. The result gives us a quick view of the best customers in each tier.
Example 2: Product Performance vs. Category Average
Business question: For each product, show its total sales and how much it differs from the average sales of products in the same category. Which products are outperforming their category?
Approach:
- First CTE computes total sales per product with category.
- Second CTE computes average sales per category.
- Main query joins the two and calculates the difference.
WITH product_sales AS (
SELECT
p.category,
p.product_id,
p.product_name,
COALESCE(SUM(s.total_amount), 0) AS product_total
FROM products p
LEFT JOIN sales s ON p.product_id = s.product_id
GROUP BY p.category, p.product_id, p.product_name
),
category_avg AS (
SELECT
category,
AVG(product_total) AS avg_in_category
FROM product_sales
GROUP BY category
)
SELECT
ps.category,
ps.product_name,
ps.product_total,
ca.avg_in_category,
ps.product_total - ca.avg_in_category AS difference_from_avg
FROM product_sales ps
JOIN category_avg ca ON ps.category = ca.category
ORDER BY ps.category, difference_from_avg DESC;
Result (sample):
| category | product_name | product_total | avg_in_category | difference_from_avg |
|---|---|---|---|---|
| Accessories | Headphones | 599.97 | 99.99 | 499.98 |
| Accessories | Wireless Mouse | 0.00 | 99.99 | -99.99 |
| Accessories | Keyboard | 0.00 | 99.99 | -99.99 |
| Appliances | Refrigerator | 1200.00 | 244.99 | 955.01 |
| Appliances | Washing Machine | 499.99 | 244.99 | 255.00 |
| Appliances | Microwave | 180.00 | 244.99 | -64.99 |
| Appliances | Blender | 0.00 | 244.99 | -244.99 |
| Electronics | Smart TV | 1500.00 | 424.99 | 1075.01 |
| Electronics | Laptop | 999.99 | 424.99 | 575.00 |
| Electronics | Smartphone | 1599.98 | 424.99 | 1174.99 |
| Electronics | Gaming Console | 350.00 | 424.99 | -74.99 |
| Electronics | Monitor | 0.00 | 424.99 | -424.99 |
| Electronics | External Hard Drive | 0.00 | 424.99 | -424.99 |
| Electronics | Tablet | 0.00 | 424.99 | -424.99 |
| Electronics | Smartwatch | 0.00 | 424.99 | -424.99 |
Here, joins link products to sales, CTEs structure the intermediate calculations, and the final ORDER BY highlights top performers. We can immediately see which products are stars and which are lagging in their category.
Example 3: Customers Who Made Consecutive Purchases Within 7 Days
Business question: Identify customers who have made purchases within 7 days of a previous purchase – a sign of engaged shoppers.
Approach:
- Use
LAG()window function to get the previous purchase date for each customer. - Compute the date difference.
- Filter where the gap is ≤ 7 days.
WITH customer_purchases AS (
SELECT
customer_id,
sale_date,
LAG(sale_date) OVER (PARTITION BY customer_id ORDER BY sale_date) AS prev_purchase_date
FROM sales
)
SELECT DISTINCT
c.customer_id,
c.first_name,
c.last_name,
cp.sale_date AS purchase_date,
cp.prev_purchase_date,
(cp.sale_date - cp.prev_purchase_date) AS days_between
FROM customer_purchases cp
JOIN customers c ON cp.customer_id = c.customer_id
WHERE cp.prev_purchase_date IS NOT NULL
AND (cp.sale_date - cp.prev_purchase_date) <= 7
ORDER BY c.customer_id, cp.sale_date;
Result (In respect to our data)
| customer_id | first_name | last_name | purchase_date | prev_purchase_date | days_between |
|---|---|---|---|---|---|
| 1 | Karen | Molina | 2023-07-15 | 2023-07-10 | 5 |
| 5 | Cynthia | Lowery | 2023-06-18 | 2023-06-12 | 6 |
This query uses a window function (LAG) within a CTE, then joins with the customers table to get names, and finally filters. It demonstrates how window functions can identify sequential patterns.
Example 4: Monthly Sales Trends – Running Total and Month‑over‑Month Change
Business question: Show monthly sales totals, a running total, and the percentage change from the previous month.
Approach:
- CTE to aggregate sales by month.
-
Window functions for running total (
SUM() OVER) and previous month's sales (LAG). - Compute the change.
WITH monthly_sales AS (
SELECT
DATE_TRUNC('month', sale_date)::DATE AS month,
SUM(total_amount) AS monthly_total
FROM sales
GROUP BY DATE_TRUNC('month', sale_date)
)
SELECT
month,
monthly_total,
SUM(monthly_total) OVER (ORDER BY month) AS running_total,
LAG(monthly_total) OVER (ORDER BY month) AS prev_month_total,
CASE
WHEN LAG(monthly_total) OVER (ORDER BY month) IS NOT NULL
THEN ROUND(100.0 * (monthly_total - LAG(monthly_total) OVER (ORDER BY month)) / LAG(monthly_total) OVER (ORDER BY month), 2)
ELSE NULL
END AS pct_change
FROM monthly_sales
ORDER BY month;
Result;
| month | monthly_total | running_total | prev_month_total | pct_change |
|---|---|---|---|---|
| 2023-06-01 | 1200.00 | 1200.00 | NULL | NULL |
| 2023-07-01 | 1599.96 | 2799.96 | 1200.00 | 33.33 |
| 2023-08-01 | 1599.98 | 4399.94 | 1599.96 | 0.00 |
| 2023-09-01 | 499.99 | 4899.93 | 1599.98 | -68.75 |
Here, the CTE aggregates data, and window functions add running total and previous‑month comparison. This gives a concise view of sales momentum.
Conclusion
Throughout this article, we've explored three powerful SQL concepts—joins, window functions, and CTEs—using a practical e‑commerce dataset. As a data professional, knowing when and how to apply these tools is essential for building efficient, readable, and maintainable queries.
Here's a quick reference guide based on what we've learned:
Use Window Functions When:
- You need to perform calculations across rows without collapsing the result set (e.g., running totals, moving averages).
- You're working with rankings or percentiles (e.g., top customers by spending, quartiles).
- You need to compare values between rows (e.g., previous sale amount, month‑over‑month change).
- Your analysis requires retaining detail while still gaining aggregate insights.
Use CTEs When:
- You're breaking down a complex query into simpler, logical steps.
- The same intermediate result needs to be referenced multiple times in a query.
- You want to improve readability and make your code easier to debug.
- You're preparing data for window functions or further transformations.
Use Joins When:
- You need to combine data from multiple related tables (e.g., customers with their orders).
- Your database is normalized and information is spread across tables.
- You're building reporting datasets that require fields from different sources.
- You need to understand relationships between entities in your data.
By mastering these advanced SQL concepts, you're not just writing queries—you're building a toolkit for solving real business problems. Whether it's identifying top‑performing products, understanding customer behavior, or tracking sales trends over time, the techniques you've learned here will serve you across industries and datasets.
Data Analysis Step by Step;
1st Read: Git & Github Beginner's guide
If you’re also learning version control with Git, you can read my Git & GitHub beginner’s guide here:
https://dev.to/charles_ndungu/git-for-data-scientists-data-engineers-my-very-first-beginner-guide-git-bash-github-3952
2nd Read: Mastering Excel
After mastering Git basics, you can learn how to analyze data using Microsoft Excel here:
https://dev.to/charles_ndungu/ms-excel-for-data-analytics-a-friendly-practical-guide-for-beginners-hjn
3rd Read: Data Modelling & Schemas
This article dives into data modelling in Power BI, covering star and snowflake schemas, fact and dimension tables, relationships, and why good modelling is essential for accurate insights and fast reports.
https://dev.to/charles_ndungu/the-backbone-of-power-bi-a-deep-dive-into-data-modeling-schemas-1o1l
4th Read: Data Analysis Steps in Power BI
This article reveals how Power BI analysts act as data translators, bridging the gap between messy data and clear business action. We break down their essential three-step process: cleaning raw information, encoding logic with DAX, and designing dashboards that drive real decisions.
https://dev.to/charles_ndungu/from-raw-data-to-real-action-the-analysts-journey-as-a-data-translator-in-power-bi-2gl6
5th Read: From Tables to Insights – A SQL Masterclass
Data lives scattered across tables—customers here, products there, sales everywhere. This article is your guide to bringing it all together. Follow along with a real online store dataset as we build queries that rank top customers, spot sales trends, and uncover hidden patterns. By the end, you won't just write SQL—you'll tell stories with data.
https://dev.to/charles_ndungu/mastering-sql-joins-and-window-functions-a-practical-guide-with-an-e-commerce-dataset-ejg
Repo
Top comments (0)