Introduction
SQL is a powerful language for managing and analyzing relational databases. Two essential concepts for data analysis are Joins and Window Functions.
Joins allow you to combine rows from multiple tables based on related columns.
Window Functions perform calculations across a set of rows related to the current row, enabling ranking, cumulative sums, moving averages, and more.
In this guide, we’ll create a sample database with customers, products, sales, and inventory tables, populate them with data, and demonstrate joins and window functions with real examples.
Create Database and Schema
-- Create a new database
CREATE DATABASE business_db;
-- Connect to the database
\c business_db;
-- Create schema
CREATE SCHEMA assignment;
-- Set schema for this session
SET search_path TO assignment;
-- Verify schema
SHOW search_path;
Create Tables and Insert Data
Customers Table
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(100),
phone_number VARCHAR(50),
registration_date DATE,
membership_status VARCHAR(10)
);
INSERT INTO customers
(customer_id, first_name, last_name, email, phone_number, registration_date, membership_status)
VALUES
(1, 'Karen', 'Molina', 'gonzalezkimberly@glass.com', '(728)697-1206', '2020-08-27', 'Bronze'),
(2, 'Elizabeth', 'Archer', 'tramirez@gmail.com', '778.104.6553', '2023-08-28', 'Silver'),
(3, 'Roberta', 'Massey', 'davislori@gmail.com', '+1-365-606-7458x399', '2024-06-12', 'Bronze'),
(4, 'Jacob', 'Adams', 'andrew72@hotmail.com', '246-459-1425x462', '2023-02-10', 'Gold'),
(5, 'Cynthia', 'Lowery', 'suarezkiara@ramsey.com', '001-279-688-8177x4015', '2020-11-13', 'Silver');
Products Table
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
category VARCHAR(50),
price DECIMAL(10,2),
supplier VARCHAR(100),
stock_quantity INT
);
INSERT INTO products
(product_id, product_name, category, price, supplier, stock_quantity)
VALUES
(1, 'Laptop', 'Electronics', 999.99, 'Dell', 50),
(2, 'Smartphone', 'Electronics', 799.99, 'Samsung', 150),
(3, 'Washing Machine', 'Appliances', 499.99, 'LG', 30),
(4, 'Headphones', 'Accessories', 199.99, 'Sony', 100),
(5, 'Refrigerator', 'Appliances', 1200.00, 'Whirlpool', 40);
Sales Table
CREATE TABLE sales (
sale_id INT PRIMARY KEY,
customer_id INT,
product_id INT,
quantity_sold INT,
sale_date DATE,
total_amount DECIMAL(10,2),
FOREIGN KEY (customer_id) REFERENCES customers(customer_id),
FOREIGN KEY (product_id) REFERENCES products(product_id)
);
INSERT INTO sales
(sale_id, customer_id, product_id, quantity_sold, sale_date, total_amount)
VALUES
(1, 1, 1, 1, '2023-07-15', 999.99),
(2, 2, 2, 2, '2023-08-20', 1599.98),
(3, 3, 3, 1, '2023-09-10', 499.99),
(4, 4, 4, 3, '2023-07-25', 599.97),
(5, 5, 5, 1, '2023-06-18', 1200.00);
Inventory Table
CREATE TABLE inventory (
product_id INT PRIMARY KEY,
stock_quantity INT,
FOREIGN KEY (product_id) REFERENCES products(product_id)
);
INSERT INTO inventory
(product_id, stock_quantity)
VALUES
(1, 50),
(2, 150),
(3, 30),
(4, 100),
(5, 40);
SQL Joins
INNER JOIN
Returns only rows with matching values in both tables.
SELECT c.first_name, c.last_name, s.total_amount
FROM customers c
INNER JOIN sales s ON c.customer_id = s.customer_id;
## Shows customers who have made purchases.
LEFT JOIN
Returns all rows from the left table, with NULLs for unmatched right table rows.
SELECT c.first_name, c.last_name, s.total_amount
FROM customers c
LEFT JOIN sales s ON c.customer_id = s.customer_id;
## Show all customers, even those with no purchases.
SELF JOIN
Join a table to itself.
SELECT c.first_name AS customer1, m.first_name AS customer2, c.membership_status
FROM customers c
INNER JOIN customers m ON c.membership_status = m.membership_status
WHERE c.customer_id < m.customer_id;
## Find pairs of customers with the same membership status.
Window Functions
RANK()
Assigns ranks to rows in a partition.
SELECT
c.first_name,
c.last_name,
SUM(s.total_amount) AS total_spent,
RANK() OVER(ORDER BY SUM(s.total_amount) DESC) AS customer_rank
FROM customers c
JOIN sales s ON c.customer_id = s.customer_id
GROUP BY c.first_name, c.last_name;
DENSE_RANK()
Assigns ranks to rows without gaps for ties.
SELECT
c.first_name,
c.last_name,
SUM(s.total_amount) AS total_spent,
DENSE_RANK() OVER(ORDER BY SUM(s.total_amount) DESC) AS dense_rank
FROM customers c
JOIN sales s ON c.customer_id = s.customer_id
GROUP BY c.first_name, c.last_name;
Difference from RANK(): If two customers tie for 1st place, RANK() skips 2, giving the next rank as 3, while DENSE_RANK() gives 2.
ROW_NUMBER()
Assigns a unique sequential number to rows.
SELECT
c.first_name,
c.last_name,
s.sale_date,
ROW_NUMBER() OVER(PARTITION BY c.customer_id ORDER BY s.sale_date) AS purchase_order
FROM customers c
JOIN sales s ON c.customer_id = s.customer_id;
Cumulative SUM()
Calculate running totals without collapsing rows.
SELECT
p.product_name,
s.sale_date,
SUM(s.quantity_sold) OVER(PARTITION BY p.product_id ORDER BY s.sale_date) AS cumulative_sales
FROM products p
JOIN sales s ON p.product_id = s.product_id;
Top Customers per Membership Tier
SELECT
c.first_name,
c.last_name,
c.membership_status,
SUM(s.total_amount) AS total_spent,
RANK() OVER(PARTITION BY c.membership_status ORDER BY SUM(s.total_amount) DESC) AS tier_rank
FROM customers c
JOIN sales s ON c.customer_id = s.customer_id
GROUP BY c.first_name, c.last_name, c.membership_status
ORDER BY c.membership_status, tier_rank;
Sample Queries for Analysis
- Total Sales Per Product
SELECT p.product_name, SUM(s.quantity_sold) AS total_sales
FROM products p
JOIN sales s ON p.product_id = s.product_id
GROUP BY p.product_name
ORDER BY total_sales DESC;
- Customers with Purchases Over $1000
SELECT first_name, last_name
FROM customers
WHERE customer_id IN (SELECT customer_id FROM sales WHERE total_amount > 1000);
- Products Low in Stock
SELECT p.product_name, i.stock_quantity
FROM products p
JOIN inventory i ON p.product_id = i.product_id
WHERE i.stock_quantity < 50;
Conclusion
Joins combine data across tables and are essential for querying normalized databases.
Window Functions perform calculations over a set of rows without collapsing them, enabling ranking, cumulative totals, and analytics within groups.
Together, they allow powerful data analysis, such as finding top customers, cumulative sales trends, and product performance.
Top comments (0)