You have two tables.
The customers table knows who people are. Name, city, contact details.
The orders table knows what they bought. Product, amount, date.
Neither table alone answers the question: which customers in Mumbai spent more than 50,000 rupees last month?
You need both tables combined through their shared key. That combination is a join.
Joins are where SQL becomes genuinely powerful. The ability to link tables that were designed separately and query them as if they were one is what makes relational databases so useful for data analysis.
The Setup
import sqlite3
import pandas as pd
conn = sqlite3.connect("joins_practice.db")
conn.executescript("""
DROP TABLE IF EXISTS customers;
DROP TABLE IF EXISTS orders;
DROP TABLE IF EXISTS products;
CREATE TABLE customers (
customer_id INTEGER PRIMARY KEY,
name TEXT,
city TEXT,
tier TEXT
);
CREATE TABLE orders (
order_id INTEGER PRIMARY KEY,
customer_id INTEGER,
product_id INTEGER,
amount REAL,
order_date TEXT,
status TEXT
);
CREATE TABLE products (
product_id INTEGER PRIMARY KEY,
product_name TEXT,
category TEXT,
cost_price REAL
);
INSERT INTO customers VALUES
(1, 'Alex', 'Mumbai', 'Gold'),
(2, 'Priya', 'Delhi', 'Silver'),
(3, 'Sam', 'Bangalore', 'Bronze'),
(4, 'Jordan', 'Mumbai', 'Gold'),
(5, 'Lisa', 'Chennai', 'Silver'),
(6, 'Ravi', 'Delhi', 'Bronze');
INSERT INTO products VALUES
(101, 'Laptop', 'Electronics', 50000),
(102, 'Phone', 'Electronics', 28000),
(103, 'Headphones', 'Electronics', 1800),
(104, 'Shirt', 'Clothing', 400),
(105, 'Watch', 'Accessories', 6000);
INSERT INTO orders VALUES
(1, 1, 101, 75000, '2024-01-05', 'completed'),
(2, 2, 102, 45000, '2024-01-08', 'completed'),
(3, 1, 103, 3500, '2024-01-12', 'completed'),
(4, 3, 104, 800, '2024-01-15', 'completed'),
(5, 4, 101, 75000, '2024-01-18', 'completed'),
(6, 2, 104, 1500, '2024-01-20', 'completed'),
(7, 1, 105, 12000, '2024-02-01', 'completed'),
(8, 7, 102, 45000, '2024-02-05', 'completed'),
(9, 3, 105, 12000, '2024-02-08', 'completed'),
(10, 4, 104, 800, '2024-02-10', 'completed');
""")
conn.commit()
def q(sql):
return pd.read_sql_query(sql, conn)
print("Three tables created.")
print(q("SELECT * FROM customers"))
print(q("SELECT * FROM orders"))
Notice order 8 has customer_id = 7 which does not exist in the customers table. Customer 5 (Lisa) and customer 6 (Ravi) have no orders. These edge cases will show you exactly how different join types behave.
INNER JOIN: Only the Matches
Inner join returns rows that have a matching key in both tables. Unmatched rows from either side are dropped.
result = q("""
SELECT
c.name,
c.city,
o.order_id,
o.amount,
o.order_date
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
ORDER BY o.order_date;
""")
print(result)
print(f"\nRows: {len(result)}")
Output:
name city order_id amount order_date
0 Alex Mumbai 1 75000.0 2024-01-05
1 Priya Delhi 2 45000.0 2024-01-08
2 Alex Mumbai 3 3500.0 2024-01-12
3 Sam Bangalore 4 800.0 2024-01-15
4 Jordan Mumbai 5 75000.0 2024-01-18
5 Priya Delhi 6 1500.0 2024-01-20
6 Alex Mumbai 7 12000.0 2024-02-01
7 Sam Bangalore 9 12000.0 2024-02-08
8 Jordan Mumbai 10 800.0 2024-02-10
Rows: 9
Order 8 (customer_id 7, no matching customer) is gone. Lisa and Ravi who have no orders are gone. Only the nine orders with valid matching customers appear.
The ON o.customer_id = c.customer_id clause is the join condition. It tells SQL which columns to use as the connecting key.
Aliases o and c shorten the table names. Without aliases, you write orders.customer_id = customers.customer_id. With aliases, o.customer_id = c.customer_id. Always use aliases. Long table names in every column reference make queries unreadable.
LEFT JOIN: Keep Everything From the Left
Left join keeps all rows from the left table. If a row from the left has no match in the right, it still appears but with NULLs for the right table's columns.
result = q("""
SELECT
c.customer_id,
c.name,
c.city,
COUNT(o.order_id) AS total_orders,
SUM(o.amount) AS total_spent
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name, c.city
ORDER BY total_spent DESC;
""")
print(result)
Output:
customer_id name city total_orders total_spent
0 1 Alex Mumbai 3 90500.0
1 4 Jordan Mumbai 2 75800.0
2 2 Priya Delhi 2 46500.0
3 3 Sam Bangalore 2 12800.0
4 5 Lisa Chennai 0 NaN
5 6 Ravi Delhi 0 NaN
All six customers appear. Lisa and Ravi have zero orders and NaN for total_spent. An inner join would have removed them entirely.
Left join is the right choice when you need a complete list of one entity (customers) and want to attach related data when it exists, while still showing entities with no related data.
Use this for finding customers who have never ordered, users who have not logged in, products with no sales.
Finding Rows With No Match
Left join combined with a WHERE NULL check finds the rows that had no match.
result = q("""
SELECT
c.customer_id,
c.name,
c.city,
c.tier
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_id IS NULL;
""")
print("Customers with no orders:")
print(result)
Output:
Customers with no orders:
customer_id name city tier
0 5 Lisa Chennai Silver
1 6 Ravi Delhi Bronze
This pattern is used constantly. Find users who signed up but never bought. Find products that were added but never ordered. Find employees assigned to no projects. The technique is always the same: LEFT JOIN then WHERE right_table.key IS NULL.
RIGHT JOIN: Keep Everything From the Right
Right join is the mirror of left join. All rows from the right table survive regardless of whether they match the left.
result = q("""
SELECT
c.name,
c.city,
o.order_id,
o.amount,
o.customer_id AS order_customer_id
FROM customers c
RIGHT JOIN orders o ON c.customer_id = o.customer_id
ORDER BY o.order_id;
""")
print(result)
Output:
name city order_id amount order_customer_id
0 Alex Mumbai 1 75000.0 1
1 Priya Delhi 2 45000.0 2
...
8 NaN NaN 8 45000.0 7
...
Order 8 appears with NaN for name and city because customer_id 7 does not exist in customers. Right join kept all orders. Lisa and Ravi are gone because they have no orders.
Most developers prefer left joins over right joins. The same result can always be achieved by swapping the table order and using a left join. Consistency matters for readability.
FULL OUTER JOIN: Keep Everything
SQLite does not support FULL OUTER JOIN natively but you can simulate it.
result = q("""
SELECT c.customer_id, c.name, o.order_id, o.amount
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
UNION ALL
SELECT c.customer_id, c.name, o.order_id, o.amount
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.customer_id
WHERE c.customer_id IS NULL;
""")
print(result)
print(f"\nRows: {len(result)}")
Output includes all customers (even with no orders) and all orders (even with no matching customer). Full outer join shows you every row from both tables and flags the ones that could not find a match.
Joining Three Tables
Real queries often span three or more tables.
result = q("""
SELECT
c.name AS customer,
c.city,
p.product_name AS product,
p.category,
o.amount AS sale_price,
p.cost_price,
ROUND(o.amount - p.cost_price, 0) AS profit
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
INNER JOIN products p ON o.product_id = p.product_id
WHERE o.status = 'completed'
ORDER BY profit DESC;
""")
print(result)
Output:
customer city product category sale_price cost_price profit
0 Alex Mumbai Laptop Electronics 75000.0 50000.0 25000.0
1 Jordan Mumbai Laptop Electronics 75000.0 50000.0 25000.0
2 Priya Delhi Phone Electronics 45000.0 28000.0 17000.0
...
Three tables. Two joins. One clean result showing customer, product, and profit per transaction. This is the kind of query that lives in every business intelligence tool.
Each new table gets its own JOIN clause. The ON condition connects it to one of the tables already in the query.
Self Join: A Table Joining Itself
Sometimes a table references itself. An employees table where each employee has a manager_id that points to another row in the same table.
conn.executescript("""
DROP TABLE IF EXISTS employees_hier;
CREATE TABLE employees_hier (
emp_id INTEGER PRIMARY KEY,
name TEXT,
manager_id INTEGER
);
INSERT INTO employees_hier VALUES
(1, 'Director', NULL),
(2, 'Manager A', 1),
(3, 'Manager B', 1),
(4, 'Alex', 2),
(5, 'Priya', 2),
(6, 'Sam', 3);
""")
conn.commit()
result = q("""
SELECT
e.name AS employee,
m.name AS manager
FROM employees_hier e
LEFT JOIN employees_hier m ON e.manager_id = m.emp_id
ORDER BY e.emp_id;
""")
print(result)
Output:
employee manager
0 Director None
1 Manager A Director
2 Manager B Director
3 Alex Manager A
4 Priya Manager A
5 Sam Manager B
The same table joined to itself with two different aliases. e for the employee, m for the manager. Each employee now shows who they report to. The Director has NULL because they report to nobody.
Join Performance: The Rule You Need to Know
Joins are fast when you join on indexed columns. The primary key is always indexed. Foreign keys often are not by default.
If you join on a column that is not indexed and the tables are large, the query can become very slow. Always check what columns your joins use and whether those columns have indexes.
CREATE INDEX idx_orders_customer ON orders(customer_id);
CREATE INDEX idx_orders_product ON orders(product_id);
Two lines. These make the joins in this post dramatically faster on large tables. In real production databases, this is part of database design. But when you are writing queries against existing databases, knowing to ask "is this column indexed?" can save you from writing queries that take minutes instead of milliseconds.
A Resource Worth Knowing
Joins Visualizer at joins.spathon.com lets you click on different join types and see animated, interactive visualizations of exactly which rows survive each join. Visual, interactive, takes two minutes. Bookmark it. Every time you are uncertain which join type to use, check it.
Jeff Atwood wrote a short classic piece called "A Visual Explanation of SQL Joins" on his blog Coding Horror (codinghorror.com) using Venn diagrams that became the standard way people teach this topic. It is ten years old and still the clearest explanation of join types available. Search "Jeff Atwood visual explanation SQL joins Coding Horror."
Try This
Create sql_joins_practice.py. Use the database from this post.
Write a query that shows each product, how many times it was ordered, total revenue, and the profit margin (sale price minus cost price). Include products that were never ordered, showing zero for orders and revenue.
Find customers who have ordered in both January and February 2024. Use a self-join or subquery approach.
Show the city-level summary: city, number of customers, number of completed orders, total revenue, and average order value per customer. Only include cities with at least two customers.
Using a left join with a NULL check, find any orders where the customer_id does not exist in the customers table. These are data integrity problems.
Write a three-table join that shows customer name, product name, category, amount paid, and whether the customer is Gold, Silver, or Bronze tier. Sort by amount descending.
What's Next
One table queries. Multi-table joins. Both covered. Next is subqueries and CTEs, which let you use the result of one query as the input to another. They make complex multi-step logic readable and maintainable instead of nested and impossible to debug.
Top comments (0)