This post is part of the SQL: Zero to Ninja series.
You run a report and it shows user_id: 7 next to an order. Useful, but your boss wants the user's name, not a number. The name lives in a different table. So how do you pull two tables together into one answer? That is what a JOIN does, and once it clicks, half of SQL opens up.
The idea in one line
A JOIN matches rows from two tables using a shared id, so you can read facts from both in a single result.
Why are tables split up at all?
Look at our schema. A user has a name, email, and country. An order has a total and a status. Why not just stuff the user's name into every order row?
Because you would repeat it. If Sara places 50 orders, you would write "Sara" 50 times. Change her name once and you would have to fix 50 rows. Messy and easy to break.
So instead, the user lives once in users, and each order just holds a tiny pointer back to her:
users orders
----- ------
id | name id | user_id | total
1 | Sara 1 | 1 | 50
2 | Omar 2 | 1 | 30 <-- user_id 1 = Sara again
3 | 2 | 90 <-- user_id 2 = Omar
That user_id column is the string that ties them together. orders.user_id points at users.id. A JOIN is just you following that string.
The metaphor: matching name tags between two guest lists
Imagine a party with two clipboards. List A is "people who RSVP'd." List B is "people who paid." Each person wears a name tag with an id.
To find who both RSVP'd and paid, you walk down List A, and for each person you flip to List B looking for the same id. When the tags match, you staple their two rows together. That stapled row is one row in your result.
The big question (same as the laundry and notebook posts): what do you do with people who appear on only one list? Your answer is the whole difference between join types.
INNER JOIN: only the matches
INNER JOIN keeps only rows that have a match on both sides. No match, no row.
SELECT users.name, orders.total
FROM users
INNER JOIN orders
ON users.id = orders.user_id;
The ON part is the matching rule: "an order belongs to a user when orders.user_id equals users.id."
users.id = orders.user_id
| |
1 --- matches --- 1 keep
1 --- matches --- 1 keep
2 --- matches --- 2 keep
3 --- no order --- dropped (no match)
So a user with no orders simply does not show up. Great for "show me users with orders," bad for "show me users without orders."
LEFT JOIN: keep everyone on the left
LEFT JOIN keeps all rows from the left table, even when the right side has no match. The empty spots get filled with NULL.
SELECT users.name, orders.total
FROM users
LEFT JOIN orders
ON users.id = orders.user_id;
name | total
-----+------
Sara | 50
Sara | 30
Omar | 90
Lina | NULL <-- kept, even with no orders
This is your tool for "find what is missing." Want users who never ordered? LEFT JOIN, then keep the NULL rows. We go deep on this in the bonus post understanding-sql-left-join.md, so check that out if you want the full story.
RIGHT JOIN and FULL OUTER JOIN (the short version)
RIGHT JOIN is just LEFT JOIN with the tables flipped. It keeps everything on the right. Honestly, most people pick LEFT and reorder their tables instead of thinking backwards. Keep your brain calm and stick with LEFT.
FULL OUTER JOIN keeps both sides, matched where it can, NULL where it cannot. Heads up: MySQL does not have it (Postgres does). In MySQL you fake it by combining a LEFT and a RIGHT with UNION.
Forget the ON and boom: a cartesian explosion
If you join two tables but forget the ON rule, the database does not error. It does something scary instead: it pairs every left row with every right row.
-- DANGER: no ON
SELECT users.name, orders.total
FROM users
JOIN orders;
10 users and 1,000 orders gives you 10,000 rows of nonsense. That is a cartesian product, and it is almost never what you want. Always give your join an ON.
Joining three tables: what did each user buy?
Real questions often cross more than two tables. "What products did each user buy?" needs four: users -> orders -> order_items -> products. You just chain the joins, following the ids like stepping stones.
SELECT users.name, products.name AS product
FROM users
JOIN orders ON users.id = orders.user_id
JOIN order_items ON orders.id = order_items.order_id
JOIN products ON order_items.product_id = products.id;
users --(user_id)--> orders --(order_id)--> order_items --(product_id)--> products
Each ON is one hop. Follow the trail and you can answer almost anything about the whole schema.
Gotchas
Forgetting ON. No
ONmeans a cartesian explosion. If your result has way too many rows, this is suspect number one.Ambiguous column names. Both
usersandproductshave anamecolumn. If you just writeSELECT name, the database does not know which one. Useusers.nameor give it an alias:
SELECT u.name AS user_name, p.name AS product_name
FROM users u
JOIN orders o ON u.id = o.user_id
JOIN order_items i ON o.id = i.order_id
JOIN products p ON i.product_id = p.id;
Short aliases (u, o, p) keep long queries readable.
- Surprise duplicate rows. A one-to-many join repeats the left row once per match. If Sara has 3 orders, her name shows up 3 times. That is correct, not a bug, but it surprises people. If you only want one row per user, combine joins with GROUP BY from Part 05.
Recap
- Tables are split so data is not repeated. Ids tie them back together.
- INNER JOIN keeps only matching rows on both sides.
- LEFT JOIN keeps every left row, filling missing matches with NULL.
- RIGHT JOIN is LEFT flipped. FULL OUTER JOIN keeps both sides (no MySQL).
- The ON rule is the match condition. Forget it and you get a cartesian explosion.
- Chain joins to walk across many tables, one
ONhop at a time.
Your turn
Using the schema, write a query that lists every user's name next to the names of the products they bought. Then ask yourself: which join keeps users who bought nothing, INNER or LEFT? If you can explain it to a friend, you have got it.
Next up: Part 07: INSERT, UPDATE, DELETE, where we stop just reading data and start changing it.
Top comments (1)
Part 06 Practice: JOINs
Let's connect some tables. These use the shared schema (
users,orders,products,order_items). Try each one before looking, and remember the key question for every join: who gets to go missing?1. Show each order's total next to the buyer's name. (INNER JOIN)
Only orders that actually have a matching user.
Solution
Why: INNER JOIN keeps only rows that match on both sides. The
ONrule linksorders.user_idtousers.id.2. List every user and how much each order was, keeping users who never ordered. (LEFT JOIN)
Users with no orders should still appear.
Solution
Why: LEFT JOIN keeps all rows from the left table (
users). Users with no orders show up withNULLin the total column.3. Find users who have never placed an order. (LEFT JOIN + IS NULL)
Show only the names of users with zero orders.
Solution
Why: LEFT JOIN keeps everyone, then
WHERE orders.id IS NULLkeeps only the rows where no order matched. This is the classic "find what is missing" trick.4. Show each order's total with the buyer's name, using table aliases. (aliases)
Same as question 1, but use short aliases to keep it clean.
Solution
Why: Aliases (
u,o) save typing and make long join queries easier to read. They matter even more once you join three or four tables.5. What products did each user buy? (3-table join)
List each user's name next to the product names they purchased. Chain users to orders to order_items to products.
Solution
Why: Each
ONis one hop along the ids. Aliasing the twonamecolumns avoids the ambiguous column name error, since bothusersandproductshave aname.6. How many distinct products has each user bought? (join + GROUP BY)
Show each user's name and the count of different products they purchased, most varied buyers first.
Solution
Why: One-to-many joins create repeated rows, so you bring in GROUP BY from Part 05 to fold them back into one row per user.
COUNT(DISTINCT ...)makes sure buying the same product twice only counts once.Great job. JOINs are the moment SQL goes from "one table" to "the whole database." Keep practicing the "who can go missing" question and you will pick the right join every time. See you in Part 07.