Part of the "SQL: Zero to Ninja" series, written for junior web developers.
You made a users table. You made an orders table. Now a real question hits you: how do these two tables actually know about each other? And while we are at it, why do senior devs freak out when you copy the same name into ten rows? Let me show you both, the easy way.
The idea in one line
Tables connect through relationships (one-to-many, many-to-many, one-to-one), and normalization just means: store each fact once, in one place, and point to it instead of copying it.
The metaphor: your home address
Imagine you send 50 letters this year. You could write your full home address on every single envelope, 50 times. But what happens when you move? Now you have 50 envelopes with the wrong address, and good luck fixing them all.
Smart people do this instead: they keep their address in one address book, and everywhere else they just say "see my address book." Move once, fix once, done.
That is the whole spirit of normalization. Keep one single source of truth, and point to it.
The three relationship shapes
1. One-to-many (the most common one)
One user can place many orders. But each order belongs to exactly one user.
We solve this with a foreign key: a column on the "many" side that holds the id of the "one" side. So orders gets a user_id column.
users orders
+----+--------+ +----+---------+--------+
| id | name | | id | user_id | total |
+----+--------+ +----+---------+--------+
| 1 | Sara | <----------- | 10 | 1 | 50.00 |
| 2 | Omar | <--+ | 11 | 1 | 20.00 |
+----+--------+ +-------- | 12 | 2 | 99.00 |
+----+---------+--------+
Both order 10 and order 11 point back to Sara (user 1). The arrow lives on the orders side. Rule of thumb: the foreign key goes on the "many" side.
2. Many-to-many (needs a helper table)
Now the tricky one. An order can contain many products, and a product can appear in many orders. That is many on both sides.
You cannot solve this with one foreign key. There is no clean spot for it. So you add a small table in the middle, called a join table (or link table). For us, that is order_items.
orders order_items products
+----+ +----+----------+------------+ +----+----------+
| id | | id | order_id | product_id | | id | name |
+----+ +----+----------+------------+ +----+----------+
| 10 | <----- | 1 | 10 | 100 | --> | 100| Keyboard |
| 11 | <----- | 2 | 10 | 200 | --> | 200| Mouse |
+----+ | 3 | 11 | 100 | --+ +----+----------+
+----+----------+------------+
Each row in order_items is one little fact: "this order has this product." Order 10 has a keyboard and a mouse. Product 100 (keyboard) shows up in both order 10 and order 11. The middle table makes "many on both sides" easy.
3. One-to-one (the rare one)
Sometimes one row matches exactly one row in another table. Example: a user and a user_profile (bio, avatar, theme settings). One user, one profile.
You handle it like a one-to-many, but the foreign key is marked UNIQUE so a user can have at most one profile.
users (1) ---- (1) user_profiles
People often use this to keep a busy table slim, parking rarely-used columns in a side table. Nice to know it exists, but you will reach for it the least.
Normalization: the wrong way vs the right way
Here is the mistake almost everyone makes at first. They cram the user's info straight into every order:
-- WRONG: orders copies the user's name and email into every row
orders
+----+-----------+-------------------+--------+
| id | user_name | user_email | total |
+----+-----------+-------------------+--------+
| 10 | Sara | sara@old.com | 50.00 |
| 11 | Sara | sara@old.com | 20.00 |
| 12 | Sara | sara@old.com | 99.00 |
+----+-----------+-------------------+--------+
Looks fine, until Sara changes her email to sara@new.com. Now you have to update it in every single order row. Miss one, and your data lies to you. You will have three rows saying old.com and pretend that is still Sara.
This bug has a name: an update anomaly. The same fact lives in many places, so one fact can fall out of sync with itself.
The fix is the address book idea. Store Sara once, and point to her:
-- RIGHT: store the user once, point by id
users
+----+------+----------------+
| id | name | email |
+----+------+----------------+
| 1 | Sara | sara@new.com | <-- change it here, ONE time
+----+------+----------------+
orders
+----+---------+--------+
| id | user_id | total |
+----+---------+--------+
| 10 | 1 | 50.00 | <-- just points to Sara
| 11 | 1 | 20.00 |
| 12 | 1 | 99.00 |
+----+---------+--------+
Sara changes her email? You update one row in users. Every order is instantly correct, because no order stored a copy in the first place. One source of truth.
(You may hear the fancy names for the rules behind this: 1NF, 2NF, 3NF. They exist, you can read them later, but the plain idea "do not repeat the same fact" gets you 90% of the way.)
The full schema, all together
Here is how our four tables connect:
users
| (one-to-many: user_id on orders)
v
orders
| (many-to-many through order_items)
v
order_items -----> products
Read it as: a user has many orders, and an order links to many products through order_items.
Gotchas
-
Over-normalizing everything. You do not need a separate table for every tiny thing. A
countrycolumn onusersis usually fine as plain text. Do not build a 6-table maze for data that never changes and is never reused. -
A little duplication is sometimes okay. Storing the product's price at the time of sale inside
order_itemsis smart, not a sin. Prices change, but an old order should keep the price the customer actually paid. The test is: is this a snapshot (keep it) or a live fact (point to it)? -
Forgetting the foreign key constraint. Pointing by id only helps if the id is real. Add a
FOREIGN KEYso the database refuses orphan rows (an order pointing to a user that does not exist).
Recap
-
One-to-many: foreign key on the many side (
orders.user_id). -
Many-to-many: a join table in the middle (
order_items). -
One-to-one: like one-to-many, but the key is
UNIQUE. - Normalization: store each fact once, point to it by id, change it in one spot.
- Copying the same data everywhere causes update anomalies (stale, conflicting copies).
- Do not overdo it. Snapshots like the sale price are fine to store.
Your turn
You are building a blog. A post can have many tags, and a tag can be on many posts. What three tables do you need, and which one is the join table? If you can draw the little arrow diagram for a friend, you have got it.
Next up: Part 10: Subqueries and CTEs, where we put a query inside a query and then clean it all up with a tidy WITH block.
Top comments (1)
Part 09 Practice: Relationships and Normalization
Time to design and spot problems like a pro. These use our shared schema:
users,orders,products,order_items. Try each one before peeking at the solution.1. Spot the foreign key
In a one-to-many between
usersandorders, which table gets the foreign key column, and what is it called?Solution
Why: the foreign key always lives on the "many" side, because each of the many rows needs to remember its one parent.
2. Design a join table
An order can have many products, and a product can be in many orders. Write the
CREATE TABLEfor the join table that makes this work.Solution
Why: a many-to-many needs a table in the middle. Each row links one order to one product. Two foreign keys, one to each side.
3. Spot the duplication problem
A teammate built this table. What is wrong with it, and what is the bug called?
Solution
Why: the same fact (Sara's email) is stored in many places. One source of truth means you change it once and everything stays correct.
4. One-to-one design
You want to store each user's
bioandavatar_urlin a separateuser_profilestable, with exactly one profile per user. Write the table so a user cannot have two profiles.Solution
Why: a one-to-one is just a one-to-many with a
UNIQUEon the foreign key. TheUNIQUEblocks a second profile for the same user.5. Snapshot vs live fact
Should
order_itemsstore the product's price at the time of sale, or should it always read the current price fromproducts? Explain.Solution
Why: prices change over time. If you always read the current price, an old receipt would change whenever you update the product. A past sale is a snapshot, so a little "duplication" here is correct, not a mistake.
6. Draw the relationships
Without writing SQL, name the relationship type between each pair, and where the link lives:
(a) users and orders, (b) orders and products, (c) users and user_profiles.
Solution
Why: knowing the shape tells you exactly where the key goes. Many-to-many is the only one that needs a helper table.
Nice work. If you can look at any two tables and instantly say "one-to-many, key goes here" or "many-to-many, needs a join table," you are thinking like a database designer. See you in Part 10.