DEV Community

Mohamed Idris
Mohamed Idris

Posted on

Part 09: Relationships and Normalization

Part of the "SQL: Zero to Ninja" series, written for junior web developers.

You made a users table. You made an orders table. Now a real question hits you: how do these two tables actually know about each other? And while we are at it, why do senior devs freak out when you copy the same name into ten rows? Let me show you both, the easy way.

The idea in one line

Tables connect through relationships (one-to-many, many-to-many, one-to-one), and normalization just means: store each fact once, in one place, and point to it instead of copying it.

The metaphor: your home address

Imagine you send 50 letters this year. You could write your full home address on every single envelope, 50 times. But what happens when you move? Now you have 50 envelopes with the wrong address, and good luck fixing them all.

Smart people do this instead: they keep their address in one address book, and everywhere else they just say "see my address book." Move once, fix once, done.

That is the whole spirit of normalization. Keep one single source of truth, and point to it.

The three relationship shapes

1. One-to-many (the most common one)

One user can place many orders. But each order belongs to exactly one user.

We solve this with a foreign key: a column on the "many" side that holds the id of the "one" side. So orders gets a user_id column.

users                         orders
+----+--------+               +----+---------+--------+
| id | name   |               | id | user_id | total  |
+----+--------+               +----+---------+--------+
| 1  | Sara   | <-----------  | 10 | 1       | 50.00  |
| 2  | Omar   | <--+          | 11 | 1       | 20.00  |
+----+--------+    +--------  | 12 | 2       | 99.00  |
                              +----+---------+--------+
Enter fullscreen mode Exit fullscreen mode

Both order 10 and order 11 point back to Sara (user 1). The arrow lives on the orders side. Rule of thumb: the foreign key goes on the "many" side.

2. Many-to-many (needs a helper table)

Now the tricky one. An order can contain many products, and a product can appear in many orders. That is many on both sides.

You cannot solve this with one foreign key. There is no clean spot for it. So you add a small table in the middle, called a join table (or link table). For us, that is order_items.

orders        order_items                 products
+----+        +----+----------+------------+   +----+----------+
| id |        | id | order_id | product_id |   | id | name     |
+----+        +----+----------+------------+   +----+----------+
| 10 | <----- | 1  | 10       | 100        | --> | 100| Keyboard |
| 11 | <----- | 2  | 10       | 200        | --> | 200| Mouse    |
+----+        | 3  | 11       | 100        | --+ +----+----------+
              +----+----------+------------+
Enter fullscreen mode Exit fullscreen mode

Each row in order_items is one little fact: "this order has this product." Order 10 has a keyboard and a mouse. Product 100 (keyboard) shows up in both order 10 and order 11. The middle table makes "many on both sides" easy.

3. One-to-one (the rare one)

Sometimes one row matches exactly one row in another table. Example: a user and a user_profile (bio, avatar, theme settings). One user, one profile.

You handle it like a one-to-many, but the foreign key is marked UNIQUE so a user can have at most one profile.

users (1) ---- (1) user_profiles
Enter fullscreen mode Exit fullscreen mode

People often use this to keep a busy table slim, parking rarely-used columns in a side table. Nice to know it exists, but you will reach for it the least.

Normalization: the wrong way vs the right way

Here is the mistake almost everyone makes at first. They cram the user's info straight into every order:

-- WRONG: orders copies the user's name and email into every row
orders
+----+-----------+-------------------+--------+
| id | user_name | user_email        | total  |
+----+-----------+-------------------+--------+
| 10 | Sara      | sara@old.com      | 50.00  |
| 11 | Sara      | sara@old.com      | 20.00  |
| 12 | Sara      | sara@old.com      | 99.00  |
+----+-----------+-------------------+--------+
Enter fullscreen mode Exit fullscreen mode

Looks fine, until Sara changes her email to sara@new.com. Now you have to update it in every single order row. Miss one, and your data lies to you. You will have three rows saying old.com and pretend that is still Sara.

This bug has a name: an update anomaly. The same fact lives in many places, so one fact can fall out of sync with itself.

The fix is the address book idea. Store Sara once, and point to her:

-- RIGHT: store the user once, point by id
users
+----+------+----------------+
| id | name | email          |
+----+------+----------------+
| 1  | Sara | sara@new.com   |   <-- change it here, ONE time
+----+------+----------------+

orders
+----+---------+--------+
| id | user_id | total  |
+----+---------+--------+
| 10 | 1       | 50.00  |   <-- just points to Sara
| 11 | 1       | 20.00  |
| 12 | 1       | 99.00  |
+----+---------+--------+
Enter fullscreen mode Exit fullscreen mode

Sara changes her email? You update one row in users. Every order is instantly correct, because no order stored a copy in the first place. One source of truth.

(You may hear the fancy names for the rules behind this: 1NF, 2NF, 3NF. They exist, you can read them later, but the plain idea "do not repeat the same fact" gets you 90% of the way.)

The full schema, all together

Here is how our four tables connect:

users
  | (one-to-many: user_id on orders)
  v
orders
  | (many-to-many through order_items)
  v
order_items  ----->  products
Enter fullscreen mode Exit fullscreen mode

Read it as: a user has many orders, and an order links to many products through order_items.

Gotchas

  1. Over-normalizing everything. You do not need a separate table for every tiny thing. A country column on users is usually fine as plain text. Do not build a 6-table maze for data that never changes and is never reused.
  2. A little duplication is sometimes okay. Storing the product's price at the time of sale inside order_items is smart, not a sin. Prices change, but an old order should keep the price the customer actually paid. The test is: is this a snapshot (keep it) or a live fact (point to it)?
  3. Forgetting the foreign key constraint. Pointing by id only helps if the id is real. Add a FOREIGN KEY so the database refuses orphan rows (an order pointing to a user that does not exist).

Recap

  • One-to-many: foreign key on the many side (orders.user_id).
  • Many-to-many: a join table in the middle (order_items).
  • One-to-one: like one-to-many, but the key is UNIQUE.
  • Normalization: store each fact once, point to it by id, change it in one spot.
  • Copying the same data everywhere causes update anomalies (stale, conflicting copies).
  • Do not overdo it. Snapshots like the sale price are fine to store.

Your turn

You are building a blog. A post can have many tags, and a tag can be on many posts. What three tables do you need, and which one is the join table? If you can draw the little arrow diagram for a friend, you have got it.

Next up: Part 10: Subqueries and CTEs, where we put a query inside a query and then clean it all up with a tidy WITH block.

Top comments (1)

Collapse
 
edriso profile image
Mohamed Idris

Part 09 Practice: Relationships and Normalization

Time to design and spot problems like a pro. These use our shared schema: users, orders, products, order_items. Try each one before peeking at the solution.


1. Spot the foreign key

In a one-to-many between users and orders, which table gets the foreign key column, and what is it called?

Solution

-- The "many" side gets it. Orders is the many side (one user, many orders).
-- So orders gets a user_id column pointing back to users.id

CREATE TABLE orders (
  id INT PRIMARY KEY,
  user_id INT,                       -- the foreign key
  total DECIMAL(10,2),
  status VARCHAR(20),
  created_at TIMESTAMP,
  FOREIGN KEY (user_id) REFERENCES users(id)
);
Enter fullscreen mode Exit fullscreen mode

Why: the foreign key always lives on the "many" side, because each of the many rows needs to remember its one parent.


2. Design a join table

An order can have many products, and a product can be in many orders. Write the CREATE TABLE for the join table that makes this work.

Solution

CREATE TABLE order_items (
  id INT PRIMARY KEY,
  order_id INT,
  product_id INT,
  quantity INT,
  FOREIGN KEY (order_id) REFERENCES orders(id),
  FOREIGN KEY (product_id) REFERENCES products(id)
);
Enter fullscreen mode Exit fullscreen mode

Why: a many-to-many needs a table in the middle. Each row links one order to one product. Two foreign keys, one to each side.


3. Spot the duplication problem

A teammate built this table. What is wrong with it, and what is the bug called?

orders
+----+-----------+----------------+--------+
| id | user_name | user_email     | total  |
+----+-----------+----------------+--------+
| 10 | Sara      | sara@mail.com  | 50.00  |
| 11 | Sara      | sara@mail.com  | 20.00  |
+----+-----------+----------------+--------+
Enter fullscreen mode Exit fullscreen mode

Solution

-- The user's name and email are copied into every order row.
-- When Sara changes her email, you must update many rows, and
-- if you miss one your data goes out of sync. This is an UPDATE ANOMALY.

-- Fix: store the user once in users, and point by id.
orders
+----+---------+--------+
| id | user_id | total  |
+----+---------+--------+
| 10 | 1       | 50.00  |
| 11 | 1       | 20.00  |
+----+---------+--------+
Enter fullscreen mode Exit fullscreen mode

Why: the same fact (Sara's email) is stored in many places. One source of truth means you change it once and everything stays correct.


4. One-to-one design

You want to store each user's bio and avatar_url in a separate user_profiles table, with exactly one profile per user. Write the table so a user cannot have two profiles.

Solution

CREATE TABLE user_profiles (
  id INT PRIMARY KEY,
  user_id INT UNIQUE,                -- UNIQUE makes it one-to-one
  bio TEXT,
  avatar_url VARCHAR(255),
  FOREIGN KEY (user_id) REFERENCES users(id)
);
Enter fullscreen mode Exit fullscreen mode

Why: a one-to-one is just a one-to-many with a UNIQUE on the foreign key. The UNIQUE blocks a second profile for the same user.


5. Snapshot vs live fact

Should order_items store the product's price at the time of sale, or should it always read the current price from products? Explain.

Solution

-- Store a snapshot of the price at sale time:
CREATE TABLE order_items (
  id INT PRIMARY KEY,
  order_id INT,
  product_id INT,
  quantity INT,
  unit_price DECIMAL(10,2),          -- the price the customer actually paid
  FOREIGN KEY (order_id) REFERENCES orders(id),
  FOREIGN KEY (product_id) REFERENCES products(id)
);
Enter fullscreen mode Exit fullscreen mode

Why: prices change over time. If you always read the current price, an old receipt would change whenever you update the product. A past sale is a snapshot, so a little "duplication" here is correct, not a mistake.


6. Draw the relationships

Without writing SQL, name the relationship type between each pair, and where the link lives:
(a) users and orders, (b) orders and products, (c) users and user_profiles.

Solution

(a) users <-> orders        : one-to-many   (foreign key user_id on orders)
(b) orders <-> products     : many-to-many  (join table order_items in the middle)
(c) users <-> user_profiles : one-to-one    (UNIQUE foreign key user_id on profiles)
Enter fullscreen mode Exit fullscreen mode

Why: knowing the shape tells you exactly where the key goes. Many-to-many is the only one that needs a helper table.


Nice work. If you can look at any two tables and instantly say "one-to-many, key goes here" or "many-to-many, needs a join table," you are thinking like a database designer. See you in Part 10.