Igor Nosatov

Posted on Feb 13

Haunted House: A MySQL Normalization Story

#mysql #data #webdev #programming

This is the story of how I learned what structured databases really mean—and why MySQL normalization isn't just academic mumbo-jumbo, but the difference between a sturdy home and a paranormal nightmare.

🏚️ The Haunted Architecture: What I Built Wrong

Picture a house where:

The same family portrait hangs in every room
Each bathroom has its own separate water heater
Your address is written on every wall
When you repaint one room, you have to repaint ALL rooms

Absurd, right? Yet that's exactly what my database looked like:

-- My "clever" single-table design (DON'T DO THIS)
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_name VARCHAR(100),
    customer_email VARCHAR(100),
    customer_phone VARCHAR(20),
    customer_address TEXT,
    product_name VARCHAR(100),
    product_price DECIMAL(10,2),
    product_category VARCHAR(50),
    quantity INT,
    order_date DATETIME
);

Look innocent? Let me show you what happens when a customer orders three items:

+----------+---------------+-------------------+-------------------------------+
| order_id | customer_name | customer_email    | product_name                  |
+----------+---------------+-------------------+-------------------------------+
| 1        | Alice Chen    | alice@email.com   | Wireless Mouse                |
| 2        | Alice Chen    | alice@email.com   | USB-C Cable                   |
| 3        | Alice Chen    | alice@email.com   | Laptop Stand                  |
+----------+---------------+-------------------+-------------------------------+

Three rows. One customer. Three copies of her email address.

Now imagine Alice gets married and changes her email. How many rows do you update? You don't know. You have to search the entire table. And if you miss even one row? Alice now exists as two different people in your system—a data ghost.

This is called an UPDATE ANOMALY, and it's just one of the poltergeists that haunt poorly structured databases.

🔦 The Exorcism: Understanding Database Normalization

Database normalization is like calling in a professional paranormal investigator. It's the systematic process of eliminating ghosts (redundancy) and sealing doorways (dependencies) where anomalies can sneak in.

The Three Forms of Normal (3NF): Your Ghost-Busting Toolkit

Think of normalization like organizing a chaotic house:

First Normal Form (1NF): One Item Per Drawer

No repeating groups
Each cell contains a single, atomic value
Like ensuring each drawer holds only one type of item

Second Normal Form (2NF): Everything in Its Place

Must be in 1NF first
No partial dependencies
Like putting winter clothes in the winter closet, not scattered around

Third Normal Form (3NF): No Redundant Storage

Must be in 2NF first
No transitive dependencies
Like having ONE address book, not copies in every room

Let's exorcise my haunted database.

🛠️ The Rebuild: Structured Database Design

Here's how I restructured the database—transforming a haunted house into a well-organized home:

-- CUSTOMERS TABLE: One source of truth for customer data
CREATE TABLE customers (
    customer_id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(100) UNIQUE NOT NULL,
    phone VARCHAR(20),
    address TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_email (email)
);

-- PRODUCTS TABLE: One source of truth for product data
CREATE TABLE products (
    product_id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(100) NOT NULL,
    category VARCHAR(50),
    price DECIMAL(10,2) NOT NULL,
    stock_quantity INT DEFAULT 0,
    INDEX idx_category (category)
);

-- ORDERS TABLE: The relationship between customers and their orders
CREATE TABLE orders (
    order_id INT PRIMARY KEY AUTO_INCREMENT,
    customer_id INT NOT NULL,
    order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    status ENUM('pending', 'processing', 'shipped', 'delivered') DEFAULT 'pending',
    total_amount DECIMAL(10,2),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id),
    INDEX idx_customer (customer_id),
    INDEX idx_date (order_date)
);

-- ORDER_ITEMS TABLE: The bridge between orders and products
CREATE TABLE order_items (
    order_item_id INT PRIMARY KEY AUTO_INCREMENT,
    order_id INT NOT NULL,
    product_id INT NOT NULL,
    quantity INT NOT NULL,
    price_at_purchase DECIMAL(10,2) NOT NULL,
    FOREIGN KEY (order_id) REFERENCES orders(order_id) ON DELETE CASCADE,
    FOREIGN KEY (product_id) REFERENCES products(product_id),
    INDEX idx_order (order_id)
);

🎨 Visualizing the Architecture

Here's what this structure looks like:

         ┌─────────────┐
         │  CUSTOMERS  │
         │-------------|
         │ customer_id │────┐
         │ name        │    │
         │ email       │    │
         │ phone       │    │
         │ address     │    │
         └─────────────┘    │
                            │
                            │ 1:M (One to Many)
                            │
         ┌─────────────┐    │
         │   ORDERS    │◄───┘
         │-------------|
         │ order_id    │────┐
         │ customer_id │    │
         │ order_date  │    │
         │ status      │    │
         └─────────────┘    │
                            │ 1:M
                            │
         ┌─────────────┐    │
         │ ORDER_ITEMS │◄───┘
         │-------------|
         │ item_id     │
         │ order_id    │───────┐
         │ product_id  │◄──┐   │
         │ quantity    │   │   │
         │ price_at_   │   │   │
         │   purchase  │   │   │
         └─────────────┘   │   │
                           │   │
         ┌─────────────┐   │   │
         │  PRODUCTS   │◄──┘   │
         │-------------|       │ M:M (Many to Many)
         │ product_id  │       │ via ORDER_ITEMS
         │ name        │       │
         │ category    │       │
         │ price       │       │
         │ stock       │       │
         └─────────────┘       │
                               │
              Bridge Table ────┘

🎯 Why This Matters: The Ghost Stories We Prevent

Let me show you what we've fixed:

Anomaly #1: The Duplication Demon ❌ → ✅

Before: Alice's email existed in 47 rows
After: Alice's email exists in exactly ONE row

-- Update Alice's email: ONE update, not 47
UPDATE customers 
SET email = 'alice.wong@newdomain.com' 
WHERE customer_id = 42;

Anomaly #2: The Deletion Disaster ❌ → ✅

Before: Delete Alice's last order → Alice disappears from the database
After: Delete an order → Customer data remains intact

-- Delete an order safely
DELETE FROM orders WHERE order_id = 999;
-- Alice still exists in customers table!

Anomaly #3: The Insertion Impossibility ❌ → ✅

Before: Can't add a new product without creating a fake order
After: Add products anytime

-- Add a new product immediately
INSERT INTO products (name, category, price, stock_quantity)
VALUES ('4K Webcam', 'Electronics', 129.99, 50);

🚀 Real-World Query Examples: Putting It To Work

Now let's see how this structured approach makes complex queries elegant:

Query 1: Find all orders for a customer

SELECT 
    o.order_id,
    o.order_date,
    o.status,
    o.total_amount,
    COUNT(oi.order_item_id) as item_count
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
WHERE o.customer_id = 42
GROUP BY o.order_id
ORDER BY o.order_date DESC;

Query 2: Best-selling products

SELECT 
    p.name,
    p.category,
    SUM(oi.quantity) as total_sold,
    SUM(oi.quantity * oi.price_at_purchase) as revenue
FROM products p
JOIN order_items oi ON p.product_id = oi.product_id
GROUP BY p.product_id
ORDER BY total_sold DESC
LIMIT 10;

Query 3: Customer lifetime value

SELECT 
    c.customer_id,
    c.name,
    c.email,
    COUNT(DISTINCT o.order_id) as order_count,
    SUM(o.total_amount) as lifetime_value,
    AVG(o.total_amount) as avg_order_value
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id
ORDER BY lifetime_value DESC;

⚡ Performance Considerations: Speed vs. Structure

"But won't all these JOINs slow things down?" you ask.

Here's the truth: Properly indexed normalized tables often outperform denormalized ones.

Indexing Strategy

-- Speed up common queries
CREATE INDEX idx_customer_email ON customers(email);
CREATE INDEX idx_order_customer_date ON orders(customer_id, order_date);
CREATE INDEX idx_product_category ON products(category);
CREATE INDEX idx_order_items_composite ON order_items(order_id, product_id);

-- Analyze query performance
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;

When to Denormalize (The Exception, Not the Rule)

Sometimes you do need to break normalization—for read-heavy analytics or caching:

-- Materialized view for dashboard (updated nightly)
CREATE TABLE customer_stats_cache (
    customer_id INT PRIMARY KEY,
    total_orders INT,
    total_spent DECIMAL(10,2),
    last_order_date DATE,
    updated_at TIMESTAMP
);

But this is a conscious trade-off, not a design flaw.

🎓 Lessons From the Haunted Database

Normalization isn't academic—it's practical ghost-busting
Every violation creates a potential bug
Think in relationships, not spreadsheets
Databases model the real world, where entities are connected
One source of truth prevents data ghosts
Update once, reflect everywhere
Foreign keys are your friend
They enforce referential integrity at the database level
Start normalized, denormalize deliberately
Optimize based on measured performance, not assumptions

🔮 Your Turn: A Challenge

Here's a database design smell for you to fix:

-- What's wrong with this table?
CREATE TABLE employee_projects (
    id INT PRIMARY KEY,
    employee_name VARCHAR(100),
    employee_dept VARCHAR(50),
    dept_manager VARCHAR(100),
    project_name VARCHAR(100),
    project_budget DECIMAL(10,2),
    hours_worked INT
);

Hint: How many ghosts can you spot? How would you restructure this?

Drop your answer in the comments—I'd love to see different approaches!

📚 Going Deeper

Want to master database design?

Practice: Design a database for a library, restaurant, or social network
Read: "Database Design for Mere Mortals" by Michael J. Hernandez
Tool: Use dbdiagram.io to visualize your schemas
Challenge: Try modeling many-to-many relationships with attributes

DEV Community