DEV Community

丁久
丁久

Posted on • Originally published at dingjiu1989-hue.github.io

Database Design Fundamentals: Normalization, Indexing, and Schema Design

This article was originally published on AI Study Room. For the full version with working code examples and related articles, visit the original post.

Database Design Fundamentals: Normalization, Indexing, and Schema Design

A well-designed database makes every query simpler and faster. A poorly-designed one creates bugs, slow queries, and painful migrations forever. Here are the fundamentals every developer should know before creating tables.

1. Normalization — Reduce Redundancy

Normalization eliminates duplicate data and prevents update anomalies. You need at least 3NF (Third Normal Form) for most applications.

1NF: Atomic Values, No Repeating Groups

-- ❌ Denormalized: multiple phone numbers in one field
| id | name | phones |
| 1 | Alice | "555-0001, 555-0002"|

-- ✅ 1NF: each value is atomic, or use a separate table
| id | name |
| 1 | Alice |
| id | user_id | phone |
| 1 | 1 | 555-0001 |
| 2 | 1 | 555-0002 |

Enter fullscreen mode Exit fullscreen mode




2NF: No Partial Dependencies

Every non-key column must depend on the WHOLE primary key, not part of it. This only applies to tables with composite keys.

-- ❌ 2NF violation: course_name depends only on course_id, not the full key
| student_id | course_id | course_name | grade |
| 1 | CS101 | Intro CS | A |

-- ✅ 2NF: split into two tables
Students: student_id → course_id → grade
Courses: course_id → course_name

Enter fullscreen mode Exit fullscreen mode




3NF: No Transitive Dependencies

Non-key columns must not depend on other non-key columns.

-- ❌ 3NF violation: city_population depends on city, not directly on the key
| student_id | city | city_population |
| 1 | Boston | 675000 |

-- ✅ 3NF: city_population in a cities table
Students: student_id → city_id
Cities: city_id → name, population

Enter fullscreen mode Exit fullscreen mode



  1. Indexing — Speed Up Queries

Index Type Best For Example
B-Tree (default) Equality, range, sorting WHERE email = ?, ORDER BY created_at
Composite Multi-column queries WHERE user_id = ? AND status = ?
Partial Filtering by condition WHERE deleted_at IS NULL
Full-text (GIN/GiST) Text search WHERE body @@ to_tsquery('typescript')
Unique Enforce uniqueness UNIQUE(email)

Index Rules of Thumb

  • Index WHERE and JOIN columns. Every foreign key gets an index.
  • Composite index column order matters. Put the most selective column first.
  • Don't over-index. Each index slows down INSERT/UPDATE/DELETE.
  • Use EXPLAIN ANALYZE. Verify the index is actually being used.

3. Relationship Types

-- One-to-Many (most common):
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
user_id INT REFERENCES users(id), -- FK to users
title TEXT NOT NULL
); -- One user → many posts

-- Many-to-Many (use junction table):
CREATE TABLE post_tags (
post_id INT REFERENCES posts(id),
tag_id INT REFERENCES tags(id),
PRIMARY KEY (post_id, tag_id)
); -- One post → many tags, one tag → many posts

-- One-to-One (rare, use for optional extension):
CREATE TABLE user_profiles (
user_id INT PRIMARY KEY REFERENCES users(id),
bio TEXT
); -- One user → one profile

Enter fullscreen mode Exit fullscreen mode



  1. Common Schema Design Mistakes

Mistake Why It's Bad Fix
Using VARCHAR for everything No constraints, wasted space Use appropriate types (UUID, INT, TIMESTAMPTZ, TEXT)
Storing JSON blobs instead of columns Can't index, can't query efficiently Relational columns first, JSONB only for truly dynamic data
No TIMESTAMPTZ Timezone bugs are a nightmare Always use TIMESTAMPTZ, store UTC
Missing foreign key constraints Orphaned data, referential chaos Always add FK constraints (ON DELETE CASCADE or SET NULL)
EAV (Entity-Attribute-Value) Unqueriable soup Use JSONB for dynamic fields per row, or normal columns

5. Choosing Your Primary Key

Strategy Pros Cons
UUID v4 No collisions, client-generated, no sequence contention Larger (16 bytes), fragmented index, slower joins
UUID v7 Time-ordered, all UUID benefits Slightly more complex generation
Auto-increment INT Small index, fast joins, ordered Predictable, can't merge across servers, exposes count
Auto-increment BIGINT Same as INT, won't overflow Same cons. Use this over INT for new projects.
Nano ID / CUID2 URL-safe, collision-resistant String-based (slower than UUID in Postgres)

Recommendation: UUID v7 for distributed systems and public-facing IDs. BIGINT for internal tables. Avoid exposing auto-increment IDs in URLs.

Bottom line: Normalize to 3NF, index every FK and query column, use appropriate data types, and always have foreign key constraints. A well-designed schema is cheaper to fix now than later. See also: database comparison and ORM comparison.


Read the full article on AI Study Room for complete code examples, comparison tables, and related resources.

Found this useful? Check out more developer guides and tool comparisons on AI Study Room.

Top comments (0)