丁久

Posted on May 11 • Originally published at dingjiu1989-hue.github.io

Database Design Fundamentals: Normalization, Indexing, and Schema Design

#database #sql #schemadesign #backend

This article was originally published on AI Study Room. For the full version with working code examples and related articles, visit the original post.

Database Design Fundamentals: Normalization, Indexing, and Schema Design

A well-designed database makes every query simpler and faster. A poorly-designed one creates bugs, slow queries, and painful migrations forever. Here are the fundamentals every developer should know before creating tables.

1. Normalization — Reduce Redundancy

Normalization eliminates duplicate data and prevents update anomalies. You need at least 3NF (Third Normal Form) for most applications.

1NF: Atomic Values, No Repeating Groups

-- ❌ Denormalized: multiple phone numbers in one field

| id | name  | phones              |

| 1  | Alice | "555-0001, 555-0002"|

-- ✅ 1NF: each value is atomic, or use a separate table

| id | name  |

| 1  | Alice |

| id | user_id | phone     |

| 1  | 1       | 555-0001  |

| 2  | 1       | 555-0002  |

2NF: No Partial Dependencies

Every non-key column must depend on the WHOLE primary key, not part of it. This only applies to tables with composite keys.

-- ❌ 2NF violation: course_name depends only on course_id, not the full key

| student_id | course_id | course_name | grade |

| 1          | CS101     | Intro CS    | A     |

-- ✅ 2NF: split into two tables

Students: student_id → course_id → grade

Courses: course_id → course_name

3NF: No Transitive Dependencies

Non-key columns must not depend on other non-key columns.

-- ❌ 3NF violation: city_population depends on city, not directly on the key

| student_id | city    | city_population |

| 1          | Boston  | 675000          |

-- ✅ 3NF: city_population in a cities table

Students: student_id → city_id

Cities: city_id → name, population

Indexing — Speed Up Queries

Index Type	Best For	Example
B-Tree (default)	Equality, range, sorting	WHERE email = ?, ORDER BY created_at
Composite	Multi-column queries	WHERE user_id = ? AND status = ?
Partial	Filtering by condition	WHERE deleted_at IS NULL
Full-text (GIN/GiST)	Text search	WHERE body @@ to_tsquery('typescript')
Unique	Enforce uniqueness	UNIQUE(email)

Index Rules of Thumb

Index WHERE and JOIN columns. Every foreign key gets an index.
Composite index column order matters. Put the most selective column first.
Don't over-index. Each index slows down INSERT/UPDATE/DELETE.
Use EXPLAIN ANALYZE. Verify the index is actually being used.

3. Relationship Types

-- One-to-Many (most common):

CREATE TABLE posts (

  id SERIAL PRIMARY KEY,

  user_id INT REFERENCES users(id),  -- FK to users

  title TEXT NOT NULL

);  -- One user → many posts

-- Many-to-Many (use junction table):

CREATE TABLE post_tags (

  post_id INT REFERENCES posts(id),

  tag_id INT REFERENCES tags(id),

  PRIMARY KEY (post_id, tag_id)

);  -- One post → many tags, one tag → many posts

-- One-to-One (rare, use for optional extension):

CREATE TABLE user_profiles (

  user_id INT PRIMARY KEY REFERENCES users(id),

  bio TEXT

);  -- One user → one profile

Common Schema Design Mistakes

Mistake	Why It's Bad	Fix
Using VARCHAR for everything	No constraints, wasted space	Use appropriate types (UUID, INT, TIMESTAMPTZ, TEXT)
Storing JSON blobs instead of columns	Can't index, can't query efficiently	Relational columns first, JSONB only for truly dynamic data
No TIMESTAMPTZ	Timezone bugs are a nightmare	Always use TIMESTAMPTZ, store UTC
Missing foreign key constraints	Orphaned data, referential chaos	Always add FK constraints (ON DELETE CASCADE or SET NULL)
EAV (Entity-Attribute-Value)	Unqueriable soup	Use JSONB for dynamic fields per row, or normal columns

5. Choosing Your Primary Key

Strategy	Pros	Cons
UUID v4	No collisions, client-generated, no sequence contention	Larger (16 bytes), fragmented index, slower joins
UUID v7	Time-ordered, all UUID benefits	Slightly more complex generation
Auto-increment INT	Small index, fast joins, ordered	Predictable, can't merge across servers, exposes count
Auto-increment BIGINT	Same as INT, won't overflow	Same cons. Use this over INT for new projects.
Nano ID / CUID2	URL-safe, collision-resistant	String-based (slower than UUID in Postgres)

Recommendation: UUID v7 for distributed systems and public-facing IDs. BIGINT for internal tables. Avoid exposing auto-increment IDs in URLs.

Bottom line: Normalize to 3NF, index every FK and query column, use appropriate data types, and always have foreign key constraints. A well-designed schema is cheaper to fix now than later. See also: database comparison and ORM comparison.

Read the full article on AI Study Room for complete code examples, comparison tables, and related resources.

Found this useful? Check out more developer guides and tool comparisons on AI Study Room.

DEV Community

Database Design Fundamentals: Normalization, Indexing, and Schema Design

Database Design Fundamentals: Normalization, Indexing, and Schema Design

1. Normalization — Reduce Redundancy

1NF: Atomic Values, No Repeating Groups

2NF: No Partial Dependencies

3NF: No Transitive Dependencies

Indexing — Speed Up Queries

Index Rules of Thumb

3. Relationship Types

Common Schema Design Mistakes

5. Choosing Your Primary Key

Top comments (0)