Muhammad Junaid

Posted on Feb 22

From Idea to Database: A Practical Guide to Database Design

#database #beginners #sql #tutorial

A well-designed database makes everything downstream easier — queries are intuitive, the ORM doesn't fight you, and six months later you're not cursing past-you for shoving three concepts into one table.

A bad one? Data inconsistencies, N+1 queries, and mysterious bugs where deleting one record breaks everything else.

I spent my first year building apps without thinking about databases. I'd write APIs, mess with React, chase new frameworks — anything to avoid thinking about schemas. When I finally had to design one from scratch for a freelance client, I stared at a blank screen for hours. The problem wasn't that I didn't know SQL. It was that I had no process for turning a messy set of requirements into a clean set of tables.

So I built one. Nine steps, one realistic example, and everything I learned the hard way.

Whether you're building a SaaS, a side project, or just trying to survive your database course — the process is the same.

TL;DR — The 9 steps: (1) Understand the problem, (2) Identify entities, (3) Define attributes, (4) Map relationships, (5) Draw an ERD, (6) Define keys, (7) Normalize, (8) Write the schema, (9) Test with real queries. Full PostgreSQL schema included below.

The 9-Step Process

Step 1: Understand the Problem

Before touching any SQL, ask: what are we actually building?

Talk to the people who'll use it. If it's a personal project, talk to yourself — no judgment. You need clarity on three things:

What data needs to be stored?
What actions will users perform?
What information needs to be retrieved?

Write this down. Vague requirements lead to endless redesigns.

Our example: We're building WorkSync, a workspace collaboration platform. Remote teams use it to manage projects, share AI-generated content, collaborate in real-time, and track progress. Team members can create workspaces, start projects, assign tasks, share documents (reports, designs, code), and leave comments. There's also time tracking built in.

I'll be using PostgreSQL syntax throughout this article — if you're on MySQL or SQLite, the concepts are identical but some syntax differs (e.g., AUTO_INCREMENT instead of SERIAL, ON UPDATE CURRENT_TIMESTAMP instead of triggers for auto-updating timestamps).

Step 2: Identify Your Entities

Entities are the "things" in your system — they become your database tables. Look for nouns in your requirements. Every time you catch yourself saying "we need to track..." or "users can create...", what follows is probably an entity.

For WorkSync:

users — team members on the platform
workspaces — containers for related projects
projects — specific initiatives within a workspace
tasks — individual work items
documents — AI-generated or uploaded content
comments — feedback on tasks or documents
time_entries — time tracking for tasks

Not every noun becomes an entity. "Password" is an attribute of a user. "Assignment" is a relationship, not an entity. The test: does this thing have multiple properties? Does it exist independently? If yes to both, it's an entity.

I'm using plural, snake_case names. Pick a convention early and stick to it everywhere.

Step 3: Define Attributes

Attributes are the properties of each entity — they become your columns. For each entity, ask: what do I need to know about this thing?

users:

id — unique identifier
username — display name
email — contact and login
password_hash — never store plain passwords, ever
global_role — platform-level role (admin/member/guest)
avatar_url — profile picture
created_at — when they joined
updated_at — last profile modification

workspaces:

id — unique identifier
name — workspace title
description — purpose/overview
owner_id — who created it
created_at — when it was created
updated_at — last modification
deleted_at — soft delete timestamp (NULL means active)

Every entity below also includes id, created_at, and updated_at — I'll omit those for brevity. The full schema is in Step 8.

projects: workspace_id, name, description, status (planning/active/completed/archived), start_date, due_date, created_by
tasks: project_id, title, description, status (todo/in_progress/review/done), priority (low/medium/high/urgent), assigned_to, created_by, due_date, estimated_hours
documents: project_id, title, content, document_type (report/design/code/presentation), is_ai_generated, created_by
comments: user_id, task_id (nullable), document_id (nullable), content
time_entries: user_id, task_id, hours, description, date

A few design decisions worth explaining:

updated_at is on almost everything. You'll always need to know when something was last modified — for cache invalidation, audit trails, UI timestamps. I've never regretted adding it. I have regretted not adding it.

deleted_at instead of is_active. A nullable timestamp gives you soft deletes and tells you when it was deleted. More useful than a boolean. Only add it to tables where you actually need soft deletes.

global_role instead of role on users. We'll also have a per-workspace workspace_role later. If you name both role, you'll confuse yourself when writing queries that join these tables.

Comments use two nullable foreign keys instead of polymorphism. I'll explain why in Step 4.

Step 4: Map Relationships

This is where things get interesting. How do your entities connect?

Three types of relationships exist:

One-to-One (1:1): One record in Table A relates to exactly one in Table B.

Example: A user has one profile settings record.
This is rare. Often it means the tables should be merged, unless there's a specific reason to separate them (like keeping frequently accessed data separate from rarely used data).

One-to-Many (1:N): One record in Table A relates to many in Table B.

Example: One workspace contains many projects, but each project belongs to one workspace.
This is the most common relationship type. It appears whenever one entity "owns" or "contains" multiple instances of another.

Many-to-Many (M:N): Records in both tables can relate to many in the other.

Example: A user can be a member of many workspaces, and a workspace has many members.
This requires a junction table to work properly. More on that below.

WorkSync relationships:

users → workspaces (ownership): 1:N — one user can own many workspaces
users ↔ workspaces (membership): M:N — users join many workspaces, workspaces have many members
workspaces → projects: 1:N — one workspace contains many projects
projects → tasks: 1:N — one project has many tasks
projects → documents: 1:N — one project has many documents
users → tasks (assignment): 1:N — one user can be assigned many tasks
users → time_entries: 1:N — one user logs many time entries
tasks → time_entries: 1:N — one task has many time entries
tasks → comments: 1:N — one task has many comments
documents → comments: 1:N — one document has many comments

The pattern: In a 1:N relationship, the "many" side stores a foreign key pointing to the "one" side. In an M:N relationship, create a separate junction table that references both sides.

The M:N problem. Many-to-many relationships can't be directly represented in SQL. You need a junction table (also called a join table or bridge table). For user ↔ workspace membership, we'll create workspace_members. This is where the per-workspace role lives — admin, member, or viewer within that specific workspace, which is totally separate from the user's global_role.

Why I avoided polymorphic comments. You'll see tutorials use the commentable_type + commentable_id pattern — a single column that could reference either the tasks table or the documents table. It looks elegant. The problem? You cannot enforce a foreign key constraint on it. The database has no idea which table commentable_id points to, so referential integrity goes out the window. You've traded a clean-looking schema for orphaned records — the exact thing foreign keys exist to prevent.

Instead, I use two nullable foreign keys: task_id and document_id. A comment belongs to one or the other, enforced at the database level with a CHECK constraint:

-- A comment belongs to EITHER a task or a document, never both
-- This is enforceable at the database level
ALTER TABLE comments ADD CONSTRAINT chk_comment_target
    CHECK (
        (task_id IS NOT NULL AND document_id IS NULL) OR
        (task_id IS NULL AND document_id IS NOT NULL)
    );

You get real foreign keys, real cascading deletes, and the database actually protects your data. This pattern scales cleanly to 2–3 target tables. If you need comments on 10+ different entity types, then polymorphism or a separate comment_targets table starts making sense. But for two? Keep it simple and type-safe.

Step 5: Draw an ERD (Entity-Relationship Diagram)

Visualize your database before writing any SQL. You don't need fancy tools — even a napkin sketch helps. The point is to spot problems early, not to make something pretty.

Here's the complete ERD for WorkSync:

The diagram makes it easy to trace connections: a user creates a workspace, adds projects, creates tasks within those projects, and logs time on specific tasks. The comments table has two optional FK arrows (one from tasks, one from documents) reflecting the nullable FK approach.

Step 6: Define Primary and Foreign Keys

Keys create the structure that holds everything together.

Primary Keys (PK): Every table needs a unique identifier. I use SERIAL — PostgreSQL's auto-incrementing integer.

Avoid using business data as primary keys. Email addresses change. Usernames get updated. Phone numbers get recycled. An auto-incrementing ID never changes, which means your relationships stay stable.

A note on UUIDs: For a multi-tenant collaboration platform, you might consider UUIDs instead of auto-incrementing integers. They don't expose record counts, work well in distributed systems, and are safe to expose in URLs. The tradeoff: they're larger (16 bytes vs 4), slightly slower to index, and harder to debug. For most projects starting out, integers are fine. Switch to UUIDs when you have a real reason to.

Foreign Keys (FK): These reference a primary key in another table. They're your data integrity guardrails — the database will refuse invalid references.

Here's the junction table for workspace membership:

CREATE TABLE workspace_members (
    id SERIAL PRIMARY KEY,
    workspace_id INT NOT NULL,
    user_id INT NOT NULL,
    workspace_role VARCHAR(20) DEFAULT 'member',  -- 'admin', 'member', 'viewer'
    joined_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (workspace_id) REFERENCES workspaces(id) ON DELETE CASCADE,
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE,
    UNIQUE(workspace_id, user_id)  -- prevent duplicate memberships
);

And tasks, which demonstrates multiple FKs with different delete behaviors:

CREATE TABLE tasks (
    id SERIAL PRIMARY KEY,
    project_id INT NOT NULL,
    title VARCHAR(200) NOT NULL,
    description TEXT,
    status VARCHAR(20) DEFAULT 'todo',
    priority VARCHAR(20) DEFAULT 'medium',
    assigned_to INT,              -- nullable: task can be unassigned
    created_by INT NOT NULL,
    due_date DATE,
    estimated_hours DECIMAL(5,2),
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE,
    FOREIGN KEY (assigned_to) REFERENCES users(id) ON DELETE SET NULL,
    FOREIGN KEY (created_by) REFERENCES users(id) ON DELETE RESTRICT
);

Notice the three different ON DELETE strategies:

CASCADE on project_id — delete the project, delete its tasks. A task without a project is meaningless.
SET NULL on assigned_to — user leaves? Task becomes unassigned instead of disappearing. The work still exists.
RESTRICT on created_by — you can't delete a user who created tasks. Handle reassignment first. Every task should have a traceable creator for audit purposes.

These choices depend on your business logic. Be intentional about each one. If you leave ON DELETE unspecified, most databases default to RESTRICT (or NO ACTION) — which can bite you when someone tries to delete a user and gets a cryptic constraint violation with no idea which table is blocking them.

Step 7: Apply Normalization

Normalization is about organizing data to reduce redundancy. There's a whole academic framework here, but in practice you need to get to Third Normal Form (3NF) and then stop overthinking it.

First Normal Form (1NF): Each column holds one value. No comma-separated lists.

Second Normal Form (2NF): Every non-key column depends on the entire primary key. This mostly matters for composite keys.

Third Normal Form (3NF): No column depends on another non-key column.

Let me show what this means with concrete examples, not abstract definitions.

A 1NF violation might look like storing tags = "design, urgent, frontend" on a task. You can't efficiently query "find all tasks tagged urgent" without string parsing. The fix: a tags table and a task_tags junction table.

A 3NF violation would be storing workspace_name on the projects table alongside workspace_id. The workspace name depends on workspace_id, not on the project — so it's a transitive dependency. Any time the workspace is renamed, you'd have to update every project row. Instead, just JOIN through workspace_id when you need the name.

-- Bad: duplicated data, violates 3NF
CREATE TABLE projects (
    id INT PRIMARY KEY,
    workspace_id INT,
    workspace_name VARCHAR(100),  -- depends on workspace_id, not on the project
    name VARCHAR(200)
);

-- Good: store the relationship, not the data
CREATE TABLE projects (
    id INT PRIMARY KEY,
    workspace_id INT,
    name VARCHAR(200),
    FOREIGN KEY (workspace_id) REFERENCES workspaces(id)
);

Storage is cheap. Inconsistent data is expensive. Join when you need the workspace name.

Our WorkSync schema is already in 3NF — every non-key column depends only on its table's primary key, and there's no redundant data. For 95% of applications, 3NF is the sweet spot between clean design and practical query performance.

When to stop normalizing: Reach 3NF and stop. Going further often adds complexity without meaningful benefit. Sometimes intentional denormalization makes sense for performance — like caching a computed value that's expensive to calculate. But only do this when profiling proves it's necessary, not preemptively.

Step 8: Write the Schema

Now let's put it all together. Here's the complete PostgreSQL schema for WorkSync.

If you're on MySQL, the main differences are: AUTO_INCREMENT instead of SERIAL, inline INDEX inside CREATE TABLE, and ON UPDATE CURRENT_TIMESTAMP for auto-updating timestamps (PostgreSQL uses a trigger, shown below).

-- Users table
-- Note: "user" is a reserved word in PostgreSQL.
-- Always use plural table names to avoid this trap.
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    username VARCHAR(50) UNIQUE NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    password_hash VARCHAR(255) NOT NULL,
    global_role VARCHAR(20) DEFAULT 'member',
    avatar_url VARCHAR(500),
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

-- Workspaces
CREATE TABLE workspaces (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    description TEXT,
    owner_id INT NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    deleted_at TIMESTAMP NULL DEFAULT NULL, -- soft deletes
    FOREIGN KEY (owner_id) REFERENCES users(id) ON DELETE RESTRICT
);

-- Junction table for workspace membership (M:N)
CREATE TABLE workspace_members (
    id SERIAL PRIMARY KEY,
    workspace_id INT NOT NULL,
    user_id INT NOT NULL,
    workspace_role VARCHAR(20) DEFAULT 'member', -- scoped to this workspace
    joined_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (workspace_id) REFERENCES workspaces(id) ON DELETE CASCADE,
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE,
    UNIQUE(workspace_id, user_id)
);

-- Projects
CREATE TABLE projects (
    id SERIAL PRIMARY KEY,
    workspace_id INT NOT NULL,
    name VARCHAR(200) NOT NULL,
    description TEXT,
    status VARCHAR(20) DEFAULT 'planning',
    start_date DATE,
    due_date DATE,
    created_by INT NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (workspace_id) REFERENCES workspaces(id) ON DELETE CASCADE,
    FOREIGN KEY (created_by) REFERENCES users(id) ON DELETE RESTRICT
);

-- Tasks
CREATE TABLE tasks (
    id SERIAL PRIMARY KEY,
    project_id INT NOT NULL,
    title VARCHAR(200) NOT NULL,
    description TEXT,
    status VARCHAR(20) DEFAULT 'todo',
    priority VARCHAR(20) DEFAULT 'medium',
    assigned_to INT,
    created_by INT NOT NULL,
    due_date DATE,
    estimated_hours DECIMAL(5,2),
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE,
    FOREIGN KEY (assigned_to) REFERENCES users(id) ON DELETE SET NULL,
    FOREIGN KEY (created_by) REFERENCES users(id) ON DELETE RESTRICT
);

-- Documents
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    project_id INT NOT NULL,
    title VARCHAR(200) NOT NULL,
    content TEXT,
    document_type VARCHAR(50),
    is_ai_generated BOOLEAN DEFAULT FALSE,
    created_by INT NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE,
    FOREIGN KEY (created_by) REFERENCES users(id) ON DELETE RESTRICT
);

-- Comments (nullable FK approach instead of polymorphism)
CREATE TABLE comments (
    id SERIAL PRIMARY KEY,
    user_id INT NOT NULL,
    task_id INT,
    document_id INT,
    content TEXT NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE,
    FOREIGN KEY (task_id) REFERENCES tasks(id) ON DELETE CASCADE,
    FOREIGN KEY (document_id) REFERENCES documents(id) ON DELETE CASCADE,
    -- Ensure a comment targets exactly one entity
    CONSTRAINT chk_comment_target CHECK (
        (task_id IS NOT NULL AND document_id IS NULL) OR
        (task_id IS NULL AND document_id IS NOT NULL)
    )
);

-- Time tracking
CREATE TABLE time_entries (
    id SERIAL PRIMARY KEY,
    user_id INT NOT NULL,
    task_id INT NOT NULL,
    hours DECIMAL(5,2) NOT NULL,
    description TEXT,
    date DATE NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE,
    FOREIGN KEY (task_id) REFERENCES tasks(id) ON DELETE CASCADE
);

-- Indexes for query performance
CREATE INDEX idx_workspaces_owner ON workspaces(owner_id);
CREATE INDEX idx_projects_workspace ON projects(workspace_id);
CREATE INDEX idx_projects_status ON projects(status);
CREATE INDEX idx_tasks_project ON tasks(project_id);
CREATE INDEX idx_tasks_assigned ON tasks(assigned_to);
CREATE INDEX idx_tasks_status_priority ON tasks(status, priority);
CREATE INDEX idx_tasks_due_date ON tasks(due_date);
CREATE INDEX idx_documents_project ON documents(project_id);
CREATE INDEX idx_comments_task ON comments(task_id);
CREATE INDEX idx_comments_document ON comments(document_id);
CREATE INDEX idx_time_entries_user ON time_entries(user_id);
CREATE INDEX idx_time_entries_task ON time_entries(task_id);
CREATE INDEX idx_time_entries_date ON time_entries(date);

-- Auto-update updated_at
-- PostgreSQL doesn't have ON UPDATE CURRENT_TIMESTAMP like MySQL
CREATE OR REPLACE FUNCTION update_updated_at()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = NOW();
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_users_updated_at BEFORE UPDATE ON users
    FOR EACH ROW EXECUTE FUNCTION update_updated_at();
CREATE TRIGGER trg_workspaces_updated_at BEFORE UPDATE ON workspaces
    FOR EACH ROW EXECUTE FUNCTION update_updated_at();
CREATE TRIGGER trg_projects_updated_at BEFORE UPDATE ON projects
    FOR EACH ROW EXECUTE FUNCTION update_updated_at();
CREATE TRIGGER trg_tasks_updated_at BEFORE UPDATE ON tasks
    FOR EACH ROW EXECUTE FUNCTION update_updated_at();
CREATE TRIGGER trg_documents_updated_at BEFORE UPDATE ON documents
    FOR EACH ROW EXECUTE FUNCTION update_updated_at();
CREATE TRIGGER trg_comments_updated_at BEFORE UPDATE ON comments
    FOR EACH ROW EXECUTE FUNCTION update_updated_at();

A few things worth calling out:

Indexes are created separately. PostgreSQL doesn't support inline INDEX inside CREATE TABLE. Every foreign key you'll JOIN on, every column you'll filter by — index it. The read performance difference is orders of magnitude.

Every ON DELETE is explicit. CASCADE for ownership chains (delete workspace → delete projects → delete tasks). SET NULL for optional references. RESTRICT for "handle this manually first."

The updated_at trigger. You write one trigger function and attach it to every table that has updated_at. A few extra lines, but you only write it once.

VARCHAR sizes are intentional. Username gets 50 characters (sufficient for display names), email gets 255 (the email standard), content fields use TEXT (no practical limit). Don't just slap VARCHAR(255) on everything — think about what the column actually holds.

Step 9: Review and Test

Don't ship a schema you haven't queried. This is the step most people skip, and it's where the most important bugs get caught.

Run realistic queries

Write the queries your application will actually execute. If they're awkward to write, your schema needs work.

-- All projects in a workspace with task counts
SELECT p.name, p.status, COUNT(t.id) as task_count
FROM projects p
LEFT JOIN tasks t ON p.id = t.project_id
WHERE p.workspace_id = 1
GROUP BY p.id, p.name, p.status;

-- Overdue tasks for a specific user
SELECT t.title, t.due_date, p.name as project_name
FROM tasks t
JOIN projects p ON t.project_id = p.id
WHERE t.assigned_to = 5
  AND t.due_date < CURRENT_DATE
  AND t.status != 'done';

-- Total hours logged per team member in a project
SELECT u.username, SUM(te.hours) as total_hours
FROM time_entries te
JOIN users u ON te.user_id = u.id
JOIN tasks t ON te.task_id = t.id
WHERE t.project_id = 10
GROUP BY u.id, u.username
ORDER BY total_hours DESC;

-- Comments on a specific task
SELECT c.content, u.username, c.created_at
FROM comments c
JOIN users u ON c.user_id = u.id
WHERE c.task_id = 42
ORDER BY c.created_at;

Test referential integrity

Verify that your constraints actually protect you:

Try to create a task with a non-existent project_id — should fail
Delete a project and confirm its tasks are also deleted (CASCADE)
Delete a user and confirm their assigned tasks become unassigned (SET NULL)
Try to join a workspace twice — should fail due to UNIQUE constraint
Try to delete a user who created projects — RESTRICT should block it

Walk through real user flows

Don't just think in tables. Think in user stories:

Alice creates a workspace and invites Bob
Bob creates a project and adds tasks
Alice assigns a task to Bob
Bob logs time on the task
Alice comments on the task
The project is completed and archived

Can you execute this flow smoothly? Is the data consistent at every step? If queries become convoluted or require multiple round-trips, reconsider your structure.

Run EXPLAIN on your most common queries. If you see a full table scan on a table that'll grow large, you need an index.

9 Common Mistakes (and How to Avoid Them)

1. Using Reserved Words as Table Names

user, order, group, comment — all reserved in various databases. You'll get cryptic syntax errors and waste an hour figuring out why. Use plural names (users, orders) and you sidestep this entirely.

2. Using Business Data as Primary Keys

Email addresses change. Phone numbers get recycled. Usernames get updated. None of these are stable identifiers.

-- Don't do this
CREATE TABLE users (
    email VARCHAR(255) PRIMARY KEY,  -- Will break when email changes
    name VARCHAR(100)
);

-- Do this instead
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,  -- UNIQUE constraint, not PK
    name VARCHAR(100)
);

Use a surrogate key (SERIAL or UUID) and keep business data as regular columns with UNIQUE constraints.

3. Storing Multiple Values in One Column

This is the single most common mistake I see in beginner schemas:

-- Don't do this
CREATE TABLE users (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    skills VARCHAR(500) -- 'python, javascript, rust'
);

-- Do this instead
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100)
);

CREATE TABLE skills (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50) UNIQUE
);

CREATE TABLE user_skills (
    user_id INT,
    skill_id INT,
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE,
    FOREIGN KEY (skill_id) REFERENCES skills(id) ON DELETE CASCADE,
    PRIMARY KEY (user_id, skill_id)
);

If you ever need to query "find all users who know Rust", the comma-separated approach forces you into LIKE '%rust%' — which is slow, fragile, and will also match "trust" and "rustic".

4. Forgetting Foreign Keys

Without foreign keys, your database is just a collection of unrelated tables. Nothing stops you from inserting a task with project_id = 99999 that points to nothing. Then your JOINs return phantom data and you spend a weekend debugging ghost records.

5. No ON DELETE Strategy

If you don't specify what happens when a referenced row is deleted, the default is to block the deletion entirely. That's fine — until someone tries to delete a user and gets a cryptic constraint violation with no idea which of 15 tables is blocking them. Be explicit about every foreign key's delete behavior.

6. Inconsistent Naming

Pick one convention. Stick to it. snake_case for everything:

Table names: workspace_members, time_entries
Column names: created_at, workspace_id, assigned_to
Foreign keys: {referenced_table_singular}_id (e.g., user_id, project_id)

Mixing studentID, CourseId, and enrollment_date in the same schema tells everyone you're winging it.

7. Mixing Unrelated Concepts in One Table

Each table should represent one entity. Don't create a user_and_settings table that mixes profile data with subscription details:

-- Don't do this
CREATE TABLE user_and_settings (
    id INT PRIMARY KEY,
    username VARCHAR(100),
    email VARCHAR(255),
    theme VARCHAR(20),
    notifications_enabled BOOLEAN,
    payment_method VARCHAR(50),
    subscription_end_date DATE
);

-- Separate concerns into focused tables
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    username VARCHAR(100),
    email VARCHAR(255)
);

CREATE TABLE user_settings (
    user_id INT PRIMARY KEY,
    theme VARCHAR(20),
    notifications_enabled BOOLEAN,
    FOREIGN KEY (user_id) REFERENCES users(id)
);

CREATE TABLE subscriptions (
    id SERIAL PRIMARY KEY,
    user_id INT UNIQUE,
    payment_method VARCHAR(50),
    end_date DATE,
    FOREIGN KEY (user_id) REFERENCES users(id)
);

8. Premature Over-Engineering

Don't build for 10 million users on day one. You don't need sharding, read replicas, or a polymorphic comment system for your MVP. Start simple. Add complexity when profiling shows actual performance problems — the bottleneck is rarely where you expect it to be.

9. Not Testing with Real Queries

If you design a schema in isolation and only test it when the app is half-built, you'll discover problems too late. Write your most common queries before finalizing the schema. If they're awkward to write or require 5 JOINs for a simple feature, your design needs work.

Testing with five perfect rows reveals nothing. Load real-ish data. Test deletion cascades.

Practice Exercises

If you want to solidify this, try designing schemas for these. Don't just think about it — actually write the CREATE TABLE statements. That's where the real learning happens.

Hotel Booking System — Hotels, rooms (types: single/double/suite), guests, bookings, payments. How do you prevent double-booking a room for overlapping dates?
Food Delivery App — Restaurants, menus, customers, orders with line items, drivers, payments. What if a menu item price changes after someone ordered it?
Fitness Tracker — Users, workouts, exercises (sets/reps/weight), workout plans. How do you query "what was my bench press max last month"?
Job Board — Companies, postings, applicants, applications, saved jobs. An application goes through stages (applied → screening → interview → offer → hired/rejected). How do you model that state machine?
Library System — Books, authors (M:N), members, borrowing records, fines. How do you handle multiple copies of the same book?

Wrapping Up

Database design looks simple on paper but requires practice to get right. The good news: the process is always the same.

Understand the problem
Find your entities
Define attributes
Map relationships
Sketch an ERD
Set up keys
Normalize
Write the schema
Test with real queries

The biggest lesson I've learned: draw before you code, and query before you ship. Fixing a relationship on a whiteboard takes 30 seconds. Fixing it in production with real data takes a weekend and a lot of stress.

The first design won't be perfect. That's expected. Build it, test it, discover the issues, and refine. This iterative approach works better than trying to get it perfect upfront.

More Readings: https://junaidshaukat.com/blog

DEV Community