5 Database Design Mistakes I Keep Seeing (And How to Catch Them Early)

#database #sql #webdev #dataengineering

I've reviewed a lot of database schemas over the years — my own, my teammates', inherited ones from projects I joined mid-flight. Some patterns keep showing up. Not the dramatic "we lost all our data" kind of mistakes, but the quiet ones that make every future feature harder to build, every query slower than it should be, and every migration feel like defusing a bomb.
Here are five I see most often, why they hurt, and how to catch them before they reach production.

1. Missing Indexes on Foreign Keys

This is the most common performance mistake in database design, and it's invisible until your tables have real data.
When you add a foreign key from orders.user_id to users.id, most databases create an index on the referenced column (users.id — usually the primary key, already indexed). But they don't automatically create an index on the referencing column (orders.user_id). MySQL is the exception — it creates one implicitly. PostgreSQL, SQL Server, SQLite, and MariaDB do not.
This means every query that joins orders to users, every ON DELETE CASCADE check, and every lookup like SELECT * FROM orders WHERE user_id = ? is doing a full table scan on orders.user_id. With 1,000 rows, you'll never notice. With 1,000,000 rows, your API response times go from 20ms to 2 seconds.
How to catch it: Any time you define a foreign key, ask yourself: "Will I query by this column?" The answer is almost always yes. Add the index. Make it a habit so automatic you don't even think about it.
The tricky part is that this mistake is nearly invisible in a text-based schema. In a Prisma schema or raw DDL file, your foreign key definition and your index definitions are often far apart — sometimes dozens of lines separated. You can easily have a perfectly correct REFERENCES users(id) without a corresponding CREATE INDEX and never notice the gap.
This is one of the reasons I started designing schemas visually before writing any ORM code. When you can see a table's columns, constraints, and indexes all in one place, a foreign key column without an index is an obvious gap rather than something buried in a text file. In Relmic, every table's indexes are visible right alongside its columns and relationships, so this kind of omission is immediately apparent.

2. Using the Wrong Primary Key Type

Not all primary keys are created equal, and the choice has long-term consequences that are painful to change later.
The classic debate is auto-incrementing integers vs UUIDs, but the real mistakes I see are more subtle:
Using INT instead of BIGINT for auto-incrementing keys. A 32-bit integer maxes out at ~2.1 billion. That sounds like a lot until you realize that a high-traffic events table, a logging table, or any table with frequent inserts and deletes can burn through IDs faster than expected. Changing a primary key type on a table with millions of rows and multiple foreign key references is one of the most painful migrations in existence.
Using UUIDs as primary keys without understanding the performance trade-off. UUIDv4 values are random, which means B-tree index insertions are scattered across the index rather than appending to the end. On write-heavy tables, this causes page splits and index fragmentation. PostgreSQL's gen_random_uuid() produces v4 UUIDs by default. If you need UUIDs but also need sequential-ish insert performance, UUIDv7 (time-ordered) is the better choice — but not all databases support it natively yet.
Using VARCHAR for keys that are always fixed-length. If your key is always exactly 26 characters (like a ULID) or 36 characters (like a UUID string), use CHAR(26) or a native UUID type rather than VARCHAR(255). The storage and comparison performance difference is real at scale.
How to catch it: Before you finalize any schema, explicitly review every primary key and ask: what's the expected row count in 1 year? 5 years? Will this key be exposed in URLs or APIs? Will other services reference it? These questions should drive the type choice, not just convenience or habit.

3. Circular Dependencies Between Tables

This one creeps in gradually and rarely shows up as an error — until it blocks your migrations.
Here's how it typically happens. You have a users table and a teams table. A user belongs to a team (users.team_id → teams.id). Then someone adds a "team owner" feature: teams.owner_id → users.id. Now users depends on teams and teams depends on users. Circular dependency.
In application code, this works fine — your ORM handles it. But at the database level, it causes real problems:

You can't insert a user without a team, and you can't insert a team without a user (the owner). One of the foreign keys needs to be nullable or deferred, which weakens your data integrity.
Migration ordering becomes fragile. If you're generating DDL, which CREATE TABLE comes first?
ON DELETE CASCADE chains can create confusing or dangerous cascading delete behavior.
Backups and restores can fail if the database tries to enforce foreign keys during data load.

How to catch it: This is genuinely hard to spot in text. You need to trace the foreign key chain across multiple table definitions and hold the graph in your head. With three tables it's manageable. With twenty, it's nearly impossible to spot visually just by reading DDL.
This is another case where a visual schema editor pays for itself. When you can see every table and every relationship as lines on a canvas, a circular dependency literally looks like a loop — arrows going from A to B and back from B to A. You see it instantly and can decide how to break the cycle (nullable FK, a junction table, or restructuring the relationship) before you ever run a migration.

4. Inconsistent Naming Conventions

This mistake doesn't cause bugs. It causes confusion, and confusion causes bugs.
The most common inconsistencies I see:
Mixed singular and plural table names. user vs orders vs product_category. Pick one convention and enforce it across every table. Most ORMs have opinions here (Prisma uses singular model User, Rails uses plural users table), but whatever you choose, be consistent.
Mixed casing styles. userId vs user_id vs UserID in the same schema. PostgreSQL silently lowercases unquoted identifiers, so userId becomes userid — which leads to confusion when your ORM generates queries with quoted identifiers that don't match.
Ambiguous column names. A column called status tells you nothing without context. order_status with an ENUM or CHECK constraint is self-documenting. Similarly, type is almost always ambiguous — account_type or subscription_type is clearer.
Inconsistent foreign key naming. Some columns use user_id, others use author (when it references users.id), others use created_by. If a column references users.id, call it user_id or at minimum something that clearly indicates the referenced table.
How to catch it: This is one of the few mistakes that's actually easier to catch in a visual view than in text. When you see all your table names and column names displayed side by side, inconsistencies in naming patterns jump out. A table called UserProfile next to one called order_items is immediately jarring in a visual layout where you might read past it in a text file.

5. Premature Denormalization (or Over-Normalization)

Both extremes cause pain, but I see premature denormalization more often in modern application development.
Premature denormalization usually looks like storing computed or derived data as columns "for performance" before you've measured whether there's actually a performance problem. A common example: adding a post_count column to users instead of running SELECT COUNT(*) FROM posts WHERE user_id = ?. Now you have two sources of truth for the same data, and you need to keep them in sync — through triggers, application logic, or both. Every bug in that sync logic means your data lies to you.
The right time to denormalize is when you've measured a real performance problem, you've confirmed that an index won't solve it, and you've accepted the complexity of maintaining the redundant data. Until all three are true, normalize.
Over-normalization is the opposite extreme and tends to show up in schemas designed by people who learned database design from textbooks. When you have a user_settings table with a single row per user that's always loaded alongside the users row, you don't need a separate table — just add the columns to users. When you have a countries lookup table that contains 195 rows that never change, consider whether it's worth the join overhead or whether an enum or check constraint serves you better.
How to catch it: Before you add any column that stores derived data, write the query that computes it. Run EXPLAIN ANALYZE. If it's fast enough with an index, skip the denormalization. If it's genuinely slow, denormalize — but document why, because six months from now someone will look at the redundant column and wonder if it's safe to remove.
For over-normalization, the signal is a schema where every lookup is its own table. If you find yourself joining five tables to render a single form, and three of those tables have fewer than 20 rows that never change, you've probably over-normalized.

The Common Thread

Looking at these five mistakes, there's a pattern: most of them are hard to see when your schema lives in a text file. Missing indexes hide between separated definitions. Circular dependencies require tracing foreign keys across multiple tables. Naming inconsistencies are easy to skim past. The shape of your data model — its relationships, its bottlenecks, its structural problems — is fundamentally visual information that doesn't communicate well in a linear text format.
That's why I build my schemas visually before writing any ORM code or DDL. I use Relmic — it lets you design your schema as tables on a canvas with real database types, real constraints, and real relationships, then export to whatever format you need (SQL DDL, Prisma, SQLAlchemy, TypeScript, JSON). But regardless of what tool you use, the practice of looking at your schema as a visual artifact rather than a text file will catch problems that no amount of code review will surface.
Design first. Generate second. Debug never.

What database design mistakes have you seen most often? I'd love to hear your war stories in the comments.