Hi! The abyss has been quite intense these weeks, but we're back with a new entry. I remember when I was working in the database design of my first position a year and a half ago. The amount of things I was blundering: bad practices, poor naming conventions... I guess that's how we all start and learn at the beginning.
I've been working on database design at a startup, and this time, it feels like the system is no longer against me. It's all about best practices, scalability, and clear conventions. But this didn't happen suddenly; it took countless mistakes, hours, and pain.
I've gone through the process of learning the best practices on my own and ensuring my schemas can actually scale, so you don't have to go through the same painful process. Here are the 10 database design lessons I learned the hard way.
1. Avoid overusing enums
An enum
(enumeration) is a data type that lets you define a list of possible string values a column can hold. The database then enforces that the data inserted into that column must be one of those values. They are great for small, unchanging sets of values but they're a pain to update because it requires a schema migration.
Lookup tables are separate tables that contain a list of values, and your main table references them using a foreign key. This makes the values easy to manage, add, or remove without changing the main table's schema.
Simple Enum
in PostgreSQL:
CREATE TYPE user_status_enum AS ENUM ('active', 'inactive', 'suspended');
CREATE TABLE users (
id SERIAL PRIMARY KEY,
username VARCHAR(50) NOT NULL,
status user_status_enum NOT NULL
);
Scalable Lookup Table Schema:
CREATE TABLE post_categories (
id SERIAL PRIMARY KEY,
name VARCHAR(50) UNIQUE NOT NULL
);
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
title VARCHAR(255) NOT NULL,
category_id INTEGER REFERENCES post_categories(id)
);
Backend Dictionaries
These are data structures in your application code that hold a list of valid values for a specific field.
// This data structure is dynamically populated from the database
interface CategoryDictionary {
[categoryName: string]: number;
}
const CATEGORIES: CategoryDictionary = {
'Tech': 1,
'Music': 2,
'Travel': 3
};
// Example usage
const categoryId = CATEGORIES['Tech']; // Returns 1
const isValidCategory = !!CATEGORIES[categoryName]; // Checks if a category exists
While they can be a useful tool for validation, simply using a static array of strings in the database isn't a robust solution on its own. It directly violates the principles of database normalization by creating a redundancy between your application and your database schema.
If the list of valid values changes, you have to update your code and potentially your database, leading to a disconnect between the two sources of truth. A more normalized approach is to use these dictionaries alongside a lookup table. The dictionary in your backend can be populated from the lookup table in your database on application startup. This way, the database remains the single source of truth for the list of values, and your application always has the most up-to-date data for validation.
2. Design for the Business, Not for Convenience
This principle is about creating a database schema that accurately models the real-world business concepts, rather than what's easiest for your code. A DATE
data type, for instance, represents a full calendar date. While a developer might find it convenient to store a birthday as three separate integers for year, month, and day, the business understands a birthday as a single date. Using the correct data type ensures your database reflects the truth of the business.
Bad Design:
CREATE TABLE user_birthdays (
user_id SERIAL PRIMARY KEY,
birth_month INTEGER,
birth_day INTEGER,
birth_year INTEGER
);
Good Design:
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
birthday DATE
);
3. Consistency Matters
Consistency means using the same naming conventions, data types, and structural patterns throughout your entire database. For example, if you choose snake_case
(e.g., user_id
), you should use it everywhere. Inconsistent naming like user_id
in one table and userId
in another leads to confusion and bugs.
Inconsistent Naming (Bad):
CREATE TABLE users (
user_id SERIAL PRIMARY KEY
);
CREATE TABLE user_posts (
post_id SERIAL PRIMARY KEY,
userId INTEGER
);
Consistent Naming (Good):
CREATE TABLE users (
id SERIAL PRIMARY KEY
);
CREATE TABLE user_posts (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id)
);
4. Everything Should Serve a Purpose
Every table, column, and index in your database should exist for a specific, current business requirement. Over-engineering is the act of building features or adding complexity that isn't currently needed, often with the thought of "just in case." This adds unnecessary complexity and can harm performance. The goal is to keep the database as simple as possible while meeting all requirements.
Over-engineered (Bad):
CREATE TABLE products (
id SERIAL PRIMARY KEY,
shipping_notes TEXT,
supplier_id INTEGER,
is_on_sale BOOLEAN,
sale_price DECIMAL(10, 2)
);
If your business only has one supplier and products are never on sale, these columns are just clutter. Don't try to cover all possible future cases when a feature hasn't even been requested yet.
5. Don't Let the Database Handle Authentication
Your database is a data store, not a security server. Don't rely on features like PostgreSQL's Row-Level Security (RLS) for core authentication, please. It sounds easy to do, but youre introducing provider-lock in. What if you change the database youre working with in the future? Thats why all authentication and password hashing should be handled in your application layer.
Example RLS Policy (to show what to avoid):
-- Now your app's security is coupled to the DB...
CREATE POLICY user_access ON user_posts
FOR SELECT
USING (user_id = current_setting('app.user_id')::integer);
6. The Database Shouldn't Handle Business Logic Validations
Validation is the process of ensuring data is correct and meaningful. The database should only perform basic validations like ensuring a field is not null or that a value is unique. Complex business logic, like checking if a number is between 1 and 12 for a month, should be handled by your application code. This separation of concerns keeps your database clean and your application flexible.
Application-level validation (Better):
// In your backend code
if (month < 1 || month > 12) {
// Error handling
}
7. Separate Staging and Production Databases
Production is the public-facing environment. Staging is a pre-production environment used to test new features. You should never work directly on your production database. Mistakes are inevitable. Having separate databases creates a safety net, so you don't accidentally delete real customer data. I recommend you using a product like Hashicorp Vault to properly save the credentials and URLs to work with these environments.
8. Never Take Scalability for Granted
Scalability is the ability of a system to handle a growing amount of work. Its a mindset you must adopt from the beginning. A key part of this is understanding the impact of your design choices. An index is a data structure that improves the speed of data retrieval operations on a database table. Forgetting to add them to foreign keys is a common mistake that can lead to major performance issues as your data grows.
Adding an Index
CREATE TABLE user_posts (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id)
);
CREATE INDEX idx_user_posts_user_id ON user_posts (user_id);
Think About the Long Term
Always build your tables with the expectation that there will be changes and new features.
Spoiler: There will always be changes.
This means you should design for the long term. A critical part of this is adding a new column. To avoid database conflicts during a migration, a new column must be nullable or have a default value. This prevents live applications from failing when they try to insert new data. While a nullable column is a common and necessary workaround, using a default value is often a better practice from a data integrity standpoint, as it avoids nulls and maintains a cleaner dataset.
Document Your Schemas
In a startup, your data structures will evolve quickly. When using flexible data types like JSON
or JSONB
to store semi-structured data, it's easy to lose track of the schema. That's why you should document your JSON/JSONB schemas and structures through a formal specification like OpenAPI. This ensures that your entire team and any future consumers of your API understand the data format, preventing errors and ensuring consistency.
9. Use the Right Data Structures
Choosing the correct data type (e.g., BOOLEAN
, DATE
, JSONB
) for a column is important for performance and clarity. JSONB
is a PostgreSQL data type that stores JSON data in a binary format which allows you to index and query it efficiently. Its perfect for semi-structured data where the schema might be flexible.
Example with JSONB
:
CREATE TABLE social_media_posts (
id UUID PRIMARY KEY,
metadata JSONB
);
INSERT INTO social_media_posts (id, metadata)
VALUES (
'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11',
'{"tags": ["newmusic"], "likes": 150}'
);
10. Always Choose UUIDs Over auto_increment
IDs
An auto_increment
ID is a simple integer that automatically increases with each new row. A UUID (Universally Unique Identifier) is a 128-bit number that is globally unique. UUIDs are essential for distributed systems because they can be generated anywhere, on a client, in a microservice; without a central authority to ensure uniqueness. This prevents ID conflicts and makes your system much more scalable and secure.
Why UUIDs are better:
No Conflicts in Distributed Systems: With
auto_increment
, two different services inserting data at the same time might generate the same ID. UUIDs solve this problem.Security & Data Leaks: Predictable, sequential IDs like
1, 2, 3...
are a major security risk. A malicious user can simply try incrementing IDs in an API endpoint (e.g.,/api/users/1
, then/api/users/2
) to scrape or access other users' data. With UUIDs, the IDs are random and impossible to guess, protecting against these types of attacks.
Example in PostgreSQL:
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(100) UNIQUE NOT NULL
);
-- The UUID is generated automatically
INSERT INTO users (email) VALUES ('janedoe@example.com');
Conclusion
Database design is a brutal teacher in software engineering. The most painful lessons aren't in a tutorial when you're developing; they're in production, where a small mistake can become a colossal monster. Don't be discouraged by your blunders or failures as they're the very thing that forges your mastery.
Think of it like Dark Souls. You dont get good by avoiding the bosses; you get good by facing them, learning their patterns, and finally beating them. The ten lessons in this post are your strategies. Now you're ready to face your own production monsters. I know I'll be facing more challenges, and I'm willing to keep sharing with you everything I learn on this journey.
See you in the next entry. Dont let the abyss consume you.
Top comments (1)
Totally felt this. At a previous startup we used a Postgres enum for order_status, and when “preorder” got added the type change blocked a zero-downtime deploy and left half the fleet on the old version. We rebuilt it as a lookup table and loaded a dictionary at app startup—no more deploy drama after that.