Adam McNeilly

Posted on Dec 3, 2018

Effective Database Design: Part 1

#database

Databases, while something I don't work with every day, have always been an interest of mine. I used to spend a lot of time on the StackOverflow MySQL tag just for fun. I've also spent a lot of time discussing database design with my roommate, and noticed a reoccurring theme among these conversations - we spent a lot of time discussing how to structure the database.

Questions that usually arise are "how should I organize this table" or "how do I handle a relationship between these two entities?" These questions are very important, because if we don't design our database properly, we can end up with something that will not be easily maintainable in the long run.

Let's begin with an example of an improperly designed table, and the risks we face as a result. Consider being asked to design a database table to record a student's name, year, and the classes they're taking. We may end up with one like this:

name	year	class
Adam	Freshman	Math
Adam	Freshman	Programming
Prince	Freshman	History

This table meets the requirements listed, but has a couple problems. This table has redundant data, and also lacks integrity. These two concepts are essential to proper database design.

Data Redundancy

Data is redundant if it appears more than once. I'm not referring to a specific word or identifier appearing multiple times. In the example above, there are two rows that say Adam is a freshman. This is redundant data that doesn't need to be captured twice, or even a third time if Adam took another class.

Data Integrity

Lifting an explanation from Wikipedia, "data integrity is the maintenance of, and the assurance of the accuracy and consistency of, data over its entire life-cycle."

In simple terms, I should be able to update, insert, and remove data without any negative effects on the other data. Consider our students table in the last section. It has the possibility for three data anomalies:

Update Anomaly - If I updated row 2 to modify the year column to be "Sophomore", I would have one record where Adam is a freshman and one where he is a sophomore. This inconsistency would not allow me to determine which row is correct.
Insertion Anomaly - If I wanted to insert just a new student, I couldn't do so. At least, not without setting the "class" column to null. Similarly, I can't add a record for a class without a corresponding student, and year. This limitation prevents us from properly recording data that may otherwise be important to us.
Deletion Anomaly - If I were to delete row 3, I have two problems. I've lost any record of Prince being a student, and I've lost any record of History being a class. This is a problem because I lose two pieces of information, even if I only want to delete one of them.

Normalization

So how do we design our databases to avoid these two problems? The answer is normalization. Database normalization is a process of organizing a database to avoid redundant data and ensure data integrity. A normalized database would not be at risk of these insert, update, and delete anomalies.

Normalization isn't defined by one single criterion. In other words, you don't just say "if we have X, then our database is normalized."

Instead, normalization is defined through a series of normal forms, each one building upon the last to ensure data integrity and avoid redundancy. There are several normal forms, and throughout this series we are going to work through three of them: first normal form (1NF), second normal form (2NF), third normal form (3NF).

In the next post we'll learn about this first normal form and how to design tables that comply with 1NF.

Top comments (5)

Vince Ramces Oliveros • Dec 4 '18

Great article, I can't wait for the next post. I am already familiar with RDBMS. Perhaps you can post relationships, PK and FK, and 1F to 3F(4F and above are for advance).

Adam McNeilly • Dec 4 '18

Thanks! The next posts are going to cover 1NF to 3NF. I'll post one each day this week.

I didn't plan a post specifically on relationships and keys, but maybe I'll isolate those after these are done. :)

Vince Ramces Oliveros • Dec 4 '18

Don't you think relationships are the main reason why Normalization exist? To connect those normalized table.

Adam McNeilly • Dec 4 '18

Sure, and relationships are mentioned in the later posts, they're just not the primary focus.

If at the end of the series you think there's additional ground to cover, let me know, I'd be happy to keep it going.

Liz • May 3 '20

Hi Adam,
Is there a book you can recommend on the topic? Really great posts.

DEV Community

Effective Database Design: Part 1

Data Redundancy

Data Integrity

Normalization

Top comments (5)

Read next

Real-Time Processing of Big Data: Tools and Best Practices

Why Running Databases on Kubernetes is a Recipe for Disaster: The Case for a New Platform Designed for Stateful Workloads

Persistent Storage vs. Cache Storage: Understanding the Differences and Use Cases

Exploring InstantDB: A New Approach to Local-First Databases