DEV Community

Adam McNeilly
Adam McNeilly

Posted on

Effective Database Design: Part 2

In the last post, we walked through the importance of proper database design, and learned about normalization - a process we can follow to ensure data integrity in our databases.

Normalization is achieved by making sure your database meets a number of normal forms. Each normal form builds upon the last, and each one makes your data a little better. In this post, we'll discuss how to achieve the first normal form (1NF).

First Normal Form

To ensure that a table meets 1NF, we need three things:

  • Each record is represented by a unique primary key.
  • Each column has only atomic (single) values.
  • There are no repeating groups.

I'd like to explain each bullet by starting with a table that is not well designed, and explain three steps we can take to achieve 1NF. Let's create a table that records a student, their year, and the classes they're taking.

name year classes
Adam Freshman Math, Programming
Prince Freshman History

Primary Key

Each record in a database table should have a unique identifier called a primary key. In many cases, this will be a number. In some cases, though, a string could suffice (think of a user name, or social security number - these should be unique among people).

A primary key is often one column, but does not have to be so. Two columns together could be the unique identifier for a record.

Our table so far has no unique identifier in any way. A student's name is not unique, nor is the combination of (name, year), nor is the combination of (name, year, classes). To resolve this, we can just add a unique id:

id name year classes
1 Adam Freshman Math, Programming
2 Prince Freshman History

Atomic Values

When we talk about atomic values, what we mean is that it can not be divided any further. In plain English, this means we shouldn't have a list of values in a column. Thus, our classes column does not comply with 1NF, because we shouldn't have multiple values in one cell.

The quick solution to this, is to break them out into their own column.

id name year class_one class_two
1 Adam Freshman Math Programming
2 Prince Freshman History null

However, we've created a problem here.

Repeating Groups

A table should not have a repeating group of columns. What this means is, we should not have class_one, class_two, ..., class_n. The reason this is bad is because we would end up with a lot of null values for students who don't take N classes, and if there ended up being a student with more than N classes, we'd have to modify the table to add a new column.

The quick solution to this, is to treat class as one column and add a new row for each time a student takes a class:

name year class
Adam Freshman Math
Adam Freshman Programming
Prince Freshman History

Further Improvements

This design does comply with 1NF. This is, though, the exact same table we had in the first post, that I explained was not well designed due to data redundancy and integrity issues.

Another key concept of database design is ensuring each table only represents a specific entity. Here, we have the concepts of students and of classes together in one table. Let's break them out into separate tables:

A students table:

id name year
1 Adam Freshman
2 Prince Freshman

A classes table:

id class_name
1 Math
2 Programming
3 History

The relationship between them can be captured separately, in a students_classes table:

student_id class_id
1 1
1 2
2 3

In this table, the primary key is a composition of both (student_id, class_id) together.

I hope you found this helpful! The 1NF is an important step toward database normalization, and even making these few quick updates we have data that is far better than we started with. There's no redundant data, and we aren't at risk of any data anomalies.

In the next post we'll continue to add on to this database and make sure it complies with the 2NF.

Top comments (8)

Collapse
 
dannida profile image
dannida

very simple and concise.... but why do we need the third table?

Collapse
 
adammc331 profile image
Adam McNeilly

The third table is responsible for holding the connection between students and a class.

Here it is visually if that helps:

Collapse
 
dannida profile image
dannida

Thank u... That was helpful...looking forward to part 3

Thread Thread
 
adammc331 profile image
Adam McNeilly

I just published it. :) dev.to/adammc331/effective-databas...

Collapse
 
gozippy profile image
GoZippy

So where's the Next?

Collapse
 
adammc331 profile image
Adam McNeilly

I'll post it this afternoon. :)

Collapse
 
gozippy profile image
GoZippy

Yippie

Collapse
 
dhatguy profile image
Joseph Odunsi

Is the third table really necessary?