In this blog post I chose the Database admin centric option to identify and explain different normal forms of data in my SQL database.
The most important question to ask when it comes to Database Normalization and normal forms in general, is what do those words mean? Database normalization is a technique to organize data in a database by reducing duplicated data (redundancy) and ensuring only relevant data is stored. Each of these stages of organization are called normal forms, with each form building upon the one before it to ensure the database is normalized.
While there are six levels of normalization and normal forms, typically only the first three are necessary for most applications, and in this blog post we will primarily be looking at the First and Second Normal Forms.
The requirements for the First Normal Form (1NF) are:
- Data is stored in tables with rows uniquely identified by a primary key
- Data within each table is stored in individual columns in its most reduced form
- There are no repeating or redundant columns
Looking at the COVID-19 database we have been building over the past few posts, we can see the table GeographicLocation is in violation of the first normal form because of the Country column. As our primary key for a GeographicLocation is the ZipCode, the Country for our GeographicLocation will always be USA, making the Country column in the table redundant.
After removing the Country column, our GeographicLocation table satisfies the requirements for 1NF.
The Second Normal Form has all the same requirements for 1NF, along with the additional requirement that only data related to a table’s primary key.
Looking at the table Hospital, we can see it is in violation of the requirement for 2NF because the Population column is not related to the Hospital’s primary key of HospitalID, but rather the foreign key of a GeographicLocation’s ZipCode. As the population of a GeographicLocation changes, it would be very easy to forget to update the Population value in the Hospital table because of its indirect relationship with the primary key. To solve this problem, we need to move the Population column to the GeographicLocation table where it is directly associated with the primary key of ZipCode.
Below is a video where I walk through the things I discussed in this post, as well as show the SQL commands I used to normalize and clean up my COVID-19 database. Thanks for reading/watching!
Top comments (0)