Normal Forms

Today I want to talk about ways to improve a database in terms of normalization. Normalization is a way of restructuring your database to help prevent data redundancy and streamlines your data.

For this post, we are going to look at the first two forms of normalization. The First Normal Form (1NF) is whenever data is stored in tables with rows that are all uniquely identified by the primary key, when the data in each table is in its most reduced form, and there are no repeating groups. The Second Form (2NF) requires everything from 1NF along with having no in-table dependencies between the columns in each table.

How this relates to my data

With normalization, we can make the data from the COVID risk chart streamlined and less redundant. In the picture provided, we can see I am in violation of 1NF. This is due to my inclusion of the "Underlying Condition" column in the "Risk" table. The problem is that the underlying condition is already mentioned in the "Person" table, making it so we have duplicated data.

To make sure this works with 1NF, have to remove the "Underlying Conditions" column in the "Risk" table.

With 2NF, there is a possible problem in the "Behavior" table. The problem is how there are two columns that mention whether or not someone has traveled. To fix this, we can remove the "Travel Destination" column because it does not play a major factor in assessing someone's risk.