DEV Community

Daragh Walsh
Daragh Walsh

Posted on

Database Normalisation (1st-4th N.F.) using an example

 Introduction

So we have just been hired as a database administrator for a car dealership. Currently they operate with only one table, which is below. I have also included some entries that exist in this table.

Initial Database Schema

Initial Database Schema

CustomerName is the name of the customer (e.g. Alan Smith).

Purchase is what the customer bought (e.g. Corolla).

Address is the address of the customer (e.g. 2 college road).

Newsfeed is what kind of marketing information the customer wishes to receive (e.g. a booklet on Toyota car offers).

Supplier is the brand which supplies the car the customer purchases (e.g. Toyota)

SupplierEmail is the contact email address for the Supplier (e.g. toyota@toyota.inc)

Price is the selling price of the car

Example Entries

Example of data entered in initial database

1st Normalisation Form

For the database to be in the 1st Normalisation Form we must satisfy the following conditions:

  1. Each cell to be single valued
  2. Entries in the column(s) is/are to be of the same type
  3. Rows are to be uniquely identified

The current iteration of the table has many flaws. For example when you view the above sample data we cannot determine whether “Alan Smith” is the two different people or one person who bought two cars. We also have comma-separated values and this is very messy to deal with.

The first thing we will need to do is add or assign a column as the primary key and no field in the current instance of the database will satisfy the conditions of a primary key. So we will add a CustomerID field which will solve the person identification issue. Once we have made that change our table schema will look as follows:

Addition of primary key to initial database

And now when we look below at the data we now see that it was in fact two /different/ people named “Alan Smith” who purchased a car from us. As a result of this change we have also removed the issue of comma-separated values and each cell is now contains a single value.

Data with new Customer ID field

2nd Normalisation Form

For the database to be in the 2nd Normalisation Form we must satisfy the following conditions:

  1. All attributes (columns that are non-key) must be dependent on the key

We must look at each column in the table and ask ourselves, is this dependent on the CustomerID? When we take a closer look we can see that the CustomerName, Address and Newsfeed are dependant. Just ask yourself, “Do I need the primary key to associate this attribute with the correct entry?”. If the answer is yes, then keep it in the table.

Now when your answer is no then you must separate this attribute out into another table. When we separate out our values we now have the following two tables:

Separation of data into purchases and customer tables

Note: for clarity sake, from now on I will highlight the primary key in orange

The issue now is that we have lost our transactions! Well, for that we will want to create a junction (or associative) table. Now for anyone who doesn’t know what a junction table is don’t worry. It is just the name given to a table that maps two or more tables together by referencing the primary keys of each table. The entries are known as compound keys. See what it looks like below for more clarity.

Junction Table containing customer ID and purchase keys

3rd Normalisation Form

For the database to be in the 3rd Normalisation Form we must satisfy the following conditions:

  1. All fields (columns) can be determined only by the key in the table (and not any other column).

Purchases and Customer ID tables

Now if we look at the table on the right (the purchases table) you will notice that the Supplier and the SupplierEmail will always be the same (i.e. the Toyota contact email will never be matched to the Renault supplier). As you add more sales of different cars you will notice that you would get duplicate data for these fields.

Now let’s say we got a new rep for Renault and she contacted us to say that we are to update the email address for Renault to their address. We would have to go through all the entries in the Purchases database and update the SupplierEmail field. This is not acceptable to us! What we will do is pull out the Supplier and SupplierEmail and place it in a different table. We will then reference this table via the use of a foreign key. A foreign key is simply an entry that references the primary key of another table.

Purchase table with a foreign key of supplier
Note: the foreign key is in blue, which now references the new supplier table

Supplier Table

We have removed the redundancy and can update the contact information in one place.

4th Normalisation Form

For the database to be in the 4th Normalisation Form we must satisfy the following conditions:

  1. No multi-valued dependencies

So a multi-valued dependency exists in a table when two attributes in the table are independent entries, but are both dependent on a third attribute.

Alt Text

This can be seen above, CustomerName, Address and Newsfeed all depend on the CustomerID. Now what happens when a customer wants to unsubscribe from receiving the monthly or quarterly brochures from their preferred car brand(s)? Do we place a /null/ in the Newsfeed column? No, this would be inefficient and just a mess to maintain. Instead we will move the Newsfeed attribute to a new table.

Alt Text

Final Database Schema

Alt Text

So in the end we arrive at this structure. These tables all adhere to the various conditions and will result in better data management and maintenance for you (the database administrator).

Thanks for taking the time to read this, let me know what thought of it! Or inform me of any changes I could make to the post.

Top comments (0)