DEV Community

Cover image for Database Normalization with SQL

Posted on


Database Normalization with SQL

Hey👋 SQL enthusiast

This is a guide on Database Normalization and how to implement normalization using SQL, let's get started.

What is Database Normalization
Normalization is a technique used for evaluating and reorganizing table structures to minimize redundancies, improve data integrity, improve storage efficiency, and reduce the need to re-design the database if new data is introduced. Database normalization removes data inconsistencies which may cause the analysis of our data to be more complicated. These inconsistencies could come from updating records, inserting, or deleting records, which is known as update anomaly, insertion, and deletion anomaly, normalization also includes the removal of duplicate records which in turn saves storage.
The figure below shows the stages of Database normalization.

Database normalization steps from 1NF to 4NF

Let's take a look at the first normal form, what does it take for a database table to be in the first normal form. Here are the rules for a table to be in first normal form (1NF)

  • Each column should contain an atomic value
  • A column should contain data of the same data type eg (if a column contains an integer all the other elements in the column should be an integer)
  • Each column should have a unique column name
  • The order in which data are stored in the columns does not matter.

let's take a look at a table to better understand the concept of 1NF

Records Table Not Normalize
Records table

As you can see from the above table there are columns with duplicate elements in the database table, from the last row, the columns are (student major Std_major, courses title CTitle, and Instructor location Ilocation).
to solve this problem we are going to create a new table and insert only one atomic value into every column in the table.

Here is the code used to create the table and insert the element into the table from the original table.

--creating a new table for the schools Record 1NF
                                 Std_Name VARCHAR(50), 
                                 Std_address VARCHAR(50), 
                                 Std_Major VARCHAR(50), 
                                 CID VARCHAR NOT NULL, 
                                 CTitle VARCHAR(50), 
                                 Iname VARCHAR(50), 
                                 Ilocation VARCHAR(100), 
                                 Grade CHAR(5))

--Inserting into the Schools Record 1NF table
INSERT INTO Schools_Record_1NF(Std_ID, Std_Name, 
                               Std_address, Std_Major, 
                               CID, CTitle, Iname, 
                               Ilocation, Grade)

--Using Case statement to deal with columns with duplicate entries
SELECT Std_ID, Std_Name, Std_address,
-- The CASE Statement here checks the first entry in the column Using (SUBSTR AND INSTR)
        SUBSTR(Std_Major, 1, INSTR(Std_Major, ',')-1)='' THEN Std_Major
        ELSE SUBSTR(Std_Major,1,INSTR(Std_Major, ',')-1)
    END AS Std_Major,
        SUBSTR(CID,1,INSTR(CID, ' ')-1)='' THEN CID
        ELSE SUBSTR(CID,1,INSTR(CID, ' ')-1)
        SUBSTR(CTitle,1,INSTR(CTitle,',')-1)='' THEN CTitle
        ELSE SUBSTR(CTitle,1,INSTR(CTitle,',')-1)
    END AS CTitle,
        SUBSTR(Iname,1,INSTR(Iname,',')-1)='' THEN Iname
        ELSE SUBSTR(Iname,1,INSTR(Iname,',')-1)
    END AS Iname, 
        SUBSTR(Ilocation,1,INSTR(Ilocation, ',')-1)='' THEN Ilocation
        ELSE SUBSTR(Ilocation,1,INSTR(Ilocation, ',')-1)
    END AS Ilocation,
        SUBSTR(Grade,1,INSTR(Grade, ',')-1)='' THEN Grade
        ELSE SUBSTR(Grade,1,INSTR(Grade, ',')-1)
    END AS Grade
FROM School_Record

-- Using the union operator to join the first and the second entry of the table

SELECT Std_ID, Std_Name, Std_address, 
-- Using SUBSTR and INSTR to access the second entry inside the columns
    SUBSTR(Std_Major,INSTR(Std_Major,',')+1) As Std_Major,
    SUBSTR(CTitle, INSTR(CTitle, ',')+1) AS CTitle,
    SUBSTR(Iname, INSTR(Iname, ',')+1) AS Iname, 
    SUBSTR(Ilocation, INSTR(Ilocation, ',')+1) AS Ilocation, 
    SUBSTR(Grade, INSTR(Grade, ',')+1) AS Grade
FROM School_Record
-- Checking the columns with multiple entries
WHERE Std_Major LIKE "%,%" or CID LIKE "% %"
Enter fullscreen mode Exit fullscreen mode

Here is the new table after applying the above code

Records Table in 1NF
Table at 1NF

one new row is added to the table rather than having columns with non atomic values.

Next up let's move into the second normal form (2NF)

The Second Normal Form (2NF)

Table not in 2NF
Table not in 2NF

The above table is not in 2NF because not all the columns in the table are depending on the primary keys of the table. e.g. columns like the student name Std_name only depends on the student Id Std_ID column which is one part of the primary key since both Std_ID and CID are the primary keys in this case and also courses title only depends on courses Id CID without depending on student Id Std_ID which means we are having a partial functional dependency.

To Convert a table to Second Normal Form
The table must be in 1NF.

  • A table in 2NF should not have a partial functional dependency.

What is a partial functional dependency?
let's understand Functional dependency first
Dependency in a table is when the columns in the table are depending on a particular column like the primary key of a table.
Partial functional dependency is when a particular column is not full depending on all the primary key columns of the table (in this case our primary key is more than one column).
To implement the second normal form 2NF we will start by creating new tables one for the student information, instructor information, and another table for the student grade information.

--Creating the Tables for the Second Normal Form 2NF
CREATE TABLE Student_Info_2NF (
                               Std_ID VARCHAR(50) NOT NULL, 
                               Std_Name VARCHAR(50), 
                               Std_address VARCHAR(50),
                               Std_Major VARCHAR(50),
                               PRIMARY KEY(Std_ID));

CREATE TABLE Instructor_Info_2NF(
                                 CID VARCHAR NOT NULL, 
                                 CTitle VARCHAR(50), 
                                 Iname VARCHAR(50), 
                                 Ilocation VARCHAR(100),
                                 PRIMARY KEY(CID));

                            Std_ID VARCHAR(50) NOT NULL,
                            CID VARCHAR NOT NULL,
                            Grade CHAR(5),
                            PRIMARY KEY(Std_ID, CID),
                            FOREIGN KEY(Std_ID) REFERENCES Student_Info_2NF(Std_ID) ,
                            FOREIGN KEY(CID) REFERENCES Instructor_Info_2NF(CID));
Enter fullscreen mode Exit fullscreen mode

students Information table
student information table

Instructor information table
Instructor information table

students grade information table
student grade information table

Next we will be moving on to the 3NF

Third Normal Form 3NF

For Relation to be in 3NF the Relation First must be in 1NF and 2NF

  • They should be no transitive dependencies

Well, transitive dependency simply means that all other columns in the table should only depend on the PRIMARY KEY Column only and primary key alone.

Based on the tables we currently have,
there is only one table that is showing an issue which is the instructor information table because the Primary key here is the course id (CID) and the instructor location can be determined by the CID and it can also be determined by the instructor name Iname column, so to solve this problem we are going to create a new table for the Instructor information and another table with the courses information.

here is the code to create the table and insert the data into the table from the Instructor information table above.

CREATE TABLE Instructor_Info_3NF (
                                 Iname VARCHAR(50) NOT NULL,
                                 Ilocation VARCHAR(100),
                                 PRIMARY KEY (Iname));

CREATE TABLE Courses_Info_3NF(
                             CID VARCHAR NOT NULL, 
                             CTitle VARCHAR(50), 
                             Iname VARCHAR(50), 
                             PRIMARY KEY(CID)
                             FOREIGN KEY(Iname) REFERENCES Instructor_Info_3NF(Iname));
Enter fullscreen mode Exit fullscreen mode
--Inserting into the Instructor_Info_3NF
INSERT INTO Instructor_Info_3NF(Iname, Ilocation)
SELECT Iname, Ilocation 
FROM Instructor_Info_2NF;

--Inserting into the Courses_Info_3NF
INSERT INTO Courses_Info_3NF(CID, CTitle, Iname )
SELECT CID, CTitle, Iname 
FROM Instructor_Info_2NF;
Enter fullscreen mode Exit fullscreen mode

Here are the new tables

Instructor information table 3NF
Instructor information table

Courses information table 3NF
Courses information table

We are going to stop on the 3NF

We have now completed the normalization process, In general most Databases are in 3NF, in this post you've understood what normalization is and also learned how to implement normalization using SQL.

Hope this blog helps you understand Database normalization. If you enjoy my blog please don't forget to like or comment on the article. It will encourage me to write more articles.

Top comments (2)

detzam profile image

Interesting stuff, takes me back to collage.
I would've added the sql code after the logical explanation.

abubakaralsadiq profile image

Thank you @detzam for the comment , I will make sure I do that next time.

Timeless DEV post...

Git Concepts I Wish I Knew Years Ago

The most used technology by developers is not Javascript.

It's not Python or HTML.

It hardly even gets mentioned in interviews or listed as a pre-requisite for jobs.

I'm talking about Git and version control of course.

One does not simply learn git