Database Normalization: A Complete Guide to 1NF, 2NF, 3NF, and BCNF

#learning #database #sql #developer

What is normalization in DBMS?

Database normalization is a process by which the database structure can be designed in a very effective way. Normalization in DBMS reduces the repetition of data and increases the integrity of data. In this process, various relational tables are created and duplicate data is removed from them. As a result, data repetition is reduced and data is not lost when inserting, deleting, updating.

Why is normalization needed?

Suppose you are creating a student information system. There is only one table in which all the information of the student (name, course, teacher) is stored. Now if you want to delete the information of a teacher, then the student's info along with the teacher and the course info will also be deleted. We will normalize so that the data is not lost during the CRUD operation. Here you have to create a separate table for students, courses, teachers through normalization so that data integrity is maintained while doing CRUD.

Normalization steps:

Normalization is usually carried out in several steps. Each step is called a normalization form (NF). Such as 1NF, 2NF, 3NF, BCNF etc. Below are some of the key steps:

First Normal Form (1NF)

Condition:

The table must not contain any repeating groups.
All column values must be atomic.

Example:

Before normalization

After normalization

The values of the subjects column in the table above are not atomic. Because in this column, different values are placed inside a column with a comma. So the 1NF values are placed in different rows. Now each column has an atomic value that satisfies the condition of 1NF.

Second Normal Form (2NF)

Condition:

The table must be in 1NF.
There must be no partial dependency (non-key columns must depend on the entire primary key).

Example:

Before normalization

After normalization

In the table above, StudentID + CourseID is the composite key. But CourseName does not depend on the entire composite key. It depends only on the CourseID so this is a partial dependency. Similarly, in the case of StudentName, it does not depend on the entire composite key. It depends only on the StudentID so this is also a partial dependency. Now after normalizing this is divided into two tables. The non-key columns in each table are now dependent on the entire primary key, which means there is no longer any partial dependency that meets the 2NF condition.

Third Normal Form (3NF)

Condition:

The table must be in 2NF.
There must be no transitive dependencies (non-key columns cannot be dependent on any other non-key column).

Example:

Before normalization

After normalization

In the table above, StudentID is the primary key that uniquely identifies each row. Here the DepartmentName depends on the DepartmentID and the DepartmentID depends on the StudentID which basically creates a transitive dependency. Because when a non-key column (DepartmentName) depends on another non-key column (DepartmentID), it is called a transitive dependency. Since there is a transitive dependency, two tables have been created by normalizing this table.

Boyce-Codd Normal Form (3.5NF)

Condition:

The table must be in 3NF.
Each determinant must be a candidate key. A determinant is a column or set of columns that determines another column.

Example:

Before normalization

After normalization

In the table above, StudentID + CourseID is the candidate key. Functional dependency is CourseID- > Instructor but here CourseID is not a candidate key but it is part of the candidate key so it violates the condition of BCNF. So now two new tables have been created through normalization so that the Determinant is always the candidate key. Actually, BCNF is an updated version of 3NF.