Prabavathy Balagurusamy

Posted on Oct 8

🧠 Database Normalization in SQL (1NF 2NF 3NF)

#programming #database #mongodb #sql

🎯 Objective

Database normalization is one of the most important concepts in database design. It ensures that data is stored efficiently, redundancy is minimized, and data integrity is maintained.

To understand 1NF, 2NF, and 3NF, identify data anomalies, and implement normalization using SQL with proper CREATE TABLE, INSERT, and JOIN queries

By the end, you’ll understand not only how to normalize a table, but why it matters.

🧱 Base Table — The Starting Point

Let’s begin with a simple (but flawed) table design that stores students, their enrolled courses, and instructor details:

At first glance, this table might look fine — it gives us all the details in one place. But let’s look deeper.

⚠️ Data Anomalies

This table has the following anomalies:

Insertion anomaly: A new course can’t be added unless at least one student is enrolled.
Update anomaly: If the instructor’s phone number changes, it must be updated in multiple rows.
Deletion anomaly: If all students of a course are removed, information about that course and instructor is lost.

We apply Normalization — a step-by-step process of structuring the database.

🧱 Step 1: Transforming to 1NF (First Normal Form)

✅ Rule: Every column should contain indivisible (atomic) values — no groups or lists within a single field.

Our original table already meets this rule since each column stores one value per cell.
However, let’s formally define this table in SQL to establish a proper structure.

CREATE TABLE StudentCourse_1NF (
StudentID VARCHAR(10),
StudentName VARCHAR(50),
CourseID VARCHAR(10),
CourseName VARCHAR(50),
Instructor VARCHAR(50),
InstructorPhone VARCHAR(15)
);

➕ Insert Sample Data:

INSERT INTO StudentCourse_1NF VALUES
('S01', 'Arjun', 'C101', 'DBMS', 'Dr. Kumar', '9876543210'),
('S01', 'Arjun', 'C102', 'Data Mining', 'Dr. Mehta', '9123456780'),
('S02', 'Priya', 'C101', 'DBMS', 'Dr. Kumar', '9876543210'),
('S03', 'Kiran', 'C103', 'AI', 'Dr. Rao', '9988776655');

💡 Observation:
Although the data follows 1NF, there’s still repeated information — the same course and instructor details appear multiple times.

🧩 Step 2: Converting to 2NF (Second Normal Form)

✅ Rule: Remove partial dependencies — every non-key attribute must depend on the entire primary key.

In our current structure, a composite key could be (StudentID, CourseID).
But attributes like CourseName, Instructor, and InstructorPhone depend only on CourseID.
This means the table breaks the rule for 2NF.

To correct this, we’ll divide the data into three distinct tables: Student, Course, and Enrollment.

🧮 Student Table

Holds all student-specific details.

CREATE TABLE Student (
StudentID VARCHAR(10) PRIMARY KEY,
StudentName VARCHAR(50)
);

INSERT INTO Student VALUES
('S01', 'Arjun'),
('S02', 'Priya'),
('S03', 'Kiran');

📘 Course Table

Contains information about each course and its instructor.

CREATE TABLE Course (
CourseID VARCHAR(10) PRIMARY KEY,
CourseName VARCHAR(50),
Instructor VARCHAR(50),
InstructorPhone VARCHAR(15)
);

INSERT INTO Course VALUES
('C101', 'DBMS', 'Dr. Kumar', '9876543210'),
('C102', 'Data Mining', 'Dr. Mehta', '9123456780'),
('C103', 'AI', 'Dr. Rao', '9988776655');

🧾 Enrollment Table

Links students to the courses they have enrolled in.

CREATE TABLE Enrollment (
StudentID VARCHAR(10),
CourseID VARCHAR(10),
PRIMARY KEY (StudentID, CourseID),
FOREIGN KEY (StudentID) REFERENCES Student(StudentID),
FOREIGN KEY (CourseID) REFERENCES Course(CourseID)
);

INSERT INTO Enrollment VALUES
('S01', 'C101'),
('S01', 'C102'),
('S02', 'C101'),
('S03', 'C103');

✅ Result:
Now, each table only includes data that depends on its own key.
We’ve removed redundant information between students and courses — though one more dependency still exists between Instructor and InstructorPhone.

🧩 Step 3: Moving to 3NF (Third Normal Form)

✅ Rule: Eliminate transitive dependencies — non-key fields shouldn’t depend on other non-key fields.

Here, InstructorPhone depends on Instructor, not on CourseID.
That’s a transitive dependency, and it violates 3NF.

To fix this, we’ll create a new table just for instructor details.

👨‍🏫 Instructor Table

CREATE TABLE Instructor (
InstructorID VARCHAR(10) PRIMARY KEY,
InstructorName VARCHAR(50),
InstructorPhone VARCHAR(15)
);

INSERT INTO Instructor VALUES
('I01', 'Dr. Kumar', '9876543210'),
('I02', 'Dr. Mehta', '9123456780'),
('I03', 'Dr. Rao', '9988776655');

📘** Updated Course Table (After 3NF)**

CREATE TABLE Course_3NF (
CourseID VARCHAR(10) PRIMARY KEY,
CourseName VARCHAR(50),
InstructorID VARCHAR(10),
FOREIGN KEY (InstructorID) REFERENCES Instructor(InstructorID)
);

INSERT INTO Course_3NF VALUES
('C101', 'DBMS', 'I01'),
('C102', 'Data Mining', 'I02'),
('C103', 'AI', 'I03');

✅ Now:
Every column in every table depends directly on its primary key — there’s no indirect dependency left.
The schema is now fully normalized and easy to maintain.

🧮 Step 4: Retrieve Data Using JOINs

Let’s merge all the tables to display complete student-course-instructor details.

SELECT s.StudentName, c.CourseName, i.InstructorName, i.InstructorPhone
FROM Enrollment e
JOIN Student s ON e.StudentID = s.StudentID
JOIN Course_3NF c ON e.CourseID = c.CourseID
JOIN Instructor i ON c.InstructorID = i.InstructorID;