Introduction
Have you ever wondered why your database tables seem a bit messy? That's where database normalization comes in. It's a structured approach to designing a database to reduce data redundancy and improve data integrity. In this guide, we'll walk through the process of normalizing a simple table to 1NF, 2NF, and 3NF using Oracle Live SQL.
We'll start with a problematic base table, identify its flaws (anomalies), and then normalize it step-by-step. Let's get started!
- The Problematic Base Table
Our starting point is a single table that contains all the data about students, courses, and instructors. As you'll see, this table is highly redundant.
This table is a great example of what not to do. It suffers from several anomalies:
Update Anomaly: If Dr. Kumar's phone number changes, we have to update multiple rows.
Insertion Anomaly: We can't add a new course without a student.
Deletion Anomaly: Deleting the last student in a course also deletes the course information itself.
- Normalizing to 1NF (First Normal Form) A table is in 1NF if it contains only atomic values (single values in each cell) and a unique key. Our base table is already in 1NF, as there are no repeating groups. We just need to define a composite primary key to ensure each row is unique.
Here's the SQL statement for our 1NF table:
SQL
CREATE TABLE Courses_1NF (
StudentID VARCHAR2(5) NOT NULL,
StudentName VARCHAR2(50),
CourseID VARCHAR2(5) NOT NULL,
CourseName VARCHAR2(50),
Instructor VARCHAR2(50),
InstructorPhone VARCHAR2(15),
PRIMARY KEY (StudentID, CourseID)
);
- Normalizing to 2NF (Second Normal Form) To be in 2NF, a table must be in 1NF, and all non-key attributes must be fully dependent on the entire primary key. In our 1NF table, the primary key is (StudentID, CourseID). However, StudentName only depends on StudentID, and CourseName, Instructor, and InstructorPhone only depend on CourseID.
To fix this, we'll split our table into three smaller, more focused tables: Students_2NF, Courses_Instructors_2NF, and Enrollments_2NF.
SQL
CREATE TABLE Students_2NF (
StudentID VARCHAR2(5) PRIMARY KEY,
StudentName VARCHAR2(50)
);
CREATE TABLE Courses_Instructors_2NF (
CourseID VARCHAR2(5) PRIMARY KEY,
CourseName VARCHAR2(50),
Instructor VARCHAR2(50),
InstructorPhone VARCHAR2(15)
);
CREATE TABLE Enrollments_2NF (
StudentID VARCHAR2(5),
CourseID VARCHAR2(5),
PRIMARY KEY (StudentID, CourseID)
);
- Normalizing to 3NF (Third Normal Form) 3NF requires that a table is in 2NF and has no transitive dependencies. A transitive dependency happens when a non-key attribute depends on another non-key attribute. In our Courses_Instructors_2NF table, InstructorPhone depends on Instructor, not directly on the CourseID primary key.
To achieve 3NF, we'll separate the instructor details into their own table. This gives us four normalized tables: Students, Instructors, Courses, and Enrollments.
SQL
CREATE TABLE Students (
StudentID VARCHAR2(5) PRIMARY KEY,
StudentName VARCHAR2(50)
);
CREATE TABLE Instructors (
Instructor VARCHAR2(50) PRIMARY KEY,
InstructorPhone VARCHAR2(15)
);
CREATE TABLE Courses (
CourseID VARCHAR2(5) PRIMARY KEY,
CourseName VARCHAR2(50),
Instructor VARCHAR2(50),
FOREIGN KEY (Instructor) REFERENCES Instructors(Instructor)
);
CREATE TABLE Enrollments (
StudentID VARCHAR2(5),
CourseID VARCHAR2(5),
PRIMARY KEY (StudentID, CourseID),
FOREIGN KEY (StudentID) REFERENCES Students(StudentID),
FOREIGN KEY (CourseID) REFERENCES Courses(CourseID)
);
- Inserting and Querying the Normalized Data Now that our schema is ready, we can insert the sample data and write a query to retrieve information from our new, well-structured tables.
SQL
-- Insert into Instructors first
INSERT INTO Instructors (Instructor, InstructorPhone) VALUES ('Dr. Kumar', '9876543210');
INSERT INTO Instructors (Instructor, InstructorPhone) VALUES ('Dr. Mehta', '9123456780');
INSERT INTO Instructors (Instructor, InstructorPhone) VALUES ('Dr. Rao', '9988776655');
-- Insert into Students
INSERT INTO Students (StudentID, StudentName) VALUES ('S01', 'Arjun');
INSERT INTO Students (StudentID, StudentName) VALUES ('S02', 'Priya');
INSERT INTO Students (StudentID, StudentName) VALUES ('S03', 'Kiran');
-- Insert into Courses
INSERT INTO Courses (CourseID, CourseName, Instructor) VALUES ('C101', 'DBMS', 'Dr. Kumar');
INSERT INTO Courses (CourseID, CourseName, Instructor) VALUES ('C102', 'Data Mining', 'Dr. Mehta');
INSERT INTO Courses (CourseID, CourseName, Instructor) VALUES ('C103', 'AI', 'Dr. Rao');
-- Insert into Enrollments
INSERT INTO Enrollments (StudentID, CourseID) VALUES ('S01', 'C101');
INSERT INTO Enrollments (StudentID, CourseID) VALUES ('S01', 'C102');
INSERT INTO Enrollments (StudentID, CourseID) VALUES ('S02', 'C101');
INSERT INTO Enrollments (StudentID, CourseID) VALUES ('S03', 'C103');
Finally, here's the query to list all students, their courses, and instructor names using JOIN statements. This is much more efficient and reliable than querying the single, denormalized table.
SQL
SELECT
S.StudentName,
C.CourseName,
C.Instructor
FROM
Students S
JOIN
Enrollments E ON S.StudentID = E.StudentID
JOIN
Courses C ON E.CourseID = C.CourseID;
Conclusion
By normalizing our database, we've eliminated data redundancy and the risk of anomalies. Our database is now more efficient to update, easier to maintain, and ensures data consistency. This is the foundation of good database design!
Top comments (0)