Hi everyone! I’m really excited to share my learnings in RDBMS through this blog. As someone who is learning and exploring RDBMS concepts, I decided to write down my understanding in the simplest way possible so that anyone—especially beginners like me—can easily follow along. Today, let’s begin with one of the most important topics: Normalization.
Normalization-->
Normalization is the process of organizing data in a database to reduce redundancy and improve data integrity.
In simple words: It ensures your data is stored in the right place, without unnecessary duplication.
First Normalization Form(1NF)
For example: instead of storing a multiple phone number for a single person in the same column of the table breakdown the single table into two and through connecting the any of the attributes together to ensure the 1NF and to get the desired output use Joins
let we dive into the example:
drop table if exists students_non1NF;
create table students_non1NF(
student_id INT,
student_name VARCHAR(20),
phone_no VARCHAR(100)
)
Insert Values:
INSERT INTO students_non1NF(student_id,student_name,phone_no) VALUES
(1, 'jaswant','1234567890,0986235162'),
(2,'santhosh','7185689465,8643851084'),
(3,'rakshanth','3486512964,6853264273'),
(4,'ganges','6153956426,6426492386')
This table is not in First normalization from why because in column phone _no
has two phone numbers in the single field
To avoid this problem we split the single table into two
DROP TABLE if EXISTS students;
CREATE TABLE students(
student_id INT,
student_name VARCHAR(100),
PRIMARY KEY(student_id)
);
CREATE TABLE student_phone(
student_id INT,
phone_no VARCHAR(100),
PRIMARY KEY (student_id, phone_no),
FOREIGN KEY (student_id) REFERENCES students(student_id)
);
And the insert values are:
INSERT INTO students (student_id,student_name) values (1, 'jaswant'),(2,'santhosh'),(3,'rakshanth'),
(4,'ganges');
INSERT INTO student_phone(student_id,phone_no) values (1,'1234567890'),
(1,'0986235162'),(2,'7185689465'),(2,'8643851084'),(3,'3486512964'),(3,'6853264273'),(4,'6153956426'),(4,'6426492386');
the output is:
Boom..!!!!
The problem solved now we able to give the multiple numbers for the same people and it will not violate the 1NF
to see the output then we use Join Query:
SELECT s.student_id,
s.student_name,
p.phone_no
FROM students s
LEFT JOIN student_phone p
ON s.student_id = p.student_id;
And the output is:
Second Normalization Form(2NF)
The Second Normal Form (2NF) builds on the foundation of 1NF and takes the process of organizing data one step further. A table is said to be in 2NF if it is already in 1NF and all non-key attributes are fully dependent on the entire primary key. This mainly applies when a table has a composite primary key (a primary key made up of two or more columns). If some non-key attributes depend only on part of that composite key instead of the whole key, it creates what we call a partial dependency, and this violates 2NF. For example, if we have a table where the primary key is a combination of StudentID and CourseID, and we also store the InstructorName, then InstructorName depends only on CourseID and not on the full key. To fix this, we separate the data into two tables—one for the student-course mapping and another for the course-instructor details. By doing this, we eliminate partial dependency, making the database more structured
Now let we see the table without 2NF:
CREATE TABLE Enrollment_2nonNF(
student_id INT,
course_id INT,
course_name VARCHAR(200),
instructor VARCHAR(200),
PRIMARY KEY (student_id,course_id)
)
And the insert values are:
INSERT INTO Enrollment_2nonNF(student_id,course_id,course_name,instructor) VALUES
(1,101,'Intro to SQL','Jorn'),
(2,101,'Intro to SQL','Jorn'),
(1,102,'Intro to JAVA','smith');
The output is :
Here the instructor
column is dependent on course_id
to avoid this problem let we breakdown the table into two which one is for holding student_id
, and course_id
and the another one for holding course_id
, course_name
and instructor
drop table if exists Courses;
CREATE TABLE Courses(
course_id INT,
course_name VARCHAR(200),
instructor VARCHAR(200),
PRIMARY KEY (course_id)
);
drop table if exists Enrollment;
CREATE TABLE Enrollment(
student_id INT ,
course_id INT ,
PRIMARY KEY(student_id,course_id),
FOREIGN KEY (course_id) REFERENCES Courses(course_id)
);
INSERT INTO Courses (course_id, course_name, instructor) VALUES
(101,"Intro to SQL",'Jorn'),
(102,'Intro to JAVA','smith');
INSERT INTO Enrollment (student_id,course_id) VALUES
(1,101),
(2,101),
(1,102);
The join query is:
SELECT e.student_id , c.Course_id, c.course_name, c.instructor
from Enrollment e LEFT JOIN Courses c ON e.course_id = c.course_id;
the output is:
Third Normalization Form(3NF)
The Third Normal Form (3NF) is the next step after 2NF, and its goal is to eliminate what is known as transitive dependency. A table is said to be in 3NF if it is already in 2NF and every non-key attribute depends only on the primary key, not on another non-key attribute. In simpler terms, no column should give information about another column unless it is through the primary key. For example, consider a table with course_id, course_name, instructor, and instructor_office. Here, instructor_office depends on instructor, which means it is indirectly dependent on course_id. This violates 3NF. To fix this, we split the table into two: one for course details (course_id, course_name, instructor) and another for Instructor details (instructor, instructor_office). By doing this, every non-key attribute now depends only on the primary key, and we remove unnecessary dependencies. Achieving 3NF ensures that the data is well-organized, reduces redundancy even further, and makes the database easier to maintain and scale.
The table without 3NF look like:
CREATE TABLE courses_Non3NF(
course_id INT PRIMARY KEY,
course_name VARCHAR(100),
instructor VARCHAR(100),
instructor_office VARCHAR(100)
);
INSERT INTO courses_Non3NF(course_id,course_name,instructor,instructor_office) VALUES
(101, 'Intro to SQL','DR.Jorn' , 'Room 101'),
(102, 'Intro to JAVA','DR. Smith' , 'Room 102');
As we already discussed now we split the table into two and the code like this:
CREATE TABLE Insturctors(
instructor VARCHAR(100) PRIMARY KEY,
instructor_office VARCHAR(100)
);
CREATE TABLE courses1(
course_id INT PRIMARY KEY,
course_name VARCHAR(100),
instructor VARCHAR(100),
FOREIGN KEY (instructor) REFERENCES Insturctors(instructor)
);
INSERT INTO Insturctors (instructor,instructor_office) VALUES
('DR.jorn' , 'Room 101'),
('DR.smith' , 'Room 102');
INSERT INTO courses1(course_id,course_name,instructor) VALUES
(101, 'Intro to SQL','DR.jorn' ),
(102, 'Intro to JAVA','DR.smith');
And the join query is:
select c.course_id, c.course_name , i.instructor, i.instructor_office
from courses1 c LEFT JOIN Insturctors i ON c.instructor = i.instructor;
Super..!!, these are the concepts and practical learnings in 1NF, 2NF and 3NF and beyond that we have BCNF, 4NF and 5NF these three are not very much usable even if you eager to learn these three normalization forms please search in internet you will easily understood the things
Top comments (0)