Define Normalization?
Normalization is a process in database design used to organize data efficiently. The main goal is to minimize redundancy (repeated data) and avoid inconsistencies. It ensures that each piece of data is stored only once, making the database more reliable and easier to maintain.
In practice, normalization involves breaking a large, complex table into smaller, simpler tables and linking them using foreign keys. This structure allows data to be stored logically while preserving relationships between tables.
Why is Normalization Needed?
Normalization provides several key benefits:
Eliminates Data Duplication – Storing data only once reduces redundancy, preventing unnecessary repetition.
Ensures Data Consistency – Updating a single value automatically updates it everywhere it’s used, maintaining accuracy.
Simplifies Data Updates and Maintenance – Smaller, organized tables are easier to modify without errors.
Saves Storage Space – Avoiding repeated data reduces the storage needed.
Improves Query Performance – Structured data helps the database retrieve information faster.
Normal Forms
Normalization is done in stages, called normal forms. Each stage fixes certain types of problems in the data.
First Normal Form (1NF)
A table is in 1NF if each column contains atomic (indivisible) values.
Repeating groups or arrays are not allowed.
Each record (row) must be unique, usually ensured by a primary key.
Second Normal Form (2NF)
A table is in 2NF if it is in 1NF and no non-key attribute depends on part of a composite key.
Eliminates partial dependency, ensuring that non-key data depends on the whole primary key.
Third Normal Form (3NF)
A table is in 3NF if it is in 2NF and no non-key attribute depends on another non-key attribute.
Eliminates transitive dependency.
Boyce-Codd Normal Form (BCNF)
- A stricter version of 3NF.
- Every determinant must be a candidate key.
- Helps remove rare anomalies not covered by 3NF.
Fourth Normal Form (4NF)
- Addresses multi-valued dependencies.
- Ensures that if one attribute depends on another, it doesn’t repeat unnecessarily with other independent attributes.
Fifth Normal Form (5NF)
- Deals with join dependencies.
- Ensures data can be reconstructed from smaller tables without redundancy.
We shall use the following data as the starting point:
1. Identify anomalies (insertion, update, deletion) in this table.
🔸 Insertion Anomaly
We can’t add a new course until at least one student registers for it, because all course data is mixed with student data.
e.g., Can’t add a new course C104 – ML – Dr. Sharma without a student.
🔸 Update Anomaly
If an instructor’s phone number changes, it must be updated in multiple rows.
e.g., Dr. Kumar’s phone number appears twice.
🔸 Deletion Anomaly
If all students drop DBMS, deleting those rows also deletes Dr. Kumar and the course DBMS information.
2. Convert the table to 1NF and write the SQL CREATE TABLE statement for it.
✅ The table already satisfies 1NF because:
- All values are atomic.
- Each row is unique (based on StudentID + CourseID).
We’ll explicitly define a composite primary key (StudentID, CourseID).
3. Convert the table to 2NF and write SQL CREATE TABLE statements for the resulting tables, including primary keys.
In the current table:
- StudentName depends only on StudentID
- CourseName, Instructor, and InstructorPhone depend only on CourseID
So we separate Student and Course information.
4. Convert the table to 3NF and write SQL CREATE TABLE statements, including foreign keys.
Here, in Courses, InstructorPhone depends on Instructor, not on CourseID.
So we separate instructor details into a new table.
5. Insert the sample data into the normalized tables using INSERT INTO statements.
6. Write a query to list all students along with their courses and instructor names using JOINs.
Top comments (0)