DEV Community

Cover image for Database Normalization and Denormalization
coder7475
coder7475

Posted on

Database Normalization and Denormalization

Normalization and denormalization are cornerstone techniques in database design, each serving distinct purposes in balancing data integrity and performance. In critical systems like medical databases, where accurate patient records and rapid data access can impact lives, these approaches are vital. Using a medical system as an example, this blog explores normalization and denormalization through a Socratic lens, encouraging you to reason through their principles, applications, and trade-offs.

What Are Database Anomalies and Why Do They Matter?

Imagine a medical database where all patient, doctor, and appointment data is stored in a single table. If a patient updates their phone number, how many records must you change to keep the data consistent? If you delete a canceled appointment, what happens if it also removes the patient’s only record? These issues—update anomalies (inconsistent updates), delete anomalies (unintended data loss), and insert anomalies (inability to add data without related records)—can disrupt a hospital’s operations.

Normalization is designed to prevent these anomalies by organizing data systematically. Let’s explore how it works using a medical system with entities like Patients, Doctors, Appointments, Medical Records, and Medical Facilities.

Normalization: Structuring Data for Integrity

Normalization is like organizing a hospital’s records into a logical system where each piece of information has a unique place, much like a librarian cataloging books to avoid duplicates and ensure easy access. It involves breaking down a large table into smaller, related tables linked by keys, eliminating redundancy and preventing anomalies.

Normalization follows a series of rules called normal forms, each addressing specific data dependencies. Let’s apply these to a medical database.

Functional Dependencies: The Building Blocks

Before diving into normal forms, consider functional dependencies. If a PatientID uniquely determines a patient’s FirstName and LastName, written as PatientID → FirstName, LastName, what does this suggest about how we should store patient data? Functional dependencies ensure attributes are logically related, guiding the normalization process.

First Normal Form (1NF): The Foundation of Organization

A table is in 1NF if:

  • All attributes are atomic (indivisible).
  • Each column has a unique name.
  • All entries in a column are of the same data type.
  • Each row is unique, typically enforced by a primary key.

Consider a denormalized table in our medical system:

AppointmentID PatientName DoctorName AppointmentDetails
1 John Doe Alice Smith 2023-01-01, 10:00, Checkup
2 Jane Roe Bob Johnson 2023-01-02, 11:00, Surgery

The AppointmentDetails column contains multiple values (date, time, type), violating 1NF.To achieve 1NF, we split the data:

AppointmentID PatientName DoctorName AppointmentDate AppointmentTime AppointmentType
1 John Doe Alice Smith 2023-01-01 10:00 Checkup
2 Jane Roe Bob Johnson 2023-01-02 11:00 Surgery

Now, each cell is atomic, and the table is in 1NF. Next up is 2NF.

Second Normal Form (2NF): Eliminating Partial Dependencies

A table is in 2NF if:

  • It is in 1NF.
  • All non-key attributes fully depend on the entire primary key, avoiding partial dependencies.

Suppose the primary key is (AppointmentID, PatientID):

AppointmentID PatientID PatientName DoctorName AppointmentDate AppointmentTime AppointmentType
1 1 John Doe Alice Smith 2023-01-01 10:00 Checkup
2 2 Jane Roe Bob Johnson 2023-01-02 11:00 Surgery

Here, PatientName depends only on PatientID, not the full key (AppointmentID, PatientID). To achieve 2NF, we split the table:

Patients Table:

PatientID FirstName LastName
1 John Doe
2 Jane Roe

Appointments Table:

AppointmentID PatientID DoctorName AppointmentDate AppointmentTime AppointmentType
1 1 Alice Smith 2023-01-01 10:00 Checkup
2 2 Bob Johnson 2023-01-02 11:00 Surgery

Now, PatientName depends only on PatientID. Now that we've eliminated partial dependencies with 2NF, the next concern is transitive dependencies—where non-key fields depend on other non-key fields

Third Normal Form (3NF): Removing Transitive Dependencies

A table is in 3NF if:

  • It is in 2NF.
  • No non-key attribute depends on another non-key attribute (transitive dependency).

Let’s add doctor specialization to the Appointments table:

AppointmentID PatientID DoctorName DoctorSpecialization AppointmentDate AppointmentTime AppointmentType
1 1 Alice Smith Cardiology 2023-01-01 10:00 Checkup
2 2 Bob Johnson Neurology 2023-01-02 11:00 Surgery

Here, DoctorSpecialization depends on DoctorName, not the primary key (AppointmentID, PatientID). This is a transitive dependency.To achieve 3NF, we create a Doctors table:

Doctors Table:

DoctorID FirstName LastName Specialization
1 Alice Smith Cardiology
2 Bob Johnson Neurology

Appointments Table (Updated):

AppointmentID PatientID DoctorID AppointmentDate AppointmentTime AppointmentType
1 1 1 2023-01-01 10:00 Checkup
2 2 2 2023-01-02 11:00 Surgery

Now, Specialization depends on DoctorID, eliminating the transitive dependency.

Boyce-Codd Normal Form (BCNF): The Strictest Form

A table is in BCNF if it is in 3NF and every determinant (an attribute that determines another) is a candidate key. Consider a table for doctor schedules:

DoctorID FacilityID FacilityName ScheduleDay
1 1 City Hospital Monday
2 2 Clinic North Tuesday

If FacilityName determines FacilityID, this violates BCNF because FacilityName is not a candidate key. To achieve BCNF, we split the table:

Facilities Table:

FacilityID FacilityName
1 City Hospital
2 Clinic North

DoctorSchedules Table:

DoctorID FacilityID ScheduleDay
1 1 Monday
2 2 Tuesday

Denormalization: Prioritizing Performance

Normalization ensures data integrity but can slow queries requiring multiple joins, such as retrieving a patient’s appointment history with doctor and facility details. In a hospital, why might quick access to such data be essential, like for an emergency room dashboard? Denormalization introduces redundancy to optimize query performance.

When to Denormalize

Denormalization is useful for:

  • Read-Heavy Operations: Dashboards showing appointment schedules.
  • Complex Queries: Combining patient, doctor, and medical record data.
  • Precomputed Results: Storing aggregates, like the number of appointments per doctor.

For example, a normalized query for appointment details might be:

SELECT a.AppointmentID, p.FirstName AS PatientFirst, p.LastName AS PatientLast,
       d.FirstName AS DoctorFirst, d.LastName AS DoctorLast, a.AppointmentDate
FROM Appointments a
JOIN Patients p ON a.PatientID = p.PatientID
JOIN Doctors d ON a.DoctorID = d.DoctorID;
Enter fullscreen mode Exit fullscreen mode

This query is slow for large datasets. A denormalized table might look like:

AppointmentID PatientFirst PatientLast DoctorFirst DoctorLast AppointmentDate
1 John Doe Alice Smith 2023-01-01
2 Jane Roe Bob Johnson 2023-01-02

A simple SELECT * FROM AppointmentDetails is faster.

Denormalization Pros and Cons

Aspect Pros Cons
Performance Faster reads due to fewer joins Slower writes due to redundancy
Query Simplicity Simpler SELECT statements Complex updates to maintain consistency
Storage - Increased storage needs
Data Integrity - Risk of inconsistencies

Applying Normalization and Denormalization to a Medical System

Consider a medical database with:

  • Patients: PatientID, FirstName, LastName, DateOfBirth, ContactNumber, Email
  • Doctors: DoctorID, FirstName, LastName, Specialization, ContactNumber
  • Appointments: AppointmentID, PatientID, DoctorID, AppointmentDate, AppointmentTime, Status
  • Medical Records: RecordID, AppointmentID, Diagnosis, Prescription, TestResults
  • Facilities: FacilityID, FacilityName, Location
  • DoctorFacilities: Junction table for many-to-many relationships between doctors and facilities.

Normalized Design

The database is split into tables linked by foreign keys, ensuring integrity. For example, Appointments references PatientID and DoctorID.

Denormalized Design

For a dashboard showing appointment schedules, a materialized view might combine data:

CREATE MATERIALIZED VIEW AppointmentSummary AS
SELECT a.AppointmentID, p.FirstName AS PatientFirst, p.LastName AS PatientLast,
       d.FirstName AS DoctorFirst, d.LastName AS DoctorLast, d.Specialization,
       a.AppointmentDate, a.AppointmentTime
FROM Appointments a
JOIN Patients p ON a.PatientID = p.PatientID
JOIN Doctors d ON a.DoctorID = d.DoctorID;
Enter fullscreen mode Exit fullscreen mode

This view speeds up queries but requires periodic refreshes to stay consistent.

Enforcing Integrity with Constraints

In a medical system, constraints ensure data reliability:

  • Primary Key: PatientID ensures unique patient records.
  • Foreign Key: PatientID in Appointments references Patients.
  • Unique Constraint: Email in Patients prevents duplicates.
  • Check Constraint: AppointmentDate ensures future dates.

Conclusion

Normalization and denormalization are complementary strategies in database design. Normalization builds a robust, anomaly-free structure, ideal for transactional medical systems where accuracy is paramount. Denormalization enhances performance for read-heavy tasks, like generating appointment reports. By understanding and balancing these approaches, you can design a medical database that is both reliable and efficient.

Further Reading

Top comments (0)