I believe you'd totally agree with me if I say we are living in the "big data" era where collecting, storing and analyzing data is a top priority for companies and organizations like schools, hospitals and all what not. Data Normalization is a term that is thrown around quite often these days in discussions about databases and database design. You might here someone say "Data Normalization should be a part the database design process".
What then is data normalization? Well, data normalization is basically the process of determining how much redundancy exists in a table or as Microsoft puts it, it is the process of organizing data in a database. The main objective of Normalization in database design is to eliminate redundant (repetitive) data and ensure data is stored logically.
There are a few rules for database normalization. These rules divide larger tables into smaller tables and links them using relationships. Each rule is called a "normal form." and there are six different normal forms. But to keep things simple we are going to look at the first four normal forms(NF) which include:
- First Normal Form(1NF)
- Second Normal Form(2NF)
- Third Normal Form(3NF) and
- Boyce-Codd Normal Form(BCNF)
It entails the following rules; Each attribute in a table should have a single atomic (indivisible) value and Each record needs to be unique. So, if a relation contains a multi-valued attribute, it violates 1NF.
For a relation to be in 2NF, it needs to be in 1NF and should not contain any partial dependencies (check out my post on functional dependencies here). So a relation is in 2NF as long as it has no partial dependencies, i.e., no non-prime attributes (attributes which are not part of any candidate key) is dependent on any proper subset of a composite primary key of the table.
For a relation to be in Third Normal Form, it must be in Second Normal form and It should not have transitive dependency(again check out my post on functional dependencies here) or all non-primary fields should be dependent on the primary key.
For a table to satisfy the Boyce-Codd normal form, it should satisfy the following two conditions: It should be in the third normal form and, for any dependency A → B, A should be a super key. The second condition sounds a bit tricky, right? In simple words, it means that for a dependency A → B, A cannot be a non-prime attribute if B is a prime attribute.
Now you might be wondering what good is data normalization when you don't know the benefits in offers in database design. let us look at some of its benefits
- Data normalization gets rid of a number of anomalies that can make analysis of data complicated.
- Data normalization offers performance benefits by eliminating redundancies from databases thereby reducing the amount of disc space being used to store data. I can go on and on and on. The list is long.
Thank you for reading my article.