DEV Community

Cover image for Database Normalizarion
odeddugma
odeddugma

Posted on

Database Normalizarion

When we plan our SQL database, we must think about the schemas of the database, which means that we need to plan the structures of our tables in the database and the relations between them.

Unsuccessful schemas create anomalies in the database, and a database with anomaly within it called Unnormalized Form (UNF). This database lacking the efficiency of normalized database. An unnormalized data model will suffer the pitfalls of data redundancy, where multiple values and/or complex data structures may be stored within a single field or attribute, or where fields may be replicated within a single table.

Example:
We have a table describe employees of a fashion company with 2 stores (located in New-York and San-Francisco). The table contains the following fields (e for employee, s for store):

e_id e_name e_address s_num s_name s_address
111111 John New-York 50 Top Shop New-York
222222 Andrew Palo-Alto 80 Mr. Fix S. Francisco
333333 Rachel New-York 50 Top Shop New-York
444444 Sara Redwood city 80 Mr. Fix S. Francisco
555555 David San-Francisco 80 Mr. Fix S. Francisco
666666 Thomas San-Joze 80 Mr. Fix S. Francisco
777777 Arnold New-York 50 Top Shop New-York

On this table there are duplication in the last 2 columns of the table, because the s_name (the name of the store) and the s_address (the address of the store) do not change.

But the problem is more than just duplication of unnecessary data. A table like the table above can lead to problems.

What kind of problems can happen in this table?

  • Insertion anomaly – Lack of uniformity. If we add a new employee who work in "Mr. Fix" store, we need to re-right the name and address of the store. If we make a mistake and write another address, we will get lack of uniformity.

  • Update anomaly – It is a data inconsistency that results from data redundancy and a partial update of data. For example, if one store changed its location we need to change the tables in every s_address. If we forget to change one line our table will be wrongful.

  • Deletion Anomaly – It occurs when certain attributes are lost because of the deletion of other attributes. For example, if one shop is closed, we will delete it from the table. However, with that we might also delete the contact of the employees who worked there, and we don't want that.

We can normalize this table by separating it into two tables. We will create one table for the employees, and the other will be dedicated for the store details. In the employees table we will add the store number to each worker, like that:

Employees Table
e_id e_name e_address s_num
111111 John New-York 50
222222 Andrew Palo-Alto 80
333333 Rachel New-York 50
444444 Sara Redwood city 80
555555 David San-Francisco 80
666666 Thomas San-Joze 80
777777 Arnold New-York 50
Stores Table
s_num s_name s_address
50 Top Shop New-York
80 Mr. Fix S. Francisco
BCNF (Boyce & Codd Normal Forms)

Raymond Boyce and Edgar Codd suggested in the early 1970's, three rules for tables normalization, called after their names BCNF (Boyce & Codd Normal Forms). The normal forms (from least normalized to most normalized) are (In plain and simple English):

  1. 1NF – On each record there will be no redundant nor duplicate fields. That means that we should not have fields that don't contribute to the entity. We can get that information from related tables. For example: if we have the store number we don't need to put the store name and the store address at the same table.
  2. 2FN – Each column that is not a key will be fully dependent on all the features of the master key. for example: In the employees table the e_name and the e_address depends on the e_id in that table. In the stores table, the s_name and the s_address depends on the s_num.
  3. 3FN - All the attributes (e.g. database columns) are functionally dependent on solely the primary key. That means that no columns should be dependent or calculated (for example: sum, average etc.) on other columns in the table that are not primary key.

In the following years after that BCNF was declared there have been developed some new Normalization forms, but these three are the most important. Following these three rules will save your RDBMS from anomaly and will bring you better and more accurate solutions in your search.

Top comments (1)

Collapse
 
roymattar profile image
Roy Mattar

Tables should definitely be very thoroughly thought out.