When we plan our SQL database, we must think about the schemas of the database, which means that we need to plan the structures of our tables in the database and the relations between them.
Unsuccessful schemas create anomalies in the database, and a database with anomaly within it called Unnormalized Form (UNF). This database lacking the efficiency of normalized database. An unnormalized data model will suffer the pitfalls of data redundancy, where multiple values and/or complex data structures may be stored within a single field or attribute, or where fields may be replicated within a single table.
Example:
We have a table describe employees of a fashion company with 2 stores (located in New-York and San-Francisco). The table contains the following fields (e for employee, s for store):
e_id | e_name | e_address | s_num | s_name | s_address |
---|---|---|---|---|---|
111111 | John | New-York | 50 | Top Shop | New-York |
222222 | Andrew | Palo-Alto | 80 | Mr. Fix | S. Francisco |
333333 | Rachel | New-York | 50 | Top Shop | New-York |
444444 | Sara | Redwood city | 80 | Mr. Fix | S. Francisco |
555555 | David | San-Francisco | 80 | Mr. Fix | S. Francisco |
666666 | Thomas | San-Joze | 80 | Mr. Fix | S. Francisco |
777777 | Arnold | New-York | 50 | Top Shop | New-York |
On this table there are duplication in the last 2 columns of the table, because the s_name (the name of the store) and the s_address (the address of the store) do not change.
But the problem is more than just duplication of unnecessary data. A table like the table above can lead to problems.
What kind of problems can happen in this table?
Insertion anomaly – Lack of uniformity. If we add a new employee who work in "Mr. Fix" store, we need to re-right the name and address of the store. If we make a mistake and write another address, we will get lack of uniformity.
Update anomaly – It is a data inconsistency that results from data redundancy and a partial update of data. For example, if one store changed its location we need to change the tables in every s_address. If we forget to change one line our table will be wrongful.
Deletion Anomaly – It occurs when certain attributes are lost because of the deletion of other attributes. For example, if one shop is closed, we will delete it from the table. However, with that we might also delete the contact of the employees who worked there, and we don't want that.
We can normalize this table by separating it into two tables. We will create one table for the employees, and the other will be dedicated for the store details. In the employees table we will add the store number to each worker, like that:
Employees Table
e_id | e_name | e_address | s_num |
---|---|---|---|
111111 | John | New-York | 50 |
222222 | Andrew | Palo-Alto | 80 |
333333 | Rachel | New-York | 50 |
444444 | Sara | Redwood city | 80 |
555555 | David | San-Francisco | 80 |
666666 | Thomas | San-Joze | 80 |
777777 | Arnold | New-York | 50 |
Stores Table
s_num | s_name | s_address |
---|---|---|
50 | Top Shop | New-York |
80 | Mr. Fix | S. Francisco |
BCNF (Boyce & Codd Normal Forms)
Raymond Boyce and Edgar Codd suggested in the early 1970's, three rules for tables normalization, called after their names BCNF (Boyce & Codd Normal Forms). The normal forms (from least normalized to most normalized) are (In plain and simple English):
- 1NF – On each record there will be no redundant nor duplicate fields. That means that we should not have fields that don't contribute to the entity. We can get that information from related tables. For example: if we have the store number we don't need to put the store name and the store address at the same table.
- 2FN – Each column that is not a key will be fully dependent on all the features of the master key. for example: In the employees table the e_name and the e_address depends on the e_id in that table. In the stores table, the s_name and the s_address depends on the s_num.
- 3FN - All the attributes (e.g. database columns) are functionally dependent on solely the primary key. That means that no columns should be dependent or calculated (for example: sum, average etc.) on other columns in the table that are not primary key.
In the following years after that BCNF was declared there have been developed some new Normalization forms, but these three are the most important. Following these three rules will save your RDBMS from anomaly and will bring you better and more accurate solutions in your search.
Top comments (1)
Tables should definitely be very thoroughly thought out.