Normalization was first introduced as part of the relational model. It is the process of organizing data tables and columns in a way that reduces redundancies and improves integrity. This can either be done through :
synthesis : creates a normalized database design based on a known set of dependencies.
decomposition : takes an existing (insufficiently normalized) database design and improves it based on the known set of dependencies
There are three common normal forms (1st, 2nd and 3rd) plus a rather advanced form called BCNF. They are progressive : in orther to qualify for the 3rd normal form, a database schema must satisfy the rules of the 2nd normal form, and so on for the 1st normal form.
1st normal form : The information is stored in a table, each column contains atomic values, and there are not repeating groups of columns. This :
Eliminates repeating groups in individual tables.
Creates a separate table for each set of related data.
Identifies each set of related data with a primary key
Example
A design that violates the 1st normal form, the "telephone" column does not contain atomic values
customer ID
First name
Last name
Telephone
123
Pooja
Singh
555-861-2025, 192-122-1111
789
John
Doe
555-808-9633
456
San
Zhang
(555) 403-1659 Ext. 53; 182-929-2929
One solution would be to have an extra column for each phone number. But then, this will repeat conceptually the same attribute(phone number). Moreover, adding extra telephone number will require reorganizing the table by adding more column.This is definitely not practicle.
Another solution is to have a separate table for the association customer <-> Telephone: This respects the 1st normal form and there can be as many rows per customer as needed.
customer ID
First name
Last name
123
Pooja
Singh
789
John
Doe
456
San
Zhang
customer ID
Telephone
123
555-861-2025
123
192-122-1111
789
555-808-9633
456
(555) 403-1659 Ext. 53
456
182-929-2929
2nd normal form : The table is in the first normal form and all the non-key columns depend on the table's primary key. This narrows the table's purpose.
Example
A design that violates the 2nd normal form. The model full name being the primary key, there are other candidate keys like {manufacturer, model}. The "Manufacturer Country" column is dependant on a non-key column (the Manufacturer).
Manufacturer
Model
Model Full Name
Manufacturer Country
Forte
X-Prime
Forte X-Prime
Italy
Forte
Ultraclean
Forte Ultraclean
Italy
Dent-o-Fresh
EZbrush
Dent-o-Fresh EZbrush
USA
Kobayashi
ST-60
Kobayashi ST-60
Japan
Hoch
Toothmaster
Hoch Toothmaster
Germany
Hoch
X-Prime
Hoch X-Prime
Germany
The normalized design would be to split into two tables like the following:
Manufacturer
Manufacturer Country
Forte
Italy
Dent-o-Fresh
USA
Kobayashi
Japan
Hoch
Germany
Manufacturer
Model
Model Full Name
Forte
X-Prime
Forte X-Prime
Forte
Ultraclean
Forte Ultraclean
Dent-o-Fresh
EZbrush
Dent-o-Fresh EZbrush
Kobayashi
ST-60
Kobayashi ST-60
Hoch
Toothmaster
Hoch Toothmaster
Hoch
X-Prime
Hoch X-Prime
3rd normal form : The table is in second normal form and all of its columns are not transitively dependent on the primary key.
A column is said to be dependant on an another column if it can be derived from it, for example, the age can be derived from the birthday. Transitivity means this dependance might involve other columns. for example, if we consider three columns PersonID BodyMassIndex IsOverweight , the column 'IsOverweight' is transitively dependant on 'personID' through 'BodyMassIndex'.
Example
A design that violates the 3rd normal form. {Tournament, Year} is the primary key for the table and the column 'Winner Date of Birth' transitively depends on it.
Tournament
Year
Winner
Winner Date of Birth
Indiana Invitational
1998
Al Fredrickson
21 July 1975
Cleveland Open
1999
Bob Albertson
28 September 1968
Des Moines Masters
1999
Al Fredrickson
21 July 1975
Indiana Invitational
1999
Chip Masterson
14 March 1977
A design compliant with the 3rd normal form would be :