Database Normalization :: Part 1

Reading Time: 6 minutes


Introduction

Normalization helps one attain a good database design and thereby ensures continues efficiency of the database.

Normalization, which is a process for assigning attributes to entities, offers the following advantages:

i) It reduces data redundancies.

ii) It helps eliminate data anomalies.

iii) It produces controlled redundancies to link tables.

There are 7 types of Normal forms:

i) First Normal Form (1NF)

ii) Second Normal Form (2NF)

iii) Third Normal Form (3NF)

iv) Boyce-Codd Normal Form (BCNF)

v) Fourth Normal Form (4NF)

vi) Fifth Normal Form (5NF)

vii) Domain/Key Normal Form (DKNF)

In this blog, we will be looking into the first four only, rest I’ll be covering in Part 2 of Database Normalization.

First Normal Form (1NF) :-

A relation R s in First Normal Form (1NF) if and only if all underlying domains of the relation contain atomic (indivisible) values.

This definition of 1NF has been expressed in the following two forms:

1) In every tuple of the relation R, no attribute should have repeating groups.

2) In every tuple of the relation R, each attribute must have a value and that too an atomic (indivisible) value.

To bring an unnormalized relation into 1NF, we need to perform the following:

i) Remove all repeating groups from the relation.

ii) Decompose non-atomic attributes to atomic (indivisible) attributes

For example, consider the conversion of following unnormalized relations into 1NF:

a)

b)

=>

1NF Summarized:

i) All key attributes defined.

ii) No repeating groups in table.

iii) All attributes dependent on primary key.

Second Normal Form (2NF) :-

• Functional dependence is a relationship that exists between any two fields.

• Given a relation R, attribute Y of R is functionally dependent on attribute X of R if and only if each X-value in R has associated with it precisely one Y-value in R (at any one time).
  X -> Y

A relation R

J -> K (K is functionally dependent on J)

In relation R, K is functionally dependent upon J since for each value of J, there is only one value of K associated with (for X in J, K stores 1; for Y in J, K stores 4; for Z in J, K stores 3; i.e., precisely one value of K exists for each value of J).

On the other hand, L is not functionally dependent on J as it is not possible to uniquely determine the value of L corresponding to a given value of J. (For X in J, L stores 0 and 6 (two values), for Y in J, L stores 1 and 9 (two values)).

• Partial Dependency refers to the dependence of a non-key attribute on the portion of the composite-primary key and not the whole composite-primary key.

A relation R is in Second Normal form (2NF) if and only if it is 1NF and every non-key attribute is fully dependent on the primary key.

1) R (name, course, grade, phNo, major, dept)

F.D’s = { name -> phNo,
course -> dept,
name, course -> grade, major }

So, P.K = (name, course)

(phNo & dept must be separated from R because name & course are acting as partial key for them, i.e., phNo & dept are partially dependent on the composite-primary key of R)

=>

R1 (name, course, grade, major)

( P.K = (name, course)
grade, major being the non-key attributes for R1 are fully dependent on the primary key)

R2 (name, phNo)

( P.K = name
phNo being the non-key attribute for R2 is fully dependent on the primary key)

R3 (course, dept)

( P.K = course
dept being the non-key attribute for R3 is fully dependent on the primary key)

2) Emp_Proj (EmpNo, ProjNo, Hours, EmpName, ProjLoc)

F.D’s = { EmpNo, ProjNo -> Hours,
EmpNo -> EmpName,
ProjNo -> ProjLoc }

So, P.K = (EmpNo, ProjNo)

=>

EmpProjHours (EmpNo, ProjNo, Hours)
Emps (EmpNo, EmpName)
ProjLocation (ProjNo, ProjLoc)

2NF Summarized:

i) In 1NF

ii) Includes no partial dependencies

iii) Still possible to exhibit transitive dependencies.

Third Normal Form (3NF) :-

The elimination of transitive dependence from a 2NF relation leads to the relation in 3NF.

• Transitive dependency means a non-key attribute (say X) is dependent upon another attribute (say Y) which in turn is dependent upon primary key (say Z).

A relation R is said to be in Third Normal Form (3NF) if and only if it is in 2NF and every non-key attribute is non-transitively dependent upon the primary key.

To convert a 2NF relation into 3NF:

    ◦ Eliminate any transitive dependence of non-key attributes on primary key (or composite keys) i.e., remove dependency of a non-key attribute on other non-key attributes.
1) R (Empname, Empno, dob, Address, Deptno, Deptname,DeptMgr)

F.D’s = { EmpNo -> EmpName, DOB, Address, DeptNo,
DeptNo -> DeptName, DeptMgr }

So, P.K = (EmpNo)

(DeptName, DeptMgr being non-key attributes of R are transitively dependent on the primary key)

=>
Converted into two 3NF relations-

R1 (EmpNo, EmpName, DOB, Address, DeptNo)

( P.K = EmpNo
F.D’s = { EmpNo -> EmpName, DOB, Address, DeptNo }
Every non-key attribute of R1 is directly dependent (non-transitively dependent) on the primary key. )

R2 (DeptNo, DeptName, DeptMgr)

( P.K = DeptNo
F.D’s = { DeptNo -> DeptName, DeptMgr }
Every non-key attribute of R2 is non-transitively dependent (directly dependent) on the primary key. )

3NF Summarized:

i) In 2NF

ii) Contains no transitive dependencies.

Boyce-Codd Normal Form (BCNF) :-

A relation is in BCNF if it is in 3NF and all of its determinants (i.e., the attributes upon which other attributes depend) are candidate key.

To convert a 3NF into BCNF, decompose such that every determinant becomes a candidate key.

R (StudID, TeachNo, ClassCode, Grade)

F.D’s = { StudID -> ClassCode, Grade
TeachNo -> ClassCode }

So, P.K = (StudID, TeachNo), which is a candidate key.

(i.e., StudID, TeachNo -> ClassCode, Grade)

But, here “ClassCode” can also be thought of as a determinant.

ClassCode -> TeachNo

So, F.D’s = { StudID -> ClassCode, Grade
ClassCode -> TeachNo }


Now, P.K = (StudID)

(But according to the definition, it must be a candidate key)

So,

R1 (StudID, ClassCode, Grade)

P.K = (StudID, ClassCode), which is a candidate key.

R2 (ClassCode, TeachNo)

P.K = (ClassCode, TeachNo), which is a candidate key.

• For relation R, the table is in 3NF since all non-key attributes “ClassCode” and “Grade” are non-transitively (directly) dependent upon the primary key (StudID, TeachNo).

But this is not in BCNF because “ClassCode” is also a determinant here, since “TeachNo” depends upon “ClassCode” also and it is not a candidate key.

BCNF Summarized:

i) In 3NF

ii) Every determinant in the table is a candidate key.

Problems with BCNF:

The potential to violate BCNF may occur in a relation that:

i) contains two or more composite candidate keys.

ii) the candidate keys overlap, that have at least one attribute in common.

Conclusion

Normalization is a formal process for determining which fields belong in which tables in a relational database. Normalization follows a set of rules worked out at the time relational databases were born. A normalized relational database provides several benefits:

i) Elimination of redundant data storage.

ii) Close modeling of real world entities, processes, and their relationships.

iii) Structuring of data so that the model is flexible.

Normalization ensures that you get the benefits relational databases offer. Time spent learning about normalization will begin paying for itself immediately.

Pros and Cons of Normalization

Pros of NormalizingCons of Normalizing
More efficient database structure.We can’t start building the database before
knowing what the user needs.
A better understanding of your data. 
More flexible database structure. 
Easier to maintain database structure. 
Avoid redundant fields. 
Ensure that distinct tables exist when necessary. 

I hope you liked the blog and was able to understand the concept of discussed Normalization forms in details.

References

Books:-

i) An Introduction to Database Systems by Bipin C. Desai
ii) Database Systems by Shio Kumar Singh
iii) Fundaments of Database Systems by Ramez Elmasri and Shamkant B. Navathe
iv) Database System Concepts by Abraham Silberschatz and Henry F. Korth

Online websites:-

i) http://web.calstatela.edu/faculty/pthomas/CIS405A/UnderstandingNormalization. pdf
ii) http://www.island-data.com/downloads/papers/normalization.pdf

Written by 

Sarfaraz Hussain is a Big Data fan working as a Software Consultant with an experience of 1+ years. He is working in technologies like Spark, Scala, Java, Hive & Sqoop and has completed his Master of Engineering with specialization in Big Data & Analytics. He loves to teach and is a huge fitness freak and loves to hit the gym when he's not coding.