Database Normalization :: Part 2

Reading Time: 6 minutes

Introduction

Normalization helps one attain a good database design and thereby ensures continues efficiency of the database.

Normalization, which is a process for assigning attributes to entities, offers the following advantages:

i) It reduces data redundancies.

ii) It helps eliminate data anomalies.

iii) It produces controlled redundancies to link tables.

There are 7 types of Normal forms:

i) First Normal Form (1NF)

ii) Second Normal Form (2NF)

iii) Third Normal Form (3NF)

iv) Boyce-Codd Normal Form (BCNF)

v) Fourth Normal Form (4NF)

vi) Fifth Normal Form (5NF)

vii) Domain/Key Normal Form (DKNF)

In my previous blog, Database Normalization :: Part 1 I’ve discussed about first four.
In this blog, we will be looking into 4NF, 5NF and DKNF.

Fourth Normal Form (4NF) :-

Multivalued Dependencies (MVD) :- Multivalued dependencies are consequences of the first normal form (1NF), which disallows an attribute in a tuple to have a set of values. If we have two or more multivalued independent attributes in the same relation schema, we get into a problem of having to repeat every value of one of the attributes with every value of the other attribute to keep the relation state consistent and to maintain the independence among the attributes involved. This constraint is specified by a multivalued dependency.

Informally, whenever two F.Ds A -> B and A -> C are mixed in the same relation R(A, B, C) an MVD may arise.

MVD and 4NF :- To deal with the problem of BCNF, R. Fagin introduced the idea of MVDand the 4NF. An MVD is a functional dependency where the dependency may be to a set and not just a single value. It is defined as X ->-> Y in relation R(X, Y, Z) if each X value is associated with a set of Y values in a way that does not depend on the Z values. Here X and Y are both subsets of R. The notation X ->-> Y is used to indicate that a set of attributes of Y shows an MVD on a set of attributes of X.

Thus, informally, MVDs occurs when two or more independent multi-valued facts about the same attribute occur within the same relation. There are two important things to be noted in this definition of MVD. Firstly, in order for a relation to contain an MVD, it must have three or more attributes. Secondly, it is possible to have a table containing two or more attributes which are inter-dependent multivalued facts about another attribute. This does not make the relation an MVD. For a relation to be in MVD, the attributes must be independent of each other.

Functional dependency (FD) concerns itself with the case where one attribute is potentially a ‘single-value fact’ about another. MVD, on the other hand, concerns itself with the case where one attribute value is potentially a ‘multi-valued fact’ about another.

Alike trivial FDs (If X is a candidate key, then X -> Y, Ɐ Y Є R), there are trivial MVDs also. An MVD X ->-> Y in relation R is called a trivial MVD if –

  1. Y is a subset of X
  1. X U Y = R

An MVD that satisfies neither (a) nor (b) is called non-trivial MVD.

4NF:- A relation R is said to be in 4NF if it is in BCNF and for every non-trivial MVD(X ->-> Y) in F+, X is a super key for R. The 4NF is concerned with dependencies between the elements of composite keys composed of three or more attributes.

A relation Employee decomposed into two relations.

  • Closure of a set FDs (F+) :- It is the set of all FDs that can be derived from F. It is denoted by F+.

F={A -> B,B -> C}

F+ = { A -> B, B -> C, A -> C, A -> BC, . . . }

Join Dependency and Fifth Normal Form (5NF) :-

• If R is a schema JD(R1, R2, …, Rn) exists in R if,

ПR1 (r) Join ПR2 (r) Join …. Join ПRn (r) = r

• If we have a JD(R1, R2, …, Rn), then the relation is in 5NF if every join dependency is a consequence of its relation keys.

5NF is also called Project-Join Normal Form (PJNF).

A join dependency (JD) exists in table A if every record in the table can be reconstructed by an SQL JOIN operation that reunites all tables created by its decomposition. This must hold true for all records existing in table A at the time of its decomposition and for any valid record that could have been entered prior to its decomposition. (Records added to the decomposed tables must be able to form a valid record for table A when they are united via the JOIN.) Additionally, no records should be lost and no spurious records should be added.

Example of a table with a Join Dependency

We can say that the table in Figure 6-a has a Join dependency because we can decompose it into smaller tables. (Just because we can decompose a table further doesn’t necessarily mean we should.) For example, let’s say that we wanted to keep sensitive information, such as a vendor’s discount or status, from being accessed by everyone in the office. We could decompose this table into two smaller tables (VendorStatus and VendorInformation, respectively) as Figure 6-b shows.

The result of decomposing the Vendor table.

Because of the Join dependency, we are able to execute the following SQL statement and recreate the original Vendors table. Also, we should not lose any data or gain any bogus records in the process.

SELECT VendorInformation.VendorID, VendName, Discount, Status, VendCity, VendPhoneNumber, VendWebPage
FROM VendorInformation
INNER JOIN VendorStatus
ON VendorInformation.VendorID = VendorStatus.VendorID

This SQL statement will, in fact, recreate the original Vendors table without any problem.

Domain/Key Normal Form (DKNF) :-

A relation R is said to be in DKNF if it fulfils the following requirements –

  1. Each field must be fully and properly defined.
  2. Each field must represent a characteristic of the table’s subject.
  3. Each non-key field in the table must be functionally dependent upon the entire Primary Key.
  4. Each table should represent only a single subject.
  • A table in Domain/Key Normal Form will be free of transitive dependencies, multi-valued dependencies.

Let’s see how we can further Normalize this table.

We can indeed. One of the requirements of the Domain/Key Normal Form is that the Primary Key determines the value of every non-key column in the table. This certainly isn’t the case with the Department field. We will have to remove the “Department” field in order to place the table in Domain/Key Normal Form. Figure 7-b shows the result of your modification.

The Employees table in Domain/Key Normal Form.

Conclusion

Normalization is a formal process for determining which fields belong in which tables in a relational database. Normalization follows a set of rules worked out at the time relational databases were born. A normalized relational database provides several benefits:

i) Elimination of redundant data storage.

ii) Close modeling of real world entities, processes, and their relationships.

iii) Structuring of data so that the model is flexible.

Normalization ensures that you get the benefits relational databases offer. Time spent learning about normalization will begin paying for itself immediately.

Pros and Cons of Normalization

Pros of NormalizingCons of Normalizing
More efficient database structure.We can’t start building the database before
knowing what the user needs.
A better understanding of your data. 
More flexible database structure. 
Easier to maintain database structure. 
Avoid redundant fields. 
Ensure that distinct tables exist when necessary. 

I hope you liked the blog and was able to understand the concept of discussed Normalization forms in details.

References

Books:-

i) An Introduction to Database Systems by Bipin C. Desai
ii) Database Systems by Shio Kumar Singh
iii) Fundaments of Database Systems by Ramez Elmasri and Shamkant B. Navathe
iv) Database System Concepts by Abraham Silberschatz and Henry F. Korth

Online websites:-

i) http://web.calstatela.edu/faculty/pthomas/CIS405A/UnderstandingNormalization. pdf
ii) http://www.island-data.com/downloads/papers/normalization.pdf

 

Written by 

Sarfaraz Hussain is a Big Data fan working as a Software Consultant with an experience of 1+ years. He is working in technologies like Spark, Scala, Java, Hive & Sqoop and has completed his Master of Engineering with specialization in Big Data & Analytics. He loves to teach and is a huge fitness freak and loves to hit the gym when he's not coding.