GATE Computer Science Ideas: Normalization

Showing posts with label Normalization. Show all posts

Wednesday, July 4, 2018

Database Normalization--Part 4 (Normal Forms)

Till now, we have covered up to the types of functional dependencies. Before reading this post, please read the following posts on Normalization:-

In this post, we look at some important definitions and move up to the hierarchy of normal forms. Also, we show how to state the highest normal form of a relation.

**This post is entirely theory and may be boring. It requires multiple readings.

Prime Attribute

· Any attribute which is part of at least one candidate key is called a prime attribute.

· If AF is a candidate key, then A and F are prime attributes.

Non-prime attribute

· An attribute in a relation which is not part of any candidate key is called a non-prime attribute.

· If in a relation R(A,B,C,D,E) , AD is the only candidate key, then B,C and E are non-prime attributes.

Normal Forms

· It is a property of relations which indicates the amount of redundancy in the relation.

· Normal form and degree of redundancy are inversely proportional.

· Normalization is a systematic approach applied on relations to reduce the degree of redundancy and achieve progressively higher normal forms.

Normalization Procedure

1) Determine the highest possible normal form of the given relation.

2) Decompose the relation from its existing normal form to higher normal forms.

Procedure to find highest Normal Form of the given relation

1) Find all possible Candidate Keys of the given relation.

2) Identify all the existing prime and non-prime attributes.

3) Identify all the existing Full Dependencies, Partial Dependencies, Transitive Dependencies and Overlapping Candidate Key Dependencies.

4) Refer to the definition and hierarchy of Normal Forms for finding the highest possible Normal Form of the given relation.

Hierarchy of Normal Forms

Normalization is a hierarchical procedure and each stage is designated by a particular normal form. The hierarchy of normal forms has been shown in the diagram given below:-

First Normal Form (1NF)

· A relation is said to be in 1NF if all the values in the relation are atomic and single-valued.

· According to Codd’s rules of Relational Database Management Systems, every relation will always be in 1NF by default.

· The relation instance given below does not satisfy 1NF criterion (it is not in 1NF) as Email is not a single-valued attribute.

Name	Class	Email

Ravi	1	ravi@gmail.com ravi@yahoo.com
Rahul	2	rahul@gmail.com
Rakesh	4	rakesh@hotmail.com

· The above relation instance can be converted to 1NF as shown below:-

Name	Class	Email

Ravi	1	ravi@gmail.com
Ravi	1	ravi@yahoo.com
Rahul	2	rahul@gmail.com
Rakesh	4	rakesh@hotmail.com

Second Normal Form (2NF)

A relation is said to be in 2NF if:-

· It is in 1NF.

· It has no partial dependencies.

Third Normal Form (3NF)

A relation is said to be in 3NF if:-

· it is in 2NF.

· It has no transitive dependencies.

Boyce Codd Normal Form (BCNF)

A relation is said to be in BCNF if:-

· it is in 3NF and has no overlapping candidate key dependencies.

The hierarchy of Normal Forms can be represented by the above diagram. So, if a relation is in BCNF, it has to be in 3NF already but the reverse may not be true. Similar statments can be made for the other normal forms as well.

In this post, we have seen the hierarchy of Normal Forms. In the next post, we will look at examples on Normal Forms.

Was it a nice read? Please subscribe to the blog by clicking on the 'Follow' button on the right hand side of this page. You can also subscribe by email by entering your email id in the box provided. In that case, you will be updated via email as and when new posts are published. Cheers!

Click here to read Part 5

Database Normalization--Part 3 (Types of Functional Dependencies)

Till now, we have covered up to the Closure Set of Attributes method of finding candidate keys. Before reading this post, please read the following posts on Normalization:-

In this post, we will look at the types of Functional Dependencies which is very important for understanding Database Normalization. Here, we look at 4 types of Functional Dependencies:-

Full Dependencies
Partial Dependencies
Transitive Dependencies
Overlapping Candidate Key Dependencies

****For GATE, it is very important to know and remember the exact criterion of each of these dependencies. Almost every year, at least 1 direct question comes from this topic.

Full Dependencies

· A dependency is said to be full if the determinant of the dependency is a superkey.

· Full dependencies can be represented by the diagram shown above:-

The amount of redundancy caused by each full dependency is always 0, hence normalization procedure will not try eliminating Full Dependencies. An example of a full dependency has also been shown above. The candidate key has been assumed.Partial Dependencies

· A dependency is said to be partial if non-prime attributes are depending partially on a candidate key. In other words, if there is a functional dependency in which part of a candidate key is determining non prime attribute(s), then it is called a partial dependency.

· Partial dependencies can cause redundancy in the relations and hence they will be eliminated in the normalization procedure.

· If there exists a relation that has simple candidate keys only(single attribute candidate keys), then there exists no partial dependencies.

· Partial dependencies can be represented by the diagram shown above. An example for the same has also been shown.

Transitive Dependencies

· A dependency is said to be transitive if a non-prime attribute(s) are depending on other non-prime attributes or on a combination of non-prime attributes and proper subject of candidate keys. In other words, the dependency is transitive if one or more non-prime attributes are depending on a superkey transitively but not directly.

· Transitive dependencies can cause redundancy hence normalization process tries to eliminate them.

· If all attributes in a relation are prime, the number of partial dependencies and the number of transitive dependencies are both zero. (Why? Think!)

· Transitive dependencies can be represented by the diagram shown above. An example for the same has also been shown.

Overlapping Candidate Key Dependencies

· If in a dependency XàY, X is not a superkey and Y is the proper subset of a candidate key, then the dependency is called an Overlapping Candidate Key dependency.

· Overlapping Candidate Key Dependencies can be represented by the diagram shown above. An example for the same has also been shown.

In this post, we have seen the types of Functional Dependencies. In the next post, we will look at a few definitions and develop the concept of Normal Forms.

Was it a nice read? Please subscribe to the blog by clicking on the 'Follow' button on the right hand side of this page. You can also subscribe by email by entering your email id in the box provided. In that case, you will be updated via email as and when new posts are published. Cheers!

Click here to read Part 4

Database Normalization--Part 2 (Closure Set of Attributes)

Till now, we have covered up to the Closure Set of Attributes method of finding candidate keys. Before reading this post, please read the following post on Normalization:-

Part 1: Introduction to Database Normalization

In this post, we will see how to find the candidate keys of a relation using the Closure Set of Attributes Method. We must note that candidate keys can also be found out using the axioms of Functional Dependencies mentioned. But during the GATE examination, an attempt to save time is a must.

Closure Set of Attributes Method

This method is very useful in finding out the keys of a relation which is needed for Normalization procedure. Let us go through a couple of examples to understand the procedure.

Suppose we consider the following relation R with 4 attributes (A,B,C,D) and are also given a functional dependency set (Functional dependencies are arrived at during the Requirements Analysis Phase, when the problem is being analyzed and client specifications are being noted).

R(A,B,C,D) = { AàB , BàC , CàD , DàA } -----------> given FD set

So, here we have 4 functional dependencies. Now, we will find out the entire attribute set which each attribute derives. It is a repetitive approach.

(A)⁺ = { A,B,C,D}

Closure set of A contains A by default as AàA always holds.
As we have the functional dependency AàB, so the closure set of A will contain B (A derives B)

Closure set of A contains B and B->C therefore using transitivity axiom, we can say that AàC, so the closure set of A will contain C.
Similarly, we arrive at the following conclusions:-

· (B)⁺={A,B,C,D}

· (C)⁺={A,B,C,D}

· (D)⁺={A,B,C,D}

In this example, we find that each of the attributes A,B,C and D are the candidate keys as each of them individually derives all the attributes of this relation. A candidate key is a minimal superkey. There cannot exist any other candidate key for this relation because any superset (other than itself) of a candidate key is a superkey but not a candidate key. Please read the previous sentence again if you find the language confusing. Let us take another example.

Let us consider another relation R(A,B,C,D,E,F,G,H) along with its functional dependency set as
{ABàCD,DàE,EàF,BàA,FàG,GàH}.

We are required to find out the Candidate Keys for this

relation R.

In the 1st step, we find the closure sets of the single attributes A,B,C and D to obtain:-

· (A)⁺={A}

· (B)⁺={A,B,C,D,E,F,G,H}

· (C)⁺={C}

· (D)⁺={D,E,F,G,H}

Now, at the conclusion of this 1st step, we are sure that B is a candidate key as it derives all the attributes. Also, we can conclude at this point that if any other candidate key exists for this relation, it will not have B in it (as any superset of a candidate key is a superkey but not a candidate key).

Now, we must find the closure sets of AC, CD, AD (2-attribute closures). We note here that

(AC)⁺ denotes the set that A and C can derive together, repetitively. Here,

(AC)⁺ ={A,C}
(AD)⁺={A,D,E,F,G,H}
(CD)⁺={C,D,E,F,G,H}

So none of the 2-attribute sets are candidate keys. But we still should check for the 3-attribute set
ACD. Note that B shouldn't be part of this check as it itself is a candidate key and cannot be part

of any superset of itself.

(ACD)⁺={A,C,D,E,F,G,H}

So, ACD is not a candidate key as well as it cannot derive the attribute B.

Important Note

If we look carefully at the Functional Dependencies, we will find that B is not present in Right Hand Side of any of the Functional Dependencies. This indicates that B should be a part of any candidate key (otherwise B won't be derived).

Now, if B is a part of any candidate key of this relation, there cannot be any other candidate key as B is a single attribute candidate key. So, we could have directly stated that B is the only candidate key for this relation which would have saved a lot of time. This trick comes in very handy in GATE. Any attrribute which is not present in Right Hand Side of any of the Functional Dependencies of a given relation, has to be part of any candidate key of that relation.

At this point, we know how to find the candidate keys of any given relation provided we know the Functional Dependencies which exist for that relation. In the next post, we will look at the types of functional Dependencies that exist and subsequently move on to Normal Forms.

Database Normalization--Part 1

This is the first post on this blog. It starts the topic of Database Normalization. We will be introducing the motivation behind normalization. Subsequent posts will extend the topic and conclude with important GATE questions.

Let us consider the following relational database instance:-

*This particular relation was used as an example by the teacher who taught me this subject and I think it is very nice :P

stid	stname	course	duration	fees

100	X	Java	60	10000
200	Y	Java	60	10000
300	Z	Java	60	10000

Here, we find that the rightmost three attributes have repetitive data. If we analyse further, we find that for a particular course , for example, Java, its duration and fees are fixed at a particular instance of time. So, it is meaningless to represent the duration and fees’ information with every tuple.

A better approach would have been to just store course information with every tuple and maintain a separate relation (table) for storing information about courses. In this approach, 2 relations are used instead of 1 as shown in the diagrams above. That way, there is a significant conservation of storage space. Here, we find that a Redundancy has occurred. Database Normalization takes care of redundancies.

Database Normalization is the process of splitting relations into smaller relations with the purpose of eliminating redundancies.

Redundancies

In the previous example, we saw the redundancies. It is important to analyze what causes redundancies before moving on to its harmful effects. Redundancies are nothing but repetitions caused due to what are known as Functional Dependencies. In the previous example, the following two functional dependencies exist:-

course --> duration
course --> fees

Malicious effects of Redundancies

Redundancies cause various harmful effects as follows:-

Wastage of Memory

We have already seen this in the above example.

Anomalies (Insertion, Deletion, Updation)

Let us consider updation anomalies. Suppose the fees for the Java course has increased to Rs. 12,000.

If we don't use normalization, we will have to update the new fees in all of the tuples which have Java as course. If by chance we miss at least 1 tuple, there will be an anomaly (a mistake). This is called an updation anomaly.

To prevent these problems from occurring, we go for database normalization.

Functional Dependencies

· A functional dependency is of the form XàY where X is called the determinant of the functional dependency and Y is called the dependent of the functional dependency.

If the value of determinant attribute(s) gets duplicated in the relation, then the value of dependent attribute(s) also shall get duplicated accordingly.

· If the value of the determinant attribute(s) is unique, the dependent attribute(s) can have any value (there is no restriction on the values that can be taken by the dependent attribute(s) in that case). This particular point is very important and there have been previous GATE questions asked exclusively on this point which we will see later

Let us consider the relation instance shown below:-

Animal	Height_in_feet

Cat	30
Dog	40
Cat	30
Human	2
Cat	30

The functional dependency Animalà Height_in_feet is sure to be valid for this relation instance as whenever the Animal attribute has been repeated, the corresponding value for the Height_in_feet attribute is also same. (Please note that these values are extremely ficticious :P ). We cannot say if this Functional Dependency is valid for the entire relation as only a part of it has been shown.

But if the height value in the coloured tuple were to be changed to be 31, we could have clearly stated that this functional dependency does not hold for this relation.

Trivial dependencies

A functional dependency Xà Y is said to be a trivial dependency iff X is a superkey of Y.

Axioms of Functional Dependencies

· Reflexivity: Trivial dependencies always hold.

· Augmentation: If X,Y and Z are sets of attributes and XàY holds, then XZàYZ also holds.

· Transitivity: If X,Y and Z are sets of attributes and XàY and YàZ hold, then XàZ also holds.

The above three rules are known as Armstrong’s axioms. Using these, we can derive some secondary rules as given below:-

· Decomposition: If XàYZ holds, then XàY and XàZ both hold individually.

· Composition: If XàY and AàB hold, then XAàYB holds.

· Union: If XàY and XàZ hold, then XàYZ holds.

In the next part, we will see how to find key attributes and continue with the types of functional dependencies and beyond. Was it a nice read? Cheers!

Click here to read Part 2