Wednesday, July 4, 2018

Database Normalization--Part 1


This is the first post on this blog. It starts the topic of Database Normalization. We will be introducing the motivation behind normalization. Subsequent posts will extend the topic and conclude with important GATE questions.

Let us consider the following relational database instance:-

*This particular relation was used as an example by the teacher who taught me this subject and I think it is very nice :P

stid
stname
course
duration
fees





100
X
Java
60
10000
200
Y
Java
60
10000
300
Z
Java
60
10000

Here, we find that the rightmost three attributes have repetitive data. If we analyse further, we find that for a particular course , for example, Java, its duration and fees are fixed at a particular instance of time. So, it is meaningless to represent the duration and fees’ information with every tuple. 




A better approach would have been to just store course information with every tuple and maintain a separate relation (table) for storing information about courses. In this approach, 2 relations are used instead of 1 as shown in the diagrams above. That way, there is a significant conservation of storage space. Here, we find that a Redundancy has occurred. Database Normalization takes care of redundancies.

Database Normalization is the process of splitting relations into smaller relations with the purpose of eliminating redundancies.

Redundancies

In the previous example, we saw the redundancies. It is important to analyze what causes redundancies before moving on to its harmful effects. Redundancies are nothing but repetitions caused due to what are known as Functional Dependencies. In the previous example, the following two functional dependencies exist:- 

course --> duration
course --> fees

Malicious effects of Redundancies

Redundancies cause various harmful effects as follows:-

Wastage of Memory

We have already seen this in the above example.

Anomalies (Insertion, Deletion, Updation)

Let us consider updation anomalies. Suppose the fees for the Java course has increased to Rs. 12,000.

If we don't use normalization, we will have to update the new fees in all of the tuples which have Java as course. If by chance we miss at least 1 tuple, there will be an anomaly (a mistake). This is called an updation anomaly.

To prevent these problems from occurring, we go for database normalization.

Functional Dependencies

·        A functional dependency is of the form XàY where X is called the determinant of the functional dependency and Y is called the dependent of the functional dependency.


If the value of determinant attribute(s) gets duplicated in the relation, then the value of dependent attribute(s) also shall get duplicated accordingly.

·         If the value of the determinant attribute(s) is unique, the dependent attribute(s) can have any value (there is no restriction on the values that can be taken by the dependent attribute(s) in that case). This particular point is very important and there have been previous GATE questions asked exclusively on this point which we will see later

Let us consider the relation instance shown below:-


Animal
Height_in_feet


Cat
30
Dog
40
Cat
30
Human
2
Cat
30

The functional dependency Animalà Height_in_feet is sure to be valid for this relation instance as whenever the Animal attribute has been repeated, the corresponding value for the Height_in_feet attribute is also same. (Please note that these values are extremely ficticious :P ). We cannot say if this Functional Dependency is valid for the entire relation as only a part of it has been shown.

But if the height value in the coloured tuple were to be changed to be 31, we could have clearly stated that this functional dependency does not hold for this relation.

Trivial dependencies

A functional dependency Xà Y is said to be a trivial dependency iff X is a superkey of Y.

Axioms of Functional Dependencies

·         Reflexivity: Trivial dependencies always hold.
·         Augmentation: If X,Y and Z are sets of attributes and XàY holds, then XZàYZ also holds.
·         Transitivity: If X,Y and Z are sets of attributes and XàY and YàZ hold, then XàZ also holds.

The above three rules are known as Armstrong’s axioms. Using these, we can derive some secondary rules as given below:-

·         Decomposition: If XàYZ holds, then XàY and XàZ both hold individually.
·         Composition: If XàY and AàB hold, then XAàYB holds.
·         Union: If XàY and XàZ hold, then XàYZ holds.


In the next part, we will see how to find key attributes and continue with the types of functional dependencies and beyond. Was it a nice read? Cheers!

Click here to read Part 2

No comments:

Post a Comment