This is the first post on this blog. It starts the topic of Database Normalization. We will be introducing the motivation behind normalization. Subsequent posts will extend the topic and conclude with important GATE questions.
Let us consider the following relational database instance:-
*This particular relation was used as an example by the teacher who taught me this subject and I think it is very nice :P
stid
|
stname
|
course
|
duration
|
fees
|
100
|
X
|
Java
|
60
|
10000
|
200
|
Y
|
Java
|
60
|
10000
|
300
|
Z
|
Java
|
60
|
10000
|
Here, we find that the rightmost three attributes have repetitive data. If we analyse further, we find that for a particular course , for example, Java, its duration and fees are fixed at a particular instance of time. So, it is meaningless to represent the duration and fees’ information with every tuple.
A
better approach would have been to just store course information with every
tuple and maintain a separate relation (table) for storing information about
courses. In this approach, 2 relations are used instead of 1 as shown in the diagrams above. That way, there is a significant conservation of storage space. Here,
we find that a Redundancy has occurred. Database Normalization takes care of
redundancies.
Database Normalization is the process of splitting relations into smaller relations with the purpose of eliminating redundancies.
Redundancies
In the previous example, we saw the redundancies. It is
important to analyze what causes redundancies before moving on to its harmful
effects. Redundancies are nothing but repetitions caused due to what are known
as Functional Dependencies. In the previous example, the following two
functional dependencies exist:-
course --> duration
course --> fees
course --> fees
Malicious
effects of Redundancies
Redundancies cause various harmful effects as follows:-
Wastage of Memory
We have already seen this in the above example.
Anomalies (Insertion, Deletion, Updation)
Let us consider updation anomalies. Suppose the fees for the Java course has increased to Rs. 12,000.
If we don't use normalization, we will have to update the new fees in all of the tuples which have Java as course. If by chance we miss at least 1 tuple, there will be an anomaly (a mistake). This is called an updation anomaly.
To prevent these problems from occurring, we go for database
normalization.
Functional Dependencies
· A functional dependency is of the form XàY where X is called the
determinant of the functional dependency and Y is called the dependent of the
functional dependency.
If the value of determinant attribute(s) gets
duplicated in the relation, then the value of dependent attribute(s) also shall
get duplicated accordingly.
·
If the value of the determinant attribute(s) is
unique, the dependent attribute(s) can have any value (there is no restriction
on the values that can be taken by the dependent attribute(s) in that case). This particular point is very important and there have been previous GATE questions asked exclusively on this point which we will see later
Let us consider the relation instance shown
below:-
Animal
|
Height_in_feet
|
Cat
|
30
|
Dog
|
40
|
Cat
|
30
|
Human
|
2
|
Cat
|
30
|
The functional dependency Animalà Height_in_feet is sure to be valid for this relation instance as whenever the Animal attribute has been repeated, the corresponding value for the Height_in_feet attribute is also same. (Please note that these values are extremely ficticious :P ). We cannot say if this Functional Dependency is valid for the entire relation as only a part of it has been shown.
But if the height value in the coloured tuple were to be changed to be 31, we could have clearly stated that this functional dependency does not hold for this relation.
Trivial
dependencies
A functional dependency Xà
Y is said to be a trivial dependency iff X is a superkey of Y.
Axioms of
Functional Dependencies
·
Reflexivity: Trivial dependencies always hold.
·
Augmentation: If X,Y and Z are sets
of attributes and XàY
holds, then XZàYZ
also holds.
·
Transitivity: If X,Y and Z are sets
of attributes and XàY
and YàZ
hold, then XàZ
also holds.
The above three rules are known as Armstrong’s axioms. Using
these, we can derive some secondary rules as given below:-
·
Decomposition: If XàYZ holds, then XàY and XàZ both hold individually.
·
Composition: If XàY and AàB hold, then XAàYB holds.
·
Union: If XàY and XàZ hold, then XàYZ holds.
In the next part, we will see how to find key attributes and continue with the types of functional dependencies and beyond. Was it a nice read? Cheers!
Click here to read Part 2
Click here to read Part 2
No comments:
Post a Comment