Friday, December 21, 2018

Process Synchronization: Question 2 (2017)


Solution

Here, we are given that all locks are non-reentrant, meaning that if a thread holds a lock A, it has to release lock A before it can acquire lock A again. If a thread tries to acquire a lock when it is not available, the thread blocks.

We are asked to mind the minimum value of x and y such that deadlock may occur. Here x denotes the number of threads and y denotes the number of locks used by the multi-threaded program P. In this problem, we have to think of scenarios in the program code through which deadlock may occur. Let us start from the minimum possible combination x=1 and y=1 (option D). Starting from this option helps us in a way. If we find a situation with x=1 and y=1 such that deadlock occurs, we need not check the other 3 options. We can immediately claim that option D is our answer.

If the program P executes with only 1 thread T and 1 lock A, let's see what can happen. Using the property of non-reentrant locks as given in the problem, we find that if the thread T tries to acquire lock A twice without releasing lock A in between, deadlock will occur as T will block. So, option D is the right answer.

Answer: Option D

Process Synchronization: Question 1 (2018)

Solutions to these questions are available on multiple websites across the Internet but during the examination, it is necessary to solve the questions in very quick time. Most questions on this topic are actually easy and can be solved quickly. In this series of posts, I will try to show how I would have solved the question during the exam (or how I solved it while practising).



Solution

We have to find a suitable combination of P, Q, R and S so that this is a solution to the Producer-Consumer problem. In the Producer-Consumer problem, we have a shared buffer (shared between Producer and Consumer). The Producer produces an item and places it in the buffer. At this point, buffer size reduces by 1. The consumer retrieves the item from the buffer. At this point, the buffer size increases by 1.

So, after producing an item, the producer should indicate so. How? It can invoke signal() on a semaphore indicating the number of items currently in the buffer. The value of this semaphore will increase by 1, indicating that the number of items in the buffer has increased by 1. Initially, the semaphore empty is 0, indicating that the buffer is currently empty. So, Q has to be signal(empty). So, the answer can be only between options B and C.

Let us consider option B. Here, Producer process deals with semaphore empty and Consumer process deals with semaphore full. There is no sharing. Clearly, this cannot be the answer. Option C is the solution. The semaphore full basically denotes the number of vacant spaces in the buffer. Initially, full has value N. When producer process wishes to create an item, it calls wait(full) and full reduces by 1. It then calls signal(empty) to indicate that the number of occupants in the buffer has now increased by 1.

The solution might seem long, but during the exam, collecting these thoughts together will not take more than 1 minute.

Answer: Option C

Click here to visit Question 2

Saturday, December 15, 2018

Process Synchronization--Part 6 (Semaphores-1)

In this post, we will look at a software solution to the n-process synchronization problem, semaphores.

A semaphore is an integer variable which can be accessed only through 2 atomic operations:-
  1. wait()
  2. signal()
So, if S is a semaphore variable, the only operations and the only 2 ways to access S is:-

  1. wait(S)
  2. signal(S)
The wait and signal operations on a semaphore variable S can be defined as follows:-


We should note that these operations will be called by processes. Also, S is a shared variable. So, all the n processes may execute statements like wait(S) and signal(S). 

Let us assume that S has been initialized with 1. Now, if a process invokes wait(S), S will become 0. Now, if another process invokes wait(S), it will start waiting in the infinite loop. When some process invokes signal(S), S will become 1 again, only then can an interested process which was waiting in the loop, proceed with its code segment after wait(S). 

**wait(S) and signal(S) need not be executed by the same process. For example, if Process P0 executes wait(S) when S is initially 1, S becomes 0. Now, if process P1 executes wait(S), it starts waiting in the infinite loop. If P0 now executes signal(S), S becomes 1 and P1 can now make S=0 and proceed with its code. But this signal(S) could have been done by some process other than P0 as well. This is a very important point which we must remember. So, from the above definition, we find that S can have a minimum value of 0 and its maximum value depends on how many processes perform signal(S).

Let us say we want to solve the n-process critical section problem using semaphores. So, as we have read earlier, any solution to the critical section problem must ensure that only 1 process is allowed to execute its critical section at any point of time.

To solve the critical section problem with n processes, code structure of any process Pi looks like this:-


We should note that for a software solution to work properly, the code must be correct, this should be kept in mind. The above code works as follows:-

  • Initially S should be 1. This, in fact, is known as a Binary Semaphore as it can hold only 2 values, 0 and 1.
  • Each process invokes wait(S) before accessing the critical section.
  • Each process invokes signal(S) after accessing the critical section.
  • A process gets the chance to enter its critical section only if S = 1, which means no other process is inside its critical section, as otherwise S would have been 0.
  • When a process exits its critical section, it invokes signal(S) to make S = 1 so that some other process can enter its critical section (which process gets a chance is a matter of Scheduling and we will come back to it after some time).
  • Mutual Exclusion holds as a process can enter only if S = 1. Once it enters, S becomes 0 and now no other process can enter as S is not equal to 1.
  • An inherent assumption made in the above claim is that S = S + 1 and S = S - 1 are atomic statements. Also, there is no interruption when while(S<=0); statement is false and S = S - 1 is executed.
  • Progress holds in above code. It is obvious.
  • Bounded waiting may not hold. When a process P0 is inside its critical section, it has performed wait(S) and set S to 0. Until P0 sets S to 1 by signal(S) [acc, to above code] all the processes invoking wait(S) in the meantime will be blocked on this semaphore the details of which will have to be stored somehow. Bounded Waiting will depend on the Scheduling policy which we will get back to after some time.
So far, we have seen the concept of Semaphores, the concept of wait() and signal() operations defined on semaphore variables and a solution to the n-process critical section problem using semaphores. Now, we will see the actual implementation of Semaphores.

Implementation of Semaphores

We have stated that semaphores are integer variables with 2 operations defined on them. We have also seen that information about the processes blocked on a semaphore has to be stored along with the integer value so that they can resume operation once a process invokes signal(S).

So, we can define a semaphore has a structure (in C) or class (in Java) which has an integer data component and a pointer variable to construct a Linked List which will hold the processes which are blocked (after having invoked wait(S) on this particular semaphore. A possible implementation has been shown below and I have taken this diagram from the Operating System Course Notes of Dr. John T.Bell, Dept. of Computer Science, University of Illinois, Chicago.


Each semaphore has an integer value and also a pointer to construct a linked list which will hold processes which have invoked wait(S) when some process is already inside its critical section by having invoked wait(S) and setting it to 0.

In this implementation, the minimum value of S is not bounded. As and when new processes block, S value gets more and more negative. The last blocked process gets over when S increases from -1 to 0. At this point, only 1 process is inside its critical section using this semaphore and no process is blocked on this semaphore. Next time, when the current working process invokes signal(S), S comes back to 1 and everything is fine!

Phew! It was a bit of a roller-coaster ride I am sure :P 

But there is a problem. The line *remove a process P from S->list does not state in which order the processes are to be removed from the blocked list of a semaphore. If LIFO policy is followed, definitely starvation will occur. In this case, Bounded Waiting may not hold as well.

Deadlock Problem with Semaphores


R and S are semaphores. If process A executes wait(S), pre-empts, then process B executes wait(R), now both processes are in deadlock as neither A nor B can proceed as both R and S semaphores are already 'occupied'. This is an important problem which can occur with semaphore code.

We have seen binary semaphores and their applications. In the next post, we will see 2 other applications of semaphores.

Monday, August 27, 2018

Process Synchronization--Part 5 (Test, Set and Lock)

**The entire code and discussion ideas for this post have been taken from Galvin. The slides of this book helped me immensely during GATE preparation and I think they are really great, so are the ones of the book by Korth for DBMS. Whenever I happen to remember some other ones, I will jot them down in some un-related posts :P


Till now, we have discussed the following posts on Process Synchronization. Before reading this post, please read the following post on Process Synchronization:-


So far, we have seen a few software solutions to the 2-process synchronization problem. Now, we have more than 2 processes, Let us assume that we have n processes. Before designing software solutions for this situation, let us diverge a bit.

Till now, in the software solutions we have seen, before entering its critical section, each process acquires some kind of a lock on the shared data item, enters the critical section, and after it has finished, releases the lock on that data item.

But what if we have a situation where a process is not allowed to interrupt while it executes so that there is no question of mutual exclusion being violated? For one, this approach of disabling interrupts will be problematic on multi-processor environments because disabling interrupts on 1 processor won't suffice because a thread on another processor may violate mutual exclusion. So, we have to disable and re-enable interrupts on all processors which is difficult. Also, there will be issues with timing if we disable the clock interrupt.

What if the hardware supports atomic instructions? Operations which give a guarantee that there will be no interrupt while they are executing? One such operation is known as "Test, Set and Lock", abbreviated as TSL.

TSL can be implemented as a function which returns a boolean value. Let us look at its code first.


Before analyzing what is happening inside this piece of code, let us see how to implement Mutual Exclusion using TSL.


So, let us see how to implement Mutual Exlcusion using TSL.


So, when a process wants to enter the critical section, it will execute the code given just above. It is now time to explain what exactly is meant by the fact that TSL is atomic. It means that from the time the TSL is function is invoked by a process, and until the return value of the TSL function is received by the process inside its while loop, the process cannot be pre-empted.

Analysis


  • Lock is a boolean variable which can hold either true or false.
  • In the funtion TSL, the address of the lock variable is being passed which is accepted inside the formal parameter of TSL function in a variable ptr which is a pointer to the variable lock.
  • If there is some other process currently executing inside its critical section, the value of lock will be already true.
  • In the TSL function, the current value of the variable lock is saved in a local variable and returned to the calling process. Before returning, the value of lock is set to true by its pointer variable. This is a win-win situation. If there is no process currently within the critical section and this process is getting chance to enter, it claims its chance by setting lock to true so that no other process can enter their critical sections. If on the other hand, there is already some other process inside the critical section, then setting lock to true won't do any harm as its already true.
  • If lock was false at the time TSL was called which means there was no other process inside the critical section, then this process will get chance to enter its critical section by exiting the waiting while loop.
  • After exiting its critical section, this process sets lock to false so that other (waiting) processes can now enter.
  • The above code implements mutual exclusion and satisfies progress as well but does not satisfy bounded waiting as there is no guarantee which of the waiting processes will get chance to enter its critical section once a process sets lock to false. It may happen that a slow process always lags behind and concedes the chance to other processes.
In the next post, we will move on to Semaphores.

Click here to read Part 6

Friday, August 3, 2018

Process Synchronization--Part 4 (Peterson's Solution)

Till now, we have discussed the following posts on Process Synchronization. Before reading this post, please read the following post on Process Synchronization:-


The solution to the 2-process synchronization problem discussed in the previous post did not satisfy progress as it could lead to a deadlock situation. In this post, we discuss Peterson's solution, which satisfies Mutual Exclusion, Progress and Bounded Waiting.



Analysis

  • This solution uses both the turn variable and flag array approaches.
  • There is a small change in the waiting loop condition as well compared to the previous solution.
  • Let us consider process P0.
  • First, it expresses interest to enter the critical section by setting flag[0] = true
  • Then, it sets turn variable to 1.
  • P0 enters the waiting loop and it will continue to wait only if turn is 1 AND flag[1] =1.
  • We know that P0 had set turn to 1 in previous statement
  • But it will wait in the loop only if flag[1] = 1 which means P0 will wait only if P1 had expressed interest to enter the critical section EARLIER than P0.
  • So, in Peterson's Solution, the process which expresses interest earlier will get the chance to enter critical section earlier.

Mutual Exclusion

MUTUAL EXLCUSION HOLDS which we can prove by an argument similar to the one used for the previous solution.

Progress

The difference between Peterson's solution and the previous algorithm is that here, there is a turn variable which can hold either 0 or 1. It cannot hold both values at the same time. As the value of turn
is used in an AND (mandatory) condition in the waiting loop, at least 1 process will be able to proceed to its critical section, so there will be no deadlock. Combining this idea with the general proof of progress in the previous algorithm, we see that PROGRESS HOLDS.

Bounded Waiting

We find here that each process sets turn to the other process' desirable value. For example, P0 sets turn to 1 and P1 sets turn to 0. So, if P1 is interested to enter the critical section (if it has set flag[1] to true) and P0 is currently inside the critical section, P1 will be allowed to enter right after P0 exits and sets turn to 1. So, P1 will enter (if it wanted to !) before P0 enters again.So, BOUNDED WATING HOLDS.

So, we reach the end of the 2-process synchronization problem. In the next post, we will discuss a solution to the critical section problem when more than 2 processes are involved, using something called Semaphores. We will also see that Semaphores can be used for other applications as well.

Was it a nice read? Please subscribe to the blog by clicking on the 'Follow' button on the right hand side of this page. You can also subscribe by email by entering your email id in the box provided. In that case, you will be updated via email as and when new posts are published.Also, please like the Facebook Page for this blog at Gate Computer Science Ideas. Cheers!

Click here to read Part 5

Friday, July 13, 2018

Process Synchronization-Part 3 (Flag Array Solution)

Till now, we have discussed the following posts on Process Synchronization. Before reading this post, please read the following post on Process Synchronization:-


In the previous post, we discussed a solution to the 2-process synchronization problem which did not satisfy PROGRESS. Let us examine why the problem arose.


In the solution above, no process is being made to state explicitly that it is interested to enter its critical section. An inherent assumption is being made that both the processes will always be interested to enter their critical section but this might not be the case. In our next solution, we try to rectify this problem using a 2-element boolean flag array.


  • flag[0] = 1 means Process P0 is interested to enter its critical section
  • flag[0] = 0 means Process P0 is not interested to enter its critical section
  • flag[1] = 1 means Process P1 is interested to enter its critical section
  • flag[1] = 0 means Process P1 is not interested to enter its critical section

Analysis

  • Before attempting to enter its critical section, each process sets its flag variable to true indicating that it is interested to enter its critical section.
  • Each process than waits in a loop until the other process has its flag value to false. This means that a process will wait in the loop as long as the other process has set its flag value to true and has not yet set it to false, meaning that the other process is currently holding the pass to enter critical section.
  • Let us say Process P1 has set its flag value to true which means it has executed the statement flag[1] = true;
  • Now, it gets pre-empted and Process P0 enters the scene and sets flag[0] = true;
  • Process P0 will then enter the waiting loop until Process P1 comes back, enters critical section, exits and sets flag[1] = false;
  • So, in this solution, the process which sets its flag value to true earlier will get to enter its critical section earlier.
Mutual Exclusion

Suppose Process P0 is currently inside its critical section. This means it has set flag[0] to true. If Process P1 now attempts to enter its critical section, it will wait in the loop because flag[0] will be true as long as P0 is inside its critical section. So, mutual exclusion HOLDS.

Progress

We need to analyze a few things here.

  • Let P0 be interested to enter its critical section while P1 is not yet in the scene.
  • P0 sets flag[0] = true;
  • while(flag[1]==true) is false so it exits waiting loop and enters critical section.
  • P0 sets flag[0] to false;
  • Now, let us say P0 wants to enter its critical section once again.
  • P0 sets flag[0] = true;
  • while(flag[1]==true) is false so it exits waiting loop and enters critical section.
  • P0 sets flag[0] to false;
So, a non-interested process P1 has not blocked an interested process P0 from entering its critical section.

**Whenever I use the term 'enter its critical section', it means that a particular process is entering its own critical section. Note that a process can enter only its own critical section. A critical section of a process is a code segment of the process which accesses shared variables. A process is a program in execution. A program consists of lines of code. Ok! Enough :P

So, for now, we can think that this solution satisfies progress. But we must analyze another situation before concluding the same.

  • Let P0 be interested to enter its critical section while P1 is not yet in the scene.
  • P0 sets flag[0] = true;
  • At this point somehow P0 gets pre-empted, P1 enters the scene and executes flag[1] = true;
  • So, now, flag[0] and flag[1] are both true;
  • Now, if P0 comes back and starts executing from where it left off, which means from the while(flag[1]==true); statement, it finds that flag[1] is true and it has to wait. If P0 gets pre-empted and P1 comes, P1 has to wait as well as flag[0] is also true. So, we have a DEADLOCK situation and so, PROGRESS is not satisfied by this solution.
Bounded Waiting

We know that the definition of bounded waiting says that there should be a finite bound on the number of times other processes are allowed to enter their critical sections after the said process has exhibited its interest to join the critical section and before it is granted the chance. Here, each process exhibits its interest by setting its flag to true. And, this should hold for all processes.

Let us suppose P1 has set its flag to true and has then pre-empted. Now, if P0 comes, executes its critical section, and exits after setting flag[0] to true, instantly P1 can be granted the chance as it can now come out of the waiting loop. So, bounded waiting HOLDS.

Conclusion

So, even now, we do not have a correct solution to the 2-process synchronization problem. In the next post, we will look at Peterson's Solution which makes a small modification to the algorithm described in this post, to provide a correct solution.

Was it a nice read? Please subscribe to the blog by clicking on the 'Follow' button on the right hand side of this page. You can also subscribe by email by entering your email id in the box provided. In that case, you will be updated via email as and when new posts are published.Also, please like the Facebook Page for this blog at Gate Computer Science Ideas. Cheers!

Click here to read Part 4


Monday, July 9, 2018

Process Synchronization-Part 2 (Turn Variable Solution)

Till now, we have covered the Introduction to Process Synchronization up to the Critical Section Problem. Before reading this post, please read the following post on Process Synchronization:-


In this post, we will see the 1st solution to the 2-process synchronization problem which uses a shared variable turn. We name the 2 processes P0 and P1.

Turn Variable Solution to 2-process Synchronization Problem

The solution code can be shown by the diagram below:-


Analysis


  • turn is a shared boolean variable which can hold the value either 0 or 1
  • Process P0 waits in an infinite loop until turn value is 0, only then can it enter the Critical Section
  • When P0 exits Critical Section, it sets turn = 1 to enable Process P1 to enter the Critical Section
  • The code is extremely symmetric and easy to understand. Please observe it with some time in hand.
We will now analyze the Turn Variable algorithm with respect to the 3 conditions in our previous post.

Mutual Exclusion

Let Process P0 be in its critical section. This means that turn = 0 at the moment. Until and unless P0 exits critical section and sets turn = 1, Process P1 cannot enter the critical section. The analysis is similar if we start with process P1. So, mutual exclusion HOLDS.

Progress

Let Process P0 be inside its critical section. It finishes its critical section and sets turn = 1. Now, let us assume P0 again wants to enter its critical section. It will not be able to, as long as Process P1 enters its critical section at least once and sets turn = 0 after exiting. So, if Process P1 is not interested to enter its critical section, in a way, it is blocking the entry of an interested process P0 to the critical section even though there is no process currently in its critical section. So, Progress DOES NOT HOLD.

BOUNDED WAITING

Here, we find that strict alternation exists. In other words, the only possible execution sequence is one process followed by the other. So, bounded waiting definitely HOLDS.

Conclusion

This is an incorrect solution to the 2-process synchronization problem as Progress does not hold. In the next post, we will analyze the exact reason why this problem is occurring and try to develop a better solution.

      Was it a nice read? Please subscribe to the blog by clicking on the 'Follow' button on the right hand side of this page. You can also subscribe by email by entering your email id in the box provided. In that case, you will be updated via email as and when new posts are published.Also, please like the Facebook Page for this blog at Gate Computer Science Ideas. Cheers!



Click here to read Part 3

Friday, July 6, 2018

Process Synchronization- Part 1 (The Critical Section Problem)

This topic is extremely important for GATE. Also, most of us find it confusing. In this series, we will try to cover the basic essentials required to solve GATE questions (trust me, not much depth in this topic is needed for GATE!). The topics will include the following:-


  • Race Condition and The Critical Section Problem
  • Solutions to the 2-process Synchronization Problem
  • Hardware solutions including the Test, Set and Lock (TSL)
  • Semaphores, Mutexes and the difference between them
  • Applications of Semaphores
  • A couple of classic synchronization problems 
  • Very important previous GATE questions


Race Condition

Most of us are familiar with the notion of a process. A process can be thought of as a program in execution. In a uni-processing environment, only 1 process can be running at a time. Suppose there are 10 processes to be executed on a certain computer. Process 1 starts executing and in the midst of its execution, it requires some I/O time. So, during this time, if we keep the CPU idle, it will be a wastage and we will obtain reduced throughput. What we can do is, some other process can execute on CPU in the mean time. In this way, the CPU is always kept busy (that is the aim).

But this constant getting-in and throwing-out of processes from the CPU comes with a price. When there are so many processes executing with an effective idea of togetherness, consistency can be hard to achieve, especially when there are shared variables involved. Also, depending on order of execution of processes, the results obtained may be different. We will examine all these ideas.

Let us consider the famous Producer-Consumer problem to understand what a race condition is. While we do not need to analyze the exact code for this problem, we must understand intuitively what the problem is. In the Producer-Consumer problem, there are 2 processes, a Producer process and a Consumer process. Also, we have an array/buffer of items, let us consider the array to be infinite for now. 

To maintain the number of items available in the buffer/array, a shared variable count is needed. It is a shared variable because both the processes should be able to modify it. When the Producer process produces an item and places it on the buffer, it should increment by 1 (execute the statement count++). When the Consumer process takes up an item from the buffer, it should decrement count by 1 (execute the statement count--).

But count++ and count-- are not atomic statements at the Assembly Language Level. In other words, these statements are individually executed as a series of 3 atomic statements shown below:-


Let initial value of count be 18. We have numbered the statements from 1 to 6 for ease of analysis. Note that atomic statements can be individually in any order, which means that after execution of statement 1, the Producer process may be pre-empted while the Consumer process executes statement 4. So, (1,2,3,4,5,6) is a possible execution sequence. Also, (1,4,2,5,3,6) is another possible execution sequence.

When the sequence (1,2,3,4,5,6) occurs, the following happens:-


  • reg1 = 18
  • reg1 = 18+1 = 19
  • count = 19
  • reg2 = 19
  • reg2 = 19-1 =18
  • count = 18           (final value of count is 18-->correct)
So we find that the desired result is obtained. Now, let us see what happens when the sequence (1,2,4,5,3,6) occurs:-

  • reg1 = 18
  • reg2 = 18
  • reg1 = 18+1 = 19
  • reg2 = 18-1 = 17
  • count = reg1 = 19
  • count = reg2 = 17  (final value of count is 17--> incorrect)
If we interchange the final 2 statements in the above sequence so that (1,2,4,5,6,3) occurs, the following happens:-


  • reg1 = 18
  • reg2 = 18
  • reg1 = 18+1 = 19
  • reg2 = 18-1 = 17
  • count = reg2 = 17
  • count = reg1 = 19       (final value of count is 17--> incorrect)
So, in both the above cases, the final value of count is inconsistent and undesired. Depending on which process executes last, the final value of count is changing. This is called a race condition.

Race condition is a situation where several processes access and manipulate shared data concurrently. The final value of the shared data depends on the process which finishes last. Concurrent processes must be synchronized to prevent race conditions.


The Critical Section Problem

Suppose there are n processes, each having a particular section of code where it accesses shared variables. So, each process has a code segment where it accesses shared data which is called the critical section. We must ensure that when 1 process is executing in its critical section, no other process must be allowed to execute in its critical section.

The general code structure of process i can be shown by the diagram below:-




The entry section is usually used to obtain locks on shared variables and permission to access the Critical Section. 

The exit section is usually used to release locks on shared variables so that other processes can access their critical sections.

The remainder section is a code segment not related to the critical section.

The code is shown in a do-while loop with the assumption that a process will end when it executes some code in its remainder section which causes it to terminate.

Critical Section Problem Solution Requirements

Any solution to a critical section problem must satisfy the following 2 conditions mandatorily:-

  • Mutual Exclusion: When a process i is executing in its critical section, no other process should be allowed to enter its critical section.
  • Progress: If no process is in its critical section and we have some process which is interested to enter its critical section, only those processes not in their remainder section should be allowed to participate in the decision of which process will get the chance to enter the critical section next. In other words, only those processes which are interested to enter the critical section should be able to influence which process gets the chance to enter the critical section. A non-interested process should not be able to block an interested process from entering the critical section.
**Note: If there is INFINITE WAITING TIME TO MAKE ABOVE DECISION (which also includes the notion of DEADLOCK)  to make above decision, that means there is NO PROGRESS. This particular point is very important for GATE.

A solution to the critical section problem should ideally satisfy another condition known as Bounded Waiting the definition of which is not the same across textbooks. We can think of bounded waiting as the assurance implicit to the process that there will be a bound (finite upper limit even if it is a very high number) on the number of times other processes are allowed to enter the critical section after that process has requested to enter the critical section and before that request is granted.

In this post, we have seen up to the Critical Section Problem. In the next post, we will see various solutions to the 2-process Synchronization problem where we have only 2 processes.

Click here to read Part 2

Was it a nice read? Please subscribe to the blog by clicking on the 'Follow' button on the right hand side of this page. You can also subscribe by email by entering your email id in the box provided. In that case, you will be updated via email as and when new posts are published.Also, please like the Facebook Page for this blog at Gate Computer Science Ideas. Cheers!

Wednesday, July 4, 2018

Database Normalization--Part 4 (Normal Forms)


Till now, we have covered up to the types of functional dependencies. Before reading this post, please read the following posts on Normalization:-


In this post, we look at some important definitions and move up to the hierarchy of normal forms. Also, we show how to state the highest normal form of a relation. 

**This post is entirely theory and may be boring. It requires multiple readings.

Prime Attribute

·         Any attribute which is part of at least one candidate key is called a prime attribute.
·         If AF is a candidate key, then A and F are prime attributes.


Non-prime attribute

·         An attribute in a relation which is not part of any candidate key is called a non-prime attribute.
·         If in a relation R(A,B,C,D,E) , AD is the only candidate key, then B,C and E are non-prime attributes.


Normal Forms

·         It is a property of relations which indicates the amount of redundancy in the relation.
·         Normal form and degree of redundancy are inversely proportional.
·         Normalization is a systematic approach applied on relations to reduce the degree of redundancy and achieve progressively higher normal forms.


Normalization Procedure

1)      Determine the highest possible normal form of the given relation.

2)      Decompose the relation from its existing normal form to higher normal forms.


Procedure to find highest Normal Form of the given relation

1)      Find all possible Candidate Keys of the given relation.

2)      Identify all the existing prime and non-prime attributes.

3)      Identify all the existing Full Dependencies, Partial Dependencies, Transitive Dependencies and Overlapping Candidate Key Dependencies.

4)      Refer to the definition and hierarchy of Normal Forms for finding the highest possible Normal Form of the given relation.

Hierarchy of Normal Forms

Normalization is a hierarchical procedure and each stage is designated by a particular normal form. The hierarchy of normal forms has been shown in the diagram given below:-

First Normal Form (1NF)

·         A relation is said to be in 1NF if all the values in the relation are atomic and single-valued.
·         According to Codd’s rules of Relational Database Management Systems, every relation will always be in 1NF by default.
·         The relation instance given below does not satisfy 1NF criterion (it is not in 1NF) as Email is not a single-valued attribute.

Name
Class
Email



Ravi
1
Rahul
2
Rakesh
4

·         The above relation instance can be converted to 1NF as shown below:-

Name
Class
Email



Ravi
1
Ravi
1
Rahul
2
Rakesh
4

Second Normal Form (2NF)

A relation is said to be in 2NF if:-
·         It is in 1NF.
·         It has no partial dependencies.


Third Normal Form (3NF)

A relation is said to be in 3NF if:-
·         it is in 2NF.
·         It has no transitive dependencies.


Boyce Codd Normal Form (BCNF)

A relation is said to be in BCNF if:-
·         it is in 3NF and has no overlapping candidate key dependencies.

The hierarchy of Normal Forms can be represented by the above diagram. So, if a relation is in BCNF, it has to be in 3NF already but the reverse may not be true. Similar statments can be made for the other normal forms as well.

In this post, we have seen the hierarchy of Normal Forms. In the next post, we will look at examples on Normal Forms.

Was it a nice read? Please subscribe to the blog by clicking on the 'Follow' button on the right hand side of this page. You can also subscribe by email by entering your email id in the box provided. In that case, you will be updated via email as and when new posts are published. Cheers!

Click here to read Part 5


Database Normalization--Part 3 (Types of Functional Dependencies)


Till now, we have covered up to the Closure Set of Attributes method of finding candidate keys. Before reading this post, please read the following posts on Normalization:-


In this post, we will look at the types of Functional Dependencies which is very important for understanding Database Normalization. Here, we look at 4 types of Functional Dependencies:-
  • Full Dependencies
  • Partial Dependencies
  • Transitive Dependencies
  • Overlapping Candidate Key Dependencies
****For GATE, it is very important to know and remember the exact criterion of each of these dependencies. Almost every year, at least 1 direct question comes from this topic.


Full Dependencies

·       A dependency is said to be full if the determinant of the dependency is a superkey.
·         Full dependencies can be represented by the diagram shown above:-

The amount of redundancy caused by each full dependency is always 0, hence normalization procedure will not try eliminating Full Dependencies. An example of a full dependency has also been shown above. The candidate key has been assumed.
Partial Dependencies

·         A dependency is said to be partial if non-prime attributes are depending partially on a candidate key. In other words, if there is a functional dependency in which part of a candidate key is determining non prime attribute(s), then it is called a partial dependency.

·       Partial dependencies can cause redundancy in the relations and hence they will be eliminated in the normalization procedure.

·       If there exists a relation that has simple candidate keys only(single attribute candidate keys), then there exists no partial dependencies.
·         Partial dependencies can be represented by the diagram shown above. An example for the same has also been shown.



Transitive Dependencies

·         A dependency is said to be transitive if a non-prime attribute(s) are depending on other non-prime attributes or on a combination of non-prime attributes and proper subject of candidate keys. In other words, the dependency is transitive if one or more non-prime attributes are depending on a superkey transitively but not directly.

·         Transitive dependencies can cause redundancy hence normalization process tries to eliminate them.

·         If all attributes in a relation are prime, the number of partial dependencies and the number of transitive dependencies are both zero. (Why? Think!)

·      Transitive dependencies can be represented by the diagram shown above. An example for the same has also been shown.



Overlapping Candidate Key Dependencies

·         If in a dependency XàY, X is not a superkey and Y is the proper subset of a candidate key, then the dependency is called an Overlapping Candidate Key dependency.
·         Overlapping Candidate Key Dependencies can be represented by the diagram shown above. An example for the same has also been shown.

In this post, we have seen the types of Functional Dependencies. In the next post, we will look at a few definitions and develop the concept of Normal Forms.

Was it a nice read? Please subscribe to the blog by clicking on the 'Follow' button on the right hand side of this page. You can also subscribe by email by entering your email id in the box provided. In that case, you will be updated via email as and when new posts are published. Cheers!

Click here to read Part 4