Hello friends, In this article we will try to understand the concept of normalization in DBMS or Database Normalization.
We will start with understanding what all problems arise if a table or a database is not normalized, and how normalization in DBMS solves these problems.
What is normalization in DBMS?
Normalization in DBMS is a technique of organizing the data into multiple related tables to minimize data redundancy. Before moving further, make one thing very clear that Normalization in DBMS only minimizes the data redundancy not totally finishes data redundancy. Now, you would ask what data redundancy is and why should we reduce it.
What is Data Redundancy?
Data redundancy is nothing but repetition of similar data at multiple places. And we want to reduce it not just because of repetition of similar data multiple times. It not only holds an extra space. But it also leads to multiple other issues like insertion anomalies, deletion anomalies, and updation anomalies.
For now, let’s start with a very basic example and try to understand data redundancy and the problems that we face due to it.
|Roll||Name of Students||Name of HOD||Branch||Telephone number|
So here, we have a simple student table to store student information in a college. We have columns roll number, student name, the branch or the department name, the name of the actual head of the department, you can call it a class teacher as well, and the department telephone number.
Now let’s put data for computer science students into this table.
We’ll add roll number 1,2,3, and 4, put in the student’s name, their branch name, the name of the HOD, and the department telephone numbers.
Now in this case, the branch name, the HOD name and the telephone number are the same for all the entries that we make.
Whenever a new student entry is created, this entire information is repeated. And this is what is redundancy. Now, this data redundancy not only increases the size of our Database unnecessarily, but it also leads to multiple other issues like we talked about.
So let’s see what these issues are.
There are three main problems that data redundancy causes,
- Insertion Anomaly,
- Deletion anomaly
- Updation anomaly.
So let’s start with insertion anomaly.
We already have four students’ data in our table. If we have to add a fifth student, we will have to repeat the same data, which holds the branch information like branch name, the name of the HOD and the branch telephone number again, and if we have to enter the data for 100 more students, we have to repeat the data 100 more times, which leads to insertion anomaly. Because the repetition of data will only increase as we insert more entries to our table.
Now this repetition arises because in our example, we are unknowingly unintentionally saving two pieces of information in every row, student information and branch information.
Now if we start deleting student information, the branch information is also deleted simultaneously. And when we delete the last student role from the student table, we also unintentionally delete the branch information. So now we don’t have any student information and we also don’t have the branch information because we deleted it, which leads to deletion problems. And it’s also known as deletion anomaly.
Updation Anomaly or Modification Anomaly:
So let’s say if the authority or the head of the department leaves for some reason, and the department gets a new HOD, the system administrator has to update all the student records with the new name of the HOD. This is a modification anomaly because this is held a lot of work which the administrator has to do to update every single record with the new HOD name and during modification, if even a single row is missed out, it will lead to inconsistent data. Hence, data redundancy can also lead to inconsistency in data as a result of updation anomaly.
Now you understand that data redundancy is pretty bad. Not only because it eats up extra space, but it leads to problems in inserting new data, deleting the data and a huge problem if you have to update the data and this is the place where superman (Normalization in DBMS) has to arrive and save us.
So now the million dollar question is how normalization in DBMS will solve this problem.
So normalization will break the existing student table into two different tables, student table and branch table.
The student table will save the data of the student and the branch table will save the branch information.
Now the branch information if we want to save data of computer science department students will be just one row. That will be the branch name, computer science, the name of the new HOD and the phone number of the department.
In the student table we will keep all the information of the students enrolled for this particular branch and these tables will be related by the name of the branch in our case, which is VGM.
One can see that the branch name is still getting repeated in the students table. So as I mentioned at the beginning, normalization in DBMS is not about eliminating data redundancy. It’s about minimizing data redundancy.
So now earlier where all the branch information was getting repeated with each and every student record, now only the branch name is repeated because the branch information is kept in a different table and if we want to update any branch information, we just have to update it at one place and it will automatically be updated when we fetch a student record because the student record is connected with the branch record using the branch name. if we update the HOD name, or the phone number in the branch table, it’s updated for all the students.
Now as redundancy is minimized, hence, the problems which arise due to it are also minimized with the help of database normalization.
Let’s see how the problems are solved. Now if we have to insert a new student data, we only have to enter the roll number, the student name and just the branch name because the branch information is stored separately, and we don’t have to modify or insert it again.
Then we have to delete student information at the end of a session to store information of the students for the new batch. We can just delete all the student records. And we’ll still have the branch information branch details in a separate table save.
If we have to update the name of the HOD again, or the department’s phone number, we just have to do it in one place.
See how database normalization or normalization in DBMS solves a lot of problems by just dividing the data into separate independent logical entities and relating them using a common key or you can say a unique name in our case that is the branch name.
So normalization in DBMS can be achieved in multiple ways. We have three basic normal forms in which we can achieve normalization. They are first Normal Form, second normal form and the third normal form and another more advanced normalization technique which is known as bcnf or the advance version of 3NF.
If anyone wants us to share their content on our website, please contact us or email us to firstname.lastname@example.org.
TECHSIKHSHA IS A TECHNICAL GUIDE WEBSITE WHERE YOU CAN FIND VARIOUS TYPES OF TECHNICAL TIPS. STAY WITH US AND HELP US TO GROW.