Purpose of Normalization
Normalization is the process of structuring and handling the relationship between data to minimize redundancy in the relational table and avoid the unnecessary anomalies properties from the database like insertion, update and delete. It helps to divide large database tables into smaller tables and make a relationship between them. It can remove the redundant data and ease to add, manipulate or delete table fields.
A normalization defines rules for the relational table as to whether it satisfies the normal form. A normal form is a process that evaluates each relation against defined criteria and removes the multivalued, joins, functional and trivial dependency from a relation. If any data is updated, deleted or inserted, it does not cause any problem for database tables and help to improve the relational table' integrity and efficiency.
Objective of Normalization
Types of Anomalies
Following are the types of anomalies that make the table inconsistency, loss of integrity, and redundant data.
1. Data redundancy occurs in a relational database when two or more rows or columns have the same value or repetitive value leading to unnecessary utilization of the memory.
There are two students in the above table, 'James' and 'Ritchie Rich', whose records are repetitive when we enter a new CourseID. Hence it repeats the studRegistration, StudName and address attributes.
2. Insert Anomaly: An insert anomaly occurs in the relational database when some attributes or data items are to be inserted into the database without existence of other attributes. For example, In the Student table, if we want to insert a new courseID, we need to wait until the student enrolled in a course. In this way, it is difficult to insert new record in the table. Hence, it is called insertion anomalies.
3. Update Anomalies: The anomaly occurs when duplicate data is updated only in one place and not in all instances. Hence, it makes our data or table inconsistent state. For example, suppose there is a student 'James' who belongs to Student table. If we want to update the course in the Student, we need to update the same in the course table; otherwise, the data can be inconsistent. And it reflects the changes in a table with updated values where some of them will not.
4. Delete Anomalies: An anomaly occurs in a database table when some records are lost or deleted from the database table due to the deletion of other records. For example, if we want to remove Trent Bolt from the Student table, it also removes his address, course and other details from the Student table. Therefore, we can say that deleting some attributes can remove other attributes of the database table.
So, we need to avoid these types of anomalies from the tables and maintain the integrity, accuracy of the database table. Therefore, we use the normalization concept in the database management system.
Types of Normalization
First Normal Form (1NF): The table will be in First Normal Form (1NF) if all the attributes of the table contain only atomic values. We can also say that if a table holds the multivalued data items in attributes or composite values, the relation cannot be in the first normal form. So, we need to make it first normal form by making the entries of the table atomic.
Second Normal Form (2NF): A Relation will be in 2NF if it follows the following condition:
Third Normal Form (3NF): The table will be in Third Normal Form (3NF) if it follows the given conditions:
Note: A table should be in a 3 NF, if the Left-Hand Side (LHS) of all Functional dependency (FD) must be a Candidate Key (CK) / Super Key, Or the Right-hand side should be Prime attribute.
BCNF: It stands for Boyce Codd Normal form, which is the next version of 3NF. Sometimes, it is also pronounced as 3.5 NF. A normal form is said to be in BCNF if it follows the given conditions:
Note: There should be one candidate/Super key on the left-hand side for each functional dependency for BCNF.
Fourth Normal Form (4 NF): A relation is said to be Fourth Normal Form (4NF) if it follows the given conditions:
For example, if the dependency A -> B, for a single value of A, more than one value of B exists. Then the relation is said to be a multivalued dependency.
Following is the condition for being multivalued dependency:
Fifth Normal Form (5 NF): A relation is said to be 5NF if it follows the given conditions:
It is also known as Project Join Normal Form (PJNF).
Join dependency: A relation (R) is said to be a Join dependency if the relation (R) schema can be divided into smaller sets of tables R1, R2 … Rn that can be redesigned by joining multiple tables to the original table (R).