Purpose of Normalization

Normalization is the process of structuring and handling the relationship between data to minimize redundancy in the relational table and avoid the unnecessary anomalies properties from the database like insertion, update and delete. It helps to divide large database tables into smaller tables and make a relationship between them. It can remove the redundant data and ease to add, manipulate or delete table fields.

A normalization defines rules for the relational table as to whether it satisfies the normal form. A normal form is a process that evaluates each relation against defined criteria and removes the multivalued, joins, functional and trivial dependency from a relation. If any data is updated, deleted or inserted, it does not cause any problem for database tables and help to improve the relational table' integrity and efficiency.

Objective of Normalization

  1. It is used to remove the duplicate data and database anomalies from the relational table.
  2. Normalization helps to reduce redundancy and complexity by examining new data types used in the table.
  3. It is helpful to divide the large database table into smaller tables and link them using relationship.
  4. It avoids duplicate data or no repeating groups into a table.
  5. It reduces the chances for anomalies to occur in a database.

Types of Anomalies

Following are the types of anomalies that make the table inconsistency, loss of integrity, and redundant data.

1. Data redundancy occurs in a relational database when two or more rows or columns have the same value or repetitive value leading to unnecessary utilization of the memory.

Student Table:

StudRegistrationCourseIDStudNameAddressCourse
2056204JamesLos AngelesEconomics
2056247JamesLos AngelesEconomics
2246247Trent BoltNew YorkMathematics
2306204Ritchie RichEgyptComputer
2306208Ritchie RichEgyptAccounts

There are two students in the above table, 'James' and 'Ritchie Rich', whose records are repetitive when we enter a new CourseID. Hence it repeats the studRegistration, StudName and address attributes.

2. Insert Anomaly: An insert anomaly occurs in the relational database when some attributes or data items are to be inserted into the database without existence of other attributes. For example, In the Student table, if we want to insert a new courseID, we need to wait until the student enrolled in a course. In this way, it is difficult to insert new record in the table. Hence, it is called insertion anomalies.

3. Update Anomalies: The anomaly occurs when duplicate data is updated only in one place and not in all instances. Hence, it makes our data or table inconsistent state. For example, suppose there is a student 'James' who belongs to Student table. If we want to update the course in the Student, we need to update the same in the course table; otherwise, the data can be inconsistent. And it reflects the changes in a table with updated values where some of them will not.

4. Delete Anomalies: An anomaly occurs in a database table when some records are lost or deleted from the database table due to the deletion of other records. For example, if we want to remove Trent Bolt from the Student table, it also removes his address, course and other details from the Student table. Therefore, we can say that deleting some attributes can remove other attributes of the database table.

So, we need to avoid these types of anomalies from the tables and maintain the integrity, accuracy of the database table. Therefore, we use the normalization concept in the database management system.

Types of Normalization

  1. First Normal Form (1NF)
  2. Second Normal Form (2NF)
  3. Third Normal Form (3NF)
  4. Boyce and Codd Normal Form (BCNF)
  5. Fourth Normal Form (4NF)
  6. Fifth Normal Form (5NF)

First Normal Form (1NF): The table will be in First Normal Form (1NF) if all the attributes of the table contain only atomic values. We can also say that if a table holds the multivalued data items in attributes or composite values, the relation cannot be in the first normal form. So, we need to make it first normal form by making the entries of the table atomic.

Second Normal Form (2NF): A Relation will be in 2NF if it follows the following condition:

  1. The table or relation should be in 1NF or First Normal Form.
  2. All the non-prime attributes should be fully functionally dependent on the candidate key.
  3. The table should not contain any partial dependency.

Third Normal Form (3NF): The table will be in Third Normal Form (3NF) if it follows the given conditions:

  1. The table or relation should be in 2NF.
  2. It should not contain any transitive dependency. A Transitive Dependency is that any non-prime attribute determines or depends on the other non-prime attribute.
    A relation is in 3NF if FD X determines Y ('X' -> 'Y') satisfies one of the following condition:
    1. If X -> Y is a trivial FD, i.e., Y is a subset of X.
    2. If X -> Y, where X is a Super key.
    3. If X -> Y, (Y - X) is a prime attribute.

Note: A table should be in a 3 NF, if the Left-Hand Side (LHS) of all Functional dependency (FD) must be a Candidate Key (CK) / Super Key, Or the Right-hand side should be Prime attribute.

BCNF: It stands for Boyce Codd Normal form, which is the next version of 3NF. Sometimes, it is also pronounced as 3.5 NF. A normal form is said to be in BCNF if it follows the given conditions:

  1. A table or relation must be in 3NF.
  2. If a relation R has functional dependencies (FD) and if A determines B, where A is a super Key, the relation is in BCNF.

Note: There should be one candidate/Super key on the left-hand side for each functional dependency for BCNF.

Fourth Normal Form (4 NF): A relation is said to be Fourth Normal Form (4NF) if it follows the given conditions:

  1. A table must be in BCNF.
  2. There should be no multivalued dependency in the table.

For example, if the dependency A -> B, for a single value of A, more than one value of B exists. Then the relation is said to be a multivalued dependency.

Following is the condition for being multivalued dependency:

  1. For a multivalued dependency A -> B, for a single value of A, there are multiple values of B.
  2. If a table has at least three columns, they have a multivalued dependency.
  3. For a relation with A, B and C columns, where B and C should be independent.

Fifth Normal Form (5 NF): A relation is said to be 5NF if it follows the given conditions:

  1. The table should be in 4NF.
  2. There should not be Join Dependency or further non-loss decomposed.

It is also known as Project Join Normal Form (PJNF).

Join dependency: A relation (R) is said to be a Join dependency if the relation (R) schema can be divided into smaller sets of tables R1, R2 … Rn that can be redesigned by joining multiple tables to the original table (R).






Latest Courses