# Normalisation in SQL

This article will provide you detailed explanation on an important SQL concept called normalisation.

Normalisation is a database design technique that eliminates undesirable characteristics including Insertion, Update, and Deletion Anomalies and reduces data redundancy and duplication. Larger tables are split into smaller tables by normalisation procedures, and these smaller tables are connected through relationships. The goal of normalisation in SQL is to get rid of superfluous (repetitive) data and make sure that data is kept correctly.

A set of guidelines known as normal forms serve as the foundation for the normalisation process. With the use of these rules, abnormalities, inconsistencies, and data duplication that might happen when data is kept in a single table are reduced or eliminated.

Edgar Codd, the inventor of the relational model, introduced the idea of data normalisation with the First Normal Form and continued to improve it with the Second and Third Normal Forms. Later, he collaborated with Raymond F. Boyce to establish the Boyce-Codd Normal Form theory.

## Different types of normal forms

Different stages of normalisation exist, and each has its own needs and standards and all of them are mentioned and explained below

1. 1NF(First Normal Form)
2. 2NF(Second Normal Form)
3. 3NF(Third Normal Form)
4. 4NF(Fourth Normal Form)
5. 5NF(Fifth Normal Form)

## 1. 1NF

If below two criteria/rule are followed by the table then the relational data is said to in first normal form.

• Single value should be present in each table cell(Atomicity).
• Each record should be distinct and unique.

Consider below example to understand the first normal form.

Ex: let us consider the below student table to understand the 1NF concept.

Roll_no Name Courses_enrolled
11 Ram English, Biology
12 Samuel Maths, Physics
13 Rehman Chemistry

As we can observe in first 2 records the Courses_enrolled column has multiple values in a specific table cell. So, it contradicts the rules of 1NF. In order to rectify this, we need to split this table and the both parts should contain one common column/attribute. This column is nothing the super key. In order to understand what is a super key in simpler version super key is an attribute which is used to uniquely identify all the records.

So, the table is divided as follows:

Student table:

Roll_no Name
11 Ram
12 Samuel
13 Rehman

Course table:

Roll_no Course
11 English
11 Biology
12 Maths
12 Physics
13 Chemistry

All the records of the table are atomic(each table cell contains single value) and all records are unique. So, the above relational data is said to be in 1NF.

## 2. 2NF

If below two criteria/rule are followed by the table then the relational data is said to in second normal form.

• The data should be in INF.
• Every non-key column in the table should depend on the complete primary key, not just a portion of it.

Ex: consider below the below table named "sales" which has following data

order_id Customer_id Customer name Product_id Quantity Product_name
101 121 James 32 3 Laptop
102 122 Rodes 33 1 Tablet
103 123 Williams 34 2 Shirts
104 124 Smith 35 1 Watches

As we observe in the table the primary key is not a single attribute but it is a combination of 2 attributes, they are customer_id, order_id. Here Customer_name attribute is only depending on the customer_id column but not on the order_id column. This is contradicting to the 2NF rule.

So, this rectify this the sales table need to be divided into two separate tables one for orders and other for customers.

Orders table:

order_id Customer_id Product_id Quantity
101 121 32 3
102 122 33 1
103 123 34 2
104 124 35 1

Customer Table:

Customer_id Customer name
121 James
122 Rodes
123 Williams
124 Smith

## 3. 3NF

If below two criteria/rule are followed by the table then the relational data is said to in third normal form.

• The data should be in 2NF
• all non-key columns must be dependent only on the primary key, and not on any other non-key columns.

Ex: consider below example to clearly understand the concept of third normal form.

Employee_id name department Department_id Manager_id Manager_name salary
501 James Marketing 121 101 Jane Hoper 20000
502 Johnson Testing 122 102 William Smith 50000
503 David Research and development 123 103 Michal Doe 75000

Here the primary key is the employee_id attribute. The department, Manager Name, and Salary columns are not depending on Employee ID, but also on the Manager ID attribute. This is contradicting to 3NF rule.

To rectify it and structure it into third normal form we need to divide the entire table into 3 parts in which one part is for employees second part is for departments and last one is for managers.

Employees Table:

Employee_id Name Department_id Manager_id Salary
501 James 121 101 20000
502 Johnson 122 102 50000
503 David 123 103 75000

Department table

Department_id Department_name
121 Marketing
122 Testing
123 Research and development

Manager Table:

Manager_id Manager_name
101 Jane Hoper
102 William Smith
103 Michal Doe

Now, the Manager_name column depends only on the Manager_id column, while the Department and Salary columns depend solely on the Department_id and Employee_id, respectively. This meets the criteria of 3NF.

## Fourth Normal form

If below two criteria/rule are followed by the table then the relational data is said to in fourth normal form.

• The data should be already in 3NF
• The relational data should not have any non-trivial multi-valued dependencies.

First we need to understand what is multi-valued dependency.

When an attribute or combination of attributes can have several values for a single value of another attribute or combination of attributes, then this situation is said to be multi-valued dependency (MVD).

Ex: consider below example to clearly understand the concept of fourth normal form. Consider below "orders" table which consists of following data

order_id Customer_id Customer name Product_id Product_name Product_description
101 121 James 32 Laptop Gaming Laptop
101 121 James 33 Book Fiction Story
103 122 Rodes 33 Book Fiction Story
104 124 Smith 35 Watches Smart watch
104 124 Smith 36 Shirt Formal Shirt

In this table we can clearly observe there is multi value dependency because there exist multiple values for Product name and Product Description for each combination of order ID and Product ID. For example, Order ID 101 and Product ID 32 have the Product Name "Laptop" and the Product Description "Gaming Laptop", while Order ID 101 and Product ID 33 have the Product Name "Book" and the Product Description "Fiction Story".

So, in order to remove this Multi value dependency and to structure the above data into fourth normal form we have to split the total table into separate parts one for orders and other for the products. So, this results to

Orders Table:

order_id Customer_id Customer name
101 121 James
103 122 Rodes
104 124 Smith

Products Table:

order_id Product_id Product_name Product_description
101 32 Laptop Gaming Laptop
101 33 Book Fiction Story
103 33 Book Fiction story
104 35 Watches Smart Watch
104 36 Shirt Formal Shirt

Now the resultant table looks in fourth normal form without any multi value dependency.

## Fifth Normal Form:

A table is regarded as normalised in 5NF when all of its non-trivial dependencies are based on the main key. As a result, there is no redundancy in the database, and any potential data abnormalities brought on by repeated updates are also eliminated.

Example:

Consider the below table for illustrating 5 NF:

Book_id Title Author_ID Author Name Author email
11 Python for beginners 101 K. Kishore [email protected]
11 Python for beginners 102 L. Ravi [email protected]
12 Software testing 103 P. Gopi [email protected]
12 Software testing 104 R. Joseph [email protected]

The relational data is redundant/repetitive i.e., same book names are assigned to two authors. Due to this redundancy, updating the email of one author may require updating all other instances of that author in the table.

We must locate the multi-valued dependencies in the table in order to apply 5NF. A multi-valued dependency exists between the "Book ID" and "Author ID" properties in this instance since both one book and one author may have numerous authors.

Based on the multi-valued dependency, we can divide the "Books" table into two tables, "Books" and "Authors". The following attributes would be present in the new tables:

Books table:

Book_id Title
11 Python for beginners
12 Software testing

Authors table:

Author_ID Author Name Author email
101 K. Kishore [email protected]
102 L. Ravi [email protected]
103 P. Gopi [email protected]
104 R. Joseph [email protected]

Now the redundancy in the original table is eliminated by decomposing the table into two new tables. This decomposition has fulfilled the requirements of fifth normal form. So, the above relational data is structured into fifth normal Form.