Normalisation in SQL

This article will provide you detailed explanation on an important SQL concept called normalisation.

Normalisation is a database design technique that eliminates undesirable characteristics including Insertion, Update, and Deletion Anomalies and reduces data redundancy and duplication. Larger tables are split into smaller tables by normalisation procedures, and these smaller tables are connected through relationships. The goal of normalisation in SQL is to get rid of superfluous (repetitive) data and make sure that data is kept correctly.

A set of guidelines known as normal forms serve as the foundation for the normalisation process. With the use of these rules, abnormalities, inconsistencies, and data duplication that might happen when data is kept in a single table are reduced or eliminated.

Edgar Codd, the inventor of the relational model, introduced the idea of data normalisation with the First Normal Form and continued to improve it with the Second and Third Normal Forms. Later, he collaborated with Raymond F. Boyce to establish the Boyce-Codd Normal Form theory.

Different types of normal forms

Different stages of normalisation exist, and each has its own needs and standards and all of them are mentioned and explained below

  1. 1NF(First Normal Form)
  2. 2NF(Second Normal Form)
  3. 3NF(Third Normal Form)
  4. 4NF(Fourth Normal Form)
  5. 5NF(Fifth Normal Form)

1. 1NF

If below two criteria/rule are followed by the table then the relational data is said to in first normal form.

  • Single value should be present in each table cell(Atomicity).
  • Each record should be distinct and unique.

Consider below example to understand the first normal form.

Ex: let us consider the below student table to understand the 1NF concept.

Roll_noNameCourses_enrolled
11RamEnglish, Biology
12SamuelMaths, Physics
13RehmanChemistry

As we can observe in first 2 records the Courses_enrolled column has multiple values in a specific table cell. So, it contradicts the rules of 1NF. In order to rectify this, we need to split this table and the both parts should contain one common column/attribute. This column is nothing the super key. In order to understand what is a super key in simpler version super key is an attribute which is used to uniquely identify all the records.

So, the table is divided as follows:

Student table:

Roll_noName
11Ram
12Samuel
13Rehman

Course table:

Roll_noCourse
11English
11Biology
12Maths
12Physics
13Chemistry

All the records of the table are atomic(each table cell contains single value) and all records are unique. So, the above relational data is said to be in 1NF.

2. 2NF

If below two criteria/rule are followed by the table then the relational data is said to in second normal form.

  • The data should be in INF.
  • Every non-key column in the table should depend on the complete primary key, not just a portion of it.

Ex: consider below the below table named "sales" which has following data

order_idCustomer_idCustomer nameProduct_idQuantityProduct_name
101121James323Laptop
102122Rodes331Tablet
103123Williams342Shirts
104124Smith351Watches

As we observe in the table the primary key is not a single attribute but it is a combination of 2 attributes, they are customer_id, order_id. Here Customer_name attribute is only depending on the customer_id column but not on the order_id column. This is contradicting to the 2NF rule.

So, this rectify this the sales table need to be divided into two separate tables one for orders and other for customers.

Orders table:

order_idCustomer_idProduct_idQuantity
101121323
102122331
103123342
104124351

Customer Table:

Customer_idCustomer name
121James
122Rodes
123Williams
124Smith

3. 3NF

If below two criteria/rule are followed by the table then the relational data is said to in third normal form.

  • The data should be in 2NF
  • all non-key columns must be dependent only on the primary key, and not on any other non-key columns.

Ex: consider below example to clearly understand the concept of third normal form.

Employee_idnamedepartmentDepartment_idManager_idManager_namesalary
501JamesMarketing121101Jane Hoper20000
502JohnsonTesting122102William Smith50000
503DavidResearch and development123103Michal Doe75000

Here the primary key is the employee_id attribute. The department, Manager Name, and Salary columns are not depending on Employee ID, but also on the Manager ID attribute. This is contradicting to 3NF rule.

To rectify it and structure it into third normal form we need to divide the entire table into 3 parts in which one part is for employees second part is for departments and last one is for managers.

Employees Table:

Employee_idNameDepartment_idManager_idSalary
501James12110120000
502Johnson12210250000
503David12310375000

Department table

Department_idDepartment_name
121Marketing
122Testing
123Research and development

Manager Table:

Manager_idManager_name
101Jane Hoper
102William Smith
103Michal Doe

Now, the Manager_name column depends only on the Manager_id column, while the Department and Salary columns depend solely on the Department_id and Employee_id, respectively. This meets the criteria of 3NF.

Fourth Normal form

If below two criteria/rule are followed by the table then the relational data is said to in fourth normal form.

  • The data should be already in 3NF
  • The relational data should not have any non-trivial multi-valued dependencies.

First we need to understand what is multi-valued dependency.

When an attribute or combination of attributes can have several values for a single value of another attribute or combination of attributes, then this situation is said to be multi-valued dependency (MVD).

Ex: consider below example to clearly understand the concept of fourth normal form. Consider below "orders" table which consists of following data

order_idCustomer_idCustomer nameProduct_idProduct_nameProduct_description
101121James32LaptopGaming Laptop
101121James33BookFiction Story
103122Rodes33BookFiction Story
104124Smith35WatchesSmart watch
104124Smith36ShirtFormal Shirt

In this table we can clearly observe there is multi value dependency because there exist multiple values for Product name and Product Description for each combination of order ID and Product ID. For example, Order ID 101 and Product ID 32 have the Product Name "Laptop" and the Product Description "Gaming Laptop", while Order ID 101 and Product ID 33 have the Product Name "Book" and the Product Description "Fiction Story".

So, in order to remove this Multi value dependency and to structure the above data into fourth normal form we have to split the total table into separate parts one for orders and other for the products. So, this results to

Orders Table:

order_idCustomer_idCustomer name
101121James
103122Rodes
104124Smith

Products Table:

order_idProduct_idProduct_nameProduct_description
10132LaptopGaming Laptop
10133BookFiction Story
10333BookFiction story
10435WatchesSmart Watch
10436ShirtFormal Shirt

Now the resultant table looks in fourth normal form without any multi value dependency.

Fifth Normal Form:

A table is regarded as normalised in 5NF when all of its non-trivial dependencies are based on the main key. As a result, there is no redundancy in the database, and any potential data abnormalities brought on by repeated updates are also eliminated.

Example:

Consider the below table for illustrating 5 NF:

Book_idTitleAuthor_IDAuthor NameAuthor email
11Python for beginners101K. Kishore[email protected]
11Python for beginners102L. Ravi[email protected]
12Software testing103P. Gopi[email protected]
12Software testing104R. Joseph[email protected]

The relational data is redundant/repetitive i.e., same book names are assigned to two authors. Due to this redundancy, updating the email of one author may require updating all other instances of that author in the table.

We must locate the multi-valued dependencies in the table in order to apply 5NF. A multi-valued dependency exists between the "Book ID" and "Author ID" properties in this instance since both one book and one author may have numerous authors.

Based on the multi-valued dependency, we can divide the "Books" table into two tables, "Books" and "Authors". The following attributes would be present in the new tables:

Books table:

Book_idTitle
11Python for beginners
12Software testing

Authors table:

Author_IDAuthor NameAuthor email
101K. Kishore[email protected]
102L. Ravi[email protected]
103P. Gopi[email protected]
104R. Joseph[email protected]

Now the redundancy in the original table is eliminated by decomposing the table into two new tables. This decomposition has fulfilled the requirements of fifth normal form. So, the above relational data is structured into fifth normal Form.






Latest Courses