Javatpoint Logo
Javatpoint Logo

How to Delete Duplicate Rows in SQL?

In this section, we learn different ways to delete duplicate rows in MySQL and Oracle. If the SQL table contains duplicate rows, then we have to remove the duplicate rows.

Preparing sample data

The script creates the table named contacts.

In the above table, we have inserted the following data.

We execute the script to recreate test data after executing a DELETE statement.

The query returns data from the contacts table:


id first_name last_name Email age
7 Ben Barnes [email protected] 21
13 Brian Blessed [email protected] 18
10 Eliza Bennett [email protected] 23
1 Kavin Peterson [email protected] 22
14 Kavin Peterson [email protected] 23
8 Mischa Barton [email protected] 20
11 Michal Krane [email protected] 17
4 Michal Jackson [email protected] 18
2 Nick Jonas [email protected] 16
3 Peter Heaven [email protected] 25
12 Peter Heaven [email protected] 25
5 Sean Bean [email protected] 20
9 Sean Bean [email protected] 20
6 Tom Baker [email protected] 30

The following SQL query returns the duplicate emails from the contact table:


email COUNT(email)
[email protected] 2
[email protected] 2
[email protected] 2

We have three rows with duplicate emails.

(A) Delete duplicate rows with the DELETE JOIN statement

Output:

Three rows had been deleted. We execute the query, given below to finds the duplicate emails from the table.

The query returns the empty set. To verify the data from the contacts table, execute the following SQL query:


id first_name last_name Email age
7 Ben Barnes [email protected] 21
13 Brian Blessed [email protected] 18
10 Eliza Bennett [email protected] 23
1 Kavin Peterson [email protected] 22
8 Mischa Barton [email protected] 20
11 Micha Krane [email protected] 17
4 Michal Jackson [email protected] 18
2 Nick Jonas [email protected] 16
3 Peter Heaven [email protected] 25
5 Sean Bean [email protected] 20
6 Tom Baker [email protected] 30

The rows id's 9, 12, and 14 have been deleted. We use the below statement to delete the duplicate rows:

Execute the script for creating the contact.


id first_name last_name email age
1 Ben Barnes [email protected] 21
2 Kavin Peterson [email protected] 22
3 Brian Blessed [email protected] 18
4 Nick Jonas [email protected] 16
5 Michal Krane [email protected] 17
6 Eliza Bennett [email protected] 23
7 Michal Jackson [email protected] 18
8 Sean Bean [email protected] 20
9 Mischa Barton [email protected] 20
10 Peter Heaven [email protected] 25
11 Tom Baker [email protected] 30

(B) Delete duplicate rows using an intermediate table

To delete a duplicate row by using the intermediate table, follow the steps given below:

Step 1. Create a new table structure, same as the real table:

Step 2. Insert the distinct rows from the original schedule of the database:

Step 3. Drop the original table and rename the immediate table to the original one.

For example, the following statements delete the rows with duplicate emails from the contacts table:

(C) Delete duplicate rows using the ROW_NUMBER() Function

Note: The ROW_NUMBER() function has been supported since MySQL version 8.02, so we should check our MySQL version before using the function.

The following statement uses the ROW_NUMBER () to assign a sequential integer to every row. If the email is duplicate, the row will higher than one.

The following SQL query returns id list of the duplicate rows:

Output:

id
9
12
14

Delete Duplicate Records in Oracle

When we found the duplicate records in the table, we had to delete the unwanted copies to keep our data clean and unique. If a table has duplicate rows, we can delete it by using the DELETE statement.

In the case, we have a column, which is not the part of group used to evaluate the duplicate records in the table.

Consider the table given below:

VEGETABLE_ID VEGETABLE_NAME COLOR
01 Potato Brown
02 Potato Brown
03 Onion Red
04 Onion Red
05 Onion Red
06 Pumpkin Green
07 Pumpkin Yellow



Suppose, we want to keep the row with the highest VEGETABLE_ID and delete all other copies.


MAX(VEGETABLE_ID)
2
5
6
7

We use the DELETE statement to delete the rows whose values in the VEGETABLE_ID COLUMN are not the highest.

Three rows have been deleted.


VEGETABLE_ID VEGETABLE_NAME COLOR
02 Potato Brown
05 Onion Red
06 Pumpkin Green
07 Yellow

If we want to keep the row with the lowest id, use the MIN() function instead of the MAX() function.

The above method works if we have a column that is not part of the group for evaluating duplicate. If all values in the columns have copies, then we cannot use the VEGETABLE_ID column.

Let's drop and create the vegetable table with a new structure.



VEGETABLE_ID VEGETABLE_NAME COLOR
01 Potato Brown
01 Potato Brown
02 Onion Red
02 Onion Red
02 Onion Red
03 Pumpkin Green
04 Pumpkin Yellow

In the vegetable table, the values in all columns VEGETABLE_ID, VEGETABLE_NAME, and color have been copied.

We can use the rowid, a locator that specifies where Oracle stores the row. Because the rowid is unique so that we can use it to remove the duplicates rows.

The query verifies the deletion operation:


VEGETABLE_ID VEGETABLE_NAME COLOR
01 Potato Brown
02 Onion Red
03 Pumpkin Green
04 Pumpkin Yellow

Next TopicNth Highest salary





Youtube For Videos Join Our Youtube Channel: Join Now

Feedback


Help Others, Please Share

facebook twitter pinterest

Learn Latest Tutorials


Preparation


Trending Technologies


B.Tech / MCA