How to Delete Duplicate Rows in SQL?

In this section, we learn different ways to delete duplicate rows in MySQL and Oracle. If the SQL table contains duplicate rows, then we have to remove the duplicate rows.

Preparing sample data

The script creates the table named contacts.

DROP TABLE IF EXISTS contacts;
CREATE TABLE contacts (
id INT PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(30) NOT NULL,
last_name VARCHAR(25) NOT NULL, 
    email VARCHAR(210) NOT NULL,
    age VARCHAR(22) NOT NULL
);

In the above table, we have inserted the following data.

INSERT INTO contacts (first_name,last_name,email,age) 
VALUES ('Kavin','Peterson','kavin.peterson@verizon.net','21'),
       ('Nick','Jonas','nick.jonas@me.com','18'),
       ('Peter','Heaven','peter.heaven@google.com','23'),
       ('Michal','Jackson','michal.jackson@aol.com','22'),
       ('Sean','Bean','sean.bean@yahoo.com','23'),
       ('Tom ','Baker','tom.baker@aol.com','20'),
       ('Ben','Barnes','ben.barnes@comcast.net','17'),
       ('Mischa ','Barton','mischa.barton@att.net','18'),
       ('Sean','Bean','sean.bean@yahoo.com','16'),
       ('Eliza','Bennett','eliza.bennett@yahoo.com','25'),
       ('Michal','Krane','michal.Krane@me.com','25'),
       ('Peter','Heaven','peter.heaven@google.com','20'),
       ('Brian','Blessed','brian.blessed@yahoo.com','20');
       ('Kavin','Peterson','kavin.peterson@verizon.net','30'),

We execute the script to recreate test data after executing a DELETE statement.

The query returns data from the contacts table:

SELECT * FROM contacts
ORDER BY email;

id	first_name	last_name	Email	age
7	Ben	Barnes	[email protected]	21
13	Brian	Blessed	[email protected]	18
10	Eliza	Bennett	[email protected]	23
1	Kavin	Peterson	[email protected]	22
14	Kavin	Peterson	[email protected]	23
8	Mischa	Barton	[email protected]	20
11	Michal	Krane	[email protected]	17
4	Michal	Jackson	[email protected]	18
2	Nick	Jonas	[email protected]	16
3	Peter	Heaven	[email protected]	25
12	Peter	Heaven	[email protected]	25
5	Sean	Bean	[email protected]	20
9	Sean	Bean	[email protected]	20
6	Tom	Baker	[email protected]	30

The following SQL query returns the duplicate emails from the contact table:

SELECT
    email, COUNT(email)
FROM
    contacts
GROUP BY
    email
HAVING
COUNT (email) > 1;

email	COUNT(email)
[email protected]	2
[email protected]	2
[email protected]	2

We have three rows with duplicate emails.

(A) Delete duplicate rows with the DELETE JOIN statement

DELETE t1 FROM contacts t1
INNERJOIN contacts t2 
WHERE
    t1.id < t2.id AND
    t1.email = t2.email;

Output:

Three rows had been deleted. We execute the query, given below to finds the duplicate emails from the table.

SELECT
    email, 
COUNT (email)
FROM
    contacts
GROUP BY
    email
HAVING
COUNT (email) > 1;

The query returns the empty set. To verify the data from the contacts table, execute the following SQL query:

id	first_name	last_name	Email	age
7	Ben	Barnes	[email protected]	21
13	Brian	Blessed	[email protected]	18
10	Eliza	Bennett	[email protected]	23
1	Kavin	Peterson	[email protected]	22
8	Mischa	Barton	[email protected]	20
11	Micha	Krane	[email protected]	17
4	Michal	Jackson	[email protected]	18
2	Nick	Jonas	[email protected]	16
3	Peter	Heaven	[email protected]	25
5	Sean	Bean	[email protected]	20
6	Tom	Baker	[email protected]	30

The rows id's 9, 12, and 14 have been deleted. We use the below statement to delete the duplicate rows:

Execute the script for creating the contact.

DELETE c1 FROM contacts c1
INNERJ OIN contacts c2 
WHERE
    c1.id > c2.id AND
    c1.email = c2.email;

id	first_name	last_name	email	age
1	Ben	Barnes	[email protected]	21
2	Kavin	Peterson	[email protected]	22
3	Brian	Blessed	[email protected]	18
4	Nick	Jonas	[email protected]	16
5	Michal	Krane	[email protected]	17
6	Eliza	Bennett	[email protected]	23
7	Michal	Jackson	[email protected]	18
8	Sean	Bean	[email protected]	20
9	Mischa	Barton	[email protected]	20
10	Peter	Heaven	[email protected]	25
11	Tom	Baker	[email protected]	30

(B) Delete duplicate rows using an intermediate table

To delete a duplicate row by using the intermediate table, follow the steps given below:

Step 1. Create a new table structure, same as the real table:

Step 2. Insert the distinct rows from the original schedule of the database:

INSERT INTO source_copy
SELECT * FROM source
GROUP BY col; 

Step 3. Drop the original table and rename the immediate table to the original one.

DROP TABLE source;
ALTER TABLE source_copy RENAME TO source;

For example, the following statements delete the rows with duplicate emails from the contacts table:

-- step 1
CREATE TABLE contacts_temp
LIKE contacts;

-- step 2
INSERT INTO contacts_temp
SELECT * FROM contacts 
GROUP BY email;

-- step 3
DROP TABLE contacts;

ALTER TABLE contacts_temp
RENAME TO contacts;

(C) Delete duplicate rows using the ROW_NUMBER() Function

Note: The ROW_NUMBER() function has been supported since MySQL version 8.02, so we should check our MySQL version before using the function.

The following statement uses the ROW_NUMBER () to assign a sequential integer to every row. If the email is duplicate, the row will higher than one.

SELECT id, email, ROW_NUMBER() 
OVER (PARTITION BY email 
ORDER BY email
	) AS row_num
FROM contacts;

The following SQL query returns id list of the duplicate rows:

SELECT id
FROM (SELECT id,
ROW_NUMBER() OVER (
PARTITION BY email ORDER BY email) AS row_num
FROM
contacts
) t
WHERE
row_num> 1;

Output:

Delete Duplicate Records in Oracle

When we found the duplicate records in the table, we had to delete the unwanted copies to keep our data clean and unique. If a table has duplicate rows, we can delete it by using the DELETE statement.

In the case, we have a column, which is not the part of group used to evaluate the duplicate records in the table.

Consider the table given below:

VEGETABLE_ID	VEGETABLE_NAME	COLOR
01	Potato	Brown
02	Potato	Brown
03	Onion	Red
04	Onion	Red
05	Onion	Red
06	Pumpkin	Green
07	Pumpkin	Yellow

-- create the vegetable table
CREATE TABLE vegetables (
VEGETABLE_ID NUMBER generated BY DEFAULT AS ID ENTITY,
VEGETABLE_NAME VARCHAR2(100),
color VARCHAR2(20),
        PRIMARY KEY (VEGETABLE_ID)
);

-- insert sample rows
INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Potato','Brown');
INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Potato','Brown');
INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Onion','Red');
INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Onion','Red');
INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Onion','Red');
INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Pumpkin','Green');
INSERT INTO vegetables (VEGETABLE_NAME,color) VALUES('Pumpkin','Yellow');

-- query data from the vegetable table
SELECT * FROM vegetables;

Suppose, we want to keep the row with the highest VEGETABLE_ID and delete all other copies.

SELECT
MAX (VEGETABLE_ID)
FROM
vegetables
GROUP BY
VEGETABLE_NAME,
color
ORDER BY
MAX(VEGETABLE_ID);

MAX(VEGETABLE_ID)

We use the DELETE statement to delete the rows whose values in the VEGETABLE_ID COLUMN are not the highest.

DELETE FROM
vegetables
WHERE
VEGETABLE_IDNOTIN
  (
SELECT
MAX(VEGETABLE_ID)
FROM
vegetables
GROUP BY
VEGETABLE_NAME,
color
);

Three rows have been deleted.

VEGETABLE_ID	VEGETABLE_NAME	COLOR
02	Potato	Brown
05	Onion	Red
06	Pumpkin	Green
07		Yellow

If we want to keep the row with the lowest id, use the MIN() function instead of the MAX() function.

DELETE FROM
vegetables
WHERE
VEGETABLE_IDNOTIN
  (
SELECT
MIN(VEGETABLE_ID)
FROM
vegetables
GROUP BY
VEGETABLE_NAME,
color
  );

The above method works if we have a column that is not part of the group for evaluating duplicate. If all values in the columns have copies, then we cannot use the VEGETABLE_ID column.

Let's drop and create the vegetable table with a new structure.

DROP TABLE vegetables;
CREATE TABLE vegetables (
VEGETABLE_ID NUMBER,
VEGETABLE_NAME VARCHAR2(100),
Color VARCHAR2(20)
);

INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color) VALUES(1,'Potato','Brown');
INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color) VALUES(1, 'Potato','Brown');
INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color)VALUES(2,'Onion','Red');
INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color)VALUES(2,'Onion','Red');
INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color) VALUES(2,'Onion','Red');
INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color) VALUES(3,'Pumpkin','Green');
INSERT INTO vegetables (VEGETABLE_ID,VEGETABLE_NAME,color) VALUES('4,Pumpkin','Yellow');

SELECT * FROM vegetables;

VEGETABLE_ID	VEGETABLE_NAME	COLOR
01	Potato	Brown
01	Potato	Brown
02	Onion	Red
02	Onion	Red
02	Onion	Red
03	Pumpkin	Green
04	Pumpkin	Yellow

In the vegetable table, the values in all columns VEGETABLE_ID, VEGETABLE_NAME, and color have been copied.

We can use the rowid, a locator that specifies where Oracle stores the row. Because the rowid is unique so that we can use it to remove the duplicates rows.

DELETE
FROM
Vegetables
WHERE
rowed NOT IN
(
SELECT
MIN(rowid)
FROM
vegetables
GROUP BY
VEGETABLE_ID,
VEGETABLE_NAME,
color
  );

The query verifies the deletion operation:

VEGETABLE_ID	VEGETABLE_NAME	COLOR
01	Potato	Brown
02	Onion	Red
03	Pumpkin	Green
04	Pumpkin	Yellow

Next TopicNth Highest salary

← prev next →

For Videos Join Our Youtube Channel: Join Now

Feedback

Send your Feedback to [email protected]

Help Others, Please Share

Learn Latest Tutorials

Splunk

SPSS

Swagger

Transact-SQL

Tumblr

ReactJS

Regex

Reinforcement Learning

R Programming

RxJS

React Native

Python Design Patterns

Python Pillow

Python Turtle

Keras

Preparation

Aptitude

Reasoning

Verbal Ability

Interview Questions

Company Questions

Trending Technologies

Artificial Intelligence

AWS

Selenium

Cloud Computing

Hadoop

ReactJS

Data Science

Angular 7

Blockchain

Git

Machine Learning

DevOps

B.Tech / MCA

DBMS

Data Structures

DAA

Operating System

Computer Network

Compiler Design

Computer Organization

Discrete Mathematics

Ethical Hacking

Computer Graphics

Software Engineering

Web Technology

Cyber Security

Automata

C Programming

C++

Java

.Net

Python

Programs

Control System

Data Mining

Data Warehouse

^{Like/Subscribe us for latest updates or newsletter}

SQL Tutorial

SQL Database

SQL Table

SQL Select

SQL Clause

SQL Order By

SQL Insert

SQL Update

SQL Delete

SQL Join

SQL Keys

Difference

SQL Injection

SQL String Functions

Misc

SQL MCQ

PL/SQL Tutorial

Interview

Quiz