Javatpoint Logo
Javatpoint Logo

Find and Delete Duplicates From a Table in SQL Server

Many times, in SQL Server databases there are tables that hold repetitive or the same data. Sometimes this repetitive date needs to remove from the table to reduce the redundancy of the whole database. So, in this article, we are going to see how to find and remove duplicate rows from a table in SQL Server.

The various steps that we need to follow in order to find and remove duplicate rows from a table in SQL Server are:

  • The first step is to find the duplicate rows with the help of the GROUP BY clause, or we can also use the ROW_NUMBER() function.
  • Once we have found the duplicate rows successfully, the next step is to use the DELETE statement to remove the duplicate rows.

As discussed above, the first step is to identify or find the duplicates rows in the table that we want to delete for this, we need to perform some operations, and those operations are:

  • The first operation or step is to define criteria for the duplicates rows in a table that will distinguish that particular duplicate row from the other rows that are present in the same table of the database.
  • Now the next step is to write a query that will use the criteria that we have defined in the first step to distinguish the duplicate row in a table.

In this article, for finding the duplicate rows in a table, we will be using the Common Table Expression, also known as CTE. With the help of Common Table Expression or CTE, we can define a temporary named result set that will remain available till the execution of a statement, and that statement can be any statement like SELECT, INSERT, UPDATE, DELETE, or MERGE.

The syntax for a Common Table Expression or CTE in SQL Server is:

In the syntax written above:

  • Name_of_expression: The name_of_expression represents the name that we want to assign to our Common Table Expression or CTE. And this name is referred to later in the SQL statement in order to use this Common Table Expression or CTE.
  • Name_of_column: The name_of_column represents the column or columns on which the Common Table Expression or CTE will be defined.
  • definition_of_CTE: The definition_of_CTE represents the part where the actual definition of the Common Table Expression or CTE will be written.
  • SQL_statement: The SQL_statement represents the SQL statement for which that particular Common Table Expression or CTE is written and that Common Table Expression or CTE will remain valid till the execution of this SQL statement, such as SELECT INSERT, UPDATE, DELETE, or MERGE.

So, now we have an idea about what Common Table Expression or CTE is. We will also discuss Common Table Expression or CTE later in this article also. Now let us see how we will find the duplicate rows in a table. Let us take an example into consideration for understanding the usage of the Common Table Expression or CTE and then using it for finding and removing the duplicate entries in the table.

Let us create a table named students having three columns, namely studID, rollno, and Name having datatypes as varchar, int, and varchar, respectively. The syntax for creating a table with the schema mentioned above using the CREATE TABLE command is:

Output:

Find and Delete Duplicates From a Table in SQL Server

As shown in the image, we have successfully created a table having the schema mentioned above that is three columns, two having data type as varchar and one as int.

Now, let us add some data to the students' table. The syntax of the INSERT command in SQL Server to add data in the students table will be:

Output:

studID     rollNo      Name                          
---------- ----------- ------------------------------
101              87459 Andrew 
102              54771 Samuel 
103              22100 Nirnay 
104              94365 Paul 
105              35479 Casey 
106              74000 Martin 
107              10211 Raphel 
105              35479 Casey 
103              22100 Nirnay 
101              87459 Andrew

As we can see, ten rows of data in the students table are added successfully, and we can see the same as the output of the SELECT query.

We have created our table and added data to the table successfully, and the next step is to find the duplicates entries in the students table using Common Table Expression or CTE. So, we will be creating a Common Table Expression or CTE for displaying all the duplicate rows that are present in the students table. The syntax for creating a Common Table Expression or CTE with the operation mentioned above is:

Output:

studID     rollNo      Name                          
---------- ----------- ------------------------------
101              87459 Andrew
101              87459 Andrew
103              22100 Nirnay
103              22100 Nirnay
105              35479 Casey
105              35479 Casey

In the query written above, we have created a Common Table Expression or CTE named dup_cte, which counts the occurrences of each row in the students table with the use of the COUNT() function and will return only those rows of data from the table whose count is greater than one or we can also say that return only the duplicate rows or the rows that have multiple occurrences in the table. This data of the repetitive rows provided by the Common Table Expression or CTE is then used in the SELECT command to display all the repetitive rows of the students table. Then an INNER JOIN is applied between the students table and the rows provided by the Common Table Expression or CTE named dup_cte. And the final result is displayed here using the SELECT command in the SQL command section of the Common Table Expression or CTE. We can use any command other than the SELECT command to delete or update the data in this SQL command section of the Common Table Expression or CTE according to our need.

So now we have successfully identified or found the duplicate rows in the table that we want to delete, so the next step is to delete those duplicate rows from the table. The syntax for deleting duplicate rows from the students table using the Common Table Expression is:

Output:

studID     rollNo      Name                          
---------- ----------- ------------------------------
101              87459 Andrew
102              54771 Samuel
103              22100 Nirnay
104              94365 Paul
105              35479 Casey
106              74000 Martin
107              10211 Raphel

As we can see in the output of the above query, all the duplicate rows in the student table have been removed successfully. In this query, we have used the ROW_NUMBER() function instead of the COUNT() function to get the duplicate rows in the students table. The Common Table Expression using the ROW_NUMBER() function produced this output as a result of the Common Table Expression or CTE named dup_cte:

studID     rollNo      Name                           row_num             
---------- ----------- ------------------------------ --------------------
101              87459 Andrew                                            	1
101              87459 Andrew                                            	2
102              54771 Samuel                                            	1
103              22100 Nirnay                                            	1
103              22100 Nirnay                                            	2
104              94365 Paul                                              	1
105              35479 Casey                                             	1
105              35479 Casey                                             	2
106              74000 Martin                                            	1
107              10211 Raphel                                            	1

As we can see in the above output, each row has its occurrence number or row number displayed under the row_num named column. So, we used the output produced by the dup_cte named Common Table Expression and deleted only those rows whose row_num is greater than one. Subsequently, that will only delete those rows that appeared more than once in the table. Thus, removing the duplicate rows from the table.

So, with the help of this article, we are able to successfully find and delete duplicates from a table in SQL Server with the help of Common Table Expression or CTE, COUNT() Function, and ROW_COUNT() function.







Youtube For Videos Join Our Youtube Channel: Join Now

Feedback


Help Others, Please Share

facebook twitter pinterest

Learn Latest Tutorials


Preparation


Trending Technologies


B.Tech / MCA