How to Find and Remove duplicates in Excel
Microsoft Excel, commonly called Excel, is the world's most popular spreadsheet software or program, allowing users to record vast amounts of data in cells across multiple worksheets in a workbook. While working with massive data sets within the sheets, there may be cases when we may inadvertently enter duplicate values by typing or pasting. Also, we may receive files from other sources that may contain unnecessary duplicate data. Duplicates in Excel sheets can cause serious problems, causing our formula results to be inaccurate and the spreadsheet less useful. Therefore, learning to find and remove duplicates in Excel is essential.
This tutorial discusses various step-by-step methods to find and remove duplicates in our Excel sheet. We can try different methods and make our worksheets error-free or valuable depending on the use cases.
How to find duplicates in Excel?
Excel is powerful spreadsheet software and allows various methods to help us find duplicates in our Excel sheets. The following are some of the most commonly used methods for finding duplicates in Excel:
Finding Duplicates by using the Conditional Formatting
Using the 'Conditional Formatting' tool in Excel is the easiest way to find or highlight duplicates in Excel. It can find duplicates in a single row/column, multiple rows/columns, and even the entire worksheet. To find duplicates using the conditional formatting tool of Excel, we should perform the following steps:
First, we need to select the data where we want to locate the duplicates. In our example, we select the range A2:A18 to find the repeated or duplicate names.
To search for duplicates in the entire data, we can select the entire sheet data using the keyboard shortcut 'Ctrl + A'.
Once the desired data is selected, we must navigate the Home tab and click on the drop-down arrow associated with the Conditional Formatting button. Next, we must select 'Highlight Cells Rules' and 'Duplicate Values' in the succeeding menu lists.
We will be asked to choose the desired color combination from the next window to highlight duplicates in the selected area. We can either select any of the existing color styles or create our custom one as per our liking.
After selecting the desired color style, we must click the OK button. This will instantly highlight the cells with duplicates based on the chosen color style.
Once the duplicates are highlighted, we can review them to decide whether to keep them or remove them.
Finding Duplicates using the Excel Formula in the Conditional Formatting
Excel's Conditional Formatting tool consists of several options. It also allows us to enter the custom formula for various use cases. Because of the Formula support, we can also find duplicates using specific rules. For example, finding and highlighting duplicates without first occurrence, second occurrence, third occurrence, and so on.
We typically use the COUNTIF function to find duplicates using the formula. This function returns TRUE when any corresponding value is repeated multiple times in the given range. The formula to find duplicates in Excel is:
=COUNTIF(Cell Range, Starting cell address) > 1
We must perform the following steps to use the above formula in Excel to find and highlight the duplicates:
First, we need to select the cell or a range of cells where we want to look for duplicate values. This time, we select the entire data range (A1:C18) using the keyboard shortcut 'Ctrl + A'.
After selecting the effective range, we must navigate the Home tab > Conditional Formatting > New Rule. This will launch the 'New Formatting Rule' window.
In the next window, we must select the option 'Use a formula to determine which cells to format' under the section 'Select a Rule Type'.
After completing the last step, we see a new box to enter the desired formula. In the formula box, we apply the COUNTIF function in the following way:
Here, '$A$1:$C$18' represents the absolute reference of the selected range.
After entering the formula to find duplicates, we need to format the corresponding duplicates to highlight them. It will be easier to locate the duplicates in the sheet. For this, we need to click on the 'Format' button in the 'New Formatting Rule' window, which will launch the 'Format Cells' window.
In the next window, we must choose the formatting preferences, such as font, border, and fill style. In our example, we format the data as bold and the font color as red. Later, we must click the OK button in the 'Format Cells' window and the 'New Formatting Rule' window.
This will find and highlight the duplicates in the selected range within the worksheet.
The above image shows that all the values with two or more occurrences are highlighted. However, we can also ignore the first occurrence of the values using the COUNTIF formula.
Suppose we only want to highlight values or duplicates from the second occurrence. Then we can use the below COUNTIF formula in the Formula box under the Formatting Rule window.
Suppose we have more than two occurrences of values in our sheet. We can use the below formula to highlight the 3rd and other duplicate instances:
Similarly, we can use the below formula to highlight the 4th and other duplicate instances:
Suppose we only want to highlight any specific number of occurrences, i.e., 3rd occurrence of the values, then we can use the below formula:
Likewise, we can modify the COUNTIF formula for any other desired case and find or highlight the duplicates in our sheet.
How to remove duplicates in Excel?
Like finding duplicates in Excel, the software also provides various methods to help us remove/delete duplicates in our excel sheets. Following are some of the most commonly used methods to remove duplicates in Excel:
Deleting Duplicates using the Ribbon Tool
The most common method to delete duplicates in our Excel sheet involves using the tool from the ribbon. Although it is the easiest method, it has certain limitations. This method does not allow us to review duplicate entries. Instead, it deletes all the repeated entries at once, keeping only the first occurrence of the duplicates. To delete duplicates using the ribbon tool of Excel, we should perform the following steps:
First, we need to open the sheet to delete the duplicates and select/ highlight the area (range of cells).
Next, we must go to the Data tab and click on the 'Remove Duplicates' button under the 'Data Tools' section. This will launch another window.
In the next window, we need to select the columns from which we want to delete duplicates. If the selected range contains headers, we must tick the 'My data has headers' box. Once all the preferences are made, we must click the OK button to apply the changes.
As soon as we complete the previous step, Excel will notify us regarding the deleted number of duplicates in our sheet.
Since this method deletes random duplicates without allowing us to review them, we must be cautious as it alters our data set. This can sometimes cause errors for formulas applied to the sheet.
Deleting Duplicates using the Advanced Filter
Another easy method to remove duplicates involves using Excel's Advanced Filter tool. We first filter the duplicates using the Advanced Filter tool and copy the unique values to a different sheet. To filter/remove duplicates using the Advanced Filter, we should perform the following steps:
First, we need to select the range where we want to delete the duplicates. Next, we must navigate the Data tab and click on the 'Advanced' button under the 'Sort & Filter' section.
In the Advanced Filter dialogue box, we can filter the data in the same place or copy the filtered data in an entirely new place. It is recommended to use the 'Copy to another location' option to make the original data intact. Also, we must ensure that the List range (source data range) is correctly entered. In the 'Copy to' box, we must specify the range to record the resultant data without duplicates. Lastly, we must select the checkbox associated with the 'Unique records only' option. It will look like this:
Once all the preferences are made, we must click the OK button to apply the changes. The unique data entries (without occurrences) will be copied to the selected destination range, whether the same sheet or the new sheet.
Deleting Duplicates using the Excel Formula
Like applying the COUNTIF formula to find duplicates, we can also use this formula to remove duplicates in a sheet. In this method, the formula first combines the columns and calculates the number count for the recorded values. Later, it filters out duplicate values or counts with more than one occurrence. To delete/remove duplicates using the Excel Formula, we should perform the following steps:
First, we need to insert a new column where we will be combining the existing data columns A, B, and C. We use the Concatenation Operator '&' to combine our data columns and apply the formula in the following way:
We enter the above formula in the first resultant cell (D2) in the newly created column (Combined) and then copy it down to all the cells in the corresponding column.
After combining the data columns, we need to insert another column to find the duplicate value counts. We use the COUNTIF function to find out duplicates and apply the formula in the following way:
Like before, we again enter the formula in the first resultant cell (E2) in the newly created column (Count) and then copy it down to all the cells in the corresponding column. The COUNTIF formula helps us count the total number of occurrences of given data in the combined column, as shown below:
In the above image, we can see that the total Count is shown in column E. If the Count is '1', the corresponding value has only appeared once and is unique. If the Count is more than '1', it indicates multiple occurrences or duplicate values.
Next, we need to filter the Count in column E. Since we only need to keep the first occurrence or the unique values, we must keep the values with only a '1' count. So, we apply the Filter in the Count column by navigating to the Data tab > Soft & Filter > Filter.
Once the Filter has been inserted, we need to click the drop-down filter arrow on the header of the Count column E. After that, we must select the checkbox associated with '1' to keep only the unique values and remove all the remaining duplicate values.
Lastly, we must click the OK button, and all the duplicates will be hidden (filtered) in the sheet. The following image shows that the values with only Count '1' are displayed.
We can copy-paste the filtered data from the filtered columns A, B, and C to another area or an entirely new sheet.
Deleting Duplicates using the Power Query
The power query in Excel is an existing feature that allows us to perform specific operations. We can also take advantage of this feature to delete/ remove duplicates in our sheet. However, this does not highlight the duplicates; thereby, we cannot review the duplicates. Instead, it enables us to remove all the duplicates from the desired column with one click. To delete/remove duplicates using the Excel's Power Query, we should perform the following steps:
First, we need to select the data or a range where we want to remove duplicates.
Next, we need to go to the 'Data' tab and click on the 'From Table/ Range' button.
In older versions of Excel, the Power Query feature can be added as an add-on and accessed directly as a separate tab on the ribbon. In that case, we must navigate Power Query > From Table/ Range.
In the next window, we see a Power Query editor. We need to select the desired column where we want to remove duplicates. If we need to remove duplicates from the multiple columns, we can click on the corresponding column headers while holding down the Ctrl key on the keyboard. This will help us select multiple columns.
After selecting the column (s), we must press the right-click button on any selected column header and click the 'Remove Duplicates' option from the list. This will immediately remove the duplicates in the Power Query editor window.
Lastly, we must click on the 'Close and Load' button at the top to open the filtered data table in the existing Excel sheet. The removed values will not be displayed.
Once the duplicate values are removed, we can adjust the formatting of the sheet accordingly.
Deleting Duplicates using the Pivot Tables
We can also use Excel's Pivot Table feature to remove duplicates. Although rarely used, this feature is easy-to-use and removes duplicates in one go. However, we cannot review the duplicate values. To delete/remove duplicates using the Excel's Pivot Tables, we should perform the following steps:
- First, we need to select the data in which we need to remove the duplicate values or entries.
Next, we must go to the Insert tab and click on the 'Pivot Table' button under the 'Tables' section.
We must click on 'Select a table or range' in the next window. Here, we need to select a range with the data where we want to remove duplicates. Also, we must specify where we want to place the Pivot Table. For example purposes, we select the 'Existing Worksheet' option. Here, we need to select a cell or a range to create a pivot table. We select cell E1 in sheet 1. Once the preferences are made, we must click the OK button to apply the changes.
After clicking the OK button, we must select the columns we want to add to the Pivot Table. We need to use the right-side window for selecting the columns.
Once all the desired columns are added to Pivot Table, we must go to the Design tab and adjust the following preferences:
- Report Layout > Show
- Subtotals > Do not show subtotals
- Report Layout > Repeat all items labels
- Grand Totals > Off for rows and columns
After completing the previous step, we see that the data is filtered and there are no duplicates.
We can copy-paste the filtered data to another location if desired and adjust the formatting accordingly.
Important Points to Remember
- Any value that occurs more than once in the same Excel sheet is called a duplicate.
- It is always recommended to have a backup of the original Excel file before deleting or removing duplicates. This will help us to restore the data in case of incidental uncertainties.
- We can also use third-party tools/ add-ons, such as Duplicate Remover by Ablebits, to remove duplicates in our sheet in an easy way.