Javatpoint Logo
Javatpoint Logo

How to prevent duplicates in a column in Excel

Excel is widely used by many people based on their requirements. It is a combination of rows and columns used to enter the data and perform calculations. Sometimes data contains similar or duplicate values, leading to calculation errors. Rectifying this issue manually takes more time and is difficult for the larger data set. Excel provides various functions and formulas to find and remove duplicates in Excel.

Reason for duplicate data in Excel

There are several reasons for occurring duplicate data in columns. The reason are listed below as follows,

  1. Manual Entry: Duplicates may occur when data is manually entered into the column, and the same value is entered multiple times
  2. Importing data: When importing data from external sources such as databases or text files, duplicates may be introduced if the data source itself contains duplicates
  3. Data Merging: Merging data from multiple sources can also result in duplicates if the data set contains common values
  4. Human Error: Other common reasons for duplicates include human error, such as accidentally copying and pasting the same value multiple times, or mistyping data.
  5. Formulas: In some cases, formulas can also result in duplicates if the formula is not properly set up or if the data used in the formula contains duplicates.

Identifying and removing duplicates is important to maintain the accuracy and integrity of your data.

Why remove duplicate data?

Removing duplicate data in Excel is important for several reasons:

Data Integrity: Duplicate data can compromise the accuracy and reliability of your data analysis. When you remove duplicates, you ensure that each record in your data is unique, which helps to eliminate errors and inconsistencies.

Space Optimization: Duplicate data can take up valuable storage space, especially when working with large data sets. Removing duplicates helps to optimize the space your data takes up, making your workbook more efficient.

Improved Analysis: Having duplicate data can skew your analysis and lead to incorrect results. Removing duplicates ensures that your data analysis is based on accurate, up-to-date information.

Better Organization- Duplicate data can make it difficult to locate specific records or find patterns in your data. By removing duplicates, you can better organize your data and make it easier to find the information you need.

What is called filtering the unique values?

Filtering the unique values is identifying and removing duplicate data points within a dataset. The objective is to retain only one unique occurrence of each value in the data so the resulting data contains no repeating elements. This process can be performed in various programming languages and data processing tools and can be applied to different data types, such as numbers, strings, and dates. The outcome of filtering unique values is a cleaned and more meaningful dataset, free of any repetitive values that might cause confusion or misleading result in data analysis.

Methods to remove the duplicate data in the Excel column

Excel provides several inbuilt functions and methods to remove the duplicates as follows,

1. Using Sort and Filter Options

The sort and filter option is a powerful tool available in many software applications that analyze and organize data more effectively and efficiently. The steps to be followed to use the sort and filter option are as follows,

Step 1: Enter the data in the worksheet, namely A1:A10

Step 2: Select the range of data where the unique values are filtered. Here the range A1:A10 is selected.

Step 3: Choose the "Advanced" option in the Sort and Filter option in the Data Tab.

Step 4: The Advanced Filter option displays in the worksheet. In that, click the unique records option.

How to prevent duplicates in a column in Excel

Step 5: It removes the duplicate values and displays the unique values.

How to prevent duplicates in a column in Excel

From the above worksheet, cell A10 contains the data "Pine apple." The sort and filter option removes the duplicate values 9 (cell A10) and displays the unique values.

2. Using Conditional Formatting to find Duplicate Values

In Excel, you can use conditional formatting to highlight duplicates in various cells. To remove the duplicates, you can use the "Remove Duplicates" feature under the "Data" tab.

The steps to be followed to use conditional formatting are as follows,

Step 1: Enter the data in the worksheet, namely A1:A10

Step 2: Select the range of cells to find the duplicate data. Here A1:A10 is selected.

Step 3: Click the Home tab>Conditional Formatting> Highlight Cell Rules>Duplicate Values.

How to prevent duplicates in a column in Excel

Step 4: The Duplicate dialog box will open. The drop-down list displays two options, Unique and Duplicate Values. To find the duplicate values, choose the option "Duplicate" and choose the desired color. The color chosen is "Light Red Fill with Dark Red Text."

How to prevent duplicates in a column in Excel

Step 5: Click OK. The duplicate values are highlighted with the selected color.

How to prevent duplicates in a column in Excel

2.1 Using Conditional Formatting to find Unique Values

In Excel, you can use conditional formatting to highlight unique values in a range of cells. The steps to be followed are,

Step 1: Enter the data in the worksheet, namely A1:A10

Step 2: Select the range of cells to find the duplicate data. Here A1:A10 is selected.

Step 3: Click the Home tab>Conditional Formatting> Highlight Cell Rules>Duplicate Values.

How to prevent duplicates in a column in Excel

Step 4: The Duplicate dialog box will open. The drop-down list displays two options, Unique and Duplicate Values. To find the unique values, choose the option "Unique" and choose the desired color. Here the color chosen is "Green Fill with Dark Green Text."

How to prevent duplicates in a column in Excel

Step 5: Click OK. The unique values are highlighted with the selected color.

How to prevent duplicates in a column in Excel

3. Using the Remove Duplicates Option

In Excel, one can use the Remove Duplicate option from the Home tab. This option permanently removes the duplicate data. Before applying the Remove Duplicate option, copy and paste the original data into another worksheet for future reference.

The steps to be followed to use the Remove Duplicate option are as follows,

Step 1: Enter the data in the worksheet, namely A1:A10

How to prevent duplicates in a column in Excel

Step 2: Select the range of cells to find the duplicate data. Here A1:A10 is selected.

Step 3: Choose Data>Remove Duplicate option from the Data Tab.

How to prevent duplicates in a column in Excel

Step 4: The Remove Duplicate dialog box will open. In that, choose the required column.

How to prevent duplicates in a column in Excel

Step 5: Click Ok. The duplicate values are removed, and it displays the unique values.

How to prevent duplicates in a column in Excel

It displays the number of duplicate values in the data, and unique values remain the same.

Step 6: The unique values are displayed in the worksheet.

How to prevent duplicates in a column in Excel

4. Preventing duplicate Entries using the Data Validation option

In Excel, one can use the Data Validation option to prevent duplicate data from being entered into a cell or range of cells. The steps to be followed to prevent duplicate entries are as follows,

Step 1: Enter the data in the worksheet, namely A1:A0.

How to prevent duplicates in a column in Excel

Step 2: Select the range of cells to prevent duplicate entry. Here cell range A1:A20 is selected.

Step 3: Choose the data validation option from the data tab. There display three options that choose "Data Validation."

How to prevent duplicates in a column in Excel

Step 4: The Data Validation dialog box will open. Choose the "Custom" option in the Allow dialog box.

Step 5: In the formula, the box enters the formula as =COUNTIF ($A$1:$A$20, A1)=1.

Step 6: Press Ok.

How to prevent duplicates in a column in Excel

From the formula, this function counts the number of values in cell A1:A20 equal to that in cell A1. Again this value may only occur once (=1) since there doesn't need duplicate values.

Step 7: Enter duplicate data in the cell range A11:A20. Here the flower name Tulips is entered in cell A11.

How to prevent duplicates in a column in Excel

There displays a message that the value entered is not valid. As this value already contains in the cell, the data entered is

invalid.

The function accepts the new value if unique data is entered in the cell range A11:A20.

Step 8: To display the Error Alert Message, Click the Error Alert Tab in the Data Validation tab. In the Title box, enter "Duplicate Entry."

Step 9: In the Error Message dialog box, enter the message as "The value already exist. All value must be unique. Try again".

Step 10: Press Ok. If the data entered is not valid, the Error alert message will be displayed as follows,

How to prevent duplicates in a column in Excel

Summary

Preventing duplicate data is an important aspect of data management that helps ensure information accuracy and reliability. Excel provides several methods to prevent duplicates which are explained in this tutorial. By implementing these methods, organizations can ensure that their data is accurate and up-to-date and make informed decisions based on the data they collect.







Youtube For Videos Join Our Youtube Channel: Join Now

Feedback


Help Others, Please Share

facebook twitter pinterest

Learn Latest Tutorials


Preparation


Trending Technologies


B.Tech / MCA