Chloe

Posted on Apr 13 • Edited on Jun 9

How to Highlight Duplicates in Excel: A Developer-Friendly Guide

#csharp #dotnet #productivity

If you've ever received a spreadsheet from a product manager and been asked to "just check for duplicates," you know the feeling — it sounds trivial, but the moment you open the file, you realize what you're dealing with: 10,000 rows, multiple columns, no clear definition of what “duplicate” actually means in this context.

Duplicate detection is one of those tasks that sits at the intersection of Excel power users and backend developers. It's too manual to ignore, yet too tightly coupled to a .xlsx file to fit cleanly into your usual data tooling.

In this guide, we'll walk through a set of approaches that scale with your needs—from quick UI tricks to automation and developer-friendly solutions.

What Counts as "Duplicates" in Excel?
Method 1: Conditional Formatting (Built-in UI)
Method 2: COUNTIF Formula-Driven Conditional Formatting
Method 3: VBA Macro — Client-Side Automation
Method 4: Power Query & Dynamic Arrays
Method 5: Highlight Duplicates in Excel Using C# (Server-Side Automation)
Method Comparison at a Glance
Conclusion

What Counts as "Duplicates" in Excel?

Before jumping into solutions, it's worth clarifying what we actually mean by "duplicates," because the definition can vary depending on the scenario.

Different Types of Duplicates

In most real-world cases, duplicates fall into one of three categories:

Single-column duplicates: the same value appears more than once in a column, such as repeated email addresses
Full-row duplicates: every cell in two or more rows is identical.
Composite duplicates: a combination of columns forms a duplicate key, even if no individual column is duplicated on its own. For example, the same '(user_id, date)' pair appearing twice in a transaction log.

The distinction matters, because Excel’s built-in tools don’t always handle these cases the same way.

Common Gotchas

Even when the logic seems straightforward, a few subtle issues can lead to misleading results.

For example, Excel is not case-sensitive by default, which means Apple and apple are treated as the same value. That's convenient in many cases, but it can also hide inconsistencies.

Hidden characters are another common culprit. A value like "123" is not the same as "123 " if there's a trailing space—something that's easy to miss visually.

You'll also run into type mismatches, where "123" (text) and 123 (number) are treated differently. And yes, even Excel isn’t immune to floating-point quirks—calculations like 0.1 + 0.2 may not always behave exactly as expected.

Taken together, these edge cases are why “finding duplicates” is often more nuanced than it appears.

Method 1: Conditional Formatting (Built-in UI)

The fastest way to highlight duplicates — no formulas, no code. Best for one-off checks where you just need to see the problem quickly.

How to do it:

Select the column or range you want to check, then follow this path:

Home → Conditional Formatting → Highlight Cell Rules → Duplicate Values

A dialog appears letting you choose a highlight color. Click OK and Excel immediately colors every cell whose value appears more than once in your selection.

One behavior to know:

If you select multiple columns at once, Excel checks each cell individually — not each row. So if column A has alice twice and column B has bob twice, both get highlighted, but a row where (alice, bob) appears twice would only be caught if both cells independently repeat. If you need row-level or composite duplicate detection, skip ahead to Method 2.

Customizing the format:

The default is a light red fill, but you can set any fill color, font color, or border style. In the same dialog, open the format dropdown and select "Custom Format" to access the full cell formatting options.

Limitations:

No case sensitivity control — Alice and alice are treated as the same
Dynamic ranges aren't tracked automatically — if you add rows later, you'll need to reapply or extend the rule manually
Rules stack: if you have multiple conditional formatting rules on the same range, order matters. Manage them under Home → Conditional Formatting → Manage Rules

Still, for quick, one-off checks, it's often the most efficient option.

Method 2: COUNTIF Formula-Driven Conditional Formatting

When the built-in UI isn't flexible enough — you need case sensitivity, composite keys, or more precise control over which occurrences get flagged — COUNTIF gives you that control without leaving Excel.

The idea is simple: instead of letting Excel decide what counts as a duplicate, you write a formula that returns TRUE or FALSE for each cell, and conditional formatting applies the highlight based on your logic.

Basic setup:

Select your data range, for example B2:B16, then create a new conditional formatting rule:

Home → Conditional Formatting → New Rule → "Use a formula to determine which cells to format."

Enter this formula:

=COUNTIF($B$2:$B$16, B2) > 1

This counts how many times the value in B2 appears across the entire range. If it appears more than once, the cell gets highlighted. The $ signs lock the range while letting the row reference shift as the rule applies to each cell.

If you're familiar with SQL, this is the spreadsheet equivalent of:

SELECT value
FROM table
GROUP BY value
HAVING COUNT(*) > 1

Highlight only the second occurrence onwards:

The basic formula flags every occurrence, including the first. If you want to mark only the duplicates — leaving the original untouched — use this instead:

=COUNTIF($B$2:B2, B2) > 1

The key difference is $B$2:B2 — the start of the range is anchored but the end expands as the rule moves down each row. By the time Excel evaluates row 5, the range is $B$2:B5, so it only counts occurrences up to the current row. The first appearance is never flagged; only repeats are.

Composite duplicate detection:

To check whether a combination of columns forms a duplicate — say (email, signup_date) — concatenate the values into a single key using &:

=COUNTIFS($A$2:$A$100, A2, $B$2:$B$100, B2) > 1

Or with string concatenation across more columns:

=COUNTIF($A$2:$A$100 & $B$2:$B$100, A2 & B2) > 1

This is the closest you'll get to a composite primary key check inside native Excel.

Case-sensitive duplicate detection:

COUNTIF is case-insensitive by default. For case-sensitive matching, use EXACT wrapped in SUMPRODUCT:

=SUMPRODUCT((EXACT($A$2:$A$100, A2)) * 1) > 1

This iterates over every value in the range and checks for an exact match, including case, returning a count. More expensive on large datasets, but it works.

Method 3: VBA Macro — Client-Side Automation

The three methods above all live inside Excel's UI. They work well for interactive use, but they're hard to trigger programmatically, difficult to version control, and impossible to run without opening Excel manually. If you need duplicate highlighting to happen automatically — on file open, on a button click, or as part of a repeatable workflow — VBA is the natural next step.

Basic implementation:

The core logic is straightforward: iterate over a column, track how many times each value has appeared using a Dictionary, then go back and color any cell whose value appeared more than once.

Sub HighlightDuplicates()

    Dim ws As Worksheet
    Dim rng As Range
    Dim cell As Range
    Dim dict As Object
    Dim highlightColor As Long

    Set ws = ThisWorkbook.Sheets("Sheet1")
    Set rng = ws.Range("A2:A1000")
    highlightColor = RGB(255, 199, 206)

    Set dict = CreateObject("Scripting.Dictionary")

    For Each cell In rng
        If cell.Value <> "" Then
            Dim key As String
            key = Trim(CStr(cell.Value))
            If dict.exists(key) Then
                dict(key) = dict(key) + 1
            Else
                dict.Add key, 1
            End If
        End If
    Next cell

    For Each cell In rng
        If cell.Value <> "" Then
            If dict(CStr(cell.Value)) > 1 Then
                cell.Interior.Color = highlightColor
            Else
                cell.Interior.ColorIndex = xlNone
            End If
        End If
    Next cell

End Sub

The first loop builds the frequency map. The second loop applies or removes the highlight based on the count. Splitting into two passes keeps the logic clean and makes it easy to add a "clear all highlights first" step before reapplying.

Making it reusable:

Hard-coding the sheet name and range is fine for a one-off script, but if you plan to reuse this across files, parameterize the key variables:

Sub HighlightDuplicates(sheetName As String, columnRange As String, highlightColor As Long)
    Dim ws As Worksheet
    Dim rng As Range
    Set ws = ThisWorkbook.Sheets(sheetName)
    Set rng = ws.Range(columnRange)
    ' ... same logic as above
End Sub

Call it from elsewhere in your project:

Call HighlightDuplicates("Sheet1", "A2:A1000", RGB(255, 199, 206))

Triggering automatically on file open:

To run the macro every time the workbook opens, add it to the Workbook_Open event in ThisWorkbook:

Private Sub Workbook_Open()
    Call HighlightDuplicates("Sheet1", "A2:A1000", RGB(255, 199, 206))
End Sub

Now every time someone opens the file, duplicates are flagged automatically — no manual steps required.

Performance on large datasets:

VBA loops are slow by default because Excel recalculates and redraws after every change. For datasets over a few thousand rows, disable these before running and re-enable them after:

Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual

' ... your loop here ...

Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic

On a 50,000-row dataset, this alone can reduce runtime from 30+ seconds to 2–3 seconds.

Where VBA falls short:

VBA runs inside Excel, which means it requires Excel to be installed and open. It can't be triggered from a CI pipeline, a backend service, or a scheduled task on a server. If your use case involves processing Excel files without a desktop environment, the next method is what you need.

Method 4: Power Query & Dynamic Arrays

While the previous methods work well within Excel itself, they are harder to integrate into data pipelines or produce clean, structured outputs. If your goal is to output a deduplicated table, feed results into another process, or refresh the analysis automatically when data updates, Power Query and dynamic arrays are better tools.

Dynamic arrays (Excel 365 only):

If you're on Microsoft 365, the UNIQUE() and FILTER() functions let you extract duplicate values directly into a new range — no conditional formatting involved.

To get a list of values that appear more than once:

=FILTER(A2:A100, COUNTIF(A2:A100, A2:A100) > 1)

COUNTIF(A2:A100, A2:A100) returns an array of counts — one per row. FILTER() then keeps only the rows where that count exceeds 1. The result spills automatically into adjacent cells.

Note: On very large datasets, this approach may become slow because COUNTIF is evaluated across the entire range for each element.

To get just the unique values among those duplicates — without repetition:

=UNIQUE(FILTER(A2:A100, COUNTIF(A2:A100, A2:A100) > 1))

This is clean, readable, and updates live as your source data changes. The main constraint is that it requires Excel 365 — it won't work on Excel 2016 or 2019.

Power Query:

Power Query works on Excel 2016 and later, and is better suited for larger datasets or more complex transformations. The approach is to group rows by the columns you care about, count occurrences, then filter for groups with more than one row.

To open Power Query:

Data → Get Data → From Table/Range

Select your data range and click OK. In the Power Query editor:

Select the column you want to check for duplicates
Go to Home → Group By
Group by your target column, add a count column using Count Rows
Click OK — you now have a summary table with each value and its occurrence count
Filter the count column to show only rows where count > 1

Click Close & Load to output the results to a new sheet. Every time you refresh the query (Data → Refresh All), it re-runs against the latest data.

For composite duplicate detection, select multiple columns before grouping — Power Query will treat the combination as a single key.

👉 If you're familiar with SQL, you can think of this as a GROUP BY + HAVING COUNT > 1 workflow, but expressed through a visual interface.

When to use which:

Dynamic arrays are ideal when you want a live formula that updates automatically and the data fits within a single worksheet. Power Query is the better choice when the dataset is large, the transformation logic is complex, or you need a repeatable process that non-technical users can trigger with a single Refresh click.

Neither method requires code, but both offer a significant step up in capability — and both output clean data rather than just painting cells.

Method 5: Highlight Duplicates in Excel Using C# (Server-Side Automation)

Up to this point, all approaches still depend on Excel as the execution environment. That works for manual workflows, but starts to break down when you need to process files in a backend service, a CI/CD pipeline, or any environment where Microsoft Office isn't available.

In these scenarios, using a standalone .NET library becomes the most practical approach. Instead of relying on Excel itself, you work directly with the file format in code — making the solution easier to automate, deploy, and scale.

One option here is Spire.XLS, a .NET library that reads and writes Excel files independently of Office. No COM interop, no Excel process in the background, and no dependency on a desktop environment. You install it via NuGet, point it at a file, and manipulate it like any other object in your codebase.

Setup:

Install the package via NuGet:

dotnet add package Spire.XLS

Or via the NuGet Package Manager in Visual Studio:

Install-Package Spire.XLS

No further configuration needed. The library handles .xlsx and .xls formats out of the box.

Core object model:

Before writing the implementation, it helps to understand three objects you'll use throughout:

Workbook — represents the entire Excel file
Worksheet — a single sheet within the workbook
CellRange — a cell or range of cells, with properties for value, style, and formatting

Basic implementation:

The logic mirrors the VBA approach: build a frequency map, then iterate again to highlight duplicates. The difference is that this runs as a standalone C# program — no Excel required.

using Spire.Xls;
using System.Collections.Generic;
using System.Drawing;

class Program
{
    static void Main(string[] args)
    {
        Workbook workbook = new Workbook();
        workbook.LoadFromFile("data.xlsx");

        Worksheet sheet = workbook.Worksheets[0];
        int startRow = 2;
        int endRow = sheet.LastRow;
        int targetColumn = 2;

        // Build frequency map
        Dictionary<string, List<int>> valueMap = new Dictionary<string, List<int>>();

        for (int row = startRow; row <= endRow; row++)
        {
            CellRange cell = sheet.Range[row, targetColumn];
            string value = cell.Value?.Trim();

            if (string.IsNullOrEmpty(value)) continue;

            if (!valueMap.ContainsKey(value))
                valueMap[value] = new List<int>();

            valueMap[value].Add(row);
        }

        // Apply highlighting to duplicate rows
        Color highlightColor = Color.FromArgb(255, 199, 206);

        foreach (var entry in valueMap)
        {
            if (entry.Value.Count > 1)
            {
                foreach (int row in entry.Value)
                {
                    CellRange cell = sheet.Range[row, targetColumn];
                    cell.Style.Color = highlightColor;
                }
            }
        }

        workbook.SaveToFile("data_highlighted.xlsx", ExcelVersion.Version2013);
        System.Console.WriteLine("Done. Duplicates highlighted.");
    }
}

The Dictionary<string, List<int>> stores each unique value alongside the row numbers where it appears. After the first pass, any key with more than one row number in its list is a duplicate. The second pass iterates over those row numbers and applies the fill color.

Composite duplicate detection:

To check for duplicates across multiple columns — say columns 1 and 2 together — concatenate the cell values into a composite key:

string value = $"{sheet.Range[row, 1].Value?.Trim()}|{sheet.Range[row, 2].Value?.Trim()}";

The | separator prevents false matches where concatenating two different pairs could produce the same string. For example, ("ab", "c") and ("a", "bc") would both produce "abc" without a separator, but "ab|c" and "a|bc" are distinct.

Make sure the separator character does not appear in your actual data, or choose a more robust delimiter.

Skipping the header row:

The startRow = 2 in the implementation above already handles this — adjust the value to match however many header rows your file has.

Custom highlight colors:

Color.FromArgb(r, g, b) accepts any RGB value. To match the default Excel duplicate highlight:

Color highlightColor = Color.FromArgb(255, 199, 206); // light red fill
Color fontColor = Color.FromArgb(156, 0, 6);          // dark red text

cell.Style.Color = highlightColor;
cell.Style.Font.Color = fontColor;

Integrating into a .NET service:

Because this is plain C#, it slots naturally into any .NET application — an ASP.NET endpoint that processes uploaded files, a background worker that validates reports on a schedule, or a console tool invoked from a CI pipeline:

# Example GitHub Actions step
- name: Validate Excel report
  run: dotnet run --project ./ExcelValidator -- --file reports/output.xlsx

The process exits with a non-zero code if duplicates are found, which causes the pipeline step to fail — giving you automatic duplicate detection as a CI gate.

Spire.XLS vs. VBA — when to use which:

VBA is the right tool when the work stays on a desktop, the file is opened interactively, and the person running it has Excel. Spire.XLS is the right tool when the file is processed on a server, the environment has no Office installation, or you need the logic to be testable, version-controlled, and integrated into a larger codebase.

Method Comparison at a Glance

Five methods is a lot to hold in your head. Here's a summary to help you pick the right one for your situation:

Method	Requires coding	Needs Office	Automation	Best for
Conditional Formatting	No	Yes	Manual	Quick one-off visual check
COUNTIF formula	No (formula)	Yes	Semi-auto	Flexible logic, stays in Excel
Power Query / Dynamic arrays	No	Yes (365)	Auto-refresh	Periodic reports, clean output
VBA macro	Yes	Yes	Full auto	Desktop automation, event-driven
C# Spire.XLS	Yes	No	Full auto	Server-side, CI/CD, batch processing

For quick decision-making, here's a simple rule of thumb:

Spot-checking a file someone sent you → Conditional Formatting.
Logic is complex but you're staying in Excel → COUNTIF formula.
You need a refreshable, clean output table → Power Query.
Desktop automation, triggered on open or button click → VBA.
Server-side, no Office, CI/CD pipeline → C# Spire.XLS.

Conclusion

Highlighting duplicates in Excel may seem like a small task, but it often reveals deeper issues in how data is structured and validated. What starts as a quick visual check can quickly turn into questions about consistency, edge cases, and reliability. Depending on your workflow, the right approach can range from simple conditional formatting to more controlled, repeatable solutions using formulas, Power Query, or code. As your data grows and processes become more automated, duplicate detection stops being just an Excel trick and becomes part of a broader data quality strategy. Choosing the right method is less about the tools themselves, and more about where your data logic naturally belongs as your workflow evolves.

DEV Community