DEV Community

3 ways to remove duplicates in an Array in Javascript

Lenin Felix on September 07, 2021

Let's check, many times (or few) arises the need to remove duplicate elements given in an array, I don't know... it can be because you have to prin...

Read full post

Alex Matei • Sep 8 '21

The first one is sexier

Elijah Allen • Sep 8 '21

The last two are problematic because you are essentially calling a for loop in a for loop which heavily increases how long the algorithms are going to take.

Using a set to remove duplicates is a great to solve this problem.

minhlhc98 • May 31 '24

If the data is in complexity type like Object, Set is probably not a good way

VincentLavello • Jun 27 '24

How does Set() do it?

PatrickStar • Jul 2 '24

The internal implementation of a Set is usually based on a Hash Table. A hash table is a data structure that can be quickly located into a bucket by converting keys into indexes, enabling fast lookup operations.

James Morgan • Sep 8 '21

Don't forget reduce!

chars.reduce((acc, char) => acc.includes(char) ? acc : [...acc, char], []);

Lars Rye Jeppesen • Sep 5 '22

This is my preferred way,. I don't like using Sets

JasimAlrawie • Sep 8 '21

Or
let chars = ['A', 'B', 'A', 'C', 'B'];

let uniqueChars = [];
chars.forEach((e) => {
if (!(e in chars)) {
uniqueChars.push(e);
}
});

console.log(uniqueChars);

Robert Wildling • Mar 9 '22

Shouldn't that be if (!(e in uniqueChars)) { ?

DevMan • Sep 9 '21 • Edited

Complexities are:

O(N)

2 and 3 are O(N^2)

So 1 should always be used imho.

Andrew Hart • May 26 '22 • Edited

This is great for most situations, using a Map -- my personal testing has found that arrays up to a certain length still perform better with reduce, etc than Maps, but beyond N values (which I can't recall the exact amount and I'm sure it varies on the types), Map's absolutely crush them because the op is O(n), as noted by DevMan -- just thought it was worth noting.

Phi Byă • Sep 8 '21

For big array of objects, use reduce and Map:

[{a: 1, b: 2}, {a: 2, b: 3}, {a: 1, b: 2}].reduce((p, c) => p.set(c.a, c), new Map()).values()

dusa • Jul 31 '24

While the code is effective in its operation, it can be difficult to understand for new developers. It can be confusing to use map and reduce together. It is often more understandable to achieve the same result with a simpler solution

PatrickStar • Jul 2 '24

When deduplicating array elements in Vue, you need to consider whether the array itself is responsive.

function uniqueArray(arr) {
    // Check if it's a Vue.js-like reactive array
    const isVueArray = Array.isArray(arr) && arr.__ob__;

    // Function to filter unique elements in an array
    function filterUnique(arr) {
        return arr.filter((item, index, self) =>
            index === self.findIndex((t) => (
                isVueArray ? JSON.stringify(t) === JSON.stringify(item) : t === item
            ))
        );
    }

    if (Array.isArray(arr)) {
        // For simple arrays or non-reactive object arrays
        if (!isVueArray) {
            return Array.from(new Set(arr)); // Return unique values
        } else {
            // For Vue.js or similar reactive arrays
            return filterUnique(arr.slice()); // Return unique reactive array
        }
    } else {
        // Handling complex object arrays
        if (arr instanceof Object) {
            const keys = Object.keys(arr);
            const filteredKeys = filterUnique(keys);
            const result = {};
            filteredKeys.forEach(key => {
                result[key] = arr[key];
            });
            return result; // Return object with unique keys
        } else {
            return arr; // Return unchanged for non-array/non-object input
        }
    }
}

PatrickStar • Jul 2 '24

Handling duplicate elements in a large dataset with an array involves various strategies, such as chunk processing and stream processing, depending on whether the entire dataset can be loaded into memory at once. Here's a structured approach:

Chunk Processing:

1.Chunk Loading: Load the massive dataset in manageable chunks, such as processing 1000 records at a time, especially useful for file-based or network data retrieval.

2.Local Deduplication with Hashing: Use a hash table (like a Map or a plain object) to locally deduplicate each chunk.

function deduplicateChunk(chunk) {
    let seen = new Map()
    let uniqueChunk = []

    for (let item of chunk) {
        if (!seen.has(item)) {
            seen.set(item, true)
            uniqueChunk.push(item)
        }
    }

    return uniqueChunk
}

3.Merge Deduplicated Results: Combine the deduplicated results from each chunk.

function deduplicateLargeArray(arr, chunkSize) {
    let deduplicatedArray = []

    for (let i = 0; i < arr.length; i += chunkSize) {
        let chunk = arr.slice(i, i + chunkSize)
        let deduplicatedChunk = deduplicateChunk(chunk)
        deduplicatedArray.push(...deduplicatedChunk)
    }

    return deduplicatedArray
}

4.Return Final Deduplicated Array: Return the overall deduplicated array after processing all chunks.

Considerations:

1.Performance: Chunk processing reduces memory usage and maintains reasonable time complexity for deduplication operations within each chunk.
2.Hash Collisions: In scenarios with extremely large datasets, hash collisions may occur. Consider using more sophisticated hashing techniques or combining with other methods to address this.

Stream Processing:

Stream processing is suitable for real-time data generation or situations where data is accessed via iterators. It avoids loading the entire dataset into memory at once.

Example Pseudocode:

function* deduplicateStream(arr) {
    let seen = new Map()

    for (let item of arr) {
        if (!seen.has(item)) {
            seen.set(item, true)
            yield item
        }
    }
}

This generator function (deduplicateStream) yields deduplicated elements as they are processed, ensuring efficient handling of large-scale data without risking memory overflow.

In summary, chunk processing and stream processing are two effective methods for deduplicating complex arrays with massive datasets. The choice between these methods depends on the data source and processing requirements, requiring adjustment and optimization based on practical scenarios to ensure desired memory usage and performance outcomes.

András Tóth • May 30 '24

I love the first one, but I wish it had an "equality function" to be provided, as you can't use it deduplicate array of objects (usually you would check for IDs being the same and get rid off the duplicates based on that).

Tu Trinh • Sep 14 '21

Thanks for the article. This is useful.

Vedran-Basic • Sep 8 '21

Imagine having array of objects with n props having only to remove duplicate props with m same properties where m < n and use first of the two or more duplicates of the same unique constraint rule.

That's where science begins. Would be much more interested in hearing different solutions to this topic.