DEV Community

Cover image for 3 ways to remove duplicates in an Array in Javascript
Lenin Felix
Lenin Felix

Posted on • Updated on

3 ways to remove duplicates in an Array in Javascript

Let's check, many times (or few) arises the need to remove duplicate elements given in an array, I don't know... it can be because you have to print a list from the super, remove a student that duplicated his record in a form, infinity of things, so let's see some ways to do this:

1) Use Set

Using Set(), an instance of unique values will be created, implicitly using this instance will delete the duplicates.

So we can make use of this instance and from there we will have to convert that instance into a new array, and that would be it:

let chars = ['A', 'B', 'A', 'C', 'B'];
let uniqueChars = [...new Set(chars)];

console.log(uniqueChars);
Enter fullscreen mode Exit fullscreen mode

Output:

['A', 'B', 'C']
Enter fullscreen mode Exit fullscreen mode

2) Using the indexOf() and filter() methods

The indexOf() method returns the index of the first occurrence of the element in the array:

let chars = ['A', 'B', 'A', 'C', 'B'];
chars.indexOf('B');
Enter fullscreen mode Exit fullscreen mode

Output:

1
Enter fullscreen mode Exit fullscreen mode

The duplicate element is the element whose index is different from its indexOf() value:

let chars = ['A', 'B', 'A', 'C', 'B'];

chars.forEach((element, index) => {
    console.log(`${element} - ${index} - ${chars.indexOf(element)}`);
});
Enter fullscreen mode Exit fullscreen mode

Output:

A - 0 - 0
B - 1 - 1
A - 2 - 0
C - 3 - 3
B - 4 - 1
Enter fullscreen mode Exit fullscreen mode

To eliminate duplicates, the filter() method is used to include only the elements whose indexes match their indexOf values, since we know that the filer method returns a new array based on the operations performed on it:

let chars = ['A', 'B', 'A', 'C', 'B'];

let uniqueChars = chars.filter((element, index) => {
    return chars.indexOf(element) === index;
});

console.log(uniqueChars);
Enter fullscreen mode Exit fullscreen mode

Output:

['A', 'B', 'C']
Enter fullscreen mode Exit fullscreen mode

And if by chance we need the duplicates, we can modify our function a little, just by changing our rule:

let chars = ['A', 'B', 'A', 'C', 'B'];

let dupChars = chars.filter((element, index) => {
    return chars.indexOf(element) !== index;
});

console.log(dupChars);
Enter fullscreen mode Exit fullscreen mode

Output:

['A', 'B']
Enter fullscreen mode Exit fullscreen mode

3) Using the includes() and forEach() methods

The include() function returns true if an element is in an array or false if it is not.

The following example iterates over the elements of an array and adds to a new array only the elements that are not already there:

let chars = ['A', 'B', 'A', 'C', 'B'];

let uniqueChars = [];
chars.forEach((element) => {
    if (!uniqueChars.includes(element)) {
        uniqueChars.push(element);
    }
});

console.log(uniqueChars);
Enter fullscreen mode Exit fullscreen mode

Output:

['A', 'B', 'C']  
Enter fullscreen mode Exit fullscreen mode

Basically, we have options to solve this type of problem, so don't get stuck anymore and you can use whichever one appeals to you.

If you liked the content you can follow me on my social networks as @soyleninjs

ko-fi

Top comments (24)

Collapse
 
sash20m profile image
Alex Matei

The first one is sexier

Collapse
 
eaallen profile image
Elijah Allen

The last two are problematic because you are essentially calling a for loop in a for loop which heavily increases how long the algorithms are going to take.

Using a set to remove duplicates is a great to solve this problem.

Collapse
 
minhlhc98 profile image
minhlhc98

If the data is in complexity type like Object, Set is probably not a good way

Collapse
 
vincentlavello profile image
VincentLavello

How does Set() do it?

Collapse
 
jackonwei profile image
PatrickStar

The internal implementation of a Set is usually based on a Hash Table. A hash table is a data structure that can be quickly located into a bucket by converting keys into indexes, enabling fast lookup operations.

Collapse
 
jimjam88 profile image
James Morgan

Don't forget reduce!

chars.reduce((acc, char) => acc.includes(char) ? acc : [...acc, char], []);

Collapse
 
spock123 profile image
Lars Rye Jeppesen

This is my preferred way,. I don't like using Sets

Collapse
 
phibya profile image
Phi Byă

For big array of objects, use reduce and Map:

[{a: 1, b: 2}, {a: 2, b: 3}, {a: 1, b: 2}].reduce((p, c) => p.set(c.a, c), new Map()).values()

Collapse
 
devman profile image
DevMan • Edited

Complexities are:

  1. O(N)

2 and 3 are O(N^2)

So 1 should always be used imho.

Collapse
 
andrewjhart profile image
Andrew Hart • Edited

This is great for most situations, using a Map -- my personal testing has found that arrays up to a certain length still perform better with reduce, etc than Maps, but beyond N values (which I can't recall the exact amount and I'm sure it varies on the types), Map's absolutely crush them because the op is O(n), as noted by DevMan -- just thought it was worth noting.

Collapse
 
jackonwei profile image
PatrickStar

When deduplicating array elements in Vue, you need to consider whether the array itself is responsive.

function uniqueArray(arr) {
    // Check if it's a Vue.js-like reactive array
    const isVueArray = Array.isArray(arr) && arr.__ob__;

    // Function to filter unique elements in an array
    function filterUnique(arr) {
        return arr.filter((item, index, self) =>
            index === self.findIndex((t) => (
                isVueArray ? JSON.stringify(t) === JSON.stringify(item) : t === item
            ))
        );
    }

    if (Array.isArray(arr)) {
        // For simple arrays or non-reactive object arrays
        if (!isVueArray) {
            return Array.from(new Set(arr)); // Return unique values
        } else {
            // For Vue.js or similar reactive arrays
            return filterUnique(arr.slice()); // Return unique reactive array
        }
    } else {
        // Handling complex object arrays
        if (arr instanceof Object) {
            const keys = Object.keys(arr);
            const filteredKeys = filterUnique(keys);
            const result = {};
            filteredKeys.forEach(key => {
                result[key] = arr[key];
            });
            return result; // Return object with unique keys
        } else {
            return arr; // Return unchanged for non-array/non-object input
        }
    }
}
Enter fullscreen mode Exit fullscreen mode
Collapse
 
jackonwei profile image
PatrickStar

Handling duplicate elements in a large dataset with an array involves various strategies, such as chunk processing and stream processing, depending on whether the entire dataset can be loaded into memory at once. Here's a structured approach:

Chunk Processing:

1.Chunk Loading: Load the massive dataset in manageable chunks, such as processing 1000 records at a time, especially useful for file-based or network data retrieval.

2.Local Deduplication with Hashing: Use a hash table (like a Map or a plain object) to locally deduplicate each chunk.

function deduplicateChunk(chunk) {
    let seen = new Map()
    let uniqueChunk = []

    for (let item of chunk) {
        if (!seen.has(item)) {
            seen.set(item, true)
            uniqueChunk.push(item)
        }
    }

    return uniqueChunk
}

Enter fullscreen mode Exit fullscreen mode

3.Merge Deduplicated Results: Combine the deduplicated results from each chunk.

function deduplicateLargeArray(arr, chunkSize) {
    let deduplicatedArray = []

    for (let i = 0; i < arr.length; i += chunkSize) {
        let chunk = arr.slice(i, i + chunkSize)
        let deduplicatedChunk = deduplicateChunk(chunk)
        deduplicatedArray.push(...deduplicatedChunk)
    }

    return deduplicatedArray
}

Enter fullscreen mode Exit fullscreen mode

4.Return Final Deduplicated Array: Return the overall deduplicated array after processing all chunks.

Considerations:

1.Performance: Chunk processing reduces memory usage and maintains reasonable time complexity for deduplication operations within each chunk.
2.Hash Collisions: In scenarios with extremely large datasets, hash collisions may occur. Consider using more sophisticated hashing techniques or combining with other methods to address this.

Stream Processing:

Stream processing is suitable for real-time data generation or situations where data is accessed via iterators. It avoids loading the entire dataset into memory at once.

Example Pseudocode:

function* deduplicateStream(arr) {
    let seen = new Map()

    for (let item of arr) {
        if (!seen.has(item)) {
            seen.set(item, true)
            yield item
        }
    }
}

Enter fullscreen mode Exit fullscreen mode

This generator function (deduplicateStream) yields deduplicated elements as they are processed, ensuring efficient handling of large-scale data without risking memory overflow.

In summary, chunk processing and stream processing are two effective methods for deduplicating complex arrays with massive datasets. The choice between these methods depends on the data source and processing requirements, requiring adjustment and optimization based on practical scenarios to ensure desired memory usage and performance outcomes.

Collapse
 
jasimalrawie profile image
JasimAlrawie

Or
let chars = ['A', 'B', 'A', 'C', 'B'];

let uniqueChars = [];
chars.forEach((e) => {
if (!(e in chars)) {
uniqueChars.push(e);
}
});

console.log(uniqueChars);

Collapse
 
rowild profile image
Robert Wildling

Shouldn't that be if (!(e in uniqueChars)) { ?

Collapse
 
latobibor profile image
AndrΓ‘s TΓ³th

I love the first one, but I wish it had an "equality function" to be provided, as you can't use it deduplicate array of objects (usually you would check for IDs being the same and get rid off the duplicates based on that).

Collapse
 
tutrinh profile image
Tu Trinh

Thanks for the article. This is useful.

Collapse
 
vedranbasic profile image
Vedran-Basic

Imagine having array of objects with n props having only to remove duplicate props with m same properties where m < n and use first of the two or more duplicates of the same unique constraint rule.

That's where science begins. Would be much more interested in hearing different solutions to this topic.

Collapse
 
monello profile image
MornΓ© Louw

Then write your own article an stop trying to sound smart on someone else's post

Collapse
 
phibya profile image
Phi ByΔƒ • Edited

Simply use reduce and Map:

[{a: 1, b: 2, c: 3}, {a: 2, b: 3, c: 4}, {a: 1, b: 2, c: 5}].reduce((p, c) => p.set([c.a, c.b].join('|'), c), new Map()).values()

Edit: For selecting the first value, use Map.has before setting the value.

Collapse
 
lucianodouglasm2 profile image
Luciano Douglas Machado Chagas

I usually do it using Define, it seems to be more performative. Could you show the performance of each of these modalities?

Collapse
 
tohodo profile image
Tommy

What is an efficient way to perform this operation "in place"?

Collapse
 
emmanuelayinde profile image
The Great SoluTion

If I were you I would go for the first one. Set way

Collapse
 
antonvryckij profile image
Anton Vryckij

Thank you for the article!

Collapse
 
nahid570 profile image
Nahid Faraji

I wanna remove from array of object where multiple object contain same id but I wanna remove only first one. How?