Originally posted on Will's blog
In a previous post we saw how to determine whether a JavaScript array contains duplicate values. Today, I want to show a few different methods I've found for removing duplicate values from an array.
Using the Array.prototype.filter()
& Array.prototype.indexOf()
methods
let originalArray = [1, 2, 3, 4, 1, 2, 3, 4]
let uniqueArray = originalArray.filter((item, index, array) => {
return array.indexOf(item) === index
})
// uniqueArray === [1, 2, 3, 4]
The basic strategy here is to iterate through originalArray
and check to see if the index of the item we are currently examining is the same as the index of the item in the originalArray
.
Because indexOf
returns the first index that it finds for a given value, if it isn't a duplicate value then the index for that item must be the same!
Note that this method is not the most efficient: it executes in quadratic time. So if the arrays you're checking are very large in size, you may want to use a different method.
Another thing worth nothing is that we can use the same method to return only the duplicate values by inverting the comparison:
let duplicateArray = originalArray.filter((item, index, array) => {
return array.indexOf(item) !== index
})
Using Array.prototype.reduce()
& Array.prototype.includes()
let originalArray = [1, 2, 3, 4, 1, 2, 3, 4]
let uniqueArray = originalArray.reduce((unique, item) => {
unique.includes(item) ? unique : [...unique, item]
}, [])
// uniqueArray === [1, 2, 3, 4]
Here the strategy is to keep a running list of the unique items in our reducer function's 'accumulator' (unique
). For each item in originalArray
we check to see if the accumulator includes the item under consideration.
- If it does contain the item, return the accumulator without making any changes, in effect 'skipping over' that item.
- If it does not contain the item, spread the values in the accumulator into a new array, and add the item under consideration.
Array.prototype.includes
returns a boolean value -- true
if the value is found in the array, false
if not. This boolean value drives our conditional, determining what to do with each value.
I find this approach less intuitive and harder to read, but it works.
Also note that the empty array that is passed in after the reducer function is the starting value for the accumulator, so the first pass through the reduce
, unique
is an empty array.
⚡ Using the ES6 Set
object ⚡
let originalArray = [1, 2, 3, 4, 1, 2, 3, 4]
let uniqueArray = array => [...new Set(array)]
// or
let uniqueArray = Array.from(new Set(originalArray))
// uniqueArray = [1, 2, 3, 4]
This approach harnesses the power of the Set
object, introduced in ES6.
Sets are guaranteed to preserve the order of the inserted items, and to only contain unique values. Therefore it is by definition impossible for a set to contain duplicates!
Here we call the Set
object's constructor, passing it the array we'd like to construct a Set
from. Then, once we've trimmed out all the duplicates and stored the remaining values in our Set
, we convert back to an array and return the result.
I've seen some discussion of this approach being a bit less performant if the array under consideration is very large and contains many duplicate values. However, the same discussion found that this approach is very efficient in a scenario where the data has very few duplicates.
Personally I think the conciseness of this last approach is enough of a benefit to warrant using the Set
object approach, unless there's a compelling performance reason not to.
Top comments (18)
Surely Set must be faster - the whole thing is an O(N) operation with Set, but it's getting towards
O(N**2)
with anything that has to scan likeincludes
orfilter
... (The reduce version is better thanO(N**2)
as it is only having to test the currently created unique list)I decided to see just how different it is. In this example on jsperf, you can see that the "reduce" is around 180x slower and the indexOf even worse at almost 380x slower. Also the version of "reduce" I've used is slightly faster than your one shown, because it doesn't make an array on each loop and just modifies the one being created.
Thanks for throwing together that jsperf test Mike!
Set does appear to be much faster, which is fantastic because the syntax is so much easier to grok (for me, at least).
Yes, nice that good syntax has great performance haha! Not a normal happy thing that I find! (Like the immutability is a nicer syntax but comes with a penalty).
The approach below is O(n) and should work without ES6. The issue with some of the approaches above is they are using Array.indexOf or Array.includes, which is basically another loop through the array. You basically take a performance hit when the searched values are located near the end of the array. The only downside to the method below is that it's memory intensive for large arrays. You're essentially giving up memory for performance.
A similar function using
Set
isThese have the main advantage over
Array.from(new Set(inputArray))
in that there's no need to iterate the Set after filling it with values, but I've done a little profiling and my snippet has pretty much the same characteristics and timing asArray.from(new Set(inputArray))
(just a little bit slower), even for very large arrays.tbh, I think the big win is that they aren't leaning on JS Set's latent insertion-order stability, making the code more WYSIWYG, and more portable since in many other C-style languages Sets don't have that guarantee.
Thanks Evan! That's also a great way to approach the problem.
Awesome article. First i am seeing sets. Also glad to read comment below that its runtime is better than expected because to me it is more readable. Also i think allot of the array prototype functions can make the code harder to read in the long run. Assuming you know a bit of background on sets, that last example is so concise!
That being said, could you explain how the ... spread operator works with it? Trying to get my head around that at the moment.
Thanks James!
Because the
Set
object implements the iterator property, we can use the ES6 spread syntax to cast the values from theSet
into an array.It really is doing the same thing as the second part of that example, where I called
Array.from()
.Thankyou that makes sense. Spread operator has been a tough one to learn for me. I always feel like i kind if have an idea of what it is doing in one situation but struggle to use it myself.
Sets don't come with the same prototype functions as arrays. They have a
forEach
method, but nomap
orfilter
, for example (I think). The spread operator and square braces there are simply extracting the values from the set and turning them into an array.Set works great when you are dealing with primitive values. But if you are working with some list of items (e.g. list of comment authors in this post), you're better off using a key mapping approach:
Is there any way to count how many times the duplicate object(s) appears in the array before eliminating said duplicates using this method? Asking for a noob trying to solve an annoying kata. :)
Reference link
What is quadratic time?
Use Set:
developer.mozilla.org/en-US/docs/W...
too much coolness in one function
stackblitz.com/edit/typescript-5q3ktr
O (N) time
let originalArray = [1, 2, 3, 4, 1, 2, 3, 4];
for(var i=0,obj={};i<originalArray.length;i++){
obj[originalArray[i]]=null
}
let noDuplicates=Object.keys(obj)
Same using .reduce()...