This is a repost from my medium account. I've been planning to move away from medium for a long time, since they introduced their awful monetization model. Anyway, it's one of my more popular posts so I hope you enjoy it!
For a while I have been thinking about a simple problem, why can’t we use .filter
on array's in Javascript to get all the unique values in the array. The short answer is, you can! But I went on a short journey down JavaScript lane to find the best way to do this.
Overview
To me, filtering this way feels natural. The purpose of applying .filter
to an array is to remove unwanted values i.e. duplicates. There are plenty of ways of achieving this without using .filter
but these feel both less readable and less elegant. An added benefit of using .filter
is that you can also neatly chain off of other array functions (i.e. .map
, .reduce
, and other .filter
calls).
If you don’t care to use .filter
there’s a neat trick you can do with the ES6 Set class. This essentially converts the array to a Set which only stores unique values and then converts it back into an array. It’s clean and elegant.
const unique = arr => [...new Set(arr)];
unique(myArray)
Version 1
After a bit of tinkering the first version I came up with was this...
const unique = (elem, index, array) => {
for (var i = 0; i < index; i++) {
if (array[i] === elem) return false;
}
return true;
};
myArray.filter(unique);
Try it out in runkit.
I was initially pretty happy with this approach.
- It works for strings and numbers (primitives). I wasn’t trying to make this work for deep equality comparisons.
- It only looks at the array elements that precede the current element, so that the first instance of a value will return
true
and any subsequent instances will returnfalse
. It doesn’t need to look at the entire array. - It returns false as soon as the duplicate is found (it doesn’t keep checking the remainder of the array).
What could be better?
After a while, I realized that while Version 1 worked pretty great, but it could be better. Consider this example, an array with 1000 items.
const myArray = [<...998 items that aren't 1>, 1, 1];
When we get to the last item, it will start looking at items that precede it (999 items) to see if the same element already exists. It starts at array position [0]
and will keep checking every item until it reaches the second to last 1
at array position [998]
.
But this feels unnecessary as we already found a 1
so we know any subsequent 1
’s cannot be unique. It would be great if we could somehow cache each unique value we find so as to not need to recheck the entire array for repeated values.
I then started thinking about creating my own cache object, it’s not hard to store values in a cache and provide a function to check if they already exist, but this felt like perhaps I’m reinventing the wheel, this sounds exactly like what Set does.
Version 2
This then got me thinking again about Set()
and it’s elegance, and I started to wonder if I could somehow apply this to an .filter
.
One of the challenges with writing a function to use with .filter is you are limited to the arguments filter provides. It provides you only with the current element, index, and the initial array. It does not, unfortunately, provide you with the array being constructed for the result as you are iterating (like you have access to in .reduce
, the “accumulator”). If this were the case this would be an easier problem to solve.
After some hacking, this is what I came up with…
const unique = () => {
let cache;
return (elem, index, array) => {
if (!cache) cache = new Set(array);
return cache.delete(elem);
};
};
myArray.filter(unique());
Try it out on runkit.
This solution first generates a cache of the entire array using Set
. Then as it iterates it checks each value against the cache using the .delete method. .delete
will remove the element from the cache and return true
if it can be found, or return false
if it cannot.
This mean’s the first time a value is found, it will return true
(for a successful deletion) since every element exists once in the cache, and thus includes it in the result. But on subsequent checks for the same value (i.e. duplicates) it will return false
(the deletion fails) since that value has already been deleted, thus excluding it from the result.
In terms of performance, this solution is great, because we find all the unique values up front, and then each iteration is purely a single check against the list of unique values. There is no need to check the same values repeatedly.
The one caveat is this solution requires a cache to be generated and shared across each call and because of this requires a closure .filter(unique())
rather than .filter(unique)
. But I think this is a fair tradeoff for a performant and clean solution.
The Internet’s Solution
After I published this article, I discovered a pretty neat solution on the internet. It’s a one liner which makes it all the more beautiful.
const unique = (x, i, a) => a.indexOf(x) == i;
myArray.filter(unique);
So I started thinking maybe I wasted my time?
Performance
I believe performance metrics may have changed since I first posted this article, please check JSBench for yourself before deciding on a solution
The final piece of the puzzle was which solution is the most performant. You can see the results from yourself on JSBench.
Honestly, they’re all pretty close. Version 2 is better than Version 1 which is better than “The Internet’s Solution”. That said, they’re all very close and it likely takes a very large data set to make my solution (Version 2) worthwhile. So if this isn’t something you need, I’d probably just go with the one liner.
Wrap up
So that’s pretty much it. You can try out my solutions on runkit or copy and paste into your project. I did also create an npm package called youneek if you prefer as well. If you like this article / solution, the best thanks you can give me is a star on github.
Appendix
What if I do want to filter objects using deep equality? That’s beyond the scope of what I was trying to achieve, but you could probably use the first solution (Version 1) with something like fast-deep-equal to do the comparison.
const isEqual = require('fast-deep-equal');
const deepUnique = (elem, index, array) => {
for (var i = 0; i < index; i++) {
if (isEqual(array[i], elem)) return false;
}
return true;
};
myArray.filter(deepUnique);
Top comments (0)