Handling Array Duplicates Can Be Tricky

Milos Protic on May 11, 2019

Originally posted on my-blog. Check it out for more up to date content Let's begin by defining a simple array: const cars = [ 'Mazda', ... [Read Full]
markdown guide
 

there is also this:

const arrayWithoutDuplicates = Array.from(new Set(arrayWithDuplicates))

 

Same question @Eshun Sharma :) This is the reason why the title says "tricky"

 

This way looks great, but slow on the large data set. I prefer the for loop.

 

Have tried set to find uniques in a simple array and it works great, but would this work if its a nested array, an array of objects or multi dimensional arrays?

 

since arrays and objects are reference type it may work in some cases, ex:

let x = {a: 1};
let y = [];

y.push(x);
y.push(x);

console.log(Array.from(new Set(y))); // will output [{a: 1}]

but if you want an array of objects with the same structure it wont work:

console.log(Array.from(new Set([{a: 1}, {a: 1}]))) // will output [{a:  1}, {a: 1}]

if you want to compare structures it will mostly depend on the kind of array you have, for example an array of database documents may be filtered based on _id or name:

let withoutDuplicates = [];

withDuplicates.forEach(doc => if (!withoutDuplicates.find(e => e.name === doc.name) withoutDuplicates.push(doc)));

in most cases the arrays items will have a predetermined format, it's very rare to need multi purpose duplicate remover, at least i didn't need one till now, it also may do some checks that aren't really necessary for what you want to do, which may be bad for performance, but may be helpful to speed up development.

This is interesting, passing objects with same reference vs passing same objects with different references.

Thanks for the explanation :)

 

By visiting all the keys for each element in the source you are making an algorithm of O(nm). Also if the arrays are long enough it will miss the CPU cache for each source item. Its an efficient way basically.

An alternative is to consume more memory and keep the unique elements in a map and going trough the source only once. This way the checks are O(1).

If the elements are objects the memory overhead will be small as you store references.

 

There's another article discuss the equality of two js objects: One language uses the concept of object equivalence to compare data structures, while the other checks explicitly for object identity.dev.to/annarankin/equality-of-data...

 

I'd just use JSON.stringify(obj) when comparing objects. Of course it all depends on the things you're trying to compare, but that is really the most simple way if you have a structured data set. Perhaps a performance expert can tell us if this method is expensive though!

 

I'm no expert, but you will loose some item in your array (every non serializable like functions), it will fail on non utf8 encoded string, throws on circular reference and order will matter.

Overall, JSON is not a great way to compare complex object or array.

 

Wiltel49#2019 Updated!!! Great article about Arrays Both Unique and Duplicate copy. Well done!!!

 
 
 

Damn, if I had this article when I was struggling with arrays my motivation would have no end 😁 really great!

 
 
 

The "tricky" part is related to object duplicates. I'm not sure this approach would work with an array populated with non-primitive values

code of conduct - report abuse