DEV Community

Cover image for Handling Array Duplicates Can Be Tricky

Handling Array Duplicates Can Be Tricky

Milos Protic on May 11, 2019

Originally posted on my-blog. Check it out for more up to date content Let's begin by defining a simple array: const cars = [ 'Mazda', ...
Collapse
 
rokuem profile image
Mateus Amorim

there is also this:

const arrayWithoutDuplicates = Array.from(new Set(arrayWithDuplicates))

Collapse
 
proticm profile image
Milos Protic

Same question @Eshun Sharma :) This is the reason why the title says "tricky"

Collapse
 
eshunsharma profile image
Eshun Sharma

Have tried set to find uniques in a simple array and it works great, but would this work if its a nested array, an array of objects or multi dimensional arrays?

Collapse
 
rokuem profile image
Mateus Amorim • Edited

since arrays and objects are reference type it may work in some cases, ex:

let x = {a: 1};
let y = [];

y.push(x);
y.push(x);

console.log(Array.from(new Set(y))); // will output [{a: 1}]

but if you want an array of objects with the same structure it wont work:

console.log(Array.from(new Set([{a: 1}, {a: 1}]))) // will output [{a:  1}, {a: 1}]

if you want to compare structures it will mostly depend on the kind of array you have, for example an array of database documents may be filtered based on _id or name:

let withoutDuplicates = [];

withDuplicates.forEach(doc => if (!withoutDuplicates.find(e => e.name === doc.name) withoutDuplicates.push(doc)));

in most cases the arrays items will have a predetermined format, it's very rare to need multi purpose duplicate remover, at least i didn't need one till now, it also may do some checks that aren't really necessary for what you want to do, which may be bad for performance, but may be helpful to speed up development.

Thread Thread
 
eshunsharma profile image
Eshun Sharma

This is interesting, passing objects with same reference vs passing same objects with different references.

Thanks for the explanation :)

Collapse
 
bgadrian profile image
Adrian B.G.

By visiting all the keys for each element in the source you are making an algorithm of O(nm). Also if the arrays are long enough it will miss the CPU cache for each source item. Its an efficient way basically.

An alternative is to consume more memory and keep the unique elements in a map and going trough the source only once. This way the checks are O(1).

If the elements are objects the memory overhead will be small as you store references.

Collapse
 
jimbotsov profile image
JimboTSoV

I'd just use JSON.stringify(obj) when comparing objects. Of course it all depends on the things you're trying to compare, but that is really the most simple way if you have a structured data set. Perhaps a performance expert can tell us if this method is expensive though!

Collapse
 
gmartigny profile image
Guillaume Martigny

I'm no expert, but you will loose some item in your array (every non serializable like functions), it will fail on non utf8 encoded string, throws on circular reference and order will matter.

Overall, JSON is not a great way to compare complex object or array.

Collapse
 
shnydercom profile image
Jonathan Schneider

Agreed, it's not great in most cases. However, sometimes it is just practical from a dev productivity point of view:

  • writing your own datastructure-comparison will introduce more error sources, especially if you need to refactor. So writing your own is only economical on frequently used code
  • order: using service workers or persisting to localStorage means your data structure will "exit" the JavaScript VM's context. Then you definitely shouldn't use it. If you always generate it the same way in the same runtime context, you're safe(r)
  • it can have better performance than other comparison algorithms, especially on smaller datasets
  • Sometimes you don't want to have functions in your objects, by design. E.g. when using redux. Here it's also safe(r)

If you use it, wrap it in your own compareXYZ() util-function, so you can
a) adjust it later
b) see that function in your profiler pop up

Collapse
 
qcgm1978 profile image
Youth

There's another article discuss the equality of two js objects: One language uses the concept of object equivalence to compare data structures, while the other checks explicitly for object identity.dev.to/annarankin/equality-of-data...

Collapse
 
moopet profile image
Ben Sinclair

I find this quite difficult to read with the reuse of names like resultItem and the double-negation of things like !notFound. I think the first example is better (the one without the reduce(), because it's more readable.

Aren't you deferring the problem, though? You've moved the comparison to checking the properties of an object for equality, but if those properties are also objects... you're back to square one. So you'd need to recurse, and do a deep comparison, which is expensive and has to have some compromises of its own (like picking a max depth or facing what to do if there's recursion in the object itself).

Collapse
 
proticm profile image
Milos Protic

Yes, I see your point about the double negation, I admit it should be done the other way around to improve readability, especially due to the reason that this post was written to show what is going on under the hood while finding duplicates.

About the recursion, the assumption for the given example was that we have only one level. As you said, a deep comparison is a thing of its own which wasn't the focus here.

If you are interested, take a look at the same post on devinduct.com and see Scott Sauyet comment. It's a quite interesting way to do the same thing.

Collapse
 
enriquemorenotent profile image
Enrique Moreno Tent

My solution was to use _.isEqual from "lodash"

lodash.com/docs/4.17.11#isEqual

Collapse
 
wiltel492019 profile image
Wiltel492019 • Edited

Wiltel49#2019 Updated!!! Great article about Arrays Both Unique and Duplicate copy. Well done!!!

Collapse
 
proticm profile image
Milos Protic

Thank you!

Collapse
 
wiltel492019 profile image
Wiltel492019

U are welcome my Brother!!! Gitter done. Wiltel49#2019 AAS ITI MICHIGAN!!!

Collapse
 
kristijanfistrek profile image
KristijanFištrek

Damn, if I had this article when I was struggling with arrays my motivation would have no end 😁 really great!

Collapse
 
proticm profile image
Milos Protic

Thanks mate 👍

Collapse
 
adam_cyclones profile image
Adam Crockett 🌀

Or just use a Set and array.from ?

Collapse
 
proticm profile image
Milos Protic

The "tricky" part is related to object duplicates. I'm not sure this approach would work with an array populated with non-primitive values