Detecting unique arrays in Javascript

#javascript #algorithms #arrays #performance

When Javascript arrays contain primitive values (strings, numbers, undefined, null, booleans and Symbols), there might be cases in which you're willing to detect if the array contains any duplicated elements. in other words, you would want to determine if elements in the array are unique.

There are several approaches you can take to achieve this. let's take a closer look at our options.

Approach 1: Nested loops

In this approach, we will traverse the array, starting from the first element and for each element, we will compare this element to all the other elements to see if there is a match. to achieve this, we will use two for loops, nested into each other.

function isUnique(arr) {
  const len = arr.length;
  for (let i = 0; i < len; i++) {
    for (let j = 0; j < len; j++) {
      // if the elements match, this wouldn't be a unique array
      if (i !== j && arr[i] === arr[j]) {
        return false;
      }
    }
  }
  return true;
}

Although this approach works quite fine with small and semi-small datasets, as the input dataset grows, it gets slower and slower. The slowness of this approach is because of the nested loop. Imagine a dataset of a million numbers. in this dataset, in the worst case, our duplicated element could be the last element in the array and therefore, we would need to compare a million numbers to a million numbers (1 million * 1 million), which is quite slow.

https://jsfiddle.net/farskid/bquo7k8x/12/

Approach 2: Single loop with cached values

In this approach, instead of comparing each element to every other element, we will keep track of the elements we visit and weren't a match for a duplicated element. in other words, we cache what we traverse and just look them up for the next element to check if we've already visited such an element. Because of this visited reference, we only need to compare every element in the array to this reference and therefore, we have to traverse this array only once.

function isUnique(arr) {
  const seenValues = {}

  for (let i = 0; i < arr.length; i++) {
    // we already saw this element in the array
    if (seenValues[arr[i]]) {
      return false;
    } else {
      seenValues[arr[i]] = true
    }
  }

  return true;
}

in the worst case of a million numbers in a dataset, our duplicated element will be the last element but in this approach, we only compare 1 million times. This approach is significantly faster than approach 1. .

https://jsfiddle.net/farskid/zky1mdug/18/

Approach 3: using ES6 `set`

When ES6 came around, we were introduced to a new data structure in Javascript called Sets. Sets are collection of elements that are unique by definition, meaning that if you try to insert a duplicated element into a set, it won't have any effects.

Due to Sets being a collection of unique elements by definition, there is a technique to convert arrays into sets which in turn, results in a unique collection of items in that array, now stored into the set. then a reverse operation will be used to convert that Set back to an array.

In a sense, you could say, Set is used as an intermediate data structure to remove duplicated elements from the array.

Array -> Set -> Array

// convert an array to a set and convert back
function getUniqueArray(arr) {
  return [...new Set(arr)]
}

function isUnique(arr) {
  return getUniqueArray(arr).length === arr.length
}

in this approach, if the number of elements inside the unique array (converted back from Set) is the same as the input array length, it means this array has already been containing unique values and no duplicated values were removed from it to alter the length.

Note: You don't need to convert a Set back to array if you just want to check for uniqueness. You can skip this part of operation totally by checking Set.prototype.size.

// convert an array to a set
function arrayToSet(arr) {
  return new Set(arr)
}

function isUnique(arr) {
  return arrayToSet(arr).size === arr.length
}

Performance comparison

Using any of these 3 approaches interchangeably is fine as long as your dataset is relatively small. for larger datasets, you need to keep an eye on performance of these approaches and how many operations they could execute in a limited duration.

The short answer for performance comparison between these 3 is:

Approach 2 > Approach 3 > Approach 1.

Approach 2 (using single loop with cached values) is significantly faster than the rest. between approach 3 (Set) and approach 1 (Nested loops), approach 3 is also much faster.

To have a better understanding of these performance comparisons, take a look at this benchmark:

https://esbench.com/bench/5e0273c1170166009e5470f7

Side note for whoever is curious

Approach 1 (using nested loops) is of quadratic complexity, meaning that it will result in O(n^2) Time complexity.

Approach 2 (using single loop and cached values) is of linear complexity, meaning that it will result in O(n) Time complexity.

For approach 3, I won't have a strong opinion as I'm not fully aware of how Sets are being implemented in Javascript engines under the hood.

Conclusion for the impatient

Do not pre-optimize for a problem you don't have. Performance optimizations make sense only when you have a large dataset to bring slowness onto surface. for relatively small datasets, it won't matter which approach you take as all will behave fast enough. for larger datasets, always lean towards using approach 2 as benchmarks show it's significantly faster.

Top comments (3)

Stoyan Peshev • Dec 25 '19

I would also consider how those three approaches handle different type values.

For example Approach 2 will not handle objects well (objects as computed properties), and will throw if you pass something that can't be converted to a string, i.e. Object.create(null). Approach 1 will fail with NaN values, it will falsely report [NaN, NaN] as unique.

Approach 3 is the one I would use, or a mix of Approach 2 and 3 (using a set, or preferably a weak set as a caching tool).