DEV Community

Cover image for In search of a faster uniqueBy
Andrew Durber
Andrew Durber

Posted on • Updated on

In search of a faster uniqueBy

In a project I'm working on I'm trying to keep it as lean as possible, which means I haven't reached for libraries like Lodash. Instead, I've challenged myself to hand-roll everything I need.

I needed to get an array of items unique by a given key, just like the Lodash uniqBy. I had a quick Google around to see how other folks are approaching it.

I came across the following approach:

function uniqueBy(myArr, prop) {
  // Iterate over the array and filter out duplicates
  return myArr.filter((obj, pos, arr) => {
    // Map over the array and get the values from the key.
    return arr.map(mapObj => mapObj[prop]).indexOf(obj[prop]) === pos
  })
}
Enter fullscreen mode Exit fullscreen mode

While this works, I was not too fond of mapping inside the filter. So I set up some tests around my function and started created benchmarks on jsPerf.

So anyway I started benchmarking

With an array of 10,000 items, this had a whopping 0.63 ops/sec. Zoinks.

Note

I just want to clarify, I'm in no way trying to criticise the author of the above function. Most people can use that and not notice any negative performance impact.

Iteration 1

So I thought, what if I moved the map outside of the filter?

function uniqueBy(myArr, prop) {
  // Get all values for the prop up front
  const vals = myArr.map(obj => obj[prop])
  return myArr.filter((obj, pos, arr) => {
    return vals.indexOf(obj[prop]) === pos
  })
}
Enter fullscreen mode Exit fullscreen mode

Result: 3,067 ops/sec
Extracting the map outside of the filter had much better results, relatively speaking.

Iteration 2

Keeping the same vibe, I moved onto Array.prototype.findIndex

function uniqueBy(arr, prop) {
  return arr.filter((record, index, self) => {
    // Instead of getting the values, just get the index where the predicate is true.
    return index === self.findIndex(t => t[prop] === record[prop])
  })
}
Enter fullscreen mode Exit fullscreen mode

Result: 6,962 ops/sec

But this is much of muchness; this will still make multiple passes over the arrayโ€”time to whip out the old trusty loops without predicates.

Iteration 3

function uniqueBy(arr, prop) {
  const len = arr.length // get the length up front to ensure it's only accessed once

  const data = [] // This will be our return data

  const seen = [] // This is a collection of values we've already seen

  for (let i = 0; i < len; i++) {
    // Get the things I care about here to only access the properties once.
    const item = arr[i] // The current array item

    const val = item[prop] // The current items' value that we want to unique by

    // If there's no record of this in "seen", push it to seen and add it to our return array
    // What's with the tilde? Since indexOf returns a number between -1 and N, the tilde (~) is used to convert that value into a boolean. It's the bitwise NOT operator. Link at the bottom.
    if (!~seen.indexOf(val)) {
      // Mark this value as seen
      seen.push(val)
      // Add the value to the return array
      data.push(item)
    }
  }

  return data
}
Enter fullscreen mode Exit fullscreen mode

Result: 15,196 ops/sec swoon

So we managed to get rid of the predicate callbacks, our tests still pass, and it's faster. Now we're getting somewhere.
It's somewhat less readable than previous iterations, but that's not my goal. We could stop here, but I think we can squeeze some more out of this.

Iteration 4

What if we use a Set? They're pretty nifty right:

function uniqueBy(arr, prop) {
  const len = arr.length
  const data = []
  const seen = new Set() // Create a Set
  for (let i = 0; i < len; i++) {
    const item = arr[i]
    const val = item[prop]
    if (!seen.has(val)) {
      // Check if the set has the value
      seen.add(val)
      data.push(arr[i])
    }
  }

  return data
}
Enter fullscreen mode Exit fullscreen mode

Result: 11,133 ops/sec

Wait a minute! That's slower than the previous one. Wha-, ugh-, but it's nifty! Ah well, on we go.

Iteration 5

After perusing over some benchmarks on loops, I saw that a while loop vastly outperformed a for loop.

function uniqueBy(arr, prop) {
  const len = arr.length
  const record = []
  const seen = []
  let cursor = 0
  while (cursor < len) {
    const item = arr[cursor]
    const val = item[prop]
    if (!~seen.indexOf(val)) {
      seen.push(val)
      record.push(item)
    }
    cursor++
  }
  return record
}
Enter fullscreen mode Exit fullscreen mode

Result:: 15,164 ops/sec

Boom! A while loop made this one is our fastest one yet, but even less readable.

Iteration 6

Hmm, from the loop benchmarks, decrementing is faster than incrementing, how does that look?

function uniqueBy(arr, prop) {
  let len = arr.length
  const record = []
  const seen = []
  while (len--) {
    const item = arr[len]
    const val = item[prop]
    if (!~seen.indexOf(val)) {
      seen.push(val)
      record.push(item)
    }
  }
  return record
}
Enter fullscreen mode Exit fullscreen mode

Result: 15,535 ops/sec

CAVEAT: We've lost the original order of the array.

These are marginal gains here vs the previous iteration.

Iteration 7

If there's one thing I know about JavaScript, it's that property access is fast. seen doesn't need to be an array, what if we just kept a dictionary of seen keys?

function uniqueBy(arr, prop){
  const len = arr.length
  let cursor = 0
  const record = []
  const seen = {}
  while (cursor < len) {
    const item = arr[cursor]
    const val = item[prop]
    if (!seen[val]) {
      seen[val] = 1
      record.push(item)
    }
    cursor++
  }
  return record
}
Enter fullscreen mode Exit fullscreen mode

Result: 24,970 ops/sec

The best one yet!

Iteration 8

Okay after doing some more research on loops, I came across this little number

function uniqueBy(arr, prop){
  const record = []
  const seen = {}
  for (let i = 0, len = arr.length; i < len; ++i) { // Notice the len = arr.length
    const item = arr[i]
    const val = item[prop]
    if (!seen[val]) {
      seen[val] = 1
      record.push(item)
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Result: 26,390 ops/sec

Hmm, this is the winner (so far). But why? Didn't we find the while loop faster? All that's happening is the len = array.length is just caching the length. We were already doing that?

All I can think is happening is something to do with Locality of Reference. I have no formal Computer Science, and I'm not a particularly smart man. If someone can explain to me why this is faster, please comment ๐Ÿ˜…

I recreated these tests on ESBench here: ESBench Results if that's more your cup of tea.

Bonus

Here are some other variations I tested with negligible performance gains/losses:

++cursor vs cursor++

function uniqueBy(arr, prop) {
  const len = arr.length
  let cursor = -1
  const record = []
  const seen = []
  while (++cursor < len) {
    const item = arr[cursor]
    const val = item[prop]
    if (!~seen.indexOf(val)) {
      seen.push(val)
      record.push(item)
    }
  }
  return record
}
Enter fullscreen mode Exit fullscreen mode

Reducing variables (๐Ÿ’ฉ )

function uniqueBy(arr, prop) {
  const len = arr.length
  let cursor = -1
  const record = []
  const seen = []
  while (++cursor < len) {
    if (!~seen.indexOf(arr[cursor][prop])) {
      seen.push(arr[cursor][prop])
      record.push(arr[cursor])
    }
  }
  return record
}
Enter fullscreen mode Exit fullscreen mode

Summary

This whole process is mostly a fruitless endeavour. We could have stopped at iteration number 3 and put our feet up; however, I just wanted to see how fast we could make it. I'm glad I carried on since I found the seen object approach.

You do not need to do this in your applications. You should only go this deep down the rabbit hole (and arguably further), if you're experiencing performance issues.

If you have a faster way, please ping me on Twitter @moistmakerr or comment. I'd love to know how fast we can push this.

Resources

Top comments (0)