DEV Community

Justin L Beall
Justin L Beall

Posted on • Edited on

Optimizing Search in JavaScript

Originally Posted on Skiplist blog

Skiplist is JavaScript all the way down.

It is like a zombie virus. The language has taken over everything. And, my hand has a bite. Do I embrace my inner JavaScript kitty, become what I have always feared, or should I chop it off while I have a chance?

This week I optimized an in memory cache lookup. The customer dataset was a few orders of magnitude larger than anticipated. As a result, we had to refactor an in memory cache data structure.

The initial version of the cache was pair programmed and driven via TDD. I like to embrace my inner agile technical coach when I can. It may only be a coincidence, but it was easy to refactor the implementation details such that lookups now occur in constant time. Some technical details of the optimization trick are described below.


The source for example below can be found here in GitHub.

Declarative Programming

Imperative tells the how. Declarative code tells the what.

Let's look at an example, gathering the ages of three people:

const people = [
  {id: 1, name: "Jason", age: 38},
  {id: 2, name: "Justin", age: 34},
  {id: 3, name: "Josh", age: 33}
]

// imperative
const ages = []
for(let person of people) {
    ages.push(person.age);
}

// declarative 
const ages = people.map(person => person.age)
Enter fullscreen mode Exit fullscreen mode

JavaScript offers a few of built in declarative helper functions:

The indoctrinated claim, declarative code is expressive, elegant, and functional... "clean". I agree, not caring how the sausage is made allows you to enjoy it that much better! Yet, there are times when the how is important.

Using Find to Search for a Value

What about a similar scenario where you are looking up a person by id in list of a million entries?

const idToFind = 1000000
person = people.find(person => person.id === idToFind);
Enter fullscreen mode Exit fullscreen mode

The above statement is clean, find the first person whose id is 1000000. In contrast, the imperative approach to the same linear search is about half a dozen more lines of code. Simple is awesome. Simple is clean. But, Big(O) notation ("Big O Notation") tells us linear search is literally the worse. We sacrifice performance for cleanliness, which is the trade off I pesonally will pick 99.8% of the time. #empatheticprogramming

If the keys are unique and our data set is of a manageable size, we can increase performance by turning our list of people into a map of people by id, then perform a hash lookup, O(1), on the id! We have at worse, a one time O(n) arrangement step, and then O(1) lookup on each record.

Code Example

As good stuarts of software craftsmanship, let's start off with a failing JavaScript Unit Test to assert the expected behavior.

const assert = require('assert');
const Search = require("./search");

describe('Search', function () {
  const people = [];

  before(() => {
    people.push({id: 1, name: "Jason", age: 38});
    people.push({id: 2, name: "Justin", age: 34});
    people.push({id: 3, name: "Josh", age: 33});
  });

  it('should return the correct element', function () {
    const expectedName = "Justin";
    const search = new Search(people);

    const person = search.find(2);

    assert.equal(expectedName, person.name);
  });
});
Enter fullscreen mode Exit fullscreen mode

Imperative

At this point, we have a red test. Let's implement our first approach, a traditional imperative search using a for loop.

class Search {
  constructor(people) {
    this.people = people;
  }

  find(id) {
    for(let person of this.people) {
      if(person.id === id) {
        return person;
      }
    }
  }
}

module.exports = Search;
Enter fullscreen mode Exit fullscreen mode

I setup a test harness to evaluate performance.

// performance output:
// >>> Average time for find for 3 records: 0.001 ms
// >>> Total time for find for 3 records: 2 ms
// >>> Average time for find for 1000000 records: 2.617 ms
// >>> Total time for find for 1000000 records: 2906 ms
Enter fullscreen mode Exit fullscreen mode

Declarative

We have a green test that asserts the behavior and a performance test harness, we are free to move about the cabin (refactor the internals of the find method)! Moving from imperative to declarative looks like:

// ...

  find(id) {
    return this.people.find(person => person.id === id);
  }

// ...
Enter fullscreen mode Exit fullscreen mode
// performance output:
// >>> Average time for find for 3 records: 0.001 ms
// >>> Total time for find for 3 records: 2 ms
// >>> Average time for find for 1000000 records: 2.356 ms
// >>> Total time for find for 1000000 records: 2690 ms
Enter fullscreen mode Exit fullscreen mode

Our search time on a million records is relatively unchanged. With the move from imperative to declarative, we gain code cleanliness. The code now "tells the what" in such a way that swapping in a different data structure, like a map, is easier to conceptualize. We have a reduction in cognitive load.

Optimized

Finally, what if we were performing this search within a nested loop of a large collection (this never happens!)? Two and a half milliseconds each on a search of even a few hundred records could easily degrade customer experience. So, let's look at our example using a map. An array in JavaScript is an associative array, so we can easily map the id as the key to the object.

class Search {
  constructor(people) {
    const peopleMap = [];
    people.forEach(person => peopleMap[person.id] = person);
    this.people = peopleMap
  }

  find(id) {
    return this.people[id]
  }
}

module.exports = Search;

// performance output:
// Average time for find for 3 records: 0.001 ms
// Total time for find for 3 records: 2 ms
// Average time for find for 1000000 records: 0 ms
// Total time for find for 1000000 records: 302 ms
Enter fullscreen mode Exit fullscreen mode

Conclusion

I think my problem with JavaScript is not that I don't like it. I hate that I like it. I am scared with memories of pre browser standardization (IE6 ~2005 w/ ActiveX) JavaScript web development. I respect its current position within the development community and look forward to finding a common platform option at every layer of a solution.

Top comments (6)

Collapse
 
bgadrian profile image
Adrian B.G.

Nice, you have built an Index, transforming a Search operation into a Lookup.

As a note you cannot trust/use the peopleMap.length, because most likely the IDS will not be incremental starting from 0, so it will become a "sparse array" and the length will be the max(id) +1. Also if the first element of the map will be a String, and not a Number, the map will be transformed into an Object and the length property will not exists.

In this case you have to be careful to not use String (text) properties for the index, when the storage is an array (like in your case).

For simplicity I would make const peopleMap = [] an object, and rename it to improve the readability: const peopleByID = {}. If you use numeric values I think it will be still transformed to a sparse array so [] or {} does not matter, but it will enforce the dev to NOT use the .length and other Array functions.

Multiple indexs can be added in the same way, one for each column to be searched.
If the values are not unique, the values will be a list of elements.

const peopleByAge = {};
peopleByAge[34] == [
 {"id":24,"name":"Justin","age":34}, 
 {"id":2,"name":"Rebecca","age":34}
]"

I have found rare occasions in practice for this kind of optimization, usually the data is too big or too expensive (hard to keep it up to date) to cache it locally, and usually is stored in a key-value store or directly in DB,
... but if you have this situation, where you do multiple searches on the same data set, the trade-of (between using extra memory for the index) and the CPU savings worth it.

Collapse
 
dev3l profile image
Justin L Beall

The data set did become quite large, I had to add --max-old-space-size=4096 as a node parameter! I try to store only reference data in this cache, but even that is a lot.

The purpose of the cache is to limit network IO of a data conversion, more memory overhead less IO. But, you are absolutely right. If the data cache became unmanageable, I would push the data out of memory into a local key/value store.

I like the idea of using an object instead of a list!

Collapse
 
Sloan, the sloth mascot
Comment deleted
Collapse
 
dev3l profile image
Justin L Beall

Fixed! Thanks for the correction

Collapse
 
dev3l profile image
Justin L Beall

This particular optimization happened on the back end of an Express server. The same client has a react front end and a react native mobile application!