Cameron Nokes

Posted on Jul 21, 2020 • Edited on Aug 18, 2020 • Originally published at cameronnokes.com

Create lodash.memoize from scratch

#javascript #webdev #lodash #functional

Some times the best way to understand something is to build it on your own, from scratch. Doing this has been one of the best ways for me to deeply learn both JavaScript and common patterns and techniques that can be used to solve a variety of problems. lodash is one of the most popular JS libraries, and learning how any of its methods are implemented is good learning. I've read various parts of the lodash source for years. With that, let's dive into memoize.

What is memoizing

Memoizing is a performance optimization. Let's say you have a function that is called a lot, and in performance traces, you can see that it's an expensive function to run that often. Furthermore, you know that it's doing a lot of duplicate work. The inputs to the function don't change often, so if we store the result of the function with a key based on the inputs, we could just retrieve the result the next time we receive the same inputs to the function. Sort of like a cache. This way, we only run the expensive computation as few times as possible. This is memoization. React.useMemo is a memoization function. While we won't be going over how that works specifically, know that it's using a similar technique, it's just storing and retrieving the cached result in a different way that works with the React component tree.

Defining the API

So if we look at lodash's memoize API, we can see it takes two arguments:

a function, specifically, your computationally intense function that you don't want to run as much
(optional) a "resolver", which is a function that computes the key of the result and allows us to have more control over the caching behavior. More on this later.

And it returns a new function that wraps the function that was passed as the first argument. The new function will simply forward the arguments it receives. Wrapping a function with another function like that can be a good pattern when you want to sort of intercept the behavior of one function and modify it.

Let's start there:

function memoize(fn, resolver) {
  // TODO instantiate cache here
  return function(...args) {
    // TODO implement memoizing and resolver logic here
  }
}

Implement the logic

Next, let's instantiate our cache. The cache needs to be a key/value store. The key, by default, will be the first argument received. The value will be the result of the computation. For example, if we memoized a factorial function like this:

function factorialize(n) {
  if (n < 0) {
    return -1;
  } else if (n === 0) {
    return 1;
  } else {
    return (n * factorialize(n - 1));
  }
}
const memoizedFactorialize = memoize(factorialize);
// call it a few times to get cache entries
memoizedFactorialize(5);
memoizedFactorialize(6);
memoizedFactorialize(10);

The cache object for that would conceptually need to look something thing like this:

{
  5: 120, // because 5! = 120
  6: 720,
  10: 3628800
}

But what if the cache key itself needed to be an object? A plain JS object can't use an object type as a key, if you try that you end up getting:

{
  '[object Object]': 'result'
}

So what we really need is a Map! Maps can hold objects or primitive values as keys. We'll put our map cache in the main memoize function. This way, the returned inner function will capture it in its closure and have access to it, and the cache can be persisted through multiple calls.

function memoize(fn, resolver) {
  const cache = new Map();
  return function(...args) {
    // TODO implement memoizing and resolver logic here
  }
}

Now let's implement the main logic. First let's handle the cache hit case.

function memoize(fn, resolver) {
  const cache = new Map();
  return function(...args) {
    // set the key to the first argument by default,
    // we'll implement the resolver logic later
    const key = args[0];
    // if the cache has it
    if (cache.has(key)) {
      // return the cached entry
      return cache.get(key);
    } else {
      // TODO
    }
  }
}

Now let's do the cache miss case.

function memoize(fn, resolver) {
  const cache = new Map();
  return function(...args) {
    const key = args[0];
    if (cache.has(key)) {
      return cache.get(key);
    } else {
      // call the function to get the result
      const result = fn.apply(null, args);
      // set it in the cache and return the result
      cache.set(key, result);
      return result;
    }
  }
}

Why we are we using Function.apply? apply lets us apply the elements of the args array as individual arguments to the fn. It's how we "forward" all the arguments that we intercepted to the original function.

So, what if we had a function like this that took two arguments and caching against just the first argument didn't make sense? For example, in this searchTree function, even if the tree argument is the same, the options that are passed in might affect the resulting value.

function searchTree(searchTerm, tree, opts = { maxDepth: 3 }) {/**/}
const memoizedSearchTree = memoize(searchTree);

let orgChart = {
  id: 1,
  employees: [/* tree of employees and their reports here */]
};

// will return an array of results
memoizedSearchTree('Cameron', orgChart, { maxDepth: 1 });

// will incorrectly return the same array of results 😱
memoizedSearchTree('Cameron', orgChart, { maxDepth: 3 });
// will also incorrectly return the same array of results 😱
memoizedSearchTree('Cameron', differentOrgChart, { maxDepth: 1 });

That's where the resolver argument comes in. In this case, we can create a key based on the id of the tree, the search term, and the maxDepth. So let's create what a resolver would look like for the above:

const memoizedSearchTree = memoize(
  searchTree,
  (searchTerm, tree, opts) => `${tree.id}:${searchTerm}:${opts.maxDepth}`
);

Cool! This is what the cache would end up looking like (shown here as a plain object but it would be in a Map):

{
  '1:Cameron:1': [/* result here */],
  '1:Cameron:3': [/* different result here */],
  '2:Cameron:1': [/* different result here */]
}

Alright, with that in mind, let's implement the resolver logic, which is actually fairly simple.

function memoize(fn, resolver) {
  const cache = new Map();
  return function(...args) {
    // if we have a resolver defined, use that, otherwise, default to the first arg
    const key = resolver ? resolver.apply(null, args) : args[0];
    if (cache.has(key)) {
      return cache.get(key);
    } else {
      const result = fn.apply(null, args);
      cache.set(key, result);
      return result;
    }
  }
}

So we forward the function arguments to the resolver as well and expect that the resolver returns a string, number, or object that we can use for the cache key lookup.

That's it, our complete memoize function!

Test it out

In order to facilitate unit testing -- as well as be something that might be genuinely useful to the application -- it'd be nice to provide a way to access the cache. Let's add that in now.

function memoize(fn, resolver) {
  const cache = new Map();
  // instead of returning the function right away, store it in a variable...
  const memoized = function(...args) {
    const key = resolver ? resolver.apply(null, args) : args[0];
    if (cache.has(key)) {
      return cache.get(key);
    } else {
      const result = fn.apply(null, args);
      cache.set(key, result);
      return result;
    }
  };
  // add a method to it to get the cache
  memoized.getCache = () => cache;
  // now return the function
  return memoized;
}

Now let's do some tests.

const memoizedFactorialize = memoize(factorialize);

memoizedFactorialize(5);
memoizedFactorialize(5);
memoizedFactorialize(5);

assert(
    memoizedFactorialize.getCache().size === 1,
  `memoizedFactorialize cache size should = 1`
);

memoizedFactorialize(6);

assert(
    memoizedFactorialize.getCache().size === 2,
  `memoizedFactorialize cache size should = 2`
);

Let's test caching against an object key.

const getElementBackgroundCSS = memoize(
  el => getComputedStyle(el).background
);

getElementBackgroundCSS(document.body);
getElementBackgroundCSS(document.body);

assert(
    getElementBackgroundCSS.getCache().size === 1,
  `getElementBackgroundCSS cache size should = 1`
);

All working as expected 😎. You can view the above in a JS fiddle here.

Trade-offs with memoizing

Like many things in life, memoizing comes with trade-offs. Memoizing is the classic "trade space for speed" trade-off. Your application's RAM usage will be higher, but that will offload work from the CPU. RAM usage isn't something most browser JS apps seem to worry about or optimize for (not saying that's a good thing, just my observation). If you're worried about your cache accumulating too many entries, you could add some logic to empty it if it grows too large.

if (cache.size > 1000) {
  cache.clear();
}
cache.set(key, result);

Unfortunately, unlike C or something, JavaScript doesn't have a way to get the actual memory usage of an object. So you're best way to limit the size of the cache is to go by the number of entries.

Another alternative, if you're going to be exclusively using objects as keys, is to use a WeakMap instead of a Map. WeakMap keys are "weakly held" -- they're references to an object and the entry will get automatically dropped when the object is garbage collected. For example, if you had a function that did something computationally intensive or slow with the DOM, you could use the DOM element as the key, and then that entry would get automatically dropped when that DOM element is removed. If you used a Map with a DOM element as the key, and that DOM element was removed from the DOM, you would also need to remove it from your Map for the object to get garbage collected. Not doing that is a memory leak.

That's it 🎉

I hope this was helpful to someone.

Here's the actual memoize implementation in lodash. There are some minor differences. I recommend reading it and reading other parts of lodash that you've used before.