Anton Korzunov

Posted on Apr 16, 2019 • Edited on Nov 19, 2019

How I wrote the world’s fastest memoization library

#memoization #javascript #proxy #weakmap

There are so many memoization libraries, that it's already hard to memoize them all and find the fastest one. No joke - half of memoization libraries of yesterday would be faster tomorrow, so, if you are looking for best of the best, it would be not easy to pick one.

But speed is not something you need. So I've written another library, which is not fast. All because one thing...

So, why did I write another one? Why existing ones are not... Are not what?

Memozation

This is a common pattern to reduce or completely skip unnecessary computations. Works quite straight forward –

You’ve got a function. You call than function with some arguments. Next time you do the same – just reuse already known result. Because 😉 you have it.

All the libraries do it perfectly. The only differences are 1) how they handle function arity(the number of arguments), 2) how many results they can store, 3) and how fast they are.

By default lodash.memoize “sees” only the first argument, memoizerific uses ES5 Maps to store data, while fast-memoize stringifies all the arguments, and uses JSON as a cache key.

The speed is also differs. Ramda is 100 times faster than code without memoization, lodash is 100 times faster than ramda, and nano-memoize 100 times faster that lodash.

And they all are as fast, as many time they could "skip" calling the real function. If nano-memoize, 1.000.000 times faster than the "real" function, is capable to handle every second call - it's just 2x times faster. It's quite often case to have 0x, or even negative results.

You don’t need speed. You need memoization.

Speed comparison just above, with 1000x values, was made for Fibonacci numbers calculation. It's perfect fit for a memoization case, and all these libraries are great about memoizing function results based on simple arguments, and capable to memoize as many variants of a function call, as you need. This is great, as I've said, for Fibonacci numbers calculation, but blowing all memory limits for other tasks, as long cache size usually is NOT limited, while "the Memory" has very strict limits.

You don't have endless memory. And don't need it.

The one memoization

The first call about this problem (for me) was made by a library named memoize-one, written by Alex Reardon. The main intent was clear – it memoizes one, and only ONE result. Because you might don’t need more. You almost never need more.

"Don't do anything, if it is the same as before" - is actually the only thing React/Redux world needs. Just ability to cut off an update, shouldComponentUpdate without any side effects(like memory leaks in memoization cache)

And the main feature, React might require from memoization library to perform this task, is not speed. Speed also matters, but false positives matters more. Memoization shouldn’t be fast. It should be reliable and usable.

this is not bound to React, but proper memoization is usually required and usually is problematic for React ecosystem.
PS: and useMemo solved this. Two years after this article.

Ok, memoization

For every case, you have to write a “special” code to correctly memoize the stuff you need. Sometimes it's straightforward, sometimes it's not. Looking back to my own experience (oof, I had problems with it) you need a special mindset, and specific engineering approach to do this thing properly.

In short - all modern libraries rely on immutable structures and structural data sharing to speed up and simplify comparison, and, basically, everything you need to use a memoization library - is to provide proper arguments. Obviously - to provide proper arguments to a selector, you have to know how your data structures are made, which could be a problem without types or on an alien codebase.

const mapStateToProps = state => ({
   todos: state.todos.filter(todo => todo.active)
});

This is a simple mapStateToProps which would be called on every state change, producing an absolutely unique todos every time(.filter is returning a derived array), causing connected component to update, and trashing lifecycle hooks.

It's easy to "fix" it - just wrap with any memoization library.

const filterTodos = anyMemoization(todos => todos.filter(todo => todo.active));
const mapStateToProps = state => ({
   todos: filterTodos(state.todos)
});

Now it will react only to state.todos object change - ref equality is how it usually made. But let's make it a bit more complex :)

const filterTodos = memoize(todos => todos.filter(todo => todo.active));

const getTodos = todos => todos.map(todo => todo.text)

const mapStateToProps = state => ({
   activeTodosText: getTodos(filterTodos(state.todos))
});

This one will still react on state.todos object change. And this is something everybody expects from this code. Change todos - produce a new state.

But look closer - how it ACTUALLY should behave? Long story short - it should react only on .text of only .active todos change. 🤯
It should just keep memoizating as long, as possible. That's the goal.

Could you achieve this with memoization library of any sort? Mmmm, no.

Meanwhile, in MobX lands

The one thing, I’ve always like in MobX — laziness. Not library laziness, but mine. I could be lazy, and write a code, which would just work.

You don't have to think, about — “Oh, by the time this event got dispatched, Redux will trigger all the ConnectedComponents, mapStateToProps all the things, and may redraw half of the Application, all due to one selector of mine producing a unique value each run". Oh, I hate it!

You know — due to low-level optimizations you, and nobody except you, have to provide, but did not — Vue and Angular (data models) could be much faster out of the box. I mean React/Redux could just suck. And MobX — rocks!

And there is one more thing where Redux is not pleasant enough - reducers. That 100 thousand line long reducers, full of object spread and object rest operators.
Fortunately, we have immer and immer made this moment more enjoyable and pleasant. It gives transparency and removes frictions.

return {
  ...state,
  x: {
    ...state.x,
    y,
  }

// vs
produce(state, draft => {
 draft.x.y = y;
});

Oh, how I wish to have the same magic experience with memoization.

So what about memoization?

TL;DR - there is a library, memoization library, I've built, which shares something with MobX and immer. It just works, solving your problems.

Somehow. Magically. Invisible.

As I said in the beginning — I've built the slowest memoization library, and It is the fastest memoization library at the same time. I called it — memoize-state.

theKashey / memoize-state

The magic memoization for the State management. ✨🧠

memoize-state

Caching (aka memoization) is very powerful optimization technique - however it only makes sense when maintaining the cache itself and looking up cached results is cheaper than performing computation itself again You don't need WASM to speed up JS

Blazing fast usage-tracking based selection and memoization library, which always works....

Read me - How I wrote the world’s fastest memoization library

Reselect? Memoize-one? Most of memoization libraries remembers the parameters you provided, not what you did inside Sometimes is not easy to achive high cache hit ratio. Sometimes you have to think about how to properly dissolve computation into the memoizable parts.

I don't want to think how to use memoization, I want to use memoization!

Memoize-state is built to memoize more complex situations, even the ones which are faster to recompute, than to deside that recalculation is not needed Just because one cheap computation can cause…

View on GitHub

It is slow because it utilizes ES6 Proxy to watch what memoized function is doing, and uses complex algorithms to manage the outcome. It has something like 100x or even 1000x more code than a normal memoization library to perform this operation, and requires much, much more operations to complete.
It is fast because when it has to decide, should it return memoized value or have to refresh it, it would not compare arguments as other memoization libraries do, but it could compare only used parts of the arguments, only thing used to produce result, making it the best candidate for…

“please memoize as many cases as it is possible”

And, as long as it memoizes more “often”, it spends less time in real computations and working faster. It does not work faster - it just works.

I probably should post an example:

const filterData = memoize( data => data.filter(item => item.selected) )
// ^ we are filtering only "selected" elements

filterData([{selected:true, value:1}, {selected: false, value: 2}]) 
// ^ first call. run it
filterData([{selected:true, value:1}, {selected: false, value: 2}])
// ^ second call. It's the same, you will be given prev result

filterData([{selected:true, value:1}, {selected: false, value: 3/* HEY! */}])
// ^ value of the second, not selected element is changed, but we "dont need it".
// you will be given the old result

filterData([{selected:true, value:2}, {selected: false, value: 2}])
// value of the first argument is changed. re-run

In this example - any changes in {selected:false} element would be ignored. We don't need it. Something we were looking for.

There is no need to destructurise your selectors, no need to create 5 different selectors and complex isEqual logic - just memoize-state.

To be honest - if you will try to run this example - it would not work. filterData is returning selected items, and every time we are calling it with a new items list. While it will ignore changes in non-selected items, changing selected ones, even just proving looking the same ones would cause re-run. And this is something we have asked for. "Immutable data structures", remember?

// so this is how it would work with "immutable" data
filterData([selectedValue, notSelectedValue])
filterData([selectedValue, anotherNotSelectedValue]) // still memoized

But we might ask for something more specific and remove that "parasite" computations.

const filterData = memoize( data => data.filter(item => item.selected)[0].value/*YEP, only value*/)
// we need only a __value__ of the first selected element

In this case algorythm would understand that you dont interested in "data structures", but only in "value". So - it would react only to it.

Do not think

Do not think about how it works. It works. No matter how you will use it

const getVisibleTodos = (state, props) => 
    switch (state.visibilityFilter) {
      case 'SHOW_COMPLETED': return state.todos.filter(todo => todo.completed)
      case 'SHOW_ACTIVE': return state.todos.filter(todo => !todo.completed)
      default: return todos
    }

// works like this
const mapStateToProps = (state, props) => {
  return {
    todos: memoize(getVisibleTodos(state, props))
  }
}

// and like this
const mapStateToProps = memoize( (state, props) => {
  return {
    todos: getVisibleTodos(state, props)
  }
})

// and even with "double" memoization
const mapStateToProps = memoize( (state, props) => {
  return {
    todos: memoize(getVisibleTodos(state, props))
  }
})

No special logic. No selectors. No “argument-level” memoization. You can apply memoize-state ANYWHERE! As many times as you want. You may add another memoization inside or outside. It does not matter. And it will just track down the usage of arguments you provided, and do the job.

Memoize-state is not “internal” memoization, not “function-level” memoization, it is “external” memoization. Just wrap anything with memoize-state and get it memoized. Any pure function, which possibly could be memoized!

Stability

Writing this library was not a simple task. I wrote it, it took something about two days, I tested it, posted on Twitter, found that the library doesn’t work, I mean completely does not work, and spend two more weeks in R&D.

I fixed these problems. Next, I wrote an article about this library. Found a few more things I just did wrong. Fixed it. A year later, after myriad issues resolved and bazillion test written, I am writing this article.

How it works

How it actually works — it just wraps all given arguments with Proxy from proxyequal library and watches the object key access.

Once you run memoized function - it would know which pieces of passed arguments were used to produce a result, and which pieces were returned as a result.

It would know what you did last summer, and have you called .forEach, do you need .value or everything you are looking for is the existence of a key.

It's not a deep-equal, not shallow-equal, it's smart-equal. proxy-equal.

Speed

It is quite hard to understand the performance of this library — it is always in some balance between “cost” of memoized function, and “cost” of memoization sugar.

Standard” memoize. Function of 3 integer arguments. No changes.

memoize-one    x 6703353 ops/sec
lodash.memoize x 3095017 ops/sec
fast-memoize   x 1013601 ops/sec 
memoize-state  x 4007493 ops/sec

It's not slow, even fast than lodash

function with an object as an argument, returning a part

base            x    10095 ops/sec
memoize-one     x    10054 ops/sec
lodash.memoize  x  1695449 ops/sec
fast-memoize    x  1287216 ops/sec
memoize-state   x  1574688 ops/sec

Once you start using less than a whole object - libraries which rely on ref equality stops working, while other ones continue the race

function with an object as an argument, changing other value, returning a part

memoize-one     x   10066 ops/sec
lodash.memoize  x   92596 ops/sec
fast-memoize    x   89224 ops/sec
memoize-state   x 1469865 ops/sec

But when you starting changing some pieces of the state, you do not use - all the other libraries also slow down, while memoize-state continue working.

The power of memoize-state - ignore state updates you are not interested in. And that's a usual case for state management.

What could be build using it

React-memoize

Memoize-state working so easy and invisible to the user, that I’ve used it for another library, with memoization in mind. As Dan Abramov proposed.

The library I’ve build based not on this specification, as long there is no need is inputs if your memoization function is “external”.

import Memoize from 'react-memoize';

 <Memoize
   prop1 = "theKey"
   state = {this.state}

   compute={ ({prop1, state}) => heavyComputation(state[prop1]) }
  >
  { result => <Display>Result is: {result}</Display> }
</Memoize>

It might be not quite clear what's good about this example, but, in short - compute would be called only when state[prop1], or something exact inside would change. Memoization + shouldComponentUpdate in one bundle!
It just passes all the props (except compute) to the compute function, and renders result via function-as-children (aka renderProps).

The library is well typed, and contain few components to make your life easier. For example "Flow", you may use to process data like in a stream way.

          <MemoizedFlow 
          input={{...this.props, ...this.state}}
          flow={[
            // will react on rows or filter change
            ({rows, filter}) => ({rows: list.filter(filter)}),
            // will react on rows(from step1) change or order
            ({rows, order}) => ({rows: rows.slice().sort(order)}), // !! clone array before sort
            // will react on rows and pagination changes
            ({rows, page, perPage}) => ({rows: list.slice(page*perPage, (page+1)*perPage)}),
            // will react on something else, not related
            ({rain, bows}) => ({rainbows: rain+bows, rain: null, bows: null })
            ]}
          >
            {output => <Table {...output} onSortChange={this.onSortChange} onPageChange={this.onPageChange}/>}
          </MemoizedFlow>

That's is all. Everything else is hidden under the hood. It will know which step depends on which step from infering usage of variables provided. It would know which step should be re-run after some change, and will never do more than needed.

beautiful-react-redux

A small library which hijacts Redux and provides a beautiful memoization out of the box.

why-did-you-update-redux

Another redux-related library, which lets you debug your selectors and mapStateToProps.
As long as memoize-state is so cool - it could check your handmade selectors - are they also cool. If not - it will explain what's wrong, which function is not pure enough and help you make your application faster, without using magic memoization in production.

reactive-react-redux

And yet again - Redux related library, this time made of hooks.
There is nothing special in this library, except it's a pleasure to use it, and it would be more performant out of the box, that you may expect.
Again - it uses memoize-state underneath, to optimize your components update.

Browser support

proxy-equal the base layer for all the magic uses ES6 Proxy, which does not exist on IE11 and some ReactNative environments. proxyequal comes with proxy-ponyfill on board.
There is only one edge case, which could not polyfilled(accessing not existing properties), everything else is safe and fast.

Limitations

There is also a common “limitation” for memoization libraries - they can store, yet again, only one “last” result. And, in case you have a few different connected components, all selecting something from a single store, but with different props — you will always have your memoization broken. Not broken - just it would be useless.

There Can Be Only One! Is a good slogan for a movie, but not for real application. And this library is changing ... nothing here.

Conclusion

Original performance tests also contain not only operation/per second, but “cache-hit” parameter. It is way more important.

Your memoization function could be slow, as long it will be faster that react/redux/redraw caused by “missed” memoization and tree-rendering.

Correct reselect cascade could have 100% cache hit, but it is hard to write correct cascade, debug it, keep it up to date. Meh, it will just take time.
While “cache-hitting” ability of memoize-state is close to the ideal. It will memoize as much cases as it could.

It is 10 times bigger than normal memoization library, (should be) 10 times slower than normal memoization library, but, you know, your application will be the same 10 times fast. Without any your time spent on optimizations.

That's the goal. There is nothing “special” you have to do.

Not “faster”, but “better” - “please memoize as many cases as it is possible”.

By the way

I've got another article about another library, and that library might solve the problem with "only one result" -

Memoization Forget-Me-Bomb

Anton Korzunov ・ Apr 1 '19

#javascript #cache #memoization #optimization

DEV Community