Language support for caching sucks. Despite an ever present need to cache data and calculations, most languages require the programmer to do all the work. There's no way to track dependencies, no way to know if the cache is valid, no way to know if the cache is even useful. In this article I'll look at some of the key problems.
This article is primarily focused on caching calculations, not the caching of data. Some of the problems and solutions apply to both, but they are somewhat different in nature.
Why cache?
In Fuse we do a lot of calculations. We need to check hit test bounds, calculate curve bounds, produce local-to-world transform matrices and calculate render bounds. I'll even consider the drawing of elements to be a calculation. I'm sure I'm missing something; the point is that we have a lot to do.
On their own none of these individual calculations is a problem. En masse however, for every node in the tree, they take a lot of CPU time. Caching improves the frame rate and reduces energy usage.
We also cache inherited properties like
IsVisible
andIsEnabled
. We don't do this for efficiency reasons, but to generate events when these values change at each node. Tree-based variables and events is another avenue to consternation all on its own.
The structure of the problem
A cached calculation looks something like this:
var cache_branch = merge( measure(leaf), measure(cone) )
Let's assume for a moment that measure
and merge
are both pure functions (the result depends solely on the input arguments). This means cache_branch
is valid until either leaf
or cone
change. But how do we know when they change?
There are two basic approaches. Either we subscribe to a leaf_changed
event:
defn update_cache_branch = -> {
cache_branch = merge( measure(leaf), measure(cone) )
}
leaf_changed.subscribe( update_cache_branch )
cone_changed.subscribe( update_cache_branch )
Or we add a leaf_version
and check it each time we calculate:
get_cache_branch = -> {
if leaf_version != cached_leaf_version or cone_version != cone_cached_version {
cache_branch = merge( measure(leaf), measure(cone) )
cached_leaf_version = leaf_version
cached_cone_version = cone_version
}
return cache_branch
}
Both approaches unfortunately require that whoever changes leaf
do something more: either emit the changed event or increment the version. I'll just refer to the event based approach now for simplicity.
The source of defects
That example is nearly trivial yet we can already see the coming wave of defects. Either the cache forgets to subscribe to events, or the producer fails to emit an event.
Nonetheless, to see the true scope of the problem, let's make it more complicated. We'll replace measure
with calc_bounds
that calculates the render bounds of a child node, where leaf
and cone
are such nodes (elements in a UI tree for example).
var cache_branch = merge( calc_bounds(leaf), calc_bounds(cone) )
It would no longer make sense to subscribe to leaf_changed
as we don't actually know what variables in the node are used by calc_bounds
. Nor would we want to start subscribing to many different events here, it's error-prone and inefficient.
Instead we pass on the responsibility for the caching to the node itself:
var cache_branch = merge( leaf.bounds, cone.bounds )
We can subscribe to a simple bounds_changed
event on each of the nodes. It keeps this level simple, but it's pushed the burden a level down.
Each node needs to cache its own bounds
and emit events when it updates. It may sound simple, but something like bounds involves a lot of variables: position, size, transforms and visibility. A change in any one of those variables must invalidate the cache somehow.
Events are expensive
Events are never free, and that's a problem. Does the cached render bounds really need to subscribe to events for all the variables of interest? Consider now that hit test bounds and layout also have cached values. Every node in a UI tree would need to subscribe to a lot of events to keep it's caches in order. The management, and dispatch, of these events, adds even more overhead.
Worse, these cached values aren't always used. It would be pointless to calculate a value, thus subscribe to events, unless something actually wanted to use the value. It's actually quite easy to build a caching system that ends up performing worse than if there we no caches at all!
In Fuse we use a third approach to alleviate this. Instead of dynamic events we hard-code some virtual functions for key caches. Things like InvalidateRenderBounds
and OnRenderBoundsInvalidated
. All cached values are calculated and cached lazily: after invalidation they are not calculated again until actually used.
This does wonders for efficiency, but at the expense of complexity.
Compiler help
There's no avoiding the need for these caches. It's just unfortunate there isn't much language support to help. All this work feels mundane and repetitive. Each variable we use a variable the cache ends up having the same basic structure of either subscribing to messages or checking versions. Each time we set a variable we have the same basic emitting of events or incrementing versions.
This all feels like something a compiler could be doing for us in an efficient and safe manner. I'm not clear yet on what the syntax should be. Well, that's not true, the ideal syntax would be something like:
defn get_cache_branch = -> {
return lazy_cache_calc( merge( measure(leaf), measure(cone) ) )
}
Where lazy_cache_calc
sets up everything needed for cache: the events and subscriptions, or versions and checking, on the complete tree of dependencies. Though it seems like I might be asking for a bit much, this is essentially what reactive programming does. It's just usually reserved for higher level programming.
I'll have to investigate further.
Top comments (5)
Have you considered memoization (here a benchmark and a basic implementation: addyosmani.com/blog/faster-javascr...) instead. It seems to fit your use case as it works very well on 'pure' functions. You can 'nest' memorization too (so call memorized functions inside a memorized function). Speed gain is incremental, as well as memory usage.
These are not the types of calculations where memoization helps. They are not pure functions, being based on a nested hierarchy of data. The input space at each node is large enough that you'll also never get a cache match on any change. The cost of input comparison will also generally be more expesnive than the calculation itself (matrix operations).
Ok, understood. Memoization is good on simple inputs. Yes, an inner caching mechanism would be actually good. In that case the "cache invalidation key" need to be calculated inline when the operations occurs, an "hash of the subtree" could fit. If the function is iterative it will skip calculation on non changed branches.
There's something called "incremental computation" or "self-adjusting computation" that hasn't made it to the mainstream yet because it's still research in progress, but I'm hoping the technology will start getting out there.
For example, Adapton: adapton.org/
Incremental: github.com/janestreet/incremental
Much of the academic research has been by Umut Acar: umut-acar.org/self-adjusting-compu...
Do you have any examples of the syntax? I can't find anything clear from those links.
The descriptions however make it sound like reactive expressions.