DEV Community

Anton Kuklin
Anton Kuklin

Posted on • Edited on

Cache eviction strategies

Introduction

Let’s start from the very beginning, what the caching is. Caching is the process of storing copies of files in a cache, or temporary storage location, so that they can be accessed more quickly (link). This leads us to the question of what is “more quickly” and what is the cost. Indeed, this is a viable question and to answer it, I believe, we should start with using our imagination a little bit. Let’s say we have a client, which is located in New York, hm no… on the North American continent, oh no wait, let’s try something more hardcore. We have a client on planet Earth and it makes requests to a server which sits on the Moon (remember, we are using our imagination). And this server stores historical images of the cities on Earth, like Berlin 2020, London 2021 etc. As you can imagine, those images don’t really change and to perform these requests we have to wait an extremely long time (relatively ofcourse). If those images don’t change, why don't we just store all of them on a client side? Yeah, this does make sense, but do we really need all those images? Probably you as a client are interested only in the UK cities, for instance, so why to store all of them. Moreover, can you physically store all the images locally, hm imagine all the cities in the world and images, which were taken each day… That’s a lot. This leads us to the idea that we need to store only a subset of such images, related to this specific client, but how to solve this problem in a general way? Goal of this post is to share a few basic ways of how to actually maintain only relevant data in a cache.

Engineers with different backgrounds most likely have faced different types of caches in their careers. Frontend engineers have mostly worked on caching data on a web browser side to reduce calls to the Backend API. Backend engineers most probably have been caring about reducing calls to some persistent storage, e.g. PostgresQL and caching some data, to reduce response latency and avoid “long” reads from the disk (in comparison to RAM access for instance). System engineers have been thinking about using L1/L2/L3 caches on CPU etc. All these points are leading us to an idea, that caching is not just a one specific solution of optimising browser/database or something else, but a broad concept being used across all the parts of computer science. And specifically because of that I will focus today on concepts of how it might be approached in a general way, but not “how to implement a CDN”.

Cache hit and miss

As with mostly anything in our life, it’s all about the tradeoffs. But to discuss them, we first need to look into some basic definitions, which will be used as metrics later on.

Cache hit is a fact of an item (data piece) being found in a cache. Cache hits are being used in a cache hit ratio, which helps to understand how well your cache is configured. This will allow you to understand how to adjust your setup, while working on a caching strategy. In contradiction, cache miss is a fact of an item being not found in cache and it also has a corresponding cache miss ratio.

As you could have already noticed, we are specifically interested in cache eviction strategies, due to the fact that we have restrictions on the size of a cache, both physical and logical. Let’s try to check a few basic ways, how we can remove elements from our cache.

FIFO

Generally speaking, it’s the same FIFO (first in, first out) aka queue algorithm, that you already know. It’s easy, right? We are just maintaining a queue with a fixed size, and each time we have a new element put in our cache being full, we remove the last element from our queue.

Image description

Even though the use case of this approach makes sense, we want to store the latest elements being used, right? Yes and no, I will show later on the problem related to this approach.

LIFO

Similar to above, the same idea you already know with a LIFO (last in, first out) aka stack. We will first remove an element, which entered the queue at last.

Image description

LRU

LRU stands for Least Recently Used and at the first glance it seems like it should be similar to FIFO, but it’s actually not, because it is an improvement of FIFO. As you recall I’ve promised to mention problems related to FIFO. Let’s imagine such a situation in FIFO, we have a cache of a fixed size of 4 elements and we perform such inserts in this cache (I will mark data objects by their id’s): 1,2,1,1,3.

Image description

As you can spot, we’ve used 50% of our cache, for the same element being duplicated. Indeed, if you will think about some real situation, let’s imagine you are an instagram and you cache images of user profiles. Celebrity pages will be called and cached much more frequently than others, as the result your cache will be flooded with data duplications. LRU solves this problem by not just placing an object in the beginning, but replacing an object, if it exists in the beginning. So as the result, if we have 1,2,1,1,3 add operations, the result will be 3,1,2 ordered objects in cache.

Image description

MRU

The same way, LRU optimises FIFO algorithm, MRU optimises LIFO algorithm. So if we have operations 1,2,1,1,3, the resulting cache state will be 2,1,3.

Image description

The use case might be non obvious, so I would recommend checking this article: link

TLRU

This algorithm modifies LRU algorithm, by adding a TTL (time to live) property to elements, which means that if your element wasn’t accessed for the last X seconds (whatever TTL you will configure) it will be automatically evicted. As we can imagine, this eviction will be useful, only in case, when cache is not being overflown, otherwise those elements would be just evicted as the last ones, being pushed by the more recent ones. Let’s look on the example:

Image description

Bloom filters

Nevertheless, this topic is not about caching, I believe it's important to cover it too.
As you can remember, cache helps us to get required objects quicker, but what if the required data piece is not presented in general in our main memory? There is a way to reduce the number of calls to our main memory for objects that are not presented there.

Let’s start with a simple configuration. We select some hashing function, which will be used to hash searched keys. For each key that exists in our main storage, we will calculate a hashed value and store it as a fact that there is an entry for such a hash value. Let’s imagine, that we have a hashing function, that looks like this:

Image description

So this would mean, that if we have values 1 and 99, we would have something like that in our bitmap:

Image description

Image description

So each time a request comes in, we can calculate a hashed value of it and check our map, if such a value exists.
Well, you might think that this would allow us to completely eliminate searches for keys that do not exist in our main memory, but not really. Let’s imagine that we have 101 and 201 values. This would mean, that our bitmap will look like this:

Image description

Image description

Both ids 101 and 201 will give the same hashed value. So what if our memory has only 101 id presented and we will try to understand whether 201 is presented in a main memory, our bloom filter will mark both of them as presented. This leads us to an idea, that bloom filter doesn’t allow to specifically determine if the value exists, but it gives an option to filter some keys that do not exist.

To minimise the probability that a key doesn’t exist, we can use multiple hashing functions. For example we can calculate not only % 100, but also % 33. As a result we would check that if a value is not presented in even one of the calculation results - it is guaranteed that it is not presented in a main memory. On the other hand, we also have to keep in mind that we need to balance between the complexity of calculation and probability of full collision.

Conclusion

We have covered basic approaches, how we can manage cache eviction and obviously the algorithms I have listed are just some basic approaches that you might use, however a lot of more complex algorithms are based on these ones. On a final note, I would like to mention that the idea about caching is shared across all the levels of abstractions in IT systems and this approach might be easily reused in different places.

Top comments (0)