Objectives
- Understand what caching is?
- Understand how caching is done?
- Understand the difference between cache hit and cache miss?
- What data inconsistency is and how we can minimize it in our cache services?
What is caching?
I'm going to ask you to do a couple of computations and I'll want you to bear with me. You can use a calculator if you want.
- What is 30 x 25? Got the answer?
- Ok What is 2 x 3?
- Great, what is 30 x 25 again?
If you observed, the first time you answered what 30 x 25 was, it took you a while to compute the answer which is 750 by the way but the second time I asked the question, you most likely didn't have to recompute the answer, you just remembered the answer because you had just made the computation not long ago. This is what caching is about. A cache service stores part of your data(mostly recently accessed data) so it can serve data faster. Caches happen from the low level of computer architecture with pages, L-series caches and RAM to the application layer using tools like memcached and redis. This article will focus more on the application level.
How is caching done?
Sometimes, the requests coming to your system will need a lot of computing process before a response. Sometimes, you have high number of requests coming in your services which will put your database under heavy pressure leading to high latency. Caches allow you to serve data faster by not having to hit your datastore on each request. It improves your read performance thus increasing your overall throughput. Adding a caching
layer to your system means anytime a read request comes in;
- Your system checks your cache service and if the data exists in the cache service, the request is served.
- If the data is not in the cache service, the request is sent to your database to get the data so it's served. The data is also written to your cache service.
The more the data being requested are found in the cache the faster you can serve responses and this is known as cache hit. Cache hit is the rate at which data being requested are found in your cache. The opposite is cache miss, which defines the rate at which data being requested are not found in the cache. The goal is to increase your cache hit as the higher your cache hit the faster you serve your responses. When your caches don't have a lot of data in them, they are said to be cold and when they do have a lot of data in them they are said to be warm.
This was the cause of the Slack's incident on 0-22-22. They had an issue that led to their cache layers being flushed thus becoming cold leading to high cache misses that overloaded their databases. Databases couldn't keep up with all the requests and kept timing out which also hampered their cache service from filling up with data thus leading to higher requests on database. They had to throttle their system so they can fail some requests and be able to serve some requests to gradually build their cache system to a level of normalcy.
Data Inconsistency
Caches as good as they are are also tricky to deal with. One of the biggest issues is data inconsistency. Data inconsistency is as a result of data in your database being different from data in your cache. This can be as a result of
updates on data not yet synchronised to your cache service. There are different ways to resolve this and they all have their pros and cons.
- Write through. Data will first be written to your cache and then written to your datastore synchronously. This is expensive as two writes have to be done simultaneously, one to the database and one to the cache.
- Write behind. Data will first be written to your cache and later be written to your database asynchronously.
They both shine in different situations. Let's say, you are building a caching layer that serve the likes
feature on your products in your e-commerce store. If someone likes a product, it's not integral to show them the right number of
likes at the moment, the consequences are inconsequential thus you can update the cache and update your database later. However, let's say you caching the quantity
of the products. It's very important to show the latest updated
quantity, otherwise you'll end up with more orders than products. You want to use a write through approach in such cases even though it's computationally expensive.
- Data invalidation. We can also update data on the cache when the data has been updated or we can invalidate the data and create a new data that the requests will be pointed to.
- Time to Live (TTL). We can also set a TTL, in which we give data a limited amount of time like 60 seconds, after which the data becomes obsolete. You want to set your TTL to a range +/- 10s of your chosen time. So for 60s, that will be range 50s-70s. This ensures all your cache data don't go cold at the same time thus overloading your database with requests.
Why don't we use cache everywhere?
If caches are so good then why don't we use them all the time?
- Because caches contain temporary data, the data are saved in your RAM thus when your system goes down or restarts, you lose your cache data. Due to continuous deployments, errors, autoscaling etc. containers restart and rebalance quite frequently. This means when you have your cache services baked in your applications, your cache will go cold frequently. This is why caches are recommended as standalone services (We will discuss this in another article).
- Apart from the transient nature of RAM, they are also very expensive. You need databases to write your data to disks not only to save your data but also to save you cost.
Conclusion
In the current climate where everything is supposed to be blazingly fast. You want to increase your system response times using cache. However, you want to be
also aware of the extra complexity it introduces and how to mitigate its pitfalls.
Top comments (0)