João Godinho

Posted on Mar 29

Cache Layers in Modern Applications

#distributedsystems #computerscience #systemdesign

What is Cache?

Any temporary storage for copies of data in order to get faster responses. It can be related to hardware, where the CPU accesses caches (L1, L2, L3) to store copies of data from main memory and improve access speed. This happens due to multiple factors: the technology used to build caches (L1, L2, L3) is different from the one used for main memory, making it faster but more expensive, and it is also physically closer to the CPU, which reduces data transfer latency. The cache typically stores recently and/or frequently accessed data.
- Note: Cache is different from a Data Buffer, since a buffer is used as temporary storage not to get the most frequently and recently used data, but to manage the problem when we have consumers and producers operating at different rates.
In this post we will discuss primarily cache as internet technologies used to provide scalability, performance, cost reduction and more (focusing on system design). For example, web browsers cache HTML, images, and more after first load; cache of DNS records on the OS; CDN servers cache content to reduce latency. (We will not discuss hardware caches, DNS cache, DB caches and other caches, but they are also important)
Covered topics: the role of cache, cache layers, and how to combine them to achieve better performance and scalability.

Why to use caching?

Increase performance: Reading from memory is much faster than from disk, resulting in faster data access. Itsignificantly reduces database I/O and increases read throughput.
Reduce costs: Reduces database costs since cache can reduce database load, allowing you to reduce the number of DB instances, and if the DB service or VPS charges per throughput caching will also reduce costs.
Scalability: With in-memory cache we can handle application access spikes more easily, such as Black Friday. Caching most accessed data is crucial and ensures the app will handle the load without DB bottlenecks.
There are two main aspects about caching: performance and data freshness.
- We want to serve the fastest responses possible while still providing the required data freshness. This can vary depending on the application domain, for example, the price of a product should never be stale, but a blog post in general doesn’t require freshness.

CACHE HIT and CACHE MISS

HIT = data is in cache, use it from cache (faster load)
MISS = data isn’t in cache, save it to cache

Considerations before Cache layers

This nomenclature and numbering are not formal definitions. I am using them as a way to explain and make caching layers easier to understand. Don’t confuse this with memory hierarchy or OSI layers; the way I used this has nothing to do with either. It is only a naming approach to help understand different levels of caching.

Layer 1: Browser Cache (plus Concept of Time to Live)

A natural starting point is web browsers, one of the most used categories of applications in the contemporary world. As the name suggests, this cache isn’t handled by web developers but by the browser application itself, although developers can configure it correctly in their server responses, we will cover this topic below, first let’s understand this mechanism.
When a user first accesses a web page, the browser loads a huge amount of data, and to avoid loading it after every refresh it stores it on the user’s disk with a Time to Live - which determines how long the copy will stay stored in the cache - this copy can be deleted in some scenarios: time to live expires, using ETags, or the cache is full and needs to be replaced with other data.
As mentioned, it is provided by browsers, and is the best cache we can have since it is on the user’s computer, avoiding hitting our CDN servers or origin servers. But the developer needs to analyze it carefully since if you use it incorrectly you will face users with an outdated cache that you can’t clean programmatically because of misconfiguration.
It also works similarly on mobile applications, utilizing HTTP headers to configure cache constraints.

HTTP Cache Headers

It’s crucial to configure the cache correctly by setting the best matching HTTP cache headers for server responses.
With HTTP cache headers you can configure not only browser caches but other caches that we will discuss throughout this article. A key point is to understand that you can tell the cache layer (in this case the web browser) things like:
- 1 “DON’T CACHE THIS”
- 2 “CACHE FOR THIS TIME X”
- 3 “CACHE USING THIS TAG X, AND I TELL YOU IF SOMETHING CHANGED”
  - This last approach allows developers to update cache programmatically without the need for a TTL or the user manually cleaning the cache.
  - The browser sends a request with the ETag header before using data from cache, and the server can respond with one of the following: 1st - “304 Not Modified”, which means “use from cache” or 2nd - “200 OK”, which means “new response not from cache, update cache”
- Obviously there are multiple concerns about HTTP cache headers that I will not discuss here. For more about HTTP cache headers HTTP Caching 101

Layer 2: Content Delivery Network Cache

A CDN is a reverse proxy server that lives closer to end users, and can be used for many server optimizations, one of which is caching.
Everything that we’ve discussed about HTTP cache headers for browsers works almost the same here with an important difference: it works like a “centralized cache”, not on the user’s machine.
When talking about browser cache, you’ve ignored the following situation:
- Imagine that you have an admin user that accesses the /admin route, should you cache the response data for all your users? NO.
- If you still want to cache it, you need to better understand HTTP cache headers and set Cache-Control: private, which tells the CDN to not store it, only the web browser.

Layer 3: Reverse Proxy Cache (Infrastructure)

In a case where you have multiple server instances running in a single region, you can use a reverse proxy in front of these instances to load balance traffic among them and also cache their responses. It is also configured using HTTP headers.
It can be implemented using Varnish, Nginx, or other reverse proxies.
To understand better, read the section about reverse proxy in: How Content Delivery Networks

Layer 4: Application Cache

There are two scopes for this application cache: Local and Distributed
Local application cache: Only one instance has access to it (in-process cache)
- In the case where you have multiple instances, using local cache would introduce consistency issues, since each instance has its own local memory that is not shared.
- If you have a single instance, it’s clearly a good idea to use a local in-memory application cache, since it’s the fastest type of cache available.
- This reminds us of the idea of memoization, which is storing function results in memory for future use to improve computation speed. But here it is used for frequently accessed data.
- Ensure you manage it correctly to avoid excessive memory usage; for that, use libraries in your language, for example: NPM lru-cache - JS.
Distributed application cache: Used when you have multiple instances (Redis/Memcached)
- As mentioned, horizontal scaling requires more than a single instance server, and for that we need a shared cache, also called a distributed cache.
- It is slower than local in-memory cache since distributed cache adds network communication overhead. However, it is still much faster than querying the database directly.
Could we mix both local and distributed? Yes, by trying to achieve something similar to the hardware memory hierarchy, where we have different cache levels. However, it is important to take care with this approach since it can lead to consistency issues. If serving stale data for a short window is acceptable, this approach can work; otherwise, you should avoid mixing both strategies.

Last observations about these Cache Layers

People tend to think that only application cache exists, and also think that they can only use Redis/Memcached as distributed caches. This is an enormous mistake that can lead to low performance and high costs. Imagine having a multi-region application and a single Redis instance placed in one region, then you would pay too much latency for all other regions that do not share the same region as your Redis/Memcached.
Another thing is: not only static assets can be cached on browser and CDNs, you can and may cache your API responses.
The combination of these layers is the best case for application performance and scalability.

Start with the simplest caching layer that solves your problem. Browser caching and CDN caching are almost free. Application-level caching is the next step. Only add more complexity (write-through, stampede prevention, stale-while-revalidate) when you have evidence that you need it.

Caching caveats (mainly cache invalidation)

There are only two hard things in Computer Science: cache invalidation and naming things.

— Phil Karlton

Caching isn’t all about lightning speed, but also about using the best caching strategy to avoid serving stale data and crashing servers, especially at the application-level cache.
- If you don’t use the correct cache strategy, you will probably find yourself using too many resources to achieve what someone who knows cache strategies and when to use each can achieve. This is not the main topic of this article, but a discussion for a future one.
- You can serve stale data by not updating it in cache when it is necessary.
- You can crash your server in several ways with misuse of caching:
  - A hot key suddenly becomes invalid and then multiple DB calls happen at the same time to revalidate cache.
  - If you are using a local cache solution in your origin server that keeps growing indefinitely and consumes all your RAM.
  - Distributed cache failure: your entire cache cluster goes down and now all traffic falls back to the DB.
One example of misuse of caching is: using a high-cardinality cache key that is never reused: user:{timestamp}
- Congrats! By doing this you now have only unnecessary extra trips that always result in a miss: cache read -> miss -> db -> cache save
There are multiple things to take care of when using caching and multiple strategies to consider, but it can be a game changer for your application if well set. Stale data issues tend to be the most frequent ones. Also, never forget race conditions and other common issues that will be discussed in a future post.

DEV Community