Caching Strategies? What's that?
Note: in-case you haven't read Deep Dive into Caching: Techniques for High-Performance Web Apps the previous blog.
Before we go deep, let us understand some common policies here
Write Through: The data is written in cache and backing store / DB simultaneously (parallelly).
Write Around: The Data is written only to the Backing Store / DB, not caches.
Write Background/ Write Behind: The Data is written to the Cache first then the Backing Store / DB in the background.
Read Through: The Data is Written to the Backing Store / DB. if the data is ever read it's written on the cache. This makes the first data read to be time taking but subsequent reads are faster.
Each of the policies above has advantages and disadvantages.
In the case of Distributes / Microservice Architecture, the thing would be distributed more based on the scale of the whole system, and other techniques such as sharding, etc are involved. Will be writing about this on some other blogs.
Problem
Ahh, When to which Writing Policies?
Let's understand some use cases for each of them
1. Write-Through Policy
Data Consistency : When strong consistency between the cache and the underlying data store is required. Any data written to the cache is immediately available in the backing store.
Simple Implementation : Easy to implement and understand since every write operation is propagated to the underlying data store.
Read-Heavy Workloads : Suitable for scenarios where read operations are more frequent than write operations, as the data in the cache is always consistent with the data store.
Examples:
Session Management : In web applications, session data needs to be consistent and immediately available across multiple nodes.
Configuration Data : Configuration settings that are frequently read but rarely changed.
2. Write Around Policy
Write-Heavy Workloads : Suitable for applications with frequent writes and less frequent reads, reducing the number of write operations to the cache.
Cold Data : Ideal for scenarios where data is not frequently accessed after being written. The cache is not burdened with rarely accessed data.
Examples:
Bulk Data Imports : Applications that periodically import large datasets where the data is not immediately needed for reading.
Logging Systems : Systems that write log data directly to storage but only occasionally read the data for analysis.
3. Write Behind (Write Back) Policy
Performance : Improves write performance by quickly acknowledging write operations and deferring the actual write to the data store.
Batch Processing : Suitable for scenarios where data can be written in batches to the underlying store, reducing the write load.
Data Freshness : Suitable when immediate consistency is not critical, and slight delays in data propagation to the data store are acceptable.
Examples:
User Activity Logging : Applications that log user actions where the logs are periodically flushed to the database.
E-commerce : Shopping cart data that is written to the cache for quick access and periodically synchronized with the database.
4. Read Through
Lazy Loading : Useful for loading data on demand, caching it only when it is actually needed.
Read-Heavy Workloads : Suitable for applications where read operations significantly outnumber write operations, and data needs to be quickly accessible after the first access.
Examples:
Product Catalogs : E-commerce applications where product details are read frequently but updated infrequently.
Content Management Systems (CMS): Systems where articles or media are read frequently after the initial publication.
When to select what
Need Consistency
Write Through : Ensures strong consistency as data is written to both the cache and the store simultaneously.
Write Around : Can lead to stale cache data until the data is read and cached.
Write Behind : Provides eventual consistency with potential lag between cache and store.
Read Through : Ensures data is cached on first access, potentially leading to stale data if not frequently updated.
Need Performance
Write Through : This can be slower for write operations due to double writes (cache and store).
Write Around : Reduces write load on the cache, faster write operations.
Write Behind : Improves write performance, but read operations may suffer if cache and store are not in sync.
Read Through : Fast read operations after initial cache miss, good for read-heavy scenarios.
Need Simplicity
Write Through : Simple to implement and ensures immediate consistency.
Write Around : Simple for write operations but requires cache management for reads.
Write Behind : More complex due to the need for asynchronous write handling and potential consistency issues.
Read Through : Straightforward for reads, requires handling of initial cache misses.
It depends on the use case you are solving.
Now Let's go deep, into the strategy used while implemeting cache.
Developers: I commonly use the OG, LRU cache most of the time.
LRU (Least Recently Used) is a popular caching strategy, but it's not always the best fit for every use case. There are several alternative caching strategies, each with its own strengths and suitable scenarios.
There are many with their own use cases, would be naming them here
LRU : Best for scenarios where the most recently accessed items are most likely to be accessed again soon.
LFU : Best when the access frequency is a good predictor of future accesses.
FIFO : Simple, best when the oldest data is the least useful.
Random Replacement (RR) : Simple, good for unpredictable access patterns.
Time-To-Live (TTL) : Best for time-sensitive data that becomes stale after a certain period.
Adaptive Replacement Cache (ARC) : Adapts well to changing access patterns, more complex.
Least Recently/Frequently Used (LRFU) : Balances between recency and frequency, tunable.
Segregated LRU (SLRU) : Useful for multi-segment caches with different types of data.
Most Recently Used (MRU) : Useful in specific scenarios where the most recent data is less useful.
Clock Algorithm : A variant of LRU that approximates its behavior using a circular buffer (clock) and a use bit for each page.
2Queue : Balances recency and frequency with separate queues.
Some frameworks by default support this strategy such as Django, and Spring Boot, and many more.
If you reached here read the Blog. Thanks for Reading. Hope this blog is about caches.
What did we learn
- When to use what policy which implementing caching
- How to not overload the caches by using various techniques for implementing caches.
Follow for more interesting blogs, Follow me to make me motivated to write some more interesting stuff.
Here are my social - Linked In
Top comments (6)
One issue you don't address is transactional integrity.
When you have many threads and processes writing to the database and cache you get a problem where you need cache consistency.
The definitive solution is two-phased commit (so you write to the cache and the db in the same transaction), but many of the cache tools we use today don't support that.
So the solution I've always used is to:
If the transaction fails, the old value will be reloaded. If the transaction succeeds the new value will be loaded on the first read.
Writing to the cache directly puts us in the uncomfortable position of having to undo our change if there's some error. So this simple mechanism works well.
This technique is similar to Two-Phase Commit (2PC) to ensure that all parts of the transaction either commit or roll back together.
But this really a smart way. Thanks for Sharing here.
Cool
read about caching at scale
nice article!
nice