Tharindu Sathischandra

Posted on Oct 17, 2020

A quick introduction to Distributed Caching

#cache #distributedchaching

Cache?

Cache is simply the storage capacity for data on a system that is reserved for the quicker servicing of future requests.

Why Caching:

Reduce Network calls
Avoid Re-computations
Avoid load on DB
Improve availability of data, by providing continued service even if the backend server is unavailable
Improved data access speeds brought about by locality of data
Smoothing out load peaks (e.g.: by avoiding round-trips between middle-tier and data-tier)

Distributed Caching

Cache is maintained as an external service to application servers (front-end servers)
Typically deployed on a cluster of multiple nodes forming a large logical cache (horizontal scalability)
Managed in a synchronized fashion, making the same cached data remotely available to all application servers (consistent)
Survives application server restarts and deployments
Preserves high availability

Use Cases Of Distributed Caches

Database Caching to reduce DB bottleneck:

The Cache layer in-front of a database saves frequently accessed data in-memory to cut down latency & unnecessary load on it.

Storing User Sessions:

User sessions are stored in the cache to avoid losing the user state in case any of the instances go down.

Cross-Module Communication & Shared Storage:

Saves the shared data which is commonly accessed by all the services. It acts as a backbone for microservice communication.

In-memory Data Stream Processing & Analytics:

Real-time processing (opposed to traditional ways of storing data in batches & then running analytics on it)

E.g.: anomaly detection, fraud monitoring, online gaming real-time stats, real-time recommendations, payment processing, etc.

Distributed Hash Tables

Generally, a Distributed Cache is based on a Distributed Hash Table (DHT) which is similar to hash-table but spread across multiple nodes.
Key-value pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key
DHT allows a Distributed cache to scale on the fly, by managing the addition, deletion, failure of nodes continually.

Distributed Caching Strategies

There are different kinds of caching strategies that serve specific use cases.

Cache Aside
Read-Through
Write-Through
Write-Back
Write-Around

By selecting proper strategy caches can reduce response times, decrease the load on the database, and save costs. The strategy is selected based on the data and data access patterns ( data size, data uniqueness, access frequency, …).

1. Cache Aside Strategy

Cache sits on the side and the application directly talks to both the cache and the DB.
When the user sends a request for particular data: The system first looks for it in the cache & If present it's simply returned. Else, the data is fetched from the database, the cache is updated & is returned to the user.
Data is lazy loaded into cache.
In this strategy, the data is written directly to the database. This means things between the cache and the database might get inconsistent.
- To avoid this, data on the cache has a TTL (Time to Live). After TTL the data is invalidated from the cache.
Resilient to cache failures. (If, cache cluster goes down, the system can still operate by going directly to the database.)
Best suits for:
- Read-heavy workloads
- Data that is not much frequently updated: e.g.: user profile data (name, birthday, etc.).

2. Read-Through

Cache sits in-line with the database.
When there is a cache miss, it loads missing data from the database, populates the cache and returns it to the application.
Cache always stays consistent with the database.
Best suits for: Read-heavy workloads.

3. Write-Through

Data is first written to the cache and then to the database.
The cache sits in-line with the database and writes always go through the cache to the main database.
Maintains high consistency between the cache and the database (but, adds a little latency during the write operations as data is to be updated in the cache additionally.)
Best suits for: write-heavy workloads like online multiplayer games

4. Write-Back (write-behind)

Application writes data to the cache which acknowledges immediately and after some delay, it writes the data back to the database.
Resilient to database failures and can tolerate some database downtime
Risky as if the cache fails before the DB is updated, the data might get lost. (Write-back strategy is used with other caching strategies to more advantage) Write Back Distributed Cache Strategy

Cache Eviction

A cache eviction algorithm is a way of deciding which element to evict when the cache is full.

Some common cache eviction policies:

First In First Out (FIFO): First discards the first block accessed
Last In First Out (LIFO): First discards the block accessed most recently
Least Recently Used (LRU): First discards the least recently used items
Most Recently Used (MRU): First discards the most recently used items
Least Frequently Used (LFU): Counts how often an item is needed. Those that are used least often are discarded first.

Some popular distributed caches used in the industry are: Redis, Hazelcast, Eh-cache, Memcached, Riak, …

DEV Community

A quick introduction to Distributed Caching

Cache?

Why Caching:

Distributed Caching

Use Cases Of Distributed Caches

Database Caching to reduce DB bottleneck:

Storing User Sessions:

Cross-Module Communication & Shared Storage:

In-memory Data Stream Processing & Analytics:

Distributed Hash Tables

Distributed Caching Strategies

1. Cache Aside Strategy

2. Read-Through

3. Write-Through

4. Write-Back (write-behind)

Cache Eviction

Useful resources

Top comments (0)

Read next

Becoming a Good Software Engineer: Business Alignment and Performance Optimization 💯

Docker for Infrastructure as Code (IaC): Automating Infrastructure with Containers

JavaScript Event Loop

Scaling Frontend Development with Single-Spa Micro-Frontends