Deepak Singh Solanki

Posted on Apr 12 • Originally published at deepakinsights.Medium on Mar 8

Cache Strategies in Distributed Systems

#architecture #distributedsystems #performance #systemdesign

Introduction

In 2021, during a mock exam for 25,000 students, our system crashed. The reason was not just high traffic. All our cache keys expired at the same time. Every request hit the database directly. The database slowed down. The exam failed before it even started.

That day taught me one thing: caching alone is not enough.

In distributed systems, cache is your first line of defence. It stores frequently accessed data in high-speed memory, reduces database load, and improves response time. But if you don’t manage it properly, cache itself becomes the reason your system goes down.

In this article, I will walk you through why basic TTL breaks under pressure and what I learned about TTL Jitter, Probabilistic Early Re-computation, Mutex Locking, Stale-While-Revalidate, and Cache Warming the hard way.

Why Basic TTL Is Not Enough

When I first started working with caching, TTL felt like a complete solution. Set an expiry time, cache refreshes automatically. Simple and clean.

But here is the problem. TTL does not care about what is happening in your system. It does not know if it is 2 AM with zero traffic or an IPL match day with millions of users online. It just expires. Every single time.

In a single server system, this is manageable. But in distributed systems, you are not dealing with one cache key. You are dealing with thousands of them. And when all of them share the same TTL, they all expire at the same time.

That is exactly what happened with us. Our cache TTL was set to expire between 5 PM to 6 PM every Sunday. The same time our exam started. 25,000 students hit the system, cache expired simultaneously, and every request went straight to the database.

Basic TTL has one job: expire the cache. It does that job well. But it has no strategy for what happens next. No coordination. No awareness. Just expiry.

That gap is where distributed systems break.

How Cache Expiry Causes Traffic Spikes

A request comes in, server looks for data in cache, finds it, and sends the response back. The database is never involved.

But what if thousands of cache keys expire together? Suddenly, every request finds an empty cache and hits the database directly. Connection pool fills up. Queries start queuing. Response time increases.

Users see a slow response and start refreshing. More requests. More load. More failures. The system keeps struggling until it crashes.

On our exam day, this is exactly what happened. 25,000 students hit the database directly. Students refreshed. The system could not recover in time.

This chain reaction is called Thundering Herd. And basic TTL has no answer for it.

TTL Jitter

Different roads get a green signal at different times to avoid all traffic moving simultaneously. If all roads get a green signal at once, it results in chaos. Same thing happens when all cache keys expire at the same time. The database gets hit by thousands of requests together and the system struggles to recover.

TTL Jitter solves this by adding a small random value to each cache key’s expiry time. Some keys expire a little earlier, some a little later. No two keys expire at exactly the same time. This spreads the database load across a window instead of hitting it all at once.

It is a small change but it makes a big difference during high traffic events like IPL match day or a Big Billion Day sale.

Probabilistic Early Re-computation

A car fuel indicator does not wait for the tank to go empty before warning you. It alerts you early, while there is still enough fuel to reach a petrol pump. You always refill it before it goes empty.

Probabilistic Early Re-computation works the same way. Instead of waiting for a cache key to fully expire, the system starts refreshing it early. But not always. It uses a probability calculation to decide whether to trigger the refresh. The closer the cache gets to its expiry, the higher the chance of early refresh. This way, the cache is always ready before it expires, and the database never gets a sudden spike of requests.

This is especially useful during high-traffic events. When millions of users are hitting the same cache key, you cannot afford to let it expire. Probabilistic Early Re-computation ensures the cache is silently refreshed in the background before expiry hits, and the thundering herd never gets a chance to start.

Mutex / Cache Locking

Imagine a busy grocery store with a single billing counter. When one customer is being billed, others wait in the queue. Nobody jumps the counter to bill themselves. One at a time, in order.

Mutex / Cache Locking works the same way. When cache expires, only the first request acquires the lock and hits the database to regenerate the cache. All other requests wait in the queue till the cache is ready. Once the cache is refreshed, the lock is released and all waiting requests get the data directly from the cache.

One important thing. Always set an expiry on the lock. If the request that acquired the lock crashes, the lock must auto-release. Otherwise, all waiting requests will be stuck forever and your system will freeze.

Stale-While-Revalidate

The government announces a petrol price hike from today’s midnight. But you can still buy petrol at old price as price is not updated yet and new price applies from tomorrow.

CDN is another example. Platforms like Cloudflare continue serving cached content to users while fetching fresh content from the origin server in the background.

Stale-While-Revalidate works the same way. When a cache key expires, the system continues to serve old cached data and refreshes cache in background at the same time. Once the new data is ready, future requests get the updated cache.

This ensures users never face a slow response because of cache expiry. The system stays responsive even during cache regeneration. No waiting. No database spike. No thundering herd.

Cache Warming / Pre-Warming

Before every IPL match, Hotstar engineers start preparing the servers for live streaming to millions of users. They load match data, player stats, team lineups, and streaming configurations into cache well before the first ball is bowled. Everything is already ready before users open the app.

Netflix does this every night. They pre-compute the homepage for every user profile and load it into cache before you even open the app. By the time you login, your homepage is already ready.

This is called Cache Warming. Instead of waiting for the first request to hit the database and build the cache, the system proactively loads the cache before traffic arrives. This ensures the cache is already hot and ready to serve before users start coming in.

Without cache warming, the first wave of users after a big event launch hits an empty cache. Database gets slammed. The system slows down. That is exactly the thundering herd situation we want to avoid. Cache warming ensures the herd never finds an empty cache.

Tradeoffs

Every caching strategy is a tradeoff between Freshness, Speed and Consistency. Freshness means how up-to-date your cached data is. Speed means how fast your system responds. Consistency means how uniform the data is across all users at the same time.

If you want fresh and consistent data every time, your database may slow down under high traffic because you are not serving older data from cache. If you want to reduce latency and maintain consistency, caching helps but may not always serve the freshest data. If you want high consistency with reduced latency, you may need to serve slightly older data until the cache is updated.

When to Use Which Strategy

There is no thumb rule for choosing a caching strategy. It totally depends on your use case, your traffic patterns, and how critical data freshness is for your system.

Here is a simple guide to help you decide:

If your system has many cache keys expiring around the same time, start with TTL Jitter. It is the simplest fix and works well as a default for all distributed systems.
If you have hot keys that are hit by millions of requests, use Probabilistic Early Re-computation. It ensures cache is always ready before expiry hits.
If rebuilding cache is expensive and you cannot afford duplicate recomputation, use Mutex / Cache Locking. One request rebuilds, others wait.
If your system is read-heavy and you can tolerate slightly old data, use Stale-While-Revalidate. Speed is the priority here.
If you know a traffic spike is coming, like an IPL match or a sale day, use Cache Warming. Prepare before the herd arrives.

And remember, these strategies are not mutually exclusive. In real systems like Hotstar or Amazon, engineers often combine multiple strategies together. For example, TTL Jitter with Cache Warming before a big event, or Mutex with Stale-While-Revalidate for read-heavy APIs.

Conclusion

After that failed exam, we made two important changes in our cache strategies. We started using multiple Cache TTL values and set them to expire at different times to avoid simultaneous expiry. We also pre-warmed our servers before every big event to ensure cache was ready before traffic arrived.

These two changes made all the difference. We tested the system for 30,000 students and this time, everything worked smoothly.

That experience taught me that caching is not just a performance tool. In distributed systems, how you manage cache can be the difference between a smooth experience and a total system failure.

Start simple. Use TTL Jitter as your default. Add more strategies as your system grows. And always prepare before the spike hits, not after.

DEV Community