DEV Community

Cover image for Caching Is Easy. Production Caching Is Not.
Gaurav Sharma
Gaurav Sharma

Posted on • Originally published at thetruecode.com

Caching Is Easy. Production Caching Is Not.

This post is part of the series The True Code of Production Systems.

The first time you add caching to a system, it feels like a superpower.

One afternoon of work. Response times drop. Database load drops. The whole system breathes easier. You ship it, you move on, and somewhere in the back of your mind you file caching under "solved problems."

That filing is the mistake.

Because caching in production is not one decision. It is ten decisions, and most teams only consciously make one of them: the performance one. The other nine happen by default, by accident, or not at all. And defaults in production have a way of becoming incidents.

This article is about all ten. But before we get into them, let us look at a system where one of those defaults caused a real problem.


A Booking System That Did Everything Right. Almost.

A platform handles seat reservations for corporate training workshops. On a normal day it serves around two to three hundred requests per minute. The engineering team is small but experienced.

Workshop availability data was cached in Redis with a TTL of sixty seconds. The reasoning was sound — availability changes only when someone books or cancels. Caching it for a minute seemed perfectly reasonable, and for months it worked exactly as designed.

Then a well-known instructor announced a new batch of workshops on LinkedIn. The post got shared widely. Within minutes, several hundred users landed simultaneously to check availability and book seats.

The cached availability keys for those workshops had expired seconds before the spike hit. Every one of those hundreds of requests checked the cache, found a miss, and went directly to the database. The database — which had been handling 20–30 direct queries per minute — received several hundred simultaneous queries in a few seconds.

Connection pool exhausted. Query times climbed from milliseconds to seconds. The application started timing out. Users saw errors. Some refreshed, which made it worse. The platform was effectively down for four minutes during the highest-traffic window it had ever seen.

The cache was there. Redis was running fine. The TTL was set. Everything was configured.

Nobody had thought about what happens when a popular key expires at exactly the wrong moment.

We will come back to this system after the ten points. By then you will know exactly what went wrong and what a one-line fix would have looked like.


What Most Developers Think Caching Is

Cache the expensive query. Set a TTL. Use Redis. Done.

That mental model is not wrong. It is just incomplete. In production, every caching decision is simultaneously three other things:

  • A consistency decision — data in cache may no longer reflect reality
  • A reliability decision — a cache misbehaving under load can damage the system it was meant to protect
  • A cost decision — the wrong caching setup charges you quietly, consistently, and across more than one bill line item

Most developers ship caching thinking only about performance. The other three dimensions show up later, usually at inconvenient moments, usually pointing back to a decision that was never consciously made.


The Ten Things Production Caching Actually Requires


1. Your Caching Pattern Is a Choice. Make It Deliberately.

Most developers use Cache Aside without ever knowing they made a choice. The code checks the cache, finds a miss, goes to the database, stores the result, and returns it. It is the most common pattern. It works. But it is one of four — and each behaves differently in production.

Cache Aside puts the application in charge. You decide when to read from cache and when to write to it. This gives you flexibility, but every invalidation is your responsibility. Miss one code path that updates the underlying data without clearing the cache, and you silently serve stale data. No error. No alert.

Read Through moves that responsibility elsewhere — the cache itself fetches from the database on a miss. This keeps application code clean but creates a cold start problem: every fresh deployment begins with an empty cache, and until it warms up, your database absorbs full traffic.

Write Through writes to both cache and database on every write. Your cache is always in sync — but every write now has to complete in two places before returning to the caller.

Write Behind writes to cache immediately and updates the database asynchronously. Writes are very fast. But if the cache node goes down before the async write completes, that data is gone. Unless you have consciously decided that some data loss is acceptable, this pattern is not the right one.

Before you deploy, ask: What is my consistency requirement? Can users tolerate stale data, and if so, for how long? Which pattern actually matches that requirement?


2. Cache Invalidation: Why the Joke Is Not Actually a Joke

The two hardest things in computer science are cache invalidation and naming things. Most people chuckle and move on. They should sit with it longer.

TTL-based invalidation is what most systems use. Simple, easy to reason about, no inter-service coordination needed. The downside: TTL is a blunt instrument. Set it too long — users interact with stale data. Set it too short — you hammer the database repeatedly.

Event-based invalidation is more precise. When the underlying data changes, you immediately delete or update the cache key. The challenge is coverage: every single code path that can modify data must also trigger the invalidation. If you have five endpoints that update a user's profile and handle only four of them, you have a stale data bug that will appear random.

The situation that quietly destroys production systems is mixing both approaches across services with no shared strategy. Service A uses TTL. Service B uses events. Service C was written by a contractor six months ago. The cache becomes a state that no single person can fully reason about.

Ask yourself: Who owns cache invalidation in my system? Is there an actual strategy, or is each service doing its own thing independently?


3. The Cache Stampede: When Your Protection Collapses All at Once

This one catches even experienced teams off guard.

A popular cache key expires. At that exact moment, your system is handling high traffic. One thousand requests check the cache. All one thousand see a miss. All one thousand go directly to the database to fetch the data and rebuild the cache. Your database — which the cache was there to protect — absorbs a spike it was never provisioned to handle alone.

This is a cache stampede (also called a thundering herd). The irony: the more effective your cache, the worse the stampede when it fails.

Three ways to protect against it:

  • Mutex / locking — Only one request rebuilds a key at a time; others wait. Prevents the database spike but risks a queue buildup if the rebuild is slow.
  • Probabilistic early expiration — Before the TTL expires, the system starts refreshing the key using a probability function based on remaining TTL and rebuild cost. Hot keys effectively never go fully cold.
  • Background refresh — A dedicated worker keeps popular keys warm by refreshing them proactively before they expire. The application never experiences a true miss.

Ask yourself: What is peak concurrent traffic on my most accessed cache key? What happens to my database if that key expires right now, at this traffic level?


4. Some Things Should Never Be Cached

Knowing what not to cache is equally important and almost never discussed.

Transactional or financial data — account balances, order statuses, payment confirmations. If a user sees a balance that was accurate 30 seconds ago and makes a financial decision based on it, no performance gain justifies that. If stale data can cause a user to take a wrong action with real consequences, it should not be cached.

Highly personalised responses — the risk here is not performance. If your cache key does not capture every dimension that makes a response unique (user ID, role, tenant, locale, feature flags), you can serve one user's data to a completely different user. This has happened at companies of every size. The incident report always traces back to a cache key that was not specific enough.

Legally or contractually sensitive content — terms and conditions, regulated pricing, compliance documentation. Serving an outdated version is not just a UX problem. Depending on the industry, it can carry legal weight.

Ask yourself: If this cached value is served 60 seconds after it was written, what is the worst realistic outcome for the user receiving it?


5. Your Eviction Policy Is a Decision, Not a Default

Every cache has a memory ceiling. When it fills up, something gets removed. The question is whether that was a deliberate engineering choice or something that just happened because nobody changed the default.

In Redis, the default eviction policy is noeviction — when memory is full, Redis stops accepting writes and returns errors. That is almost certainly not the behaviour you want under load. Many teams discover this only when they are already in an incident.

Common strategies:

Policy Removes Best for
LRU Least recently accessed key Most general workloads
LFU Least frequently accessed key Workloads where long-term frequency matters more than recency
TTL-based Key closest to expiry Protecting long-lived data from short-lived displacement

Ask yourself: Have you explicitly configured your eviction policy? When your cache fills up at peak load, what should be protected and what should go?


6. The Cold Start Problem Nobody Prepares For

You deploy a new version of your application. The new instance comes up with a completely empty cache. For the first several minutes, every request is a miss. Every request goes to the database.

In a low-traffic system, barely noticeable. In a high-traffic system — or one with a database already near capacity — those first few minutes can look exactly like an incident. By the time someone traces it to the deployment, the cache has warmed up. The post-mortem notes it as "transient."

Until the next deployment.

Three approaches:

  • Cache warming on startup — Pre-populate your most-accessed keys before the new instance takes live traffic. Requires knowing your hot keys, which your observability setup should already surface.
  • Gradual traffic shifting — Old instances keep serving traffic with warm caches while new instances slowly build up state.
  • Sticky sessions during rollout — Routes users to consistent instances temporarily, limiting how many cold instances are simultaneously exposed to real traffic.

Ask yourself: What does your system look like in the 5 minutes immediately after a fresh deployment? Have you ever deliberately tested it?


7. Distributed Caching Is Not Just Single-Node Caching at Bigger Scale

When you move to a distributed cache cluster, the rules change in ways that are easy to miss.

Consider a write: your application updates cache node 1. Replication to node 2 is asynchronous and hasn't completed. Another request, routed to node 2, reads that key and gets the old value. Two users, the same request, nearly the same moment — different responses.

This is not a malfunction. It is the expected behaviour of an eventually consistent distributed system. The problem surfaces when the application is designed assuming strong consistency and the cache is delivering eventual consistency. That mismatch does not produce errors. It produces silent incorrectness.

Redis Cluster uses asynchronous replication. Under normal conditions, replication lag is milliseconds and practically invisible. But in failure scenarios — a node going down, a network partition, a failover — writes that were acknowledged can be lost before they propagate.

Ask yourself: Has your application been designed knowing that cache reads across nodes may not always be consistent? What actually happens to your users if they are not?


8. Security Gaps in Caching Are Invisible Until They Are Not

Here is how it goes wrong. You cache a response containing data belonging to a specific user. A second user sends a request that generates the same cache key. They receive the first user's cached response — their personal data, their account details, their private information — served silently to someone who should never have seen it.

This is a data breach that produces no exception, no error log, and no anomaly in performance metrics. The cache is working exactly as designed. The design is the problem.

The fix requires rigorous cache key scoping. Every dimension that makes a response unique must be part of the key: user ID, tenant ID, permission level, role, locale, feature flags. Leaving any out is not a minor oversight to patch in the next sprint. It is a live security incident waiting for the right traffic pattern.

The second concern: what lives in your cache at rest. Session tokens, access tokens, PII embedded in cached API responses. Most teams apply strict access controls to their databases. Not all of them apply the same rigour to their cache infrastructure.

Ask yourself: Are your cache keys scoped precisely enough that no response can ever be served to the wrong user? If your cache infrastructure were accessed by someone who shouldn't have it, what would they find?


9. If You Are Not Measuring Your Cache, You Do Not Know If It Is Working

A cache you cannot observe is either working fine or silently failing — and you have no way to tell which.

Three numbers tell you almost everything:

Hit rate — percentage of requests served directly from cache. A high, stable hit rate means the cache is doing its job. A rate slowly declining over days or weeks signals that data volatility has increased, TTLs have drifted, or a deployment changed behaviour upstream.

Miss rate — how often requests fall through to the database. A sudden spike means a stampede may be in progress, an invalidation pipeline has broken, or a deployment started cold.

Eviction rate — tells you whether your cache is sized correctly. A rising eviction rate means your working set is larger than your allocated memory. Data is being pushed out before it can be reused. Your hit rate follows downward. Your database load follows upward.

Together, these three numbers tell a continuous story. Without them, you are managing critical infrastructure entirely on faith.

Ask yourself: Can you pull up a live view of your cache hit rate, miss rate, and eviction rate right now? If not, that is the first thing to fix.


10. The Cost Is Real, and It Compounds Quietly

Under-provisioned cache: High eviction rates reduce hit rate → more database load → more compute needed → higher costs across multiple services.

Over-provisioned cache: You pay for memory that sits idle. Managed Redis on any major cloud provider bills idle capacity at the same rate as active capacity.

The right size comes from understanding your working set — the total data your application actually reads within a given time window. If your working set is 15 GB and your cache is 4 GB, you are not caching 15 GB. You are repeatedly evicting and re-fetching 11 GB of it, paying for database round trips on every cycle.

The other cost that accumulates quietly: data transfer. If your application instances and cache cluster live in different availability zones, you pay for cross-zone traffic on every cache read. On a high-traffic system with a high hit rate, that is an enormous number of reads. The per-request cost is small. The monthly total is not.

Ask yourself: Have you sized your cache from a working set analysis or from a number someone estimated at the start of the project? Do you know what your cross-zone cache traffic costs per month?


Back to the Booking System

Remember the platform that went down for four minutes? The cache was there. Redis was running. The TTL was set.

What they had not done was think about the stampede (point 3).

The availability keys for those popular workshops all had the same sixty-second TTL, set at roughly the same time when the workshops were first published. So they all expired together. When the traffic spike hit, every request found a cold cache simultaneously and went straight to the database.

The fix was not complicated. A background worker refreshing availability keys for popular workshops every 45 seconds would have kept those keys warm through the entire spike. The database would have seen normal traffic. Users would have seen normal response times.

One decision. Not made. Four minutes down.

That is what production caching actually looks like. Not a performance graph. A decision with a consequence.


The Thing That Ties All of This Together

Caching does not make your system faster.

Done right, it does. Done wrong, it makes your system faster right up until the moment it does not. And when it fails, it tends to fail suddenly, in ways that are difficult to trace back to a decision made quietly, months earlier, on an ordinary afternoon.

The engineers who build systems that hold up under real pressure are not necessarily smarter. They are more deliberate. They treat each of these ten things as a conscious choice rather than something that gets handled by default.

Make the choices. Write them down. Revisit them before you ship.


Production Ready Checklist

Go through this before anything involving caching reaches production. Not as a formality — as a genuine engineering checkpoint.

  • [ ] Have I consciously chosen a caching pattern and do I understand its consistency trade-offs?
  • [ ] Do I have a defined invalidation strategy with a clear owner, clear triggers, and handling for silent failures?
  • [ ] Have I protected my hottest cache keys against a stampede event?
  • [ ] Have I audited what I am caching and confirmed none of it is transactional, financial, or dangerous when stale?
  • [ ] Have I explicitly configured my eviction policy rather than accepting the default?
  • [ ] Have I planned and actually tested what happens in the first five minutes after a cold deployment?
  • [ ] Do I understand my cache cluster's replication and consistency model, and has my application been designed with that in mind?
  • [ ] Are my cache keys scoped precisely enough that no response can ever be served to the wrong user?
  • [ ] Do I have live monitoring for hit rate, miss rate, and eviction rate?
  • [ ] Have I sized my cache from a working set analysis and not from a rough estimate?

Originally published on The True Code — a series on production-critical engineering, stack-agnostic, with enough depth to actually change how you think.

Top comments (0)