Caching is often marketed as a free performance boost: drop in Redis, flip a flag, and enjoy instant speed. In reality, a cache is another network hop, another moving part, and another place where a request can get stuck or time out under load.
The “easy win” that wasn’t
We had a hot endpoint running close to its limits, so we put a cache in front of it. The expectation was simple: fewer database hits and dramatically lower latency.
What actually showed up in the graphs:
- p95 latency went up instead of down.
- “Fast locally, slow in prod” became something we said almost every day.
- Incidents became harder to reason about, because every request now had two potential bottlenecks: the cache or the origin.
The code looked clean, the cache dashboards were green, but the user experience was clearly worse.
Where the slowdown really came from
Cache hits still cost something
Even on a hit, you pay for:
- A network round trip to the cache (if it’s remote).
- Serialization and deserialization.
- Connection pooling, TLS, retries, and all the overhead around them.
On a well‑indexed, already‑warm database, this extra hop can actually be more expensive than just running the original query.
Misses can be double work
On a miss, the “simple” read path often turns into:
-
GETfrom cache → miss - Fetch from origin
-
SETinto cache
If the data changes frequently or isn’t reused much, you’ve just added extra work to almost every request. The hit‑rate chart might not look terrible, but most of the real cost is hiding on the miss path.
Stampede and churn
When a popular key expires, a lot of requests can pile onto the origin at once. That thundering herd effect can send p95 through the roof exactly when the system is under its heaviest load.
Short TTLs plus high‑cardinality keys are another kind of tax. If entries are constantly being evicted, the cache stops behaving like a cache and turns into an expensive pass‑through layer you touch on every request.
When the cache becomes the bottleneck
When the remote cache service slows down or runs out of resources, the application is dragged down with it. In those moments:
- App logs may not show a clear error.
- But APM starts showing “external call” time dominating the whole request.
- You’ve basically traded “the DB is slow” days for “the cache is slow” days.
How to tell if your cache is really helping
First, measure the cache as if it were its own service:
-
cache_hit_rate(per endpoint or key group) -
cache_get_msandcache_set_ms(p50 / p95 / p99) origin_msrequest_total_ms-
cache_timeoutsandcache_errors
Then run a small experiment:
- Bypass the cache for 5–10% of traffic and compare p95/p99.
- Look at end‑to‑end latency for both the hit path and the miss path.
If the hit path is not clearly cheaper than calling the origin, your cache is just an expensive layer of complexity.
The changes that actually helped
Without throwing the cache away completely, these changes made a real difference:
- Cache less, but smarter: Only cache reads that are both expensive and reusable.
- Store smaller objects: Cache a minimal DTO instead of a huge, fully hydrated object graph.
- Use request coalescing / single‑flight: For the same key, collapse concurrent requests so only one goes to the origin.
- Add TTL jitter: Keep things from expiring all at once and reduce the risk of a stampede.
- Use stale‑while‑revalidate: Serve slightly stale but fast data while refreshing the cache in the background.
- Set tight timeouts and intentional fallbacks: Don’t let cache timeouts dictate your entire API’s latency.
Once these were in place, the cache finally started acting like a performance layer instead of “just another production problem.”
Which part of this story feels most familiar to you? The thundering herd, the disappointing hit rate, or the “we did everything right, so why is p95 still bad?” question? Drop your own cache war stories in the comments—they’re basically tiny post‑mortems that help the next person avoid the same mess.
Top comments (0)