In this article we’ll cover how the browser cache and HTTP headers work, when and how to use stale-while-revalidate, how service workers give you programmatic control over caching, what you should never cache, and how cache invalidation works in practice. All of it from the perspective of building for platforms that can’t afford to get this wrong.
Caching is one of those topics that every frontend developer thinks they understand, until they’re staring at a production issue where users are getting stale data, or worse, the server is getting hammered because nothing is being cached at all.
I’ve worked on platforms handling more than 10 million active users. And I can tell you, caching stops being a “nice to have” the moment your scale starts growing. It becomes the difference between a platform that feels fast and one that quietly falls apart under load.
This article is everything I’ve learned about frontend caching, written the way I wish someone had explained it to me early in my career.
First, Understand What You Are Actually Caching
Before you touch a single HTTP header or write a line of service worker code, ask yourself one question.
What is the cost of serving stale data here?
That question determines everything. Because caching is always a tradeoff between freshness and performance. The mistake most developers make is treating all resources the same way. They’re not.
Here’s how I think about it:
Static assets (JS bundles, CSS, fonts, images) - these can be cached aggressively, sometimes forever, if you version them correctly.
API responses - depends entirely on how often the data changes and how much it matters if the user sees something slightly outdated.
HTML documents - usually should not be cached aggressively, especially for authenticated apps.
User-specific data - almost never cache this without thinking carefully.
Get this mental model right first. Everything else follows from it.
Browser Cache and HTTP Headers
The browser cache is your first and most powerful caching layer. It lives between the user and your server, and it’s controlled entirely through HTTP response headers.
Cache-Control
This is the header you’ll use the most. Here’s what the key directives actually mean:
Cache-Control: max-age=31536000, immutablety
max-age tells the browser how many seconds to keep this resource before considering it stale. immutable tells the browser not to bother revalidating it even on a hard refresh, because the content will never change.
Use this combination for versioned static assets, your JS bundles, CSS files, and images that have a content hash in the filename. Something like main.a3f9c2.js. The hash changes every build, so the URL changes, so you never serve stale code. Cache it forever.
Cache-Control: no-cache
Despite the name, this does not mean “don’t cache.” It means “cache it, but check with the server every time before using it.” The server can respond with a 304 Not Modified and the browser uses the cached version. No full download needed.
Use this for your HTML documents. You want the browser to always check if there’s a new version, but still benefit from caching when nothing has changed.
Cache-Control: no-store
This actually means “don’t cache.” Nothing is stored anywhere. Use this for sensitive data, authentication responses, anything you never want sitting in a cache.
ETag and Last-Modified
These work alongside Cache-Control for revalidation. When the browser asks "has this changed?", the server uses these to answer.
ETag: "abc123"
Last-Modified: Mon, 01 Jan 2024 00:00:00 GMT
The browser sends back If-None-Match: "abc123" or If-Modified-Since on the next request. If nothing changed, the server returns 304 and saves the bandwidth of sending the full response again.
At scale, these small savings add up to a significant reduction in server load.
Stale-While-Revalidate
This is one of the most underused caching strategies I’ve seen in frontend codebases, and it’s genuinely powerful.
Cache-Control: max-age=60, stale-while-revalidate=300
Here’s what this does. For the first 60 seconds, serve from cache, no questions asked. Between 60 and 300 seconds, serve the stale cached version immediately, but kick off a background request to revalidate it. After 300 seconds, it’s stale and must be revalidated before serving.
The user gets an instant response. The cache gets updated in the background. No loading spinner, no waiting.
This pattern is perfect for data that changes occasionally but doesn’t need to be real-time. Think navigation menus, configuration data, content that updates a few times a day. The user always gets a fast experience and the data stays reasonably fresh.
You’ll also recognize this pattern from TanStack Query’s staleTime and gcTime configuration. The underlying idea is exactly the same, just applied at the JavaScript layer instead of the HTTP layer.
Service Workers - The Programmable Cache
HTTP headers give you declarative control over caching. Service workers give you programmatic control. That’s a significant difference.
A service worker sits between your app and the network, intercepting every request. You decide what happens with each one.
self.addEventListener('fetch', (event) => {
event.respondWith(
caches.match(event.request).then((cachedResponse) => {
if (cachedResponse) {
return cachedResponse
}
return fetch(event.request)
})
)
})
This is a simple cache-first strategy. Check the cache first, fall back to the network if nothing is found. For a platform serving millions of users, this means repeat visitors often never hit your server for static assets at all.
Caching Strategies with Service Workers
Cache First - Serve from cache, fall back to network. Best for static assets that rarely change.
Network First - Try the network, fall back to cache if offline. Best for API data where freshness matters but offline support is needed.
Stale While Revalidate - Serve from cache immediately, update cache in background. Best for non-critical content where speed matters more than freshness.
Cache Only - Only serve from cache. Useful for assets you’ve pre-cached during install.
Network Only - Always go to network. For requests that should never be cached, like analytics or payment endpoints.
Pick your strategy per resource type, not globally. A single strategy for everything is almost always the wrong call.
Pre-caching vs Runtime Caching
Pre-caching happens when the service worker installs. You explicitly list assets to cache upfront.
self.addEventListener('install', (event) => {
event.waitUntil(
caches.open('v1').then((cache) => {
return cache.addAll([
'/',
'/styles/main.css',
'/scripts/main.js',
])
})
)
})
Runtime caching happens dynamically as requests come in. You cache responses as they’re fetched, so frequently accessed resources end up in cache naturally over time.
For most platforms, you want both. Pre-cache your critical shell, runtime cache everything else.
What Not to Cache
This is the part that trips people up. Caching the wrong things causes bugs that are genuinely hard to debug in production.
Never aggressively cache:
- Authentication tokens or session data
- Payment and transaction endpoints
- User-specific personalization data
- Anything that changes per user or per session
- A/B test configurations if they need to be real-time
I’ve seen teams cache API responses that included user-specific entitlements. The result was users seeing content they shouldn’t have access to, or not seeing content they’d just purchased. At scale, that’s not just a bug. It’s a trust problem.
When in doubt, don’t cache it, or use no-cache so at least revalidation happens.
Cache Invalidation - The Hard Part
There’s a famous saying in computer science: there are only two hard things, naming things and cache invalidation.
It’s funny because it’s true.
For static assets, content-hashed filenames solve this completely. New deploy, new hash, new URL, new cache entry. Old one expires naturally.
For API responses and service worker caches, you need a versioning strategy.
const CACHE_VERSION = 'v2'
self.addEventListener('activate', (event) => {
event.waitUntil(
caches.keys().then((cacheNames) => {
return Promise.all(
cacheNames
.filter((name) => name !== CACHE_VERSION)
.map((name) => caches.delete(name))
)
})
)
})
Every time you deploy, bump the cache version. The activate event cleans up old caches. Users get fresh data on their next visit without you having to manually purge anything.
Caching at Scale - What Actually Changes
When you’re building for 10 million users, the fundamentals don’t change. But the consequences of getting it wrong are amplified significantly.
A few things I’ve learned from operating at that scale:
Measure before you optimize. Use Chrome DevTools, Lighthouse, and your RUM (Real User Monitoring) data to understand where your actual cache hit rates are. Don’t guess.
CDN caching and browser caching are different layers. Your CDN has its own cache headers, often separate from what the browser sees. Understand both. Misconfiguring your CDN can mean millions of users bypassing the browser cache entirely.
Cache stampedes are real. When a popular cached resource expires simultaneously for millions of users, they all hit your server at once. Stale-while-revalidate and jittered expiry times help prevent this.
Monitor your cache hit ratio. If it’s low, you’re leaving performance on the table. If it’s too high and you’re seeing stale data complaints, your TTLs are too aggressive.
Final Thoughts
Caching is not a set-it-and-forget-it feature. It’s an ongoing engineering decision that touches performance, correctness and user experience all at once.
Start with HTTP headers and get those right. Layer in stale-while-revalidate for the right resources. Add service workers when you need offline support or more granular control. And always think about invalidation before you think about caching.
The developers who get this right aren’t the ones who know the most cache directives. They’re the ones who ask the right question first.
What is the cost of serving stale data here?
Answer that honestly for every resource, and the rest follows.
Have thoughts or questions on frontend caching? Drop them in the comments, always happy to discuss.
This article is part of the Frontend at Scale series.
Top comments (0)