Why Your Service Worker Cache Is Silently Breaking Your Offline Mode

#pwa #webdev #programming #serviceworker

A Progressive Web App that promises offline support and only mostly delivers is worse than one that promises nothing. Users learn to distrust the offline indicator after the first time it lies to them, and the entire feature stops being a feature. The most common cause of this kind of silent failure is a service worker cache that does not actually contain what the app needs to operate offline.

This piece walks through the patterns that produce the silent break and the techniques that prevent it.

Photo by İsmail Enes Ayhan on Unsplash

The silent break, in one sentence

Your service worker caches what the user has already visited, which is usually less than what the user will try to visit while offline. The mismatch is silent because there are no errors; the app simply fails to load assets it needs and falls back to whatever skeleton or error state it has, often without the user understanding why.

The fix is not to cache more aggressively. The fix is to be deliberate about what gets cached, when, and from what trigger. Most service worker caches end up reactive: the first time the user visits a route, that route's assets are cached. The second time, they are served from cache. The user never visits the third route offline because they have not visited it online yet either.

Step one: precache the critical shell

The most reliable pattern is to precache the application shell at install time. The shell is the set of assets the app needs to start: the entry HTML, the main JavaScript bundle, the main CSS, any fonts, and any images that are part of the always-visible UI.

Precaching is done in the service worker's install event:

self.addEventListener('install', (event) => {
  event.waitUntil(
    caches.open('app-shell-v1').then((cache) => {
      return cache.addAll([
        '/',
        '/main.js',
        '/main.css',
        '/fonts/inter.woff2',
        '/icons/logo.svg',
      ])
    })
  )
})

The shell is loaded once, kept in cache forever (or until the version is bumped), and serves as the foundation for offline operation. Users visiting any route within the app can get to a working shell even on a cold offline visit, because the shell was cached at install time, not on first visit.

For the deeper offline-tolerant data fetching that needs to layer on top of this, the broader pattern lives in the Workbox documentation and the MDN service worker reference, which together cover the most common runtime caching strategies.

Step two: distinguish between cache-first and network-first routes

Once the shell is in place, runtime caching of API responses and dynamic content has to choose a strategy per route. The two that come up most often:

Cache-first means the service worker checks the cache first and only falls back to the network on miss. Good for assets that rarely change (avatars, product images, font files). Bad for API responses where staleness matters.

Network-first means the service worker tries the network first and falls back to the cache only on network failure. Good for API responses where freshness matters more than offline support. The user gets fresh data when online and the last-known data when offline.

A third pattern, stale-while-revalidate at the service worker layer, returns the cached response immediately and fires a background fetch that updates the cache for next time. This is the same idea as the HTTP-level stale-while-revalidate directive but implemented in JavaScript with more control.

The trap is using cache-first for API responses without thinking. The user updates their profile, the server stores the change, the service worker keeps returning the cached old profile because the cache hit comes back before the network attempt. The offline mode "works" in that the UI loads, but the data is wrong.

Step three: handle the cache version bump

Service worker caches are versioned. When you ship a new version of the app, the new service worker installs alongside the old one and the old caches stay around until something explicitly cleans them up. If you do not handle this, the user's browser accumulates dead caches forever, and storage quota eventually evicts good caches before bad ones.

The pattern is to clean up old caches in the activate event:

self.addEventListener('activate', (event) => {
  const allowList = ['app-shell-v2', 'api-cache-v1']
  event.waitUntil(
    caches.keys().then((keys) =>
      Promise.all(
        keys.filter((key) => !allowList.includes(key)).map((key) => caches.delete(key))
      )
    )
  )
})

Bump the version string on every release that changes the shell. The activate handler deletes everything that is not on the current allowlist, freeing storage for whatever the new version needs.

Photo by Anete Lusina on Pexels

Step four: detect quota pressure before it bites

Browsers limit the total storage a single origin can use. The limit varies by browser and by available disk space, but it is rarely as much as the application author assumes. When the limit is reached, the browser silently evicts entries. The app stops working offline, and the team has no signal that anything is wrong.

The fix is to estimate quota usage at runtime and log when it approaches the limit:

if ('storage' in navigator && 'estimate' in navigator.storage) {
  navigator.storage.estimate().then((estimate) => {
    const percent = (estimate.usage / estimate.quota) * 100
    if (percent > 80) {
      console.warn(`Storage at ${percent.toFixed(1)}% of quota`)
    }
  })
}

In production, this signal should go to your error tracking. Quota pressure is a leading indicator of impending offline breakage; catching it early lets the team prune caches before the user notices.

"Offline mode is not a feature you ship and forget. It is an ongoing operational commitment, and the silent failure modes deserve as much attention as the active ones." - Dennis Traina, founder of 137Foundry

Step five: test offline mode in CI

The hardest part of shipping reliable offline support is testing it. Manual testing is unreliable because developers always remember to visit the route online first, which warms the cache, which masks the bug.

The fix is to run an automated test that exercises the offline flow from a cold state. Playwright and similar tools let you simulate an offline network condition and verify that key routes still work. The test should:

Launch a fresh browser context (no cached state).
Navigate to the app once online to install the service worker and precache the shell.
Force the browser into offline mode.
Navigate to each route that should work offline and verify the rendered output.

If any route fails, the precache list is incomplete or a runtime caching strategy is wrong. The test catches the bug before users see it.

The web development team at 137Foundry treats this as a default part of any PWA we ship with offline support. Without it, the offline mode is theoretical, and the user complaints arrive eventually.

A small observability note

One more layer worth mentioning: instrument the service worker itself. Add logging to every fetch event handler that records whether the response came from cache or network, and what the cache name was. Aggregate these logs to your observability backend.

The metric to watch is the cache hit rate per route. A healthy precache hit rate is close to 100% for shell routes; a healthy runtime cache hit rate is 50% or more for routes the user visits repeatedly. A sudden drop indicates a regression, usually that a recent code change has broken the cache key or the precache list.

The service worker context is awkward for logging because it does not have direct access to the rest of the app's logging infrastructure. The pattern that works is to post a message from the service worker to the main thread with the log payload, and let the main thread forward it to the observability backend. A few lines of code, and the offline mode now has the same observability story as the rest of the app.

For the broader caching architecture, including how the service worker cache layer interacts with HTTP-level caching, the 137Foundry article on browser API caching walks through the decision matrix in detail. The web development service page covers the related architectural work we do for clients building PWAs.