minoblue

Posted on Nov 26

Browser Storage Deep Dive: Cache vs IndexedDB for Scalable PWAs

#webdev #javascript #performance #architecture

Tools for Different Jobs

The browser provides several persistent storage mechanisms, but Cache Storage API and IndexedDB are the workhorses for modern offline-first strategies.

Cache Storage API: Use to store HTTP Responses (HTML, JS, CSS, images). It's a key-to-response store, primarily managed by the Service Worker.
IndexedDB: Use to store structured, large, queryable data (e.g., product catalogs, chat messages, complex application state). It's a full NoSQL database in the browser.
Service Workers: The orchestrator, acting as a network proxy to intercept requests and decide whether to serve from the Network, Cache, or IndexedDB.

1. What Each Storage Mechanism Is (Simple Definitions)

Mechanism	Description	Primary Use Case
Cache Storage API	A key to response map, accessible by Service Workers, for storing entire HTTP responses.	Static assets (JS/CSS/images), offline shell (HTML).
IndexedDB	An asynchronous, transactional, NoSQL database for storing structured objects and large datasets.	Product data, user messages, complex offline data store.
Service Worker	A background JavaScript script that intercepts network requests (`fetch` events) and handles background tasks (push, sync).	Network interception, caching logic, data synchronization.
`localStorage`	A synchronous, small key $\to$ string store. Not for large or mission-critical offline data.	User preferences, simple feature flags.

2. Why Both Exist: Purpose & Strengths

Feature	Cache API	IndexedDB
What to Store	Whole HTTP responses (files)	Structured objects, key-value pairs, Blobs.
Typical Use	Fast asset loading, offline navigation.	Complex data operations (search, filtering, partial updates).
Querying	No (simple match by request/URL).	Yes (indexes, key range queries, cursors).
Size & Scale	Good for assets; eviction rules browser-defined.	Great for huge datasets (MBs–GBs) with better quota.
API Type	Promise-based request/response.	Transactional, object store, indexes.

3. Storage Internals: Disk, RAM, and Database Engines

Both Cache and IndexedDB are persistent to disk to survive reloads and restarts.

💿 IndexedDB Backends

IndexedDB doesn't just store files; it uses real database engines on the user's disk:

Chromium-based Browsers (Chrome, Edge, Opera): Use LevelDB (a fast key/value store, often an LSM-tree).
Firefox and Safari: Use SQLite (a transactional, relational database, typically B-tree storage).

This strong backend is why IndexedDB supports durability, transactions, and fast, indexed lookups on large datasets.

🧠 Performance Layers

Browsers may keep frequently accessed parts of the Cache and IndexedDB in RAM for speed, but the disk is the authoritative, durable store.

4. Service Worker and Caching Strategies

The Service Worker is key. It handles the fetch event, allowing you to implement specific caching strategies.

Strategy	Logic	When to Use
Cache-First	Check cache; if present, return it; otherwise, fetch from the network.	Static assets (hashed JS/CSS), icons, app shell. Fast and stale-tolerant.
Network-First	Try network; if successful, return response and update cache; otherwise, fall back to cache.	Sensitive/volatile data (checkout, stock, auth). Prioritizes freshness.
Stale-While-Revalidate (SWR)	Return cached asset immediately, and then asynchronously fetch the network version to update the cache for the next time.	Feeds, listings, product pages. Best user experience (fast display, background freshness).

Cache-First Example (Service Worker Snippet):

event.respondWith(
  caches.match(event.request).then(resp => resp || fetch(event.request))
);

5. Practical Data Management: Cache vs. IndexedDB

Choosing the right store is crucial for efficiency and scaling.

🖼️ Use Cache API When:

You're storing a whole HTTP Response (including headers).
The data is a file-like asset (images, CSS).
You only need to retrieve the asset by its URL key.

🧩 Use IndexedDB When:

You need to store structured objects (JSON objects, complex records).
The dataset is large (MBs/GBs).
You require querying (by indexes, ranges) or transactions.
You need partial updates or controlled eviction (e.g., removing the 100th oldest record).

Best Practice: For heavy paginated API responses (e.g., a product list), avoid storing the massive JSON blob in the Cache API. Instead, decomposed the data and store it in IndexedDB.

Pattern: Heavy Paginated APIs (Service Worker + IndexedDB)

This pattern solves the issue of duplicated or stale data in large API responses.

Service Worker intercepts the paginated API request (/api/products?page=N).
Network Success: The Service Worker parses the JSON.
- It saves each individual item (e.g., product object) into an items Object Store in IndexedDB.
- It saves a metadata record ({page: N, itemIds: [...]}) into a pages Object Store in IndexedDB.
- It returns the original network response to the client.
Network Failure/Offline:
- The Service Worker reads the page metadata from the pages store.
- It retrieves the individual items (products) by their IDs from the items store.
- It reconstructs the JSON response and returns it to the client.

This design enables:

No Duplication: Items are stored only once, even if they appear on multiple pages.
Easy Updates: A partial update from the server only requires updating the few changed item records in IndexedDB.
Querying: You can create an index on the items store for offline search/filtering.

6. Development Essentials

🔑 IndexedDB API & `idb` Wrapper

The vanilla IndexedDB API is verbose and callback-heavy. It's highly recommended to use a lightweight Promise-based wrapper like idb (or Dexie) to simplify transactions and object store operations.

IndexedDB Internal Structure (Simplified):
$$\text{Database} \to \text{Object Stores} \to \text{Indexes}$$

A Transaction is required for all read/write operations and ensures atomicity (all changes succeed or all fail).
Data is stored using the Structured Clone Algorithm, which handles Blobs, Maps, Sets, and circular references (better than JSON).

📝 Best Practices Checklist

Version Caching: Always version your static cache names (e.g., static-v2) and implement an activate listener in the Service Worker to clear old caches.
Cache Hashing: Use hashed filenames (app.1a2b3c.js) for static assets to ensure a long TTL (time-to-live) without serving stale code.
Quota Management: Be mindful of browser quotas. IndexedDB generally has a higher allowance than other storages. Implement logic to trim the oldest/least-used entries (e.g., remove the oldest 10 cached pages).
Feature Detect: Always check if ('serviceWorker' in navigator) before registering, and feature detect for advanced APIs like Background Sync or Push.

⚠️ Pitfalls to Avoid

Over-caching: Caching JavaScript without cache-busting (hashing) can lead to users running stale, broken code.
Heavy JSON in Cache: Storing large, complex JSON in the Cache is inefficient. It leads to disk bloat and overhead from parsing the full response object every time. Use IndexedDB for heavy objects.
Sensitive Data: Never store sensitive user data without proper encryption and explicit consent.

📰 Real-World Use Case: Large News Website PWA

The combination of Service Workers and IndexedDB is the cornerstone of building reliable, high-performance Progressive Web Apps (PWAs) for large-scale websites, especially those that rely heavily on frequently updated content like News/Media or E-commerce.

Here is a detailed, real-world use case using a major News/Media Website architecture, which must prioritize both asset speed and content freshness for offline reading. The goal for a large news site is to deliver the latest headlines quickly, allow users to read articles even when offline, and handle the vast volume of frequently changing article data efficiently.

1. Service Worker Strategy Overview

Resource Type	Storage Mechanism	Caching Strategy	Purpose
App Shell (HTML, JS, CSS, icons)	Cache API	Cache-First (pre-cached on install)	Fast, reliable initial load and UI rendering, even when offline.
Article Images (JPG, PNG)	Cache API	Stale-While-Revalidate (SWR)	Fast display from cache, update in background for next visit.
Article Data (JSON content)	IndexedDB	IndexedDB Logic (read/write in SW)	Storage for structured, queryable data for offline reading.
Live Endpoints (Login, Comments)	Network (No Cache)	Network-Only	Prioritizes absolute freshness and security for sensitive actions.

2. IndexedDB Schema and Data Flow

The key to efficiency is decomposition: storing article content granularly in IndexedDB, which allows for fast lookups and partial updates.

IndexedDB Structure (Simplified)

Database: news-db (Version 1)
Object Store 1: articles (Key: articleId)
- Stores: { articleId, headline, body, author, timestamp, isRead }
Object Store 2: feeds (Key: feedName, e.g., 'homepage', 'world-news')
- Stores: { feedName, articleIds: [...], fetchTime }

💡 The Cache-Then-Network + IDB Flow (For News Feed)

When the client requests /api/v1/feed?name=homepage:

A. Service Worker (`sw.js`) Code Snippet (High-Level)

This implements the Cache-Then-Network pattern using the IndexedDB utilities.

// Service Worker (using a library like Workbox or idb)

self.addEventListener('fetch', event => {
  const url = new URL(event.request.url);
  if (url.pathname.startsWith('/api/v1/feed')) {
    event.respondWith(handleFeedRequest(event.request));
  } 
  // ... other asset caching routes follow
});

async function handleFeedRequest(request) {
  const feedName = new URL(request.url).searchParams.get('name');

  // 1. Immediately look for data in IndexedDB (Cache-Then-Network)
  const cachedData = await loadFeedFromIDB(feedName);

  // 2. Fire network request in the background
  const networkPromise = fetch(request)
    .then(netRes => netRes.clone().json())
    .then(async data => {
      // 3. Decompose and store data from network response
      await saveArticlesAndFeedMeta(feedName, data.articles);
      return new Response(JSON.stringify(data), { status: 200, headers: { 'Content-Type': 'application/json' }});
    })
    .catch(err => {
      console.warn('Network failed, relying on IDB/Offline', err);
      // If network fails, return the cached data (if available)
      return cachedData;
    });

  // 4. Return cached data immediately if available, otherwise wait for network/offline fallback
  // This is the core SWR/Cache-Then-Network logic
  return cachedData || networkPromise;
}

B. The IndexedDB Utility: Writing (`saveArticlesAndFeedMeta`)

This part, run inside the Service Worker thread, handles the data decomposition and storage, ensuring atomicity via transactions.

// IndexedDB utility (pseudo-code)

async function saveArticlesAndFeedMeta(feedName, articles) {
  const db = await getDBConnection(); // Connects to 'news-db'
  const tx = db.transaction(['articles', 'feeds'], 'readwrite');

  const articleStore = tx.objectStore('articles');
  const feedStore = tx.objectStore('feeds');

  const articleIds = [];

  // 1. Save/Update each article individually in the 'articles' store
  for (const article of articles) {
    // This allows partial updates in the future
    articleStore.put(article); 
    articleIds.push(article.articleId);
  }

  // 2. Save the feed metadata in the 'feeds' store
  const feedMeta = { 
    feedName: feedName, 
    articleIds: articleIds, 
    fetchTime: Date.now() 
  };
  feedStore.put(feedMeta); // Key is feedName

  return tx.done; // Wait for transaction to complete (atomic commit)
}

C. The IndexedDB Utility: Reading and Reconstructing (`loadFeedFromIDB`) 🔑

This is the implementation of the function responsible for serving the cached content.

// IndexedDB utility (pseudo-code)

/**
 * Loads the feed metadata and reconstructs the full JSON response from individual article records.
 * @param {string} feedName - The name of the feed (e.g., 'homepage').
 * @returns {Promise<Response | null>} A reconstructed JSON Response object or null if data is missing.
 */
async function loadFeedFromIDB(feedName) {
  try {
    const db = await getDBConnection(); // IDB connection
    const tx = db.transaction(['articles', 'feeds'], 'readonly');

    const feedStore = tx.objectStore('feeds');
    const articleStore = tx.objectStore('articles');

    // 1. Get the feed metadata (list of article IDs)
    const feedMeta = await feedStore.get(feedName);

    if (!feedMeta || !feedMeta.articleIds || feedMeta.articleIds.length === 0) {
      return null;
    }

    const articles = [];

    // 2. Fetch individual article records using the stored IDs
    for (const articleId of feedMeta.articleIds) {
      const article = await articleStore.get(articleId);
      if (article) {
        articles.push(article);
      }
    }

    await tx.done;

    // 3. Reconstruct the final JSON response object structure
    const fallbackData = {
      source: 'IndexedDB Offline Cache',
      timestamp: feedMeta.fetchTime,
      articles: articles,
      isOffline: true, // Signal to the client that this data is offline-sourced
    };

    // Return a Response object mimicking the network response
    return new Response(JSON.stringify(fallbackData), { 
      status: 200, 
      headers: { 'Content-Type': 'application/json' }
    });

  } catch (error) {
    console.error(`Error loading ${feedName} from IDB:`, error);
    return null;
  }
}

3. Benefits and Advanced Techniques

This approach achieves several critical goals for a large PWA:

Offline Access: If the network fails, the loadFeedFromIDB function can reconstruct the entire feed page from the articles stored in IndexedDB.
Performance: The user sees the old content immediately (SWR) while the network fetches the new content in the background, significantly reducing perceived latency.
Storage Efficiency: Article content is normalized (stored only once). If an article is on the homepage and the sports page, only one copy exists in IndexedDB, preventing unnecessary disk bloat.
Background Sync: If a user submits a comment or a reaction while offline, the Service Worker can save the POST request payload to a separate IndexedDB store (e.g., outbox) and use the Background Sync API to automatically re-submit the request once the connection is restored.

🌐 Cache and IndexedDB Usage in Popular Web Platforms

1. WhatsApp Web (Messaging Service) 💬

Storage Type	Usage	Scenario-Based Reasoning
Cache Storage API	Low/Moderate. Stores the static PWA shell (HTML/JS/CSS bundles) and application icons.	These assets are static and rarely change, making the Cache API perfect for guaranteeing an instant UI load (`Cache-Only`).
IndexedDB	Critical/Heavy Use. Stores the entire message history locally, along with contact details, media pointers, and signal protocol keys.	Messages are structured data, often involve GBs of storage, and require fast, transactional lookups (searching chats, loading a thread by ID). The volume and necessity of queryable history dictates IndexedDB.
Service Worker	Core. Manages the persistent connection, intercepts API requests, and uses the Background Sync API to queue outgoing messages when the network is unstable.

2. Facebook / Instagram (Social Media Feed) 📸

Storage Type	Usage	Scenario-Based Reasoning
Cache Storage API	Heavy Use. Caching the primary application shell, common shared libraries, profile avatars, and reaction icons.	The UI should load instantly. A Cache-First strategy is used for all UI components to maximize perceived performance.
IndexedDB	High Use. Stores normalized feed data (post text, metadata, user details) and recent notifications.	Feeds change frequently. Data is decomposed (text is separate from images) and stored in IDB. The Service Worker uses Stale-While-Revalidate (SWR) on the feed API: it loads the stale data from IDB immediately, then fetches fresh data from the network in the background to update the store for the next visit.
Service Worker	Core. Implements the SWR strategy for feeds, manages the caching of dynamically loaded images, and handles notification delivery.

DEV Community

Browser Storage Deep Dive: Cache vs IndexedDB for Scalable PWAs

Tools for Different Jobs

1. What Each Storage Mechanism Is (Simple Definitions)

2. Why Both Exist: Purpose & Strengths

3. Storage Internals: Disk, RAM, and Database Engines

💿 IndexedDB Backends

🧠 Performance Layers

4. Service Worker and Caching Strategies

5. Practical Data Management: Cache vs. IndexedDB

🖼️ Use Cache API When:

🧩 Use IndexedDB When:

Pattern: Heavy Paginated APIs (Service Worker + IndexedDB)

6. Development Essentials

🔑 IndexedDB API & `idb` Wrapper

📝 Best Practices Checklist

⚠️ Pitfalls to Avoid

📰 Real-World Use Case: Large News Website PWA

1. Service Worker Strategy Overview

2. IndexedDB Schema and Data Flow

IndexedDB Structure (Simplified)

💡 The Cache-Then-Network + IDB Flow (For News Feed)

A. Service Worker (`sw.js`) Code Snippet (High-Level)

B. The IndexedDB Utility: Writing (`saveArticlesAndFeedMeta`)

C. The IndexedDB Utility: Reading and Reconstructing (`loadFeedFromIDB`) 🔑

3. Benefits and Advanced Techniques

🌐 Cache and IndexedDB Usage in Popular Web Platforms

1. WhatsApp Web (Messaging Service) 💬

2. Facebook / Instagram (Social Media Feed) 📸

Top comments (0)

Tools for Different Jobs

1. What Each Storage Mechanism Is (Simple Definitions)

2. Why Both Exist: Purpose & Strengths

3. Storage Internals: Disk, RAM, and Database Engines

💿 IndexedDB Backends

🧠 Performance Layers

4. Service Worker and Caching Strategies

5. Practical Data Management: Cache vs. IndexedDB

🖼️ Use Cache API When:

🧩 Use IndexedDB When:

Pattern: Heavy Paginated APIs (Service Worker + IndexedDB)

6. Development Essentials

🔑 IndexedDB API & idb Wrapper

📝 Best Practices Checklist

⚠️ Pitfalls to Avoid

📰 Real-World Use Case: Large News Website PWA

1. Service Worker Strategy Overview

2. IndexedDB Schema and Data Flow

IndexedDB Structure (Simplified)

💡 The Cache-Then-Network + IDB Flow (For News Feed)

A. Service Worker (sw.js) Code Snippet (High-Level)

B. The IndexedDB Utility: Writing (saveArticlesAndFeedMeta)

C. The IndexedDB Utility: Reading and Reconstructing (loadFeedFromIDB) 🔑

3. Benefits and Advanced Techniques

🌐 Cache and IndexedDB Usage in Popular Web Platforms

1. WhatsApp Web (Messaging Service) 💬

2. Facebook / Instagram (Social Media Feed) 📸

🔑 IndexedDB API & `idb` Wrapper

A. Service Worker (`sw.js`) Code Snippet (High-Level)

B. The IndexedDB Utility: Writing (`saveArticlesAndFeedMeta`)

C. The IndexedDB Utility: Reading and Reconstructing (`loadFeedFromIDB`) 🔑