DEV Community

Hamdi LAADHARI
Hamdi LAADHARI

Posted on

Chrome Extension Network Interception: The Modern Way to Scrape Instagram (and Beyond)

How to reliably extract data from Instagram (and similar sites) by intercepting network calls in a Chrome extension—robust, minimal, and production-ready.


Why Network Interception?

Traditional DOM scraping is fragile: UI changes, dynamic loading, and anti-bot measures make it unreliable. Instead, intercepting network calls lets you capture the raw data the site uses—before it even hits the DOM.

Advantages:

  • Immune to UI/layout changes
  • Access to complete, structured data
  • Faster (no DOM parsing)
  • Stealthier (less likely to trigger anti-bot)
  • Works for any site using XHR/fetch APIs

Architecture Overview

┌─ Page Context (Script Injection) ─┐
│   Patch fetch/XHR, intercept API  │
│   Dispatch CustomEvent w/ data    │
└───────────────────────────────────┘
           │
           ▼
┌─ Content Script ──────────────────┐
│   Listen for events, forward data │
│   to background via messaging     │
└───────────────────────────────────┘
           │
           ▼
┌─ Background Script ───────────────┐
│   Process, deduplicate, store,    │
│   or forward to external API      │
└───────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Minimal, Robust Implementation

1. Manifest Setup (MV3)

{
  "manifest_version": 3,
  "name": "Instagram Network Interceptor",
  "version": "1.0.0",
  "permissions": ["scripting", "storage"],
  "host_permissions": ["https://*.instagram.com/*"],
  "background": { "service_worker": "background.js" },
  "content_scripts": [
    {
      "matches": ["https://*.instagram.com/*"],
      "js": ["content.js"],
      "run_at": "document_start"
    }
  ],
  "web_accessible_resources": [
    {
      "resources": ["public/page-interceptor.js"],
      "matches": ["<all_urls>"]
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

2. Page Interceptor (public/page-interceptor.js)

Runs in the page context, patches fetch/XHR, and dispatches a CustomEvent with the response data.

(function () {
  if (window.__instaInterceptor) return;
  window.__instaInterceptor = true;

  const originalFetch = window.fetch;
  window.fetch = async function (input, init) {
    const resp = await originalFetch.apply(this, arguments);
    try {
      const url = typeof input === 'string' ? input : input.url || '';
      if (url.includes('/graphql') || url.includes('/api/')) {
        const cloned = resp.clone();
        const text = await cloned.text();
        window.dispatchEvent(new CustomEvent('instaApi', {
          detail: { url, body: text, status: resp.status, method: init?.method || 'GET' }
        }));
      }
    } catch (err) { /* ignore */ }
    return resp;
  };

  const originalXHROpen = XMLHttpRequest.prototype.open;
  const originalXHRSend = XMLHttpRequest.prototype.send;
  XMLHttpRequest.prototype.open = function (method, url) {
    this._method = method;
    this._url = url;
    return originalXHROpen.apply(this, arguments);
  };
  XMLHttpRequest.prototype.send = function () {
    this.addEventListener('load', function () {
      if (this._url && this._url.includes('/api/')) {
        window.dispatchEvent(new CustomEvent('instaApi', {
          detail: { url: this._url, body: this.responseText, status: this.status, method: this._method }
        }));
      }
    });
    return originalXHRSend.apply(this, arguments);
  };
})();
Enter fullscreen mode Exit fullscreen mode

3. Content Script (content.js)

Injects the interceptor and relays events to the background script.

// Inject the page script
const url = chrome.runtime.getURL('public/page-interceptor.js');
const script = document.createElement('script');
script.src = url;
script.type = 'text/javascript';
document.head.appendChild(script);

// Listen for intercepted data
window.addEventListener('instaApi', (evt) => {
  chrome.runtime.sendMessage({ type: 'API_PAYLOAD', payload: evt.detail });
});
Enter fullscreen mode Exit fullscreen mode

4. Background Script (background.js)

Processes, deduplicates, and stores or forwards the data.

const processedShortcodes = new Set();

chrome.runtime.onMessage.addListener((msg, sender) => {
  if (msg.type === 'API_PAYLOAD') {
    try {
      const json = JSON.parse(msg.payload.body);
      // Example: extract Instagram post shortcode
      const shortcode = extractShortcode(json);
      if (!processedShortcodes.has(shortcode)) {
        processedShortcodes.add(shortcode);
        // Store, process, or forward to external API here
        chrome.storage.local.set({ [Date.now()]: json });
      }
    } catch (e) {
      // Non-JSON or irrelevant response
    }
  }
});

function extractShortcode(json) {
  // Minimal example: adapt to your API structure
  return json?.data?.shortcode || '';
}
Enter fullscreen mode Exit fullscreen mode

Best Practices & Gotchas

  • Always clone() responses before reading body.
  • Deduplicate by post ID/shortcode to avoid reprocessing.
  • Rate limit if forwarding to external APIs.
  • Handle CSP by injecting a file, not inline code.
  • Respect privacy and ToS: only process public data, implement delays, and avoid logging sensitive info.

Extending to Other Platforms

You can adapt the same pattern for Twitter, LinkedIn, etc., by changing the URL detection logic in the interceptor.

const isTargetApi = (url) =>
  url.includes('instagram.com/graphql') ||
  url.includes('twitter.com/i/api') ||
  url.includes('linkedin.com/voyager');
Enter fullscreen mode Exit fullscreen mode

Conclusion

Network interception in the page context is the most robust, future-proof way to extract data from modern web apps. With a minimal Chrome extension, you can capture structured data directly from the source—no more brittle DOM scraping.

Top comments (0)