Why I moved my Transformers.js pipeline out of the chrome MV3 service worker and into an Offscreen Document

#architecture #machinelearning #typescript #webdev

I'm building VectorTrace, an open source chrome extension that generates semantic embeddings of scraped web elements to detect layout changes.
On Day 1, I hit a problem that took a few hours to fully understand and i want to document the fix because I couldn't find a clean writeup for it.

The plan that didn't work

The original architecture planned to run the Transformers.js embedding
pipeline (all-MiniLM-L6-v2 via ONNX WASM) directly in the Chrome MV3
background service worker, with a lazy re-init guard:

let pipeline: Pipeline | null = null;

async function getEmbeddingPipeline() {
  if (!pipeline) {
    pipeline = await /* init transformers.js */;
  }
  return pipeline;
}

This looks fine. The problem is the premise it's built on

The MV3 service worker problem

Chrome's MV3 service workers are ephemeral. Chrome suspends them after roughly 30 seconds of inactivity and can terminate them at any time.
When the service worker is suspended, every in-memory object including a loaded Transformers.js pipeline with a 22MB ONNX model is garbage collected

The lazy re-init guard handles this but it means that every time a user opens the extension after a few minutes of inactivity, the extension has to re-parse and re-initialize a 22MB model before doing anything. On midrange hardware that takes 2-4 seconds

There's a second problem: WASM in MV3 service workers. MV3's default CSP restricts wasm-unsafe-eval You can add it, but ONNX Runtime Web also tries to spawn internal worker threads and fetch additional WASM files. In a service worker context those fetches can fail depending on how the extension's web-accessible resources are configured and debugging silent WASM failures in a service worker is not fun

The fix: Chrome Offscreen Document

Chrome's Offscreen Document API (added in Chrome 109) lets an extension create a hidden document that runs in a full browser context not a restricted service worker environment it persists in memory as long as the extension needs it

This is the correct place to run WASM. The document context means:

No suspension between uses
Full browser API support
WASM, WebGL, SharedArrayBuffer work as expected
The loaded pipeline stays resident in memory

The architecture becomes:

Content Script
↓ chrome.runtime.sendMessage
Background Service Worker
↓ chrome.runtime.sendMessage
Offscreen Document

The service worker's only job is to create the offscreen document when needed and route messages to it.

One hard constraint to know

Chrome enforces one Offscreen Document per extension. If you need multiple ONNX models (I'm planning to add a visual layout segmentation model later), they all have to load in the same document. I structured the offscreen script with a keyed pipeline registry from the start:

const pipelines: Record<string, Pipeline> = {};

async function getPipeline(task: string, modelId: string) {
  const key = `${task}:${modelId}`;
  if (!pipelines[key]) {
    pipelines[key] = await pipeline(task, modelId, {
      dtype: 'q8',
      device: 'wasm',
    });
  }
  return pipelines[key];
}

dtype: 'q8' gives INT8 quantization — the model drops from ~90MB to
~22MB with minimal accuracy loss for sentence similarity tasks.

This is Day 1 of building VectorTrace in public. The repo is at https://github.com/SathiyaSenpai/VectorTrace.
If you've hit MV3 WASM issues before, I'd be curious how you solved it.