Sanja Stepa

Posted on Mar 14

The Chrome MV3 Audio Problem: How I Built a Tweet Reader With Offscreen Documents

#webdev #twitter #podcast #programming

If you're building a Chrome MV3 extension that needs to play audio, you're going to hit a wall. Service workers, which replaced background pages in MV3, can't play audio. No DOM access means no Audio objects, no <audio> elements, nothing.

I ran into this building Xeder, a Chrome extension that reads your X/Twitter feed aloud using text-to-speech. Audio playback is literally the core feature. Here's how I solved it.

The Architecture Problem

In Chrome MV3, your extension has three possible execution contexts:

Content Script: runs on the web page (x.com in my case). Has DOM access but is sandboxed from the extension's background.
Service Worker: the extension's "brain." Coordinates everything but has no DOM. It's event-driven and can be terminated at any time.
Popup/Options pages: only exist when the user opens them. Not useful for continuous audio.

None of these are ideal for audio playback:

Content script could play audio, but it's tied to the page lifecycle and would interfere with the page's own audio/media.
Service worker literally cannot.
Popup closes when the user clicks away.

The Solution: Offscreen Documents

Chrome MV3 introduced offscreen documents specifically for this kind of problem. An offscreen document is a hidden HTML page that your extension creates for specific purposes - including audio playback.

Here's the flow I ended up with:

Content Script (x.com)
    ↓ scrapes tweets, sends text
Service Worker
    ↓ coordinates, sends to offscreen doc
Offscreen Document (hidden)
    ↓ calls TTS API, plays audio
    ↑ reports playback state back

Creating the Offscreen Document

// In your service worker (background.js)
async function ensureOffscreenDocument() {
  const existingContexts = await chrome.runtime.getContexts({
    contextTypes: ['OFFSCREEN_DOCUMENT']
  });

  if (existingContexts.length > 0) return;

  await chrome.offscreen.createDocument({
    url: 'offscreen.html',
    reasons: ['AUDIO_PLAYBACK'],
    justification: 'Playing TTS audio for tweet reading'
  });
}

The Offscreen Document

<!-- offscreen.html -->
<!DOCTYPE html>
<html>
<body>
  <script src="offscreen.js"></script>
</body>
</html>

// offscreen.js
let currentAudio = null;

chrome.runtime.onMessage.addListener((message, sender, sendResponse) => {
  if (message.type === 'PLAY_AUDIO') {
    playAudio(message.audioData);
  }
  if (message.type === 'PAUSE_AUDIO') {
    if (currentAudio) currentAudio.pause();
  }
  if (message.type === 'RESUME_AUDIO') {
    if (currentAudio) currentAudio.play();
  }
});

async function playAudio(base64Audio) {
  // Convert base64 to blob
  const audioBlob = base64ToBlob(base64Audio, 'audio/mp3');
  const audioUrl = URL.createObjectURL(audioBlob);

  currentAudio = new Audio(audioUrl);

  currentAudio.onended = () => {
    URL.revokeObjectURL(audioUrl);
    // Tell the service worker we're ready for the next tweet
    chrome.runtime.sendMessage({ type: 'AUDIO_ENDED' });
  };

  currentAudio.onerror = (e) => {
    chrome.runtime.sendMessage({ type: 'AUDIO_ERROR', error: e.message });
  };

  await currentAudio.play();
}

Message Passing: The Glue

The service worker orchestrates everything through chrome.runtime.sendMessage and chrome.runtime.onMessage:

// Service worker receives tweet text from content script
chrome.runtime.onMessage.addListener((message, sender, sendResponse) => {
  if (message.type === 'SYNTHESIZE_TWEET') {
    handleTweetSynthesis(message.text, message.licenseKey);
  }
  if (message.type === 'AUDIO_ENDED') {
    // Tell content script to advance to next tweet
    chrome.tabs.sendMessage(sender.tab?.id, { type: 'NEXT_TWEET' });
  }
});

async function handleTweetSynthesis(text, licenseKey) {
  await ensureOffscreenDocument();

  // Call your TTS backend (don't put API keys in extension code!)
  const response = await fetch('https://your-backend.com/synthesize', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ text, licenseKey })
  });

  const { audio } = await response.json(); // base64 audio

  // Send to offscreen document for playback
  chrome.runtime.sendMessage({
    type: 'PLAY_AUDIO',
    audioData: audio
  });
}

The Content Script: Shadow DOM for Clean UI

The player widget is injected into x.com using Shadow DOM. This is critical — without Shadow DOM, your extension's CSS would clash with X's styles (and vice versa).

// content.js
const host = document.createElement('div');
host.id = 'xeder-root';
const shadow = host.attachShadow({ mode: 'closed' });

// Your styles are completely isolated
const style = document.createElement('style');
style.textContent = `/* your widget CSS here */`;
shadow.appendChild(style);

// Build your UI inside the shadow root
const widget = document.createElement('div');
widget.className = 'xeder-widget';
// ... build out the player UI
shadow.appendChild(widget);

document.body.appendChild(host);

Key Gotchas

1. Offscreen documents have a single-purpose rule. You need to specify a reason when creating one, and Chrome expects you to use it for that purpose. AUDIO_PLAYBACK is the one you want.

2. Service workers can be terminated. If your service worker goes idle, Chrome can kill it. You need to handle reconnection — when it wakes back up, check if your offscreen document still exists.

3. Don't put API keys in extension code. Chrome extensions are just zip files. Anyone can unpack them and read your source. Proxy all API calls through a backend. I use Firebase Cloud Functions.

4. Message passing is async. Don't assume messages arrive instantly or in order. Build your state machine accordingly.

The "I Can't Code" Part

Full disclosure: I'm a UX designer. I designed all of this - the architecture, the flows, the UI specs - and used Claude (AI) to write the actual code. The architecture diagrams and edge case handling came from me. The JavaScript came from Claude.

The interesting thing: the design work was harder than the coding. Understanding what to build and how the pieces fit together required deep research into Chrome's extension APIs. Actually writing the code was the easier part.

The result: Xeder is live on the Chrome Web Store. It reads your X feed to you. It works. And I still can't write a for loop from memory.

If you're building a Chrome MV3 extension that needs audio, the offscreen document pattern is probably what you need. The official docs are decent but light on real-world examples. Hope this helps.

Links:

DEV Community