What if we could make our video calls even better?
With softer colors, better resolution, a higher frame rate, and a more flattering angle, our video calls would be much more enjoyable. But the laptop webcam falls short of all these expectations, while the camera in our pocket already offers ten times better quality.
The challenge is how to use that phone camera for video calls without installing additional drivers, pushing video through another cloud server, or building a native app for the phone. Let's explore building a Chrome extension and a mobile PWA.
Make a website see the virtual camera 🖊️
Most video conferencing platforms rely on the same browser API, MediaDevices, to capture video. To inject a virtual camera into a website, two things need to happen: the website must believe there is a new device, and calls for that device should return a live stream from the phone's camera.
The camera must be injected before the website collects a list of devices. Therefore, the content script must be injected before any page scripts run, i.e., at document_start.
It should intercept navigator.mediaDevices and override both getUserMedia and enumerateDevices. The overridden enumerateDevices adds a virtual entry to the device list, so the new camera appears in any selector the page uses. The overridden getUserMedia checks the requested device ID: if it matches the virtual device, the call returns a stream from the phone; otherwise, it falls through to the original function.
As an additional measure, it is possible to dispatch an event about changing the device list. In some cases, if the site takes this event into account, it helps ensure the virtual camera is picked up.
What we have now:
export function monkeyPatchMediaDevices({fakeDeviceInfo, onFakeSelected}: PatchMediaDevices): void {
const mediaDevices = window.navigator.mediaDevices;
const enumerateDevicesFn = mediaDevices.enumerateDevices.bind(mediaDevices);
const getUserMediaFn = mediaDevices.getUserMedia.bind(mediaDevices);
mediaDevices.enumerateDevices = async () => {
const devices = await enumerateDevicesFn();
return [...devices, fakeDeviceInfo];
};
mediaDevices.getUserMedia = async (constraints) => {
if (checkDeviceConstraints(constraints, fakeDeviceInfo)) {
return onFakeSelected(constraints);
}
return await getUserMediaFn(constraints);
};
mediaDevices.dispatchEvent(new Event('devicechange'));
}
The end result: On any page where the extension has permission to run, the virtual camera appears as an option in a browser-based video app, without any site-specific integration.
WebRTC without a dedicated server 🍏
WebRTC requires signaling – a way for two peers to exchange connection offers, answers, and ICE candidates before they can communicate directly. What if we cannot afford a dedicated server?
An alternative is to use Google Drive's appdata folder as a mailbox. Both the desktop extension and the mobile PWA write and read signaling messages to the same hidden folder, which is invisible to users and accessible only by the extension.
The exchange follows a request-response pattern:
Desktop Drive Mobile
┃ ┃ ┃
┃ 1. Write offer ┃ ┃
┃ └────────────────────> ┃ ┃
┃ ┃ ┃
┃ 2. QR (fileId+token) ┃ ┃
┃ └──────────────────────╂───────┐ ┃
┃ ┃ │ ┃
┃ 3. Poll for answer ┃ │ ┃
┃ ├────────────────────> ┃ │ ┃
┃ ┊ ┃ │ ┃
┃ ┊ ┃ │ ┃ 4. Scan QR
┃ ┊ ┃ └──────────────> ┃
┃ ┊ ┃ ┃
┃ ┊ ┃ ┃ 5. Read offer
┃ ┊ ┃ ─────────────────────> ┃
┃ ┊ ┃ ┃
┃ ┊ ┃ ┃ 6. Write answer
┃ ┊ ┃ <──────────────────────╂─┘
┃ ˅ ┃ ┃
┃ 7. Read answer ┃ ┃
┃ <───────────────────── ┃ ┃
┃ ┃ ┃
┃ 8. Delete all files ┃ ┃
┃ └────────────────────> ┃ ┃
┃ ┃ ┃
To avoid a separate authorization step on the mobile side and ensure that access to the file will be preserved, a Google OAuth token can be embedded directly into the QR code. This means that the QR code contains everything needed to authenticate with Drive, so scanning the QR code is the only action required to establish a connection. A new QR scan is needed for each new connection, but after the WebRTC handshake is complete, the connection can continue without interruption for any duration.
/**
* Offerer uploads its description and calls back the business part
*/
public async sendOffer(description: "RTCSessionDescriptionInit): Promise<void> {"
const fileCreationResponse = await driveGapi.files.create({
media: new Blob([JSON.stringify(description)], {type: 'application/json'}),
name: `${this.options.sessionId}-offer.json`,
mimeType: 'application/json',
parents: ['appDataFolder'],
});
this.options.onDescriptionCreated(fileCreationResponse.id);
}
/**
* Answerer downloads the offer and returns it to the business part
*/
public async checkOffer(): Promise<RTCSessionDescriptionInit> {
const offerFileBlob = await driveGapi.files.download(this.options.driveId);
const offerFileJson = await offerFileBlob.text();
return JSON.parse(offerFileJson) as RTCSessionDescriptionInit;
}
/**
* Answerer uploads its description
*/
public async sendAnswer(description: "RTCSessionDescriptionInit): Promise<void> {"
const offerFileMeta = await driveGapi.files.get(this.options.driveId);
const fileCreationResponse = await driveGapi.files.create({
media: new Blob([JSON.stringify(description)], {type: 'application/json'}),
name: offerFileMeta.name.replace('offer', 'answer'),
mimeType: 'application/json',
parents: ['appDataFolder'],
});
}
/**
* Offerer downloads the answer and returns it to the business part
*/
public async checkAnswer(): Promise<RTCSessionDescriptionInit> {
const listFilesResponse = await driveGapi.files.list({
pageSize: 2,
query: `name contains '${this.options.sessionId}'`,
spaces: 'appDataFolder',
});
const listFiles = listFilesResponse?.files ?? [];
const answerFileMeta = listFiles.find((file) => file.name.endsWith('-answer.json'));
if (answerFileMeta) {
const answerFileBlob = await driveGapi.files.download(answerFileMeta.id);
const answerFileJson = await answerFileBlob.text();
const answerDescription = JSON.parse(answerFileJson) as RTCSessionDescriptionInit;
for (const file of listFiles) {
await driveGapi.files.delete(file.id);
}
return answerDescription;
}
return null;
}
Stick it together 🍍
To avoid making the user scan a QR code for every call, we can keep a long-lived WebRTC connection alive and reuse it. The ideal place for this connection would be the service worker. Since WebRTC cannot be used there, it's necessary to set up offscreen.
let offscreenSettingUp: Promise<void>;
export async function setupOffscreenDocument(filename = 'offscreen.html'): Promise<void> {
const offscreenUrl = chrome.runtime.getURL(filename);
if (await hasOffscreenDocument(offscreenUrl)) {
return Promise.resolve();
}
await (offscreenSettingUp ||= chrome.offscreen.createDocument({
reasons: ['WEB_RTC'],
url: offscreenUrl,
justification: 'WebRTC stuff',
}));
offscreenSettingUp = null;
}
However, while there is no way to stream video from offscreen to content script, the only solution is to connect content script to mobile directly. The offscreen document assists with the offer/answer exchange but stays out of the media path.
The architecture that makes it work:
┌────────────┐ ┌────────────────┐ ┌────────────────────┐
│ Side Panel ├───> │ Service Worker │ <───┤ Offscreen Document │
│ (React) │ <───┤ (dispatcher) ├───> │ (WebRTC host) │
└────────────┘ └───────┬────────┘ └───────┬────────────┘
│ ▲ │ ▲
│ │ │ │
│ │ │ │
▼ │ ▼ │
┌─────────┴──────┐ ┌─────────┴──────┐
│ Content Script │ <───┤ Mobile │
│ (monkey-patch) │ │ (camera) │
└────────────────┘ └────────────────┘
Side Panel: The UI. A React component that displays the Google Sign-in, the QR code, and device status. All inputs go to the service worker.
Service Worker: A lightweight message dispatcher. This component never touches video. It creates an offscreen document, stores user settings, and proxies messages between the side panel, the offscreen document, and the content script(s).
Offscreen Document: The hidden WebRTC manager. It owns the long-term RTCPeerConnection and uses it to establish new ephemeral connections.
Content Script: Injected into each page. Its only job is to inject the virtual camera with the phone stream, using the monkey patch described above. The script receives the stream from the mobile device and exposes it to the page.
// Configuring peer for receiving video
peer.addTransceiver('video', {direction: 'recvonly'});
peer.addEventListener('track', ({transceiver}) => {
const track = transceiver.receiver.track;
track.addEventListener('mute', () => {
this.options.onStream(null);
});
track.addEventListener('unmute', () => {
this.options.onStream(new MediaStream([track]));
});
});
// Setting up the stream in the video element
public setStream(targetStream?: MediaStream): void {
if (targetStream) {
const videoElement = window.document.createElement('video');
videoElement.srcObject = targetStream;
videoElement
.play()
.then(() => (this.targetVideo = videoElement));
return;
}
this.disposeTargetStream();
}
// Drawing the video element or a stub animation
private loop(streamOptions: FakeMediaStreamOptions): void {
window.requestAnimationFrame((time) => {
if (this.targetVideo) {
this.canvas.drawVideo(this.targetVideo);
} else {
this.canvas.drawStub(time);
}
this.loop(streamOptions);
});
}
What this architecture enables
The solutions chosen in each problem section produced capabilities that weren't designed independently. Instead, they emerged from the overall architecture.
Minimal permissions. The OAuth token stored in the QR code only has two scopes: read and write access to the app's own hidden Drive folder. It doesn't have permission to see files outside that folder or access email, contacts, or other user data.
WebRTC provides built-in security guarantees. Media streams are end-to-end encrypted regardless of the signaling method. The video stays inside the user's WiFi network or, in a fully wired setup, is transferred via USB tethering, which provides a stable wired connection that bypasses Wi-Fi entirely.
Flexibility beyond the webcam. The peer-to-peer nature of the connection allows for any type of data to be transmitted through the same channel. This includes images, screen captures, and even completely different video sources, which can replace the camera feed without affecting the signaling or transport. The architecture views video as just another type of payload among many.
Cross-platform compatibility. The desktop side is a Chrome extension, meaning it works on all operating systems that support a Chromium-based browser, such as Windows, macOS, Linux, and ChromeOS. The mobile side is a PWA, which doesn't require installation from an app store, and works on both Android and iOS. The only exceptions are platform-specific WebRTC behaviors (for example, how iOS Safari handles background streaming differently), but the core connection model remains consistent across all platforms.
The concrete result of this path is Extroid, a Chrome extension that turns your phone into a wireless webcam for any browser-based video call. It works without the need for cloud servers or drivers, and there's no need for a native app on your phone. I'd love to hear your feedback!
Top comments (0)