typical pawel

Posted on Apr 4

How We Built a Free Browser-Based Screen Recorder

#mediakit #architecture

At Flat.social, we build tools for remote teams to connect, collaborate, and have fun together. Our platform already handles real-time video, spatial audio, and interactive games for virtual offices and events. So when we decided to add a free screen recorder to our toolkit, we had a head start on working with browser media APIs. But building a screen recorder that feels polished, works reliably across browsers, and never touches a server turned out to be a deeper engineering challenge than we expected.

This article walks through the technical decisions we made, the browser APIs we used, and the problems we solved along the way.

Why Build a Screen Recorder?

Our users already use Flat.social for remote team collaboration, online brainstorming, and running virtual workplaces. A common request we kept hearing was: "Can I record what's happening on my screen to share with teammates who couldn't make it?" People wanted to capture product demos, walkthroughs, and async updates without installing desktop software or signing up for yet another SaaS tool.

We also noticed that many existing free screen recorders come with strings attached. Some upload your video to their servers. Others slap a watermark on your recording or limit you to five minutes. We wanted something different: a tool that runs entirely in the browser, saves locally to the user's device, and has zero limits. No account, no cloud, no watermark.

The Core Architecture

The screen recorder is built on three browser APIs that do most of the heavy lifting.

Screen Capture API (getDisplayMedia) lets us request access to the user's screen, a specific application window, or a single browser tab. This is the same API that powers screen sharing in video conferencing tools like Zoom and Google Meet. When the user clicks "Start Recording," the browser shows its native picker dialog, and once the user selects a source, we get a MediaStream containing the screen's video track.

getUserMedia handles the webcam and microphone. We request camera and audio access separately from the screen capture, which gives us independent control over each stream. The user can toggle the webcam and microphone on or off at any point during the recording without interrupting the screen capture.

MediaRecorder API takes a combined MediaStream and encodes it into a video file in real time. We record in WebM format using the VP8 or VP9 codec, depending on what the browser supports. The MediaRecorder fires data events as it encodes, and we collect these chunks in memory. When the user stops recording, we assemble the chunks into a Blob and offer it as a download.

The Hard Part: Webcam Compositing

The trickiest part of the entire project was the webcam overlay. We wanted the user's face to appear as a circular bubble in the corner of the recording, baked directly into the video file, just like Loom does it. This sounds simple, but the browser does not give you a built-in way to composite two video streams into one.

Our solution uses an off-screen canvas element. On every animation frame, we draw the current screen capture frame onto the canvas, then draw the webcam frame on top of it in a circular clip region. The canvas runs at the same frame rate as the screen capture, so the result looks smooth and natural.

We then call canvas.captureStream() to turn the canvas output into a new MediaStream. This composited stream is what we pass to the MediaRecorder, not the raw screen capture. The result is a single video file where the webcam bubble is permanently part of the recording.

Getting this to perform well took some iteration. Early versions caused frame drops on older machines because drawing two video sources onto a canvas sixty times per second is CPU-intensive. We optimized by matching the canvas resolution to the actual screen capture resolution instead of using a fixed high resolution, and by using requestAnimationFrame instead of setInterval to stay in sync with the browser's render loop.

Mixing Audio Streams

Audio presented its own set of challenges. The screen capture can optionally include system audio (the sounds coming from the user's computer), and the microphone provides a separate audio track. We needed to mix these two audio sources into a single track for the final recording.

We use the Web Audio API to handle this. Each audio source is connected to a MediaStreamSource node, and both sources feed into a single destination node. The destination node outputs a mixed audio stream that we combine with the composited video stream before passing everything to the MediaRecorder.

One gotcha we ran into: system audio capture is only available in Chrome and Edge, and only when the user shares a browser tab or entire screen (not a specific application window). Firefox does not support system audio capture at all. We detect these capabilities at runtime and show or hide the system audio toggle accordingly, so the UI never offers something the browser cannot deliver.

Draggable Webcam Bubble

We wanted the webcam bubble to be repositionable. The user should be able to drag it to any corner or edge of the preview before or during recording. Since the bubble's position on the preview needs to map exactly to its position in the final recording, we calculate the bubble's coordinates as percentages of the canvas dimensions rather than pixel values. This means the bubble position is resolution-independent and always lands in the right spot in the output file.

The drag interaction itself uses pointer events with a simple offset calculation to keep the bubble anchored to where the user grabbed it, preventing the annoying "jump to center" behavior that naive drag implementations often have.

Privacy by Architecture

Privacy was not an afterthought. It was a design constraint from the start. The entire recording pipeline runs in the browser's main thread and a compositing loop. No data is sent to any server at any point. There is no upload endpoint, no analytics on the video content, and no account system.

When the user clicks download, we create a Blob URL from the in-memory video data and trigger a download through an anchor element. The video goes straight from browser memory to the user's filesystem. When the user closes the tab, all data is gone.

This architecture also means we have zero infrastructure cost for the screen recorder. There are no video processing servers, no storage buckets, and no CDN bandwidth. It scales to a million users the same way it scales to one, because each user's browser does all the work.

Countdown and UX Polish

Small details matter for recording tools. We added a 3-2-1 countdown before the recording starts, giving the user time to switch windows or prepare their screen. Without it, the first few seconds of every recording would be the user fumbling to get to the right place.

We also built an instant preview that plays the recording back as soon as the user stops it. This lets people verify the content before downloading, which saves the frustrating cycle of record, download, open file, realize you missed something, and start over.

The recording controls float in a minimal toolbar that stays visible during recording without being distracting. The user can see the elapsed time, toggle the webcam and microphone, and stop the recording with a single click.

Browser Compatibility and Edge Cases

Browser compatibility is where things get messy. Chrome and Edge provide the most complete support for the APIs we use, including system audio capture and all screen sharing modes. Firefox supports screen capture and MediaRecorder but lacks system audio. Safari has limited and inconsistent support for getDisplayMedia, so we recommend Chrome for the best experience.

We also had to handle a surprising number of edge cases. What happens if the user revokes screen sharing permission mid-recording? The screen capture track fires an "ended" event, and we gracefully stop the recording and present whatever was captured up to that point. What if the webcam disconnects? We detect the lost track and continue recording without the overlay rather than crashing the entire session.

macOS adds another layer of complexity. Users need to grant Screen Recording permission at the OS level (System Settings, Privacy and Security, Screen Recording) in addition to the browser-level permission. If this permission is not granted, the browser returns a blank or black screen capture. We detect this scenario and show a clear message explaining what to do.

What We Learned

Building this tool reinforced a few things we already believed and taught us some new lessons.

First, browser APIs are more capable than most people realize. Five years ago, building a screen recorder with webcam compositing would have required a desktop application or at least a browser extension. Today, it works with vanilla web APIs and no dependencies.

Second, privacy-first architecture can also be the simplest architecture. By keeping everything local, we avoided building video processing pipelines, managing storage, handling uploads over flaky connections, and dealing with GDPR compliance for video data. The simplest solution was also the most private one.

Third, the Web Audio API is powerful but full of gotchas. Mixing audio streams, handling different sample rates, and dealing with browser-specific quirks in audio capture took more time than we expected.

The screen recorder now sits alongside our other free tools like the online dice roller, and it complements our core platform features for teams that need async communication alongside their real-time collaboration spaces. Whether your team uses Flat.social for a virtual calming room between intense work sessions or as a full virtual workplace, the screen recorder is there when you need to capture and share what is happening on your screen.

You can try the screen recorder for free. No download, no sign-up, no watermark.

DEV Community