DEV Community

latifa ouali
latifa ouali

Posted on

How I Fixed a Race Condition in rrweb That Was Breaking 60% of My Session Recordings

I launched my SaaS yesterday. It records visitor sessions on landing pages and uses AI to tell founders exactly why their visitors leave without converting.
Within 24 hours I found a bug that was silently breaking more than half my recordings. No error. No crash. Just empty sessions with a duration of 86 seconds and zero events captured.
This is how I found it and fixed it in one day.
What was happening
When I looked at the database I saw two types of sessions. Some had a full rrweb snapshot at the start, a type 4 event, which is the initial DOM capture that makes playback possible. Others had nothing. Just a few chrome extension script injections and an empty events array.
The sessions with no type 4 snapshot were completely unplayable. The recording existed but there was nothing to replay against.
Why it happened
My tracker script was loading rrweb dynamically and lazily. The flow looked like this:
The page loads. My tracker initializes. It fetches the project routes from my backend. If the current route is tracked, it calls startRecording. startRecording calls loadRRWeb which dynamically injects a script tag for rrweb. rrweb downloads asynchronously. By the time it loads and record() is called, the user has already been on the page for 2 to 3 seconds and the initial DOM snapshot moment has passed.
Without that initial snapshot rrweb has no baseline to record against. It starts capturing mutation events but has nothing to replay them on top of. The recording is useless.
The fix
The solution was simple. Start downloading rrweb immediately when the tracker script initializes, before any async operations happen.
At the very top of my IIFE, before anything else runs, I added this:

javascriptconst rrwebPreload = document.createElement("script");
rrwebPreload.src = "https://cdn.jsdelivr.net/npm/rrweb@2.0.0-alpha.13/dist/rrweb.min.js";
document.head.appendChild(rrwebPreload);
Then I updated my loadRRWeb function to wait for the already-downloading script instead of injecting a new one:
javascriptconst loadRRWeb = () => {
  return new Promise((resolve, reject) => {
    if (window.rrweb) {
      resolve(window.rrweb);
      return;
    }
    const existing = document.querySelector('script[src*="rrweb"]');
    if (existing) {
      existing.addEventListener("load", () => resolve(window.rrweb));
      existing.addEventListener("error", reject);
    } else {
      const script = document.createElement("script");
      script.src = "https://cdn.jsdelivr.net/npm/rrweb@2.0.0-alpha.13/dist/rrweb.min.js";
      script.onload = () => resolve(window.rrweb);
      script.onerror = reject;
      document.head.appendChild(script);
    }
  });
};
Enter fullscreen mode Exit fullscreen mode

Now rrweb starts downloading the moment the tracker script is parsed by the browser. By the time all the async route checking finishes and startRecording is called, rrweb is already loaded or nearly loaded. The type 4 snapshot fires correctly and the recording is complete.
The second bug I found the same day
While I was in the code I also noticed the session duration was showing incorrectly. A session that lasted 39 seconds was showing as 20 seconds in the dashboard.
The cause was in my backend merge logic. When batched events arrived and updated an existing session, the duration was being set to whichever batch arrived last rather than the maximum duration seen across all batches.
The fix was one line. Change this:

javascriptduration: duration || existingSession.session_data?.duration || 0
To this:
javascriptduration: Math.max(duration || 0, existingSession.session_data?.duration || 0)

Enter fullscreen mode Exit fullscreen mode

Same pattern I was already using for scroll depth. Just forgot to apply it to duration.
What I learned
When you lazy load a script that needs to capture the initial state of the page, you will always lose that initial state on slow connections or slow devices. Preload anything that needs to be ready before user interaction happens.
Also pin your rrweb version. Using latest means a breaking change can silently destroy your recordings overnight with no warning.
I am building WhyGoAI, a session recording tool that uses AI to tell founders the psychological reason behind every exit and gives them exact before and after fixes. If you are building something similar or ran into the same rrweb issue I would love to hear how you solved it.
whygoai.pro, free plan available.

Top comments (0)