alakkadshaw

Posted on Jun 4 • Edited on Jun 8

WebRTC Reconnect: Auto-Heal a Call | @metered-ca/realtime

#webdev #javascript #webrtc #programming

WebRTC Reconnect: Drop the Network, Watch a 1:1 Call Heal Itself

WebRTC reconnect, in one sentence: raw WebRTC has no built-in reconnection — a Wi-Fi blip or a Wi-Fi→cellular handoff leaves your RTCPeerConnection stuck in disconnected/failed with no recovery — so this tutorial builds a runnable 1:1 video call with @metered-ca/realtime, then kills the network mid-call and watches the SDK auto-recover the same peer (same identity, fresh ICE/TURN underneath) with zero reconnect code on your side.

That's the whole demo: drop the network, watch the call heal. You'll read the exact state transitions as they happen — a remote peer going reconnecting → connected, both <video> tiles re-attaching on their own — without writing a reconnect button, a manual ICE-restart loop, or a peer.reconnect() call. The point of this tutorial is the thing you don't write.

Goal

By the end you'll have a WebRTC reconnect demo you can prove: a live 1:1 call, an on-screen status log, and a repeatable way to drop the network and watch @metered-ca/realtime rebuild the connection automatically — same remote peer, same identity, fresh ICE/TURN underneath — without you writing a single line of recovery logic.

Prerequisites

Node 18+ and npm (only to serve one file — the SDK itself has zero runtime dependencies).
A free publishable key (pk_live_…) from your Metered dashboard — sign up at metered.ca. This is the no-backend prototype path; nothing runs server-side.
A modern browser: Chrome 90+ / Firefox 90+ / Safari 15+. We'll use Chrome DevTools to simulate the outage because its "Offline" toggle is the cleanest trigger.
getUserMedia needs HTTPS or localhost — serve the file, don't open file://.

Why WebRTC connections drop (and why raw WebRTC won't recover)

Three everyday things break a live call, and stock WebRTC handles none of them for you:

A network change — Wi-Fi→cellular handoff, leaving a tunnel, laptop sleep/wake. Your local IP and candidate set change out from under the connection.
A transient path loss — a few seconds of packet loss flips RTCPeerConnection.iceConnectionState to disconnected, and if it doesn't recover, on to failed.
Signaling loss — the WebSocket carrying SDP/ICE drops, so even when the network returns there's no channel to renegotiate over.

Raw WebRTC gives you the events (connectionstatechange, iceconnectionstatechange) but no recovery: there is no built-in "rebuild this call." You'd have to detect disconnected, decide whether it's transient or terminal, fire an ICE restart, renegotiate over a signaling channel you also had to keep alive — and then re-attach media. That hand-rolled ladder is exactly what @metered-ca/realtime does for you, and what the rest of this page makes observable.

The mental model (read this before the code)

There is exactly one idea to internalize, and it's the one the older peer-ID libraries get wrong:

A transient disconnect is not a terminal close.

When the network blips, your peer hasn't left — it's briefly unreachable. @metered-ca/realtime treats that as a recoverable event and heals it on three layers (all automatic, all part of the SDK's documented resilience model):

Signaling WebSocket reconnects with jittered exponential backoff (~500 ms → 30 s), and it's close-code-aware — a graceful server shutdown is retried differently from a terminal kick (e.g. an invalid/expired token or an admin disconnect is not retried). Default ~100 attempts.
Per-peer ICE restart — the SDK runs an ICE-restart ladder (up to 9 attempts over ~121 s) to rebuild the media path. While this runs, that peer surfaces as remote.state === "reconnecting".
Channel reconciliation — on WebSocket reconnect, your RemotePeer object reference is preserved (same === identity, same remote.id, same metadata). The SDK silently swaps the underlying RTCPeerConnection for a fresh one with new TURN credentials, and your local streams auto-re-attach.

What survives a reconnect: peer references, IDs, metadata, and your local stream attachments. What does not survive: the RTCPeerConnection object identity (remote.pc), any RTCDataChannel, and the remote MediaStream object identity — though stream.id stays stable, and on reconcile the SDK re-fires stream-added with that same stream.id so you just re-bind. Hold that last list — it's the whole pitfalls section.

The contrast with a deliberate teardown is the design's core. peer.close() is terminal: it tears down on purpose and you do not get auto-recovery (you'd construct a fresh MeteredPeer). Everything else — Wi-Fi drops, tunnels, laptop sleep, a flaky LTE handoff — is treated as recoverable. That clean split between recoverable blip and intentional close is precisely what trips up older peer-ID libraries that collapse a transient ICE disconnected into a terminal "destroyed".

The minimal runnable code

One file. It's a complete 1:1 call plus a status pill and a log, so the reconnect is something you can watch, not just trust. Save as index.html, drop in your pk_live_ key, serve, open in two tabs.

<!doctype html>
<html>
  <head>
    <meta charset="utf-8" />
    <title>WebRTC reconnect - @metered-ca/realtime</title>
    <style>
      video { width: 320px; background: #111; border-radius: 8px; margin: 4px; }
      body { font-family: system-ui, sans-serif; padding: 16px; }
      #log { font: 13px/1.5 ui-monospace, monospace; background: #0b1020; color: #d6e2ff;
             padding: 12px; border-radius: 8px; height: 160px; overflow: auto; margin-top: 12px; }
      .pill { display: inline-block; padding: 2px 10px; border-radius: 999px; font-weight: 600; }
      .connected    { background: #dcfce7; color: #166534; }
      .reconnecting { background: #fef3c7; color: #92400e; }
      .closed       { background: #fee2e2; color: #991b1b; }
    </style>
  </head>
  <body>
    <h1>WebRTC reconnect demo</h1>
    <button id="join">Join call</button>
    <span id="status" class="pill">idle</span>
    <div>
      <video id="local" autoplay playsinline muted></video>
      <video id="remote" autoplay playsinline></video>
    </div>
    <div id="log"></div>

    <script type="module">
      import { MeteredPeer } from "https://esm.sh/@metered-ca/realtime@1.0.7";

      const CHANNEL = "room-42";          // both tabs join the SAME channel
      const PK = "pk_live_REPLACE_ME";    // <-- your publishable key

      const localVideo  = document.getElementById("local");
      const remoteVideo = document.getElementById("remote");
      const statusEl    = document.getElementById("status");
      const logEl       = document.getElementById("log");

      const log = (msg) => {
        const t = new Date().toLocaleTimeString();
        logEl.insertAdjacentHTML("afterbegin", `<div>${t}  ${msg}</div>`);
      };
      const setStatus = (state) => {
        statusEl.textContent = state;
        statusEl.className = "pill " + state; // styles "connected"/"reconnecting"/"closed"
      };

      document.getElementById("join").onclick = async () => {
        // 1. Local camera + mic (HTTPS or localhost)
        const localStream = await navigator.mediaDevices.getUserMedia({
          video: true,
          audio: true,
        });
        localVideo.srcObject = localStream;

        const peer = new MeteredPeer({ apiKey: PK });

        // 2. Top-level signaling health (the WebSocket layer).
        //    Local peer states: idle | joining | joined | reconnecting | leaving | closed
        peer.on("state-change", ({ from, to }) => log(`peer: ${from} -> ${to}`));

        // 3. Per-peer lifecycle - THIS is where reconnect shows up.
        peer.on("peer-joined", ({ peer: remote }) => {
          log(`peer-joined: ${remote.id}`);

          // Remote peer states: idle | connecting | connected | reconnecting | closed
          remote.on("state-change", ({ from, to }) => {
            log(`  remote ${remote.id}: ${from} -> ${to}`);
            setStatus(to);
          });

          // The SDK hands us the live stream here - and RE-FIRES this on reconcile
          // with the SAME stream.id but a NEW MediaStream object. So we just re-bind.
          remote.on("stream-added", ({ stream }) => {
            remoteVideo.srcObject = stream;
            log(`stream-added (re)bound: ${stream.id}`);
          });
          remote.on("stream-removed", () => { remoteVideo.srcObject = null; });

          setStatus("connected");
        });

        peer.on("peer-left", ({ peer: remote }) => {
          log(`peer-left: ${remote.id}`);
          remoteVideo.srcObject = null;
        });

        // 4. Publish our camera to the whole channel
        peer.addStream(localStream, { role: "camera" });

        // 5. Connect
        await peer.join(CHANNEL);
        log(`joined ${CHANNEL} as ${peer.peerId}`);
      };
    </script>
  </body>
</html>

Notice there is no reconnect code. Every line above is either UI or a listener. Recovery is the SDK's job; your job is to re-bind the stream when it re-fires stream-added, and to read state when it tells you where it is.

Step-by-step annotations

1 - The call itself (steps 1, 4, 5). getUserMedia() gets your camera/mic; peer.addStream(localStream, { role: "camera" }) fans that stream out to every peer in the channel (no per-target call(remoteId) loop); peer.join(CHANNEL) connects. Two tabs join the same CHANNEL, discover each other, and the call is up. Everything else on the page is about observing the recovery.

2 - Top-level state-change is the signaling pulse. peer.on("state-change", ({ from, to }) => …) reports the health of your signaling WebSocket — layer 1. Read the payload as { from, to } (the transition), not a single state. The local peer moves through joining → joined, and during an outage you'll see it dip to reconnecting and climb back to joined. This is the coarse signal: "is my control channel up?"

3 - Per-peer state-change is where reconnection lives. This is the important one. Each remote peer has its own state-change, and its states are different from the top-level peer's: idle | connecting | connected | reconnecting | closed. During a network blip a remote transitions to reconnecting (the ICE-restart ladder is running) and then back to connected (media path rebuilt). We mirror to straight into the on-screen pill, so the recovery is visible. One idea per layer: the top-level event is about your socket; the per-peer event is about that peer's media path.

4 - Re-bind on stream-added, never cache the stream. Here's the subtle, important bit. The remote stream arrives via remote.on("stream-added", ({ stream }) => …) — and on a reconnect the SDK re-fires stream-added with the same stream.id but a new MediaStream object. So you don't poll for a stream or hold a reference across the drop: you just point the <video> at whatever stream the event hands you, every time it fires. That single handler covers both the first attach and every reconnect re-attach. (We never touch the underlying RTCPeerConnection here — that's the footgun below.)

5 - peer-left vs. a blip. A real peer-left (payload: { peer }) means the other side intentionally close()d or genuinely went away — clear the tile. A transient drop does not fire peer-left; it fires the per-peer state-change to reconnecting. Keeping these two paths distinct is the whole "transient ≠ terminal" idea in code: don't tear your UI down on a blip you're about to recover from.

Run it

npm install @metered-ca/realtime        # zero runtime deps (the CDN import above is for copy-paste)
npx serve .                          # serves on http://localhost:3000

Open http://localhost:3000 in Tab A, click Join call, accept the camera prompt.
Open the same URL in Tab B, click Join call. You now have a 1:1 call; the status pill reads connected and the log shows peer-joined.
Now break it. In Tab A, open Chrome DevTools (Cmd/Ctrl+Shift+I) → Network tab → change the throttling dropdown from "No throttling" to Offline. (No DevTools? Toggle your machine's Wi-Fi off for ~5 seconds, then on.)
What you should see:
- The status pill flips to reconnecting (amber) within a second or two.
- The log prints remote … : connected -> reconnecting, and the top-level peer: line shows the signaling socket dipping (joined -> reconnecting).
- The video may freeze on its last frame — that's expected; the media path is being rebuilt.
Restore the network (set throttling back to "No throttling", or turn Wi-Fi back on). Within a few seconds:
- The log prints remote … : reconnecting -> connected and stream-added (re)bound: ….
- The pill returns to connected (green) and both tiles resume live video.

You just watched all three resilience layers fire — socket backoff, ICE-restart ladder, channel reconciliation — without writing any of them.

A note on the prototype path: with a pk_live_ key on localhost, the call usually re-establishes on host/STUN candidates alone. Across real NATs the reconnect depends on a relay — see TURN, below. The local demo proves the state machine; production needs the TURN piece behind it.

WebRTC ICE restart, auto reconnect, and the `disconnected` state — how the three map

If you searched for "webrtc ice restart" or "webrtc auto reconnect" or "webrtc connection failed", here's how those raw-WebRTC concepts line up with what the SDK is doing for you:

Raw WebRTC concept	What it is	What `@metered-ca/realtime` does
`iceConnectionState: "disconnected"`	Transient path loss; may self-heal	Treated as recoverable; kicks off the per-peer ICE-restart ladder. Surfaces to you as `remote.state === "reconnecting"`.
`iceConnectionState: "failed"`	ICE gave up on the current candidates	The ICE-restart ladder gathers fresh candidates with new TURN creds (up to 9 attempts / ~121 s) instead of you calling `restartIce()` by hand.
`RTCPeerConnection.restartIce()` / `createOffer({ iceRestart: true })`	The manual ICE-restart primitives	Run for you on the ladder; you never call them.
Signaling channel down	No path to renegotiate over	The signaling WebSocket reconnects itself (exp backoff ~500 ms→30 s, ~100 attempts) so renegotiation has a channel.
"auto reconnect" (the pattern)	Detect → restart → renegotiate → re-attach media	The whole pattern, automatic. Your only job: re-bind on `stream-added`, observe `state-change`.

The takeaway: "WebRTC auto reconnect" isn't one switch — it's that whole ladder. Doing it by hand means wiring all five rows yourself and racing your own retries against the browser's. Here it's the SDK's job.

The hard part: reconnecting to the same peer (identity preservation)

Restarting ICE is the easy half. The half that bites people is identity: after the network heals, is this the same call, or did you just create a brand-new peer with a brand-new ID and lose all the per-peer state you'd built up (who they are, their metadata, your UI tile keyed to them)?

This is the sharp edge where older peer-ID libraries struggle — many key everything off a connection-scoped ID, so when the transport is rebuilt you effectively get a new peer and have to reconcile it yourself. @metered-ca/realtime is built the other way around:

The RemotePeer object reference is preserved across the drop — same === identity, same remote.id, same remote.metadata. The handler you registered in peer-joined keeps working; you don't re-wire anything.
Only the transport underneath is swapped — a fresh RTCPeerConnection with new TURN credentials.
So your tile, your remote.on("state-change") listener, and any per-peer state stay valid. You react to reconnecting/connected; you don't rebuild identity.

That's the wedge of this whole tutorial: the connection is disposable; the peer is not. Identity survives the drop, the media path is rebuilt under it.

Common pitfalls

#1 footgun - never cache remote.pc across a reconnect. This is the single mistake that turns "it just works" into "it works until the first Wi-Fi blip." The remote peer object is stable across a reconnect (same === identity, same remote.id, same metadata), but the SDK swaps the underlying RTCPeerConnection for a fresh one with new ICE/TURN. So if you reach for the low-level connection via the documented remote.pc escape hatch (to read stats, add a custom track, open a data channel), a handle you grabbed before the drop points at a dead PC afterward. Re-read remote.pc only after that peer reports state-change → connected. In this tutorial we never touch pc — the SDK re-attaches media for us — which is exactly why this demo survives a reconnect for free.

The remote MediaStream object isn't stable either — stream.id is. Same root cause. On reconcile the SDK re-fires stream-added with a new MediaStream object but the same stream.id. If you keyed UI off the stream object, it'll look "lost." Bind directly from the event payload every time it fires (as the demo does), or key off stream.id. And note: stream-removed is suppressed during reconcile — so a brief drop won't trick you into tearing the tile down.
Don't write a reconnect loop. Coming from older peer-ID libraries, the instinct is to listen for a disconnect and call something like peer.reconnect(). There is no such call here, and you don't want one — manual reconnect logic racing the SDK's own backoff is how you get the "socket opens but no events fire" class of bug. Recovery is automatic; you only observe it.
close() is terminal - it is not "disconnect". peer.close(reason?) permanently tears the instance down; you can't join() on it again, and it will not auto-recover. It's for intentional teardown (user hangs up, component unmounts), not for handling a blip. If you call close() expecting it to reconnect later, nothing will — construct a fresh MeteredPeer for a new session. Conflating user-initiated disconnect with accidental drops is the classic peer-ID-library footgun: an intentional teardown and a transient ICE disconnected are not the same event, and treating them the same is what breaks recovery.
Reconnect across real NATs needs TURN. On localhost the recovery looks free because host candidates always work. Across symmetric NATs and corporate firewalls, rebuilding the media path requires a relay — without TURN, the ICE-restart ladder has nothing to restart onto. See Next steps.
getUserMedia still needs HTTPS or localhost. Serve the file; never open file://.

FAQ

How do I reconnect a WebRTC call after a network change (Wi-Fi → cellular)?
You don't do it by hand. A network change flips ICE to disconnected/failed; @metered-ca/realtime treats that as recoverable and runs the ICE-restart ladder (fresh candidates + TURN creds, up to 9 attempts / ~121 s) while the signaling WebSocket reconnects underneath. You react to the per-peer state-change (reconnecting → connected) and re-bind the stream when stream-added re-fires.

Does the reconnect give me the same peer, or a new one?
The same one. The RemotePeer object reference, remote.id, and remote.metadata are all preserved across the drop — only the underlying RTCPeerConnection is swapped. That's the identity-preservation guarantee: your per-peer handlers and UI keyed to that peer stay valid.

What's the difference between ICE disconnected and failed here?
disconnected is a transient path loss that may self-heal; failed means ICE gave up on the current candidates. The SDK doesn't make you branch on them — both feed the same ICE-restart ladder, which gathers fresh candidates rather than waiting for the dead path to come back.

Can I just cache the RTCPeerConnection and reuse it after a reconnect?
No — that's the #1 footgun. remote.pc is a different object after a reconcile. Re-read it only after the peer reports state-change → connected; never hold a reference across a drop.

How is this different from older peer-ID libraries' reconnect (e.g. a manual reconnect() call)?
Categorically: older peer-ID libraries tend to expose a manual reconnect call and key state off a connection-scoped ID, so a transient disconnected can collapse into a terminal close and a rebuilt transport looks like a new peer. Here, recovery is automatic and the peer's identity is preserved across the rebuilt transport — you observe state, you don't drive reconnection.

Why does my reconnect work on localhost but fail for real users?
Because localhost recovers on host candidates, but real users behind symmetric NATs / firewalls need a relay. Without TURN the ICE-restart ladder has nothing to restart onto. Add TURN (next section) before you ship.

Next steps

Add TURN, or reconnects fail in the real world. This is not optional once you leave localhost. Metered's Open Relay Project provides 20 GB/month of free TURN with zero setup — the relay the ICE-restart ladder needs to rebuild a media path behind a firewall. It's the single most common reason a demo that "reconnects fine on my machine" fails for real users on mobile data or office Wi-Fi.
Deliver TURN credentials in a JWT. When you move off pk_live_ for production, switch to the tokenProvider (JWT) path. The SDK calls your provider on first connect and on every reconnect, so you can embed fresh iceServers/TURN credentials in the token's metadata; the client reads them from the welcome message and each rebuilt RTCPeerConnection gets working relay creds automatically. This is what makes reconnection robust in production — see the Realtime Messaging getting-started guide.
Reconnect a whole room, not one peer. This demo pins a single remote <video>. For a real room, attach the same per-peer state-change / stream-added handlers to every peer inside peer-joined, and render a tile each — each peer recovers independently, on its own ICE-restart ladder.
Read stats safely. Want a "reconnecting…" overlay driven by real ICE state, or bandwidth numbers? Reach for remote.pc to call getStats() — but obey the footgun: grab it fresh on connected, never hold it across a drop.
Start from the call instead. If you want the 1:1 call built up from scratch (camera, channel, fan-out) before adding resilience, the companion video-call tutorial walks the same @metered-ca/realtime call line by line.

Recipe (for skimmers)

WebRTC reconnect in @metered-ca/realtime is listeners, not logic:

import { MeteredPeer } from "@metered-ca/realtime";

const peer = new MeteredPeer({ apiKey: "pk_live_…" });

peer.on("peer-joined", ({ peer: remote }) => {
  // Remote states: idle | connecting | connected | reconnecting | closed
  remote.on("state-change", ({ from, to }) => {
    // "reconnecting" during a blip, "connected" when healed
    updatePill(to);
  });
  // Re-fires on reconcile with a NEW MediaStream (same stream.id) - just re-bind.
  remote.on("stream-added", ({ stream }) => {
    remoteVideo.srcObject = stream;
  });
});

peer.addStream(localStream);   // fans out to the channel
await peer.join("room-42");

Transient ≠ terminal. A network drop fires the per-peer state-change ({ from, to }: reconnecting → connected); only close() is terminal. Don't write a reconnect loop.
Re-bind on stream-added, never cache. The remote peer object and remote.id are stable, but its RTCPeerConnection (remote.pc) and MediaStream object are swapped on reconnect — re-bind from the re-fired stream-added (same stream.id).
TURN is the production dependency. Across real NATs the ICE-restart ladder needs a relay; add Open Relay (20 GB/mo free) before you ship.

Last reviewed: 2026-06-03.

_Verified against @metered-ca/realtime@1.0.7 (latest on npm; resolves on esm.sh) and the live Metered docs (llms-realtime-messaging.txt, llms-realtime-messaging-sdk.txt, re-fetched 2026-06-03): state-change payload is { from, to }; RemotePeer has no .streams array (streams arrive via the stream-added event, which re-fires on reconcile with a new MediaStream but the same stream.id); top-level peer states are idle | joining | joined | reconnecting | leaving | closed and remote-peer states are idle | connecting | connected | reconnecting | closed; peer-joined/peer-left carry { peer }; bundle ~13 KB gzipped (WebRTC included); free TURN = 20 GB/month via Open Relay. Sources: https://www.metered.ca/docs/llms-realtime-messaging.txt · https://www.metered.ca/docs/llms-realtime-messaging-sdk.txt · https://www.metered.ca/tools/openrelay/ · https://www.npmjs.com/package/@metered-ca/realtime