DEV Community

Cover image for How to Build Modern WebRTC Applications: From Simple Peer Connections to Multi-User Video Chat
Nithin Bharadwaj
Nithin Bharadwaj

Posted on

How to Build Modern WebRTC Applications: From Simple Peer Connections to Multi-User Video Chat

As a best-selling author, I invite you to explore my books on Amazon. Don't forget to follow me on Medium and show your support. Thank you! Your support means the world!

Building a modern WebRTC application feels like constructing a bridge between two computers anywhere in the world, allowing them to send video, audio, and data directly to each other in the blink of an eye. I want to walk you through how this bridge is built, piece by piece, from the simple idea to a system that can handle a room full of people. We'll use code to make the ideas clear, showing you the gears turning behind the scenes.

Think of two browsers trying to talk. They can't just call out to each other by name; they need an introduction. This is where signaling comes in. Before any direct video stream flows, the browsers need to exchange formal introductions and network coordinates. A signaling server, often a simple WebSocket server, handles this matchmaking. It doesn't carry the heavy media traffic; it just passes notes so the peers can find each other.

Here’s what that introductory server looks like. It manages rooms and lets peers exchange the crucial information needed to connect.

// A simple signaling server using Node.js and WebSocket
const WebSocket = require('ws');
const wss = new WebSocket.Server({ port: 8080 });

const activeRooms = {};

wss.on('connection', function connection(ws) {
  let currentRoom = null;
  let currentUser = null;

  ws.on('message', function incoming(rawMessage) {
    const message = JSON.parse(rawMessage);

    if (message.type === 'join-room') {
      currentRoom = message.roomId;
      currentUser = message.userId;

      if (!activeRooms[currentRoom]) {
        activeRooms[currentRoom] = {};
      }

      // Store this user's connection in the room
      activeRooms[currentRoom][currentUser] = ws;

      // Tell everyone else in the room that a new user arrived
      broadcastToRoom(currentRoom, currentUser, {
        type: 'new-peer',
        userId: currentUser
      });
    }

    if (message.type === 'signal') {
      // A peer is sending an offer, answer, or network candidate
      const targetUser = message.targetUserId;
      const targetWs = activeRooms[currentRoom]?.[targetUser];
      if (targetWs && targetWs.readyState === WebSocket.OPEN) {
        targetWs.send(JSON.stringify({
          type: 'signal',
          from: currentUser,
          data: message.data
        }));
      }
    }

    if (message.type === 'leave-room') {
      removeUser(currentRoom, currentUser);
    }
  });

  ws.on('close', () => {
    removeUser(currentRoom, currentUser);
  });

  function broadcastToRoom(roomId, excludeUserId, data) {
    const room = activeRooms[roomId];
    if (!room) return;
    for (const [userId, userSocket] of Object.entries(room)) {
      if (userId !== excludeUserId && userSocket.readyState === WebSocket.OPEN) {
        userSocket.send(JSON.stringify(data));
      }
    }
  }

  function removeUser(roomId, userId) {
    if (roomId && activeRooms[roomId]?.[userId]) {
      delete activeRooms[roomId][userId];
      broadcastToRoom(roomId, null, {
        type: 'peer-left',
        userId: userId
      });
      // Clean up empty rooms
      if (Object.keys(activeRooms[roomId]).length === 0) {
        delete activeRooms[roomId];
      }
    }
  }
});
Enter fullscreen mode Exit fullscreen mode

With the introduction made, the real work begins. Each browser creates an RTCPeerConnection object. This is the core engine. But networks are tricky. Most devices are behind routers with firewalls (NATs). To find a direct path, browsers use ICE, a framework that gathers all possible connection addresses. It uses STUN servers to discover your public IP, and if that fails, TURN servers to relay traffic, though this is slower.

Managing this connection is complex. You need to handle offers, answers, network candidate exchanges, and watch for when the connection drops. I usually wrap this logic in a class to keep things tidy. Here is a robust peer connection manager.

class RobustPeer {
  constructor(options = {}) {
    this.rtcConfig = {
      iceServers: [
        { urls: 'stun:stun1.l.google.com:19302' }
      ],
      ...options.config
    };
    this.pc = null;
    this.connected = false;
    this.remoteStreams = [];
    this.onTrackCallback = null;
    this.onStateChange = null;
  }

  // Start the process of calling another peer
  async makeCall() {
    this.pc = new RTCPeerConnection(this.rtcConfig);
    this.setupListeners();

    // Get the local camera/microphone
    const localStream = await navigator.mediaDevices.getUserMedia({ video: true, audio: true });
    localStream.getTracks().forEach(track => this.pc.addTrack(track, localStream));

    // Create the formal "offer"
    const offer = await this.pc.createOffer();
    await this.pc.setLocalDescription(offer);

    // In a real app, you'd send this offer via the signaling server
    return offer;
  }

  // Handle an incoming call
  async receiveCall(incomingOffer) {
    this.pc = new RTCPeerConnection(this.rtcConfig);
    this.setupListeners();

    // Set the remote description (the offer from the other peer)
    await this.pc.setRemoteDescription(new RTCSessionDescription(incomingOffer));

    // Get local media
    const localStream = await navigator.mediaDevices.getUserMedia({ video: true, audio: true });
    localStream.getTracks().forEach(track => this.pc.addTrack(track, localStream));

    // Create and send the "answer"
    const answer = await this.pc.createAnswer();
    await this.pc.setLocalDescription(answer);
    return answer;
  }

  setupListeners() {
    if (!this.pc) return;

    // When the other peer adds a track (sends video/audio), we get it here
    this.pc.ontrack = (event) => {
      this.remoteStreams.push(event.streams[0]);
      if (this.onTrackCallback) {
        this.onTrackCallback(event.streams[0]);
      }
    };

    // Listen for ICE candidates (network path information)
    this.pc.onicecandidate = (event) => {
      if (event.candidate) {
        // Send this candidate to the other peer via signaling
        sendSignalMessage({
          type: 'ice-candidate',
          candidate: event.candidate
        });
      }
    };

    // Monitor connection health
    this.pc.oniceconnectionstatechange = () => {
      const state = this.pc.iceConnectionState;
      if (this.onStateChange) this.onStateChange(state);

      if (state === 'failed') {
        // Try to recover the connection
        this.recoverConnection();
      }
    };
  }

  // Add a received ICE candidate to the connection
  async addIceCandidate(candidate) {
    if (this.pc && this.pc.remoteDescription) {
      try {
        await this.pc.addIceCandidate(new RTCIceCandidate(candidate));
      } catch (err) {
        console.error('Error adding ICE candidate:', err);
      }
    }
  }

  async recoverConnection() {
    console.log('Connection failed, attempting recovery...');
    // This might involve creating a new offer with 'iceRestart'
    try {
      const newOffer = await this.pc.createOffer({ iceRestart: true });
      await this.pc.setLocalDescription(newOffer);
      // Send this new offer through signaling
    } catch (err) {
      console.error('Recovery attempt failed:', err);
    }
  }

  hangUp() {
    if (this.pc) {
      this.pc.close();
      this.pc = null;
      this.connected = false;
      this.remoteStreams = [];
    }
  }
}

// Helper function to send a signal (would connect to your signaling server)
function sendSignalMessage(msg) {
  // In reality, this would be a WebSocket send
  signalingSocket.send(JSON.stringify(msg));
}
Enter fullscreen mode Exit fullscreen mode

Video and audio are just one part. What about sending a file, a chat message, or game data? This is where data channels shine. They let you send any data directly between peers with configurable reliability. Want your chat messages to arrive in order and guaranteed? Set it up. Don't care if a few position updates are lost in a fast-paced game? You can configure for that too.

Managing multiple data channels for different purposes needs care. Here’s a manager that handles creation, sending, buffering, and basic health checks.

class AppDataChannel {
  constructor(peerConnection, label = 'default') {
    this.pc = peerConnection;
    this.label = label;
    this.channel = null;
    this.messageQueue = [];
    this.isReady = false;
  }

  // Create and set up the data channel
  initialize(options = {}) {
    const config = {
      ordered: true,       // Messages arrive in the order sent
      maxRetransmits: 0,   // 0 = reliable, higher = less reliable
      ...options
    };

    this.channel = this.pc.createDataChannel(this.label, config);

    this.channel.onopen = () => {
      console.log(`Data channel '${this.label}' is open`);
      this.isReady = true;
      this.flushQueue(); // Send any messages that were waiting
    };

    this.channel.onclose = () => {
      console.log(`Data channel '${this.label}' closed`);
      this.isReady = false;
    };

    this.channel.onmessage = (event) => {
      console.log(`Received on ${this.label}:`, event.data);
      this.handleMessage(event.data);
    };

    this.channel.onerror = (error) => {
      console.error(`Data channel '${this.label}' error:`, error);
    };
  }

  // Send data. If channel isn't open, queue it.
  send(data) {
    const packet = JSON.stringify({
      timestamp: Date.now(),
      payload: data
    });

    if (this.isReady && this.channel.readyState === 'open') {
      this.channel.send(packet);
    } else {
      console.log(`Channel not ready, queuing message. Queue size: ${this.messageQueue.length}`);
      this.messageQueue.push(packet);
      // Limit queue size to prevent memory issues
      if (this.messageQueue.length > 100) {
        this.messageQueue.shift();
      }
    }
  }

  // Send all queued messages once the channel opens
  flushQueue() {
    while (this.messageQueue.length > 0 && this.isReady) {
      const msg = this.messageQueue.shift();
      this.channel.send(msg);
    }
  }

  // Override this method to handle incoming data in your app
  handleMessage(rawData) {
    try {
      const data = JSON.parse(rawData);
      console.log('Parsed message:', data);
      // Your app logic here. E.g., update UI, process a file chunk.
    } catch (e) {
      console.log('Received non-JSON data, perhaps a file chunk.');
      // Handle binary data
    }
  }

  // Send a file in chunks
  sendFile(file) {
    const CHUNK_SIZE = 16384; // 16KB
    const fileId = Date.now();
    const totalChunks = Math.ceil(file.size / CHUNK_SIZE);
    let chunkIndex = 0;

    const reader = new FileReader();
    reader.onload = (e) => {
      const chunk = e.target.result;
      this.send({
        type: 'file-chunk',
        fileId: fileId,
        name: file.name,
        type: file.type,
        size: file.size,
        chunkIndex: chunkIndex,
        totalChunks: totalChunks,
        data: chunk
      });

      chunkIndex++;
      if (chunkIndex < totalChunks) {
        readNextChunk();
      } else {
        console.log(`File ${file.name} sent.`);
      }
    };

    function readNextChunk() {
      const start = chunkIndex * CHUNK_SIZE;
      const end = Math.min(start + CHUNK_SIZE, file.size);
      const slice = file.slice(start, end);
      reader.readAsDataURL(slice); // Or use ArrayBuffer for pure binary
    }

    readNextChunk();
  }
}

// Usage example within your peer setup:
// const dataManager = new AppDataChannel(myRobustPeer.pc, 'chat');
// dataManager.initialize({ ordered: true });
// dataManager.send({ text: "Hello!", user: "Me" });
Enter fullscreen mode Exit fullscreen mode

Direct peer-to-peer is perfect for a call between two people. But what about a meeting with ten? If each person sends their stream to nine others, that’s 90 incoming streams, which cripples most home internet connections. This is where the architecture shifts. Instead of a mesh, we use a Selective Forwarding Unit (SFU). Think of it as a video switchboard in the cloud. Each person sends one stream to the SFU. The SFU then decides who needs which streams and sends them out. My device only receives the three videos I’m looking at, not all nine.

Building the logic to decide between a simple mesh and an SFU is key for a scalable app. Here’s a conceptual controller that manages this.

class ConnectionStrategyManager {
  constructor(localPeerId) {
    this.localPeerId = localPeerId;
    this.mode = 'mesh'; // 'mesh' or 'sfu'
    this.peerConnections = new Map(); // For mesh: peerId -> RobustPeer
    this.sfuConnection = null;        // For SFU: one connection to the server
    this.activeStreams = new Map();   // peerId -> MediaStream
    this.participantThreshold = 4;    // Switch to SFU at 4+ participants
  }

  async addParticipant(remotePeerId, signalingOffer) {
    const participantCount = this.peerConnections.size;

    // Decision Logic
    if (participantCount >= this.participantThreshold && this.mode !== 'sfu') {
      console.log(`Switching to SFU mode with ${participantCount + 1} participants.`);
      await this.switchToSFUMode();
    }

    if (this.mode === 'mesh') {
      await this.establishMeshConnection(remotePeerId, signalingOffer);
    } else {
      await this.connectViaSFU(remotePeerId, signalingOffer);
    }
  }

  async establishMeshConnection(peerId, offer) {
    // Create a new direct peer connection for this one person
    const newPeer = new RobustPeer();
    const answer = await newPeer.receiveCall(offer);
    this.peerConnections.set(peerId, newPeer);

    // Store their stream when it arrives
    newPeer.onTrackCallback = (stream) => {
      this.activeStreams.set(peerId, stream);
      this.updateVideoDisplay(peerId, stream);
    };

    // Send our own video to this new peer
    const localStream = await navigator.mediaDevices.getUserMedia({video: true, audio: true});
    const newOffer = await newPeer.makeCall(); // This re-offers with our track
    // ... send newOffer via signaling to the remote peer

    return answer;
  }

  async switchToSFUMode() {
    this.mode = 'sfu';
    // 1. Close all existing mesh connections
    for (const [peerId, pc] of this.peerConnections) {
      pc.hangUp();
    }
    this.peerConnections.clear();

    // 2. Establish a single connection to the SFU server
    this.sfuConnection = new RobustPeer();
    // The "offer" here would come from your SFU server's signaling
    await this.sfuConnection.receiveCall(sfuServerOffer);

    // 3. Send our local stream once to the SFU
    const localStream = await navigator.mediaDevices.getUserMedia({video: true, audio: true});
    localStream.getTracks().forEach(track => this.sfuConnection.pc.addTrack(track, localStream));

    // 4. The SFU will send us back streams from other participants
    this.sfuConnection.onTrackCallback = (stream) => {
      // The SFU should signal which peer this stream belongs to.
      // For simplicity, we'll assume a mapping exists.
      const sourcePeerId = getPeerIdFromStream(stream); // You'd implement this
      this.activeStreams.set(sourcePeerId, stream);
      this.updateVideoDisplay(sourcePeerId, stream);
    };
  }

  async connectViaSFU(peerId, signalingData) {
    // In SFU mode, we don't create a direct connection.
    // We tell the SFU server about the new participant via signaling.
    // The SFU server will then manage forwarding streams.
    sendSignalToSFUServer({
      type: 'add-participant',
      newPeerId: peerId,
      details: signalingData
    });
  }

  updateVideoDisplay(peerId, stream) {
    // Your app's UI logic to create/update a <video> element
    const videoElement = document.getElementById(`video-${peerId}`) || document.createElement('video');
    videoElement.id = `video-${peerId}`;
    videoElement.srcObject = stream;
    videoElement.autoplay = true;
    videoElement.playsInline = true;
    document.getElementById('video-container').appendChild(videoElement);
  }

  // Dynamically adjust quality based on network stats
  async adaptQuality() {
    if (this.mode === 'sfu' && this.sfuConnection?.pc) {
      const stats = await this.sfuConnection.pc.getStats();
      let packetLossRate = 0;

      stats.forEach(report => {
        if (report.type === 'inbound-rtp' && report.kind === 'video') {
          if (report.packetsLost !== null && report.packetsReceived !== null) {
            const total = report.packetsLost + report.packetsReceived;
            packetLossRate = total > 0 ? report.packetsLost / total : 0;
          }
        }
      });

      // Example adaptation: limit our outgoing video bitrate if network is poor
      const senders = this.sfuConnection.pc.getSenders();
      if (packetLossRate > 0.05) { // 5% loss
        console.log('High packet loss, reducing outgoing video quality.');
        senders.forEach(sender => {
          if (sender.track?.kind === 'video') {
            const params = sender.getParameters();
            if (!params.encodings) params.encodings = [{}];
            params.encodings.forEach(enc => {
              enc.maxBitrate = 250000; // Limit to 250 kbps
              enc.scaleResolutionDownBy = 2.0; // Reduce resolution
            });
            sender.setParameters(params).catch(console.error);
          }
        });
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

The final piece is understanding that it’s not just about getting a connection, but maintaining a good one. You need to listen to the statistics the connection provides—packet loss, latency, available bandwidth—and adapt. If the network is poor, maybe you lower the video resolution or switch to audio only. This adaptability is what separates a working demo from a professional application.

Building with WebRTC is an exercise in managing state, handling failure gracefully, and making smart trade-offs between latency, quality, and scalability. You start with a simple peer link, then add a signaling server to introduce them, then perhaps a TURN server to bypass difficult networks, and finally an SFU to bring everyone together. The code patterns we walked through—signaling, connection management, data channels, and strategic forwarding—form the foundation. From here, you can build video calls, live audio streams, collaborative whiteboards, or even real-time games, all powered by this direct connection between browsers.

📘 Checkout my latest ebook for free on my channel!

Be sure to like, share, comment, and subscribe to the channel!


101 Books

101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.

Check out our book Golang Clean Code available on Amazon.

Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!

Our Creations

Be sure to check out our creations:

Investor Central | Investor Central Spanish | Investor Central German | Smart Living | Epochs & Echoes | Puzzling Mysteries | Hindutva | Elite Dev | Java Elite Dev | Golang Elite Dev | Python Elite Dev | JS Elite Dev | JS Schools


We are on Medium

Tech Koala Insights | Epochs & Echoes World | Investor Central Medium | Puzzling Mysteries Medium | Science & Epochs Medium | Modern Hindutva

Top comments (0)