DEV Community

Aditya
Aditya

Posted on

Building Real-Time Broadcast Calling with WebRTC and WebSockets

This article walks through implementing a complete broadcast calling system that enables users to initiate group calls and establish peer-to-peer audio/video connections using WebRTC and WebSockets in Next JS.

Here is the overall System Architecture implementation:

  • WebSockets handle real-time signaling - coordinating call setup, participant management, and WebRTC negotiation messages between clients and server through persistent bidirectional connections.
  • WebRTC manages peer-to-peer media streaming - audio/video transmission, codec negotiation, and network traversal through NAT/firewalls using ICE candidates for direct client-to-client communication.
  • React Context provides centralised state management - ensuring consistent call state across components while avoiding prop drilling and maintaining a single source of truth for connection status and media streams.
  • RESTful APIs handle business logic - managing call groups, authentication, call history, and backend integration for logging and analytics.

Let's dive into the setup:
Flowchart that visualises the broadcast calling system flow

The following sections outline the key parts of my calling flow that enable it to function, as depicted in the flowchart.

- WebSocket Connection

When users log in, we establish persistent WebSocket connections and prepare media stream references.Persistent WebSocket connections establish when users authenticate, maintaining connection through network interruptions and browser state changes. The connection handles session validation, automatic reconnection with exponential backoff, and graceful degradation to polling when WebSocket fails.
The WebSocket layer coordinates multiple event types: incoming call notifications, WebRTC signaling messages (offer/answer/ICE candidates), participant lifecycle events (joined/left), and call state changes (accepted/declined/ended).

const connect = useCallback(() => {
  const accessToken = getCookieValue('accessToken');

  socket.current = io(socketUrl, {
    auth: { token: accessToken },
    transports: ['websocket', 'polling'],
    timeout: 10000,
    reconnection: true,
    reconnectionAttempts: 3,
  });

  socket.current.on('connect', () => {
    console.log('WebSocket connected:', socket.current?.id);
    setIsConnected(true);
  });

  // Set up event listeners for call signaling
  socket.current.on('incoming_call', handleIncomingCall);
  socket.current.on('webrtc-offer', handleWebRTCOffer);
  socket.current.on('webrtc-answer', handleWebRTCAnswer);
  socket.current.on('ice-candidate', handleIceCandidate);
}, []);
Enter fullscreen mode Exit fullscreen mode

- Media Stream Initialisation and global State Management with React Context

The WebRTC context manages connection status, call metadata, media stream references, participant information.

const WEBRTCContextProvider = ({ children }) => {
  const [webRTCStatus, setWebRTCStatus] = useState('idle');
  const [callType, setCallType] = useState('voice');
  const [localStream, setLocalStream] = useState(null);
  const [remoteStream, setRemoteStream] = useState(null);

  return (
    <WebRTCContext.Provider value={{
      webRTCStatus, setWebRTCStatus,
      callType, setCallType,
      localStream, setLocalStream,
      remoteStream, setRemoteStream
    }}>
      {children}
    </WebRTCContext.Provider>
  );
};
Enter fullscreen mode Exit fullscreen mode

- Initiating Broadcast Calls

Call initiation involves REST API calls for persistent state management combined with WebSocket events for real-time notification. As soon the user clicks on broadcast call button, an api is triggered, with call group id, which creates call records, generates unique identifiers, and provisions WebRTC configuration including ICE servers.The system then broadcasts call invitations to all group members through WebSocket events. WebRTC configuration (ICE servers, call room identifiers, media constraints) distributes to participants through the API response for initiators and through socket events for receivers.

// Group page - Broadcast call initiation
const handleCallgroup = async () => {
  setIsCalling(true);

  try {
    // 1. API call to create broadcast call
    const result = await broadcastCallToGroupAction({
      groupId: "group-uuid",
    });

    if (result.error) {
      toast.error(result.error);
      return;
    }

    const { id: broadcastId, caller } = result.data;
    const { webrtc } = caller;

    // 2. Set up WebRTC configuration
    setBroadCastCallId(broadcastId);
    setCallType('voice'); // or 'video'

    toast.success("Call broadcasted to group!");

  } catch (error) {
    toast.error("Failed to initiate call");
  } finally {
    setIsCalling(false);
  }
};
Enter fullscreen mode Exit fullscreen mode

- Handling Incoming Calls

As mentioned previously, all group members receive 'incoming_call' socket event when a broadcast call is initiated.

// Socket event handler for incoming calls
socket.current.on('incoming_call', (callData) => {
  console.log('Incoming call received:', callData);

  const callType = callData.callType || 'voice';
  setCallType(callType);

  const incomingCallData = {
    callId: callData.callId,
    callerName: callData.callerName || 'Unknown Caller',
    type: callType,
    broadcastCallId: callData.broadcastCallId,
    timestamp: new Date().toISOString()
  };

  // Show incoming call modal
  setIncomingCall(incomingCallData);

  toast.info(`Incoming ${callType} call from ${callData.callerName}`);
});
Enter fullscreen mode Exit fullscreen mode

Incoming calls triggers in-app modal dialogs with ringing status and audio ringtones.

// IncomingCallModal component
function IncomingCallModal({ call, onAccept, onDecline, isAnswering }) {
  const audioRef = useRef<HTMLAudioElement>(null);

  useEffect(() => {
    if (call && audioRef.current) {
      // Start ringtone
      const audio = audioRef.current;
      audio.loop = true;
      audio.volume = 0.7;

      audio.play().catch(error => {
        console.warn('Could not play ringtone:', error);
      });

      return () => {
        audio.pause();
        audio.currentTime = 0;
      };
    }
  }, [call]);

  return (
    <div className="fixed inset-0 z-50 flex items-center justify-center">
      <audio ref={audioRef} preload="auto" loop crossOrigin="anonymous">
        <source src="https://www.soundjay.com/misc/sounds/ringtone_1.mp3" />
      </audio>

      <div className="bg-white rounded-lg p-6 shadow-2xl">
        <h3>Incoming {call.type} Call</h3>
        <p>{call.callerName}</p>

        <div className="flex gap-3 mt-6">
          <button onClick={() => onDecline(call.callId)}>
            Decline
          </button>
          <button onClick={() => onAccept(call.callId)}>
            Accept
          </button>
        </div>
      </div>
    </div>
  );
}
Enter fullscreen mode Exit fullscreen mode

- Call Acceptance and WebRTC Handshake

When a user accepts the call, we trigger the answer API and initiate WebRTC peer connection establishment.At the same time, 'call-cancelled' socket event is triggered to each group member to cancel the call for all other users.
Connection Negotiation
WebRTC negotiation follows the standard offer/answer pattern with enhanced error handling. Initiators create offers after joining call rooms, receivers process offers and generate answers, and both parties exchange ICE candidates for optimal connection paths.
Media Stream Management
Media acquisition involves requesting appropriate permissions, configuring audio/video constraints based on call type, and handling device availability and quality constraints.
Network Traversal
The system handles various network configurations through STUN servers for NAT detection, TURN servers for relay when direct connection fails, and ICE candidate processing for optimal path selection.

For our purposes, we have used RTCPeerConnection web API.

I have discussed this in detail in this article.

- Call Interface and Media Stream Management in live call

Once connected, users see the call interface and exchange media streams. The call interface adapts based on call type (voice/video), connection status, participant count, and available screen real estate. Video calls display remote video as primary view with local video as picture-in-picture overlay.
Local streams assign to muted audio elements and mirrored video elements. Remote streams assign to playable audio elements and main display video elements with proper aspect ratio handling.

Here is the calling or P2P interface:

// P2P Call Interface
function P2PCallInterface({ onEndCall, onToggleMute }) {
  const { webRTCStatus, callType, localStream, remoteStream } = useWebRTCContext();

  // Set up video/audio elements
  useEffect(() => {
    if (localStream && callType === 'video') {
      const localVideo = document.getElementById('localVideo');
      localVideo.srcObject = localStream;
    }

    if (remoteStream) {
      const remoteAudio = document.getElementById('remoteAudio');
      const remoteVideo = document.getElementById('remoteVideo');

      remoteAudio.srcObject = remoteStream;
      if (callType === 'video') {
        remoteVideo.srcObject = remoteStream;
      }
    }
  }, [localStream, remoteStream, callType]);

  if (callType === 'video') {
    return (
      <div className="fixed inset-0 bg-black">
        {/* Remote video - main display */}
        <video id="remoteVideo" autoPlay className="w-full h-full object-cover" />

        {/* Local video - picture-in-picture */}
        <video 
          id="localVideo" 
          autoPlay 
          muted 
          className="absolute top-4 right-4 w-48 h-36 object-cover rounded"
        />

        {/* Controls */}
        <div className="absolute bottom-8 left-1/2 transform -translate-x-1/2">
          <button onClick={onToggleMute}>Mute/Unmute</button>
          <button onClick={onEndCall}>End Call</button>
        </div>

        <audio id="remoteAudio" autoPlay />
      </div>
    );
  }

  // Voice call interface
  return (
    <div className="fixed inset-0 bg-black/75 flex items-center justify-center">
      <div className="bg-white rounded-lg p-6">
        <h3>Voice Call Active</h3>
        <p>Status: {webRTCStatus}</p>

        <div className="flex gap-4 mt-6">
          <button onClick={onToggleMute}>Mute/Unmute</button>
          <button onClick={onEndCall}>End Call</button>
        </div>

        <audio id="localAudio" autoPlay muted />
        <audio id="remoteAudio" autoPlay />
      </div>
    </div>
  );
}
Enter fullscreen mode Exit fullscreen mode

Conclusion:

This broadcast calling system demonstrates how to combine WebSockets for signalling with WebRTC for media transmission.

Top comments (0)