Building a reliable video calling platform is one of the toughest challenges in distributed systems. When millions of users rely on your platform to connect in real-time, a single network hiccup can cascade into degraded video quality, dropped frames, or worse, a disconnected call. Understanding how to architect a system that gracefully handles network volatility while supporting multi-party calls, screen sharing, and recording requires thoughtful design across multiple layers.
Architecture Overview
A video calling platform like Zoom needs several interconnected components working in harmony. At the core, you have signaling servers that handle call initiation and coordination using WebSocket or gRPC connections. These servers manage the state of active calls, route participants to the appropriate media servers, and coordinate features like breakout rooms and screen sharing. The signaling layer is intentionally lightweight because its primary job is orchestration, not media transport.
The real magic happens in the media layer. Media servers (often using WebRTC infrastructure) handle the actual audio and video streams. In a multi-party call, these servers employ selective forwarding units (SFUs) rather than mixing everything into a single stream. An SFU receives media from each participant and forwards relevant streams to others, which is more efficient than a full mesh topology where everyone connects to everyone else. For screen sharing, the system treats it as a separate media stream with different quality parameters, allowing HD screen content without impacting video call quality.
Recording and storage components run asynchronously, capturing media streams without impacting real-time call quality. Meanwhile, breakout room functionality is handled by the signaling layer, which essentially creates sub-groups of participants and spins up isolated SFUs for each room. The system also includes a monitoring and analytics layer that continuously tracks network metrics, jitter, packet loss, and latency from each participant's perspective.
Key Design Decisions
The architecture prioritizes adaptability. Rather than maintaining fixed bitrates, the system continuously adjusts encoding parameters based on network conditions. Each client measures its own bandwidth, packet loss, and latency, then reports this information back to the media servers. This feedback loop allows the system to optimize quality in real-time without relying solely on server-side observations.
Design Insight: Handling Poor Internet
When one participant has poor internet connectivity, the system employs several strategies to maintain overall call quality. First, that participant's incoming stream is deprioritized, meaning other participants might receive their video at lower resolution or frame rate while receiving high-quality streams from better-connected users. The media server can intelligently reduce the bitrate of outgoing streams from the poor connection user, and on their end, they might switch to receiving lower resolution from others to reduce their upload/download burden.
The system also uses forward error correction and packet redundancy selectively. For the struggling participant, the SFU might duplicate critical packets to improve resilience against loss. Additionally, the signaling layer might suggest they disable video temporarily or switch to audio-only mode, while still keeping them in the call and the shared screen visible. This graceful degradation ensures that one person's poor connection doesn't ruin the experience for nine others on the call.
Watch the Full Design Process
Want to see how this architecture comes together in real-time? We've captured the entire system design process as an AI generates a professional architecture diagram while thinking through each component and decision. Check it out on your favorite platform:
Try It Yourself
Building complex systems shouldn't require hours of whiteboarding and sketching. Head over to InfraSketch and describe your system in plain English. In seconds, you'll have a professional architecture diagram, complete with a design document. Whether you're designing a video platform, a messaging system, or anything in between, you can iterate on real architectures instantly.
This is Day 47 of a 365-day system design challenge. Each day brings new architectures, new insights, and new ways to think about distributed systems.
Top comments (0)