1. What Is WebRTC (Quick Overview)?
WebRTC stands for Web Real-Time Communication — an open standard that enables audio and video streaming directly between browsers and apps without plugins. It’s the foundation of modern video calling on the web because it:
📌 Works in most browsers
📌 Uses real-time protocols (RTP/UDP) for low delay
📌 Secures streams with encryption
📌 Doesn’t require installation of special plugins
But at its core, WebRTC is originally designed for peer-to-peer connections — meaning one peer connects directly to another. This is great for 1-to-1 calls, but becomes complicated with more participants.
2. Peer-to-Peer (P2P) ➝ Mesh Architecture
🌐 How P2P works
Imagine you and one other person want a video call. WebRTC makes a direct connection between your device and theirs. Both devices send and receive streams directly — no server in the middle.
This is ideal for one-to-one video calls:
✔ Low latency
✔ No central server required
✔ No additional cost
🧠 But what if more people join?
If you add a third person, each participant must connect with each other:
A ↔ B
A ↔ C
B ↔ C
That’s 3 connections. If you add a fourth, it becomes more tangled:
6 total connections:
A↔B, A↔C, A↔D,
B↔C, B↔D,
C↔D
This pattern is called a mesh — each peer connects to all others directly.
📉 Problems with Mesh
- 🔄 Bandwidth explosion: Each peer must send its video stream to every other peer — quickly saturating upload bandwidth.
- 🖥 CPU & encoding cost: Each codec needs to encode video multiple times.
- 🧪 Not reliable when peers > ~4–6, especially over mobile or slow networks.
Thus, mesh works only for very small groups (usually up to ~5 participants).
3. Beyond Mesh — Server-Mediated Architectures
To build scalable multi-party calling, we introduce a central media server. This server can relieve peers from uploading to every other peer. There are two major ways to do this:
A. SFU — Selective Forwarding Unit
🧠 What SFU does
With SFU:
- Every peer sends their stream once to the server.
- The SFU forwards streams to all other participants — but it doesn’t decode or re-encode them.
- Each peer receives the streams it wants and renders them.
SFU acts like a traffic hub: one upload from each user, and multiple forwards.
📊 Example
Imagine 5 participants:
You send your stream _once_ → SFU
SFU sends out your video to Bl, B2, B3, B4 → each gets the streams they subscribed to
Each participant still receives (N-1) streams, but they only upload once.
⭐ Advantages of SFU
- 📈 Scales better than mesh — because upload cost on the user side doesn’t explode.
- ⚡ Lower server load — the server just forwards, not processes bits deeply.
- 🎛 Clients can choose which streams to show (e.g., pin a speaker).
- 📱 Supports simulcast (multiple quality layers) — better adapts to bandwidth.
⚠ Limitations
- Still sends multiple streams to each client (could be heavy on download).
- Server introduces another hop — slightly more latency than direct mesh.
B. MCU — Multipoint Control Unit
💡 What MCU does
MCU also receives streams from all peers. But unlike SFU, it decodes and mixes them into a single combined stream:
✔ Every participant receives just one stream — no matter how many others are in call.
✔ MCU handles mixing, layout, encoding, and then sends that one stream to all clients.
🎨 Example
In a call with 5 users:
- Each user sends their stream to the MCU.
- MCU combines all 5 videos into a tiled layout (e.g., a 2×2 grid plus one picture).
- That single mixed video is sent back to each participant.
💎 Advantages of MCU
- 📉 Clients receive only one video stream — minimal CPU & bandwidth.
- 📺 Easy consistent layout for all participants.
- 📼 Good for legacy devices that can’t handle many streams.
🔥 Downsides
- 🧠 Very heavy server processing — mixing + encoding is CPU intensive.
- 💰 Expensive to scale — server resources grow with participants.
- 😴 Less flexible — clients get one view determined by server (can’t rearrange locally).
4. SFU vs MCU — A Quick Comparison
| Aspect | Mesh (P2P) | SFU | MCU | |
|---|---|---|---|---|
| Server Required | ❌ No | ✅ Yes | ✅ Yes | |
| Upload per peer | N-1 streams | 1 stream | 1 stream | |
| Download per peer | N-1 streams | N-1 streams | 1 stream | |
| Server CPU Load | Low | Moderate | Very High | |
| Client CPU Load | High | Moderate | Very Low | |
| Scalability | Poor | High | Moderate-High | |
| Layout Flexibility | High | High | Low |
5. Why SFU Is Dominating Modern Video Apps
Today, services like Zoom, Google Meet, Jitsi, and many WebRTC SaaS platforms rely on SFU for group calls because it:
✔ Offers the best balance between scalability and performance
✔ Allows custom layouts and controls
✔ Supports simulcast adaptation to network conditions
✔ Doesn’t overwhelm the server like classic MCU does ([Clan Meeting][2])
MCU is still used for special cases like webinar broadcasting or legacy device support, but SFU is the most widely deployed.
6. Signaling, STUN & TURN — The Supporting Cast
Real world WebRTC calls don’t magically connect peers:
✔ Signaling
WebRTC uses signaling servers (your app’s backend) to exchange metadata so peers can discover each other and initiate connections.
✔ STUN
Helps discover each peer’s public IP address through NAT.
✔ TURN
Acts as a relay when direct connection isn’t possible (e.g., firewalls).
All of these help establish WebRTC connections before any media is sent.
7. Practical Examples to Visualize
🧑🤝🧑 1-to-1 Call
✔ Mesh / P2P
✔ Direct connection — minimal cost
✔ Best for simple calls
👩👩👦 Small Group (3–6 users)
✔ Mesh still kinda works
✔ But upload & CPU start suffering
🧑💻 Large Group (8–50+ users)
✔ Best with SFU
✔ Each user uploads once, downloads only what they want
✔ Clients can choose video layout
📺 Webinar / Broadcast
✔ MCU or Hybrid
✔ Mixed stream broadcast to many viewers
8. Summary — How WebRTC Makes Video Conferencing Work
- WebRTC enables real-time audio/video streaming in browsers and apps.
- For two peers, direct P2P works fine.
- As participants grow, P2P becomes inefficient (mesh).
- SFU solves this by forwarding streams through a central server with minimal processing.
- MCU mixes all media into one stream but at high server cost.
- Real apps often use hybrid models — e.g., P2P when only 2 users, SFU for groups, and even MCU for broadcasting large sessions.
Top comments (0)