Kader Khan

Posted on Jan 6

WebRTC P2P vs MCU vs SFU

#systemdesign #devops #webrtc #webdev

1. What Is WebRTC (Quick Overview)?

WebRTC stands for Web Real-Time Communication — an open standard that enables audio and video streaming directly between browsers and apps without plugins. It’s the foundation of modern video calling on the web because it:

📌 Works in most browsers
📌 Uses real-time protocols (RTP/UDP) for low delay
📌 Secures streams with encryption
📌 Doesn’t require installation of special plugins

But at its core, WebRTC is originally designed for peer-to-peer connections — meaning one peer connects directly to another. This is great for 1-to-1 calls, but becomes complicated with more participants.

2. Peer-to-Peer (P2P) ➝ Mesh Architecture

🌐 How P2P works

Imagine you and one other person want a video call. WebRTC makes a direct connection between your device and theirs. Both devices send and receive streams directly — no server in the middle.

This is ideal for one-to-one video calls:

✔ Low latency
✔ No central server required
✔ No additional cost

🧠 But what if more people join?

If you add a third person, each participant must connect with each other:

A ↔ B
A ↔ C
B ↔ C

That’s 3 connections. If you add a fourth, it becomes more tangled:

6 total connections:
A↔B, A↔C, A↔D,
B↔C, B↔D,
C↔D

This pattern is called a mesh — each peer connects to all others directly.

📉 Problems with Mesh

🔄 Bandwidth explosion: Each peer must send its video stream to every other peer — quickly saturating upload bandwidth.
🖥 CPU & encoding cost: Each codec needs to encode video multiple times.
🧪 Not reliable when peers > ~4–6, especially over mobile or slow networks.

Thus, mesh works only for very small groups (usually up to ~5 participants).

3. Beyond Mesh — Server-Mediated Architectures

To build scalable multi-party calling, we introduce a central media server. This server can relieve peers from uploading to every other peer. There are two major ways to do this:

A. SFU — Selective Forwarding Unit

🧠 What SFU does

With SFU:

Every peer sends their stream once to the server.
The SFU forwards streams to all other participants — but it doesn’t decode or re-encode them.
Each peer receives the streams it wants and renders them.

SFU acts like a traffic hub: one upload from each user, and multiple forwards.

📊 Example

Imagine 5 participants:

You send your stream _once_ → SFU  
SFU sends out your video to Bl, B2, B3, B4 → each gets the streams they subscribed to

Each participant still receives (N-1) streams, but they only upload once.

⭐ Advantages of SFU

📈 Scales better than mesh — because upload cost on the user side doesn’t explode.
⚡ Lower server load — the server just forwards, not processes bits deeply.
🎛 Clients can choose which streams to show (e.g., pin a speaker).
📱 Supports simulcast (multiple quality layers) — better adapts to bandwidth.

⚠ Limitations

Still sends multiple streams to each client (could be heavy on download).
Server introduces another hop — slightly more latency than direct mesh.

B. MCU — Multipoint Control Unit

💡 What MCU does

MCU also receives streams from all peers. But unlike SFU, it decodes and mixes them into a single combined stream:

✔ Every participant receives just one stream — no matter how many others are in call.
✔ MCU handles mixing, layout, encoding, and then sends that one stream to all clients.

🎨 Example

In a call with 5 users:

Each user sends their stream to the MCU.
MCU combines all 5 videos into a tiled layout (e.g., a 2×2 grid plus one picture).
That single mixed video is sent back to each participant.

💎 Advantages of MCU

📉 Clients receive only one video stream — minimal CPU & bandwidth.
📺 Easy consistent layout for all participants.
📼 Good for legacy devices that can’t handle many streams.

🔥 Downsides

🧠 Very heavy server processing — mixing + encoding is CPU intensive.
💰 Expensive to scale — server resources grow with participants.
😴 Less flexible — clients get one view determined by server (can’t rearrange locally).

4. SFU vs MCU — A Quick Comparison

Aspect	Mesh (P2P)	SFU	MCU
Server Required	❌ No	✅ Yes	✅ Yes
Upload per peer	N-1 streams	1 stream	1 stream
Download per peer	N-1 streams	N-1 streams	1 stream
Server CPU Load	Low	Moderate	Very High
Client CPU Load	High	Moderate	Very Low
Scalability	Poor	High	Moderate-High
Layout Flexibility	High	High	Low

5. Why SFU Is Dominating Modern Video Apps

Today, services like Zoom, Google Meet, Jitsi, and many WebRTC SaaS platforms rely on SFU for group calls because it:

✔ Offers the best balance between scalability and performance
✔ Allows custom layouts and controls
✔ Supports simulcast adaptation to network conditions
✔ Doesn’t overwhelm the server like classic MCU does ([Clan Meeting][2])

MCU is still used for special cases like webinar broadcasting or legacy device support, but SFU is the most widely deployed.

6. Signaling, STUN & TURN — The Supporting Cast

Real world WebRTC calls don’t magically connect peers:

✔ Signaling

WebRTC uses signaling servers (your app’s backend) to exchange metadata so peers can discover each other and initiate connections.

✔ STUN

Helps discover each peer’s public IP address through NAT.

✔ TURN

Acts as a relay when direct connection isn’t possible (e.g., firewalls).

All of these help establish WebRTC connections before any media is sent.

7. Practical Examples to Visualize

🧑‍🤝‍🧑 1-to-1 Call

✔ Mesh / P2P
✔ Direct connection — minimal cost
✔ Best for simple calls

👩‍👩‍👦 Small Group (3–6 users)

✔ Mesh still kinda works
✔ But upload & CPU start suffering

🧑‍💻 Large Group (8–50+ users)

✔ Best with SFU
✔ Each user uploads once, downloads only what they want
✔ Clients can choose video layout

📺 Webinar / Broadcast

✔ MCU or Hybrid
✔ Mixed stream broadcast to many viewers

8. Summary — How WebRTC Makes Video Conferencing Work

WebRTC enables real-time audio/video streaming in browsers and apps.
For two peers, direct P2P works fine.
As participants grow, P2P becomes inefficient (mesh).
SFU solves this by forwarding streams through a central server with minimal processing.
MCU mixes all media into one stream but at high server cost.
Real apps often use hybrid models — e.g., P2P when only 2 users, SFU for groups, and even MCU for broadcasting large sessions.

DEV Community