A simple explanation of WebRTC scaling problems and how Mediasoup solves them using SFU architecture.
Introduction
If you tried building a video calling app using WebRTC, you probably thought it was easy.
Until the third or fourth user joins and everything starts breaking.
- Lag increases
- Video freezes
- CPU usage spikes
- Bandwidth explodes
So how do apps like Zoom or Google Meet handle thousands of users smoothly?
They don’t use pure WebRTC.
They use a smarter architecture.
The Simple Promise of WebRTC
WebRTC allows direct browser-to-browser communication.
- No plugins
- No external software
- Ultra-low latency
At its core, it creates a direct connection between users.
Sounds perfect — until it isn’t.
Where Things Break
WebRTC uses a mesh architecture.
That means every user sends video to every other user.
Example:
- 2 users = 1 connection
- 4 users = 6 connections
- 10 users = 45 connections
This grows exponentially.
Mesh Architecture
Mesh architecture: every user connects to every other user, causing exponential connections.
The Real Problem
Every new user multiplies:
- Bandwidth usage
- CPU load
- Network complexity
This is why:
- Browsers crash
- Calls lag
- Systems fail to scale
The Breakthrough: Mediasoup (SFU)
Instead of connecting everyone to everyone, we introduce a server.
This is called an SFU (Selective Forwarding Unit).
New flow:
- User sends stream → server
- Server forwards stream → other users
Now users do not send data to everyone.
SFU Architecture
SFU architecture: a central server forwards streams efficiently.
Why This Changes Everything
Mesh architecture
- Too many connections
- High bandwidth usage
- Not scalable
SFU architecture
- One connection per user
- Efficient forwarding
- Scales to large rooms
This is how real systems are built.
Inside Mediasoup
Mediasoup acts as a media router.
Core components:
Worker
Handles media processing using CPU coresRouter
Represents a roomTransport
Manages WebRTC connectionsProducer
Sends mediaConsumer
Receives media
Every user is both a producer and a consumer.
How a Call Actually Works
- User connects via Socket.IO
- Server creates worker and router
- User joins a room
- User sends video
- Other users receive it
Everything is routed efficiently.
Mediasoup Flow
Mediasoup flow: producer sends media to server, which routes it to consumers.
Tech Stack
- Frontend: React / Next.js
- Backend: Node.js
- Signaling: Socket.IO
- Media: Mediasoup
The Hard Truth
This is not easy.
You will face:
- WebRTC debugging challenges
- NAT traversal (STUN/TURN) issues
- Media synchronization problems
But once you understand it, you can build production-grade systems.
Final Insight
WebRTC gives you real-time communication.
Mediasoup gives you scalability.
That is the difference between:
- A demo project
- A real product
Originally Published
This article was originally written and published on my portfolio:
https://www.lakhsyapurohit.online/blog/real-time-video-with-webrtc-and-mediasoup
If this helped you, consider sharing it 🚀



Top comments (0)