Abdullah al Mubin

Posted on Apr 14

WebRTC Video Calls Explained: Simpler Than You Think

#webdev #frontend #techtalks #web3

Building real-time video apps sounds intimidating at first. You hear terms like SDP, ICE candidates, STUN, TURN, and suddenly it feels like you’re diving into networking hell.

But here’s the truth:

WebRTC becomes much easier when you stop thinking like an engineer and start thinking like a human.

Let’s walk through it as a simple, memorable story.

Content

A Real-World Scenario
Step 1: They Don’t Talk Directly (At First)
Step 2: “Here’s My Setup”
Step 3: The Handshake
Step 4: The Real Problem is Network Barriers
Step 5: Asking for Public Address (STUN)
Step 6: When Direct Connection Fails
Step 7: The Middleman (TURN)
Step 8: The Magic Moment (Video Starts)
Receive Remote Video
The Entire Flow in One View
The Simplest Mental Model
Real Product View (Corporate Meeting)
The Reality Most Tutorials Don’t Tell You
What Professionals Actually Do
Final Thought

A Real-World Scenario

Imagine you’ve built a corporate meeting platform called:

meet123.com

*Rahim *(a software engineer) joins a team meeting with *Aisha *(product manager).

At exactly 10 AM:

Rahim opens: meet123.com/room/team-sync

Aisha opens the same link

They’re now “in the same meeting.”

But… nothing happens yet.

The real question is:

How do their cameras actually connect?

Step 1: They Don’t Talk Directly (At First)

Here’s the key idea:

Browsers don’t connect directly immediately.

Instead:

Rahim connects to a meeting server
Aisha connects to the same server

Think of this server like a meeting coordinator.

This is called the Signaling Server

Step 2: “Here’s My Setup”

Before the meeting starts, both participants share their setups.

Rahim says:

“I’ve got camera, mic, and these formats.”

Aisha says:

“Same here.”

This exchange is called:

SDP (Session Description Protocol)

It includes:

Video/audio formats (codecs)
Resolution
Device capabilities
Network info

Think of it like:

“Here’s how I can communicate.”

Step 3: The Handshake

Now they formally agree on how to communicate.

Flow:

Rahim creates an offer
Sends it to the backend (via WebSocket)
Backend forwards it to Aisha
Aisha creates an answer
Backend sends it back to Rahim

Now both sides agree on:

Formats
Communication rules

But they still aren’t connected yet.

Step 4: The Real Problem is Network Barriers

Here’s where reality hits.

Rahim might be:

On home WiFi
Behind a router
Using a private IP

Aisha might be:

In an office network
Behind a firewall

Problem:

They don’t know how to reach each other.

Step 5: Asking for Public Address (STUN)

To solve this, both ask a helper:

STUN server

Rahim asks:

“What’s my public address?”

STUN replies:

“You appear as: 103.xx.xx.xx:54321”

Aisha does the same.

Now both know how they appear on the internet.

They attempt a direct connection.

Step 6: When Direct Connection Fails

Sometimes it works and sometimes it doesn’t.

Reasons:

Office firewalls block traffic
Strict corporate networks
Mobile data restrictions

So WebRTC needs a fallback.

Step 7: The Middleman (TURN)

If the direct connection fails, both sides use a relay:

TURN server

Now the flow becomes:

Rahim → TURN → Aisha

It’s like saying:

“We can’t connect directly, so let’s use a central meeting room.”

This guarantees connectivity but adds some latency and cost.

Step 8: The Magic Moment (Video Starts)

Once the connection is established:

Camera captures video
Audio is recorded
Data is encoded
Sent over the network
Decoded on the other side
Displayed instantly

Now the meeting is live.

Mapping This to Real Code

1. Turn on Camera & Microphone

const stream = await navigator.mediaDevices.getUserMedia({
  video: true,
  audio: true
})


videoElement.srcObject = stream

2. Create Peer Connection

const pc = new RTCPeerConnection({
  iceServers: [
    { urls: "stun:stun.l.google.com:19302" }
  ]
})

3. Add Media Tracks

stream.getTracks().forEach(track => {
  pc.addTrack(track, stream)
})

4. Create Offer (Rahim)

const offer = await pc.createOffer()
await pc.setLocalDescription(offer)

socket.emit("offer", offer)

5. Aisha Responds with Answer

await pc.setRemoteDescription(offer)

const answer = await pc.createAnswer()
await pc.setLocalDescription(answer)

socket.emit("answer", answer)

6. Rahim Receives Answer

await pc.setRemoteDescription(answer)

7. Exchange ICE Candidates

pc.onicecandidate = (event) => {
  if (event.candidate) {
    socket.emit("ice-candidate", event.candidate)
  }
}

8. Receive Remote Video

pc.ontrack = (event) => {
  remoteVideo.srcObject = event.streams[0]
}

The Entire Flow in One View

Rahim sends offer
Aisha sends answer
Both exchange ICE candidates
Try direct connection (STUN)
Fallback to TURN if needed
Meeting starts

The Simplest Mental Model

If you remember just this, you understand WebRTC:

WebRTC → Direct video pipe
Signaling server → Meeting coordinator
STUN → Finds your public identity
TURN → Backup relay when direct fails

Real Product View (Corporate Meeting)

In your actual app:

Participant:

Clicks “Join Meeting”
Camera starts
Waits in room

Team Members:

Join the same room
Instantly connect

Everything else happens behind the scenes.

The Reality Most Tutorials Don’t Tell You

Raw WebRTC is powerful but:

Hard to scale
Full of edge cases
Painful to debug
Requires TURN infrastructure

This is where most developers get stuck.

What Professionals Actually Do

Instead of building everything manually, most teams use:

LiveKit
Agora
Twilio

With these, your code becomes:

const room = await connect(url, token)
await room.localParticipant.enableCameraAndMicrophone()

No need to manage:

Offers & answers
ICE candidates
Network traversal

Final Thought

At its core, WebRTC is beautifully simple:

Team members trying to join a meeting,
using a coordinator to exchange info,
figuring out how to reach each other,
and using a central room only if needed.

Once you see it this way, everything clicks.

DEV Community