DEV Community

Cover image for WebRTC Video Calls Explained: Simpler Than You Think
Abdullah al Mubin
Abdullah al Mubin

Posted on

WebRTC Video Calls Explained: Simpler Than You Think

Building real-time video apps sounds intimidating at first. You hear terms like SDP, ICE candidates, STUN, TURN, and suddenly it feels like you’re diving into networking hell.

But here’s the truth:

WebRTC becomes much easier when you stop thinking like an engineer and start thinking like a human.

Let’s walk through it as a simple, memorable story.

Content

  • A Real-World Scenario
  • Step 1: They Don’t Talk Directly (At First)
  • Step 2: “Here’s My Setup”
  • Step 3: The Handshake
  • Step 4: The Real Problem is Network Barriers
  • Step 5: Asking for Public Address (STUN)
  • Step 6: When Direct Connection Fails
  • Step 7: The Middleman (TURN)
  • Step 8: The Magic Moment (Video Starts)
  • Receive Remote Video
  • The Entire Flow in One View
  • The Simplest Mental Model
  • Real Product View (Corporate Meeting)
  • The Reality Most Tutorials Don’t Tell You
  • What Professionals Actually Do
  • Final Thought

A Real-World Scenario

Imagine you’ve built a corporate meeting platform called:

meet123.com

*Rahim *(a software engineer) joins a team meeting with *Aisha *(product manager).

At exactly 10 AM:

Rahim opens: meet123.com/room/team-sync

Aisha opens the same link

They’re now “in the same meeting.”

But… nothing happens yet.

The real question is:

How do their cameras actually connect?

Step 1: They Don’t Talk Directly (At First)

Here’s the key idea:

Browsers don’t connect directly immediately.

Instead:

  • Rahim connects to a meeting server
  • Aisha connects to the same server

Think of this server like a meeting coordinator.

This is called the Signaling Server

Step 2: “Here’s My Setup”

Before the meeting starts, both participants share their setups.

Rahim says:

“I’ve got camera, mic, and these formats.”

Aisha says:

“Same here.”

This exchange is called:

SDP (Session Description Protocol)

It includes:

  • Video/audio formats (codecs)
  • Resolution
  • Device capabilities
  • Network info

Think of it like:

“Here’s how I can communicate.”

Step 3: The Handshake

Now they formally agree on how to communicate.

Flow:

  1. Rahim creates an offer
  2. Sends it to the backend (via WebSocket)
  3. Backend forwards it to Aisha
  4. Aisha creates an answer
  5. Backend sends it back to Rahim

Now both sides agree on:

  • Formats
  • Communication rules

But they still aren’t connected yet.

Step 4: The Real Problem is Network Barriers

Here’s where reality hits.

Rahim might be:

  • On home WiFi
  • Behind a router
  • Using a private IP

Aisha might be:

  • In an office network
  • Behind a firewall

Problem:

They don’t know how to reach each other.

Step 5: Asking for Public Address (STUN)

To solve this, both ask a helper:

STUN server

Rahim asks:

“What’s my public address?”

STUN replies:

“You appear as: 103.xx.xx.xx:54321”

Aisha does the same.

Now both know how they appear on the internet.

They attempt a direct connection.

Step 6: When Direct Connection Fails

Sometimes it works and sometimes it doesn’t.

Reasons:

  • Office firewalls block traffic
  • Strict corporate networks
  • Mobile data restrictions

So WebRTC needs a fallback.

Step 7: The Middleman (TURN)

If the direct connection fails, both sides use a relay:

TURN server

Now the flow becomes:

Rahim → TURN → Aisha

It’s like saying:

“We can’t connect directly, so let’s use a central meeting room.”

This guarantees connectivity but adds some latency and cost.

Step 8: The Magic Moment (Video Starts)

Once the connection is established:

  • Camera captures video
  • Audio is recorded
  • Data is encoded
  • Sent over the network
  • Decoded on the other side
  • Displayed instantly

Now the meeting is live.

Mapping This to Real Code

1. Turn on Camera & Microphone

const stream = await navigator.mediaDevices.getUserMedia({
  video: true,
  audio: true
})


videoElement.srcObject = stream
Enter fullscreen mode Exit fullscreen mode

2. Create Peer Connection

const pc = new RTCPeerConnection({
  iceServers: [
    { urls: "stun:stun.l.google.com:19302" }
  ]
})
Enter fullscreen mode Exit fullscreen mode

3. Add Media Tracks

stream.getTracks().forEach(track => {
  pc.addTrack(track, stream)
})
Enter fullscreen mode Exit fullscreen mode

4. Create Offer (Rahim)

const offer = await pc.createOffer()
await pc.setLocalDescription(offer)

socket.emit("offer", offer)
Enter fullscreen mode Exit fullscreen mode

5. Aisha Responds with Answer

await pc.setRemoteDescription(offer)

const answer = await pc.createAnswer()
await pc.setLocalDescription(answer)

socket.emit("answer", answer)
Enter fullscreen mode Exit fullscreen mode

6. Rahim Receives Answer

await pc.setRemoteDescription(answer)
Enter fullscreen mode Exit fullscreen mode

7. Exchange ICE Candidates

pc.onicecandidate = (event) => {
  if (event.candidate) {
    socket.emit("ice-candidate", event.candidate)
  }
}
Enter fullscreen mode Exit fullscreen mode

8. Receive Remote Video

pc.ontrack = (event) => {
  remoteVideo.srcObject = event.streams[0]
}
Enter fullscreen mode Exit fullscreen mode

The Entire Flow in One View

  • Rahim sends offer
  • Aisha sends answer
  • Both exchange ICE candidates
  • Try direct connection (STUN)
  • Fallback to TURN if needed
  • Meeting starts

The Simplest Mental Model

If you remember just this, you understand WebRTC:

  • WebRTC → Direct video pipe
  • Signaling server → Meeting coordinator
  • STUN → Finds your public identity
  • TURN → Backup relay when direct fails

Real Product View (Corporate Meeting)

In your actual app:

Participant:

  • Clicks “Join Meeting”
  • Camera starts
  • Waits in room

Team Members:

  • Join the same room
  • Instantly connect

Everything else happens behind the scenes.

The Reality Most Tutorials Don’t Tell You

Raw WebRTC is powerful but:

  • Hard to scale
  • Full of edge cases
  • Painful to debug
  • Requires TURN infrastructure

This is where most developers get stuck.

What Professionals Actually Do

Instead of building everything manually, most teams use:

  • LiveKit
  • Agora
  • Twilio

With these, your code becomes:

const room = await connect(url, token)
await room.localParticipant.enableCameraAndMicrophone()
Enter fullscreen mode Exit fullscreen mode

No need to manage:

  • Offers & answers
  • ICE candidates
  • Network traversal

Final Thought

At its core, WebRTC is beautifully simple:

Team members trying to join a meeting,
using a coordinator to exchange info,
figuring out how to reach each other,
and using a central room only if needed.

Once you see it this way, everything clicks.

Top comments (0)