Munna Thakur

Posted on Apr 17

How Voice Data Travels: With Internet vs Without Internet 📞🌐

#architecture #computerscience #networking #webdev

A developer's deep dive into what actually happens when you make a phone call

So you're building a voice call feature in your app. You pick up a library, maybe WebRTC or a third-party SDK, and things just... work. But then a question hits you mid-implementation:

"Wait — how is voice data actually being sent? And how is this different from a regular phone call?"

That exact thought led me down a rabbit hole. This article breaks it all down — in plain English, with real technical depth underneath.

The Big Picture First

When you speak into a phone, your voice is just air vibrations (analog signal). Before it can travel anywhere — through towers or internet — it must be converted into digital data. Both call types do this. The difference is how that data travels afterward.

Your Voice (Analog)
      ↓
  Digitize + Compress
      ↓
  ┌───────────┐         ┌─────────────────┐
  │ No Internet│         │  With Internet  │
  │ GSM/VoLTE │         │  WebRTC/VoIP    │
  └───────────┘         └─────────────────┘

Part 1: Normal Phone Calls (No Internet) 📞

What's happening under the hood?

A regular phone call uses your telecom operator's infrastructure — towers, cables, switching centers — completely independent of the internet.

Step-by-Step Flow

You speak 🎤
    ↓
Microphone captures analog audio
    ↓
ADC (Analog-to-Digital Converter) → digital signal
    ↓
Codec compresses it (AMR / AMR-WB / EVS)
    ↓
Sent to nearest Cell Tower 📡
    ↓
Telecom Core Network (routes the call)
    ↓
Receiver's Cell Tower 📡
    ↓
Receiver's phone decodes → plays audio 🔊

The Codec: AMR (Adaptive Multi-Rate)

This is the compression algorithm used in traditional calls. It's smart — it adapts the bitrate based on network conditions.

AMR Mode	Bitrate	Quality
AMR 4.75	4.75 kbps	Low (weak signal)
AMR 12.2	12.2 kbps	High (strong signal)
AMR-WB (HD Voice)	23.85 kbps	HD quality

What does the data look like?

Under the hood, voice is not sent as one big audio file. It's split into tiny chunks — each chunk represents about 20 milliseconds of audio.

[20ms chunk] → [20ms chunk] → [20ms chunk] → [20ms chunk] → ...
    #1               #2               #3               #4

Each frame looks something like this conceptually:

{
  "type": "voice_frame",
  "codec": "AMR",
  "sequence": 101,
  "timestamp": 2003400,
  "payload": "<compressed binary audio bytes>"
}

⚠️ In reality it's binary, not JSON — but this structure represents what's inside each packet.

Circuit Switching vs VoLTE

Old GSM (2G/3G) → Circuit Switching

A dedicated "pipe" is reserved just for your call
Like booking a private road — no one else uses it during your call
Very stable, but inefficient (resources wasted during silence)

VoLTE (4G/5G) → Packet Switching (but controlled)

Voice is broken into packets like internet data
But the network gives it priority (QoS — Quality of Service)
Lower latency, HD quality, still uses telecom infrastructure

Part 2: Internet Calls (WhatsApp, WebRTC) 🌐

What's happening under the hood?

Apps like WhatsApp, Google Meet, and Discord use the internet to carry voice. The key technology here is WebRTC (Web Real-Time Communication) — an open standard built into browsers and mobile OSes.

Step-by-Step Flow

You speak 🎤
    ↓
Microphone captures analog audio
    ↓
ADC → digital signal
    ↓
Opus Codec compresses it
    ↓
Packetized into UDP packets
    ↓
Sent via Internet (WiFi / 4G / 5G)
    ↓
STUN/TURN Server (for NAT traversal)
    ↓
Peer-to-Peer connection (WebRTC)
    ↓
Receiver reassembles packets → decodes → plays audio 🔊

The Codec: Opus

Opus is the go-to codec for internet voice/audio. It's open-source, low-latency, and adaptive.

Feature	Opus
Bitrate range	6 kbps – 510 kbps
Latency	~20ms
Handles packet loss?	✅ Yes (built-in FEC)
Quality at low bitrate	Excellent
Used by	WhatsApp, Discord, Zoom, WebRTC

Opus has Forward Error Correction (FEC) built in — meaning it sends redundant data so if a packet is lost, it can still reconstruct the audio. That's why internet calls still sound okay even with minor packet loss.

Why UDP and not TCP?

This is one of the most important decisions in real-time audio.

TCP (used in HTTP, file downloads):

Guarantees delivery — if a packet is lost, it resends it
Problem: Resending takes time → delay → unacceptable in real-time voice

UDP (used in WebRTC voice):

No guarantee of delivery
No resending lost packets
But it's fast — packets go out and don't wait

In voice calls, a 200ms old audio packet is useless anyway. Better to skip it and keep playing forward than wait for a retry.

TCP mindset: "Wait, I need packet #47 before I continue"  ❌ (for voice)
UDP mindset: "Packet #47 is gone? Fine, move on."        ✅ (for voice)

How WebRTC Establishes Connection (Simplified)

Signaling — Both peers exchange metadata (IP, codec support) via a server
ICE (Interactive Connectivity Establishment) — Finding the best network path
STUN Server — Figures out your public IP (you're usually behind a router/NAT)
TURN Server — Relays traffic if direct P2P fails (firewall situations)
DTLS Handshake — Encrypted connection established
SRTP — Voice packets flow securely, peer-to-peer

Caller                  Signaling Server               Receiver
  |                           |                            |
  |----offer (SDP)----------->|                            |
  |                           |-------offer (SDP)--------->|
  |                           |<------answer (SDP)---------|
  |<---answer (SDP)-----------|                            |
  |                           |                            |
  |<==================ICE Candidates exchanged============>|
  |                                                        |
  |<================P2P Voice (SRTP/UDP)==================>|

What does the data look like?

{
  "type": "audio_packet",
  "codec": "opus",
  "ssrc": 3892741023,
  "sequence": 4821,
  "timestamp": 96000,
  "payload": "<opus encoded binary>"
}

This is an RTP (Real-time Transport Protocol) packet. WebRTC wraps it in SRTP (Secure RTP) for encryption.

Part 3: Side-by-Side Comparison

Feature	Normal Call 📞	Internet Call 🌐
Network	Telecom (Jio, Airtel)	Internet (WiFi / Mobile data)
Protocol	GSM / VoLTE	WebRTC (RTP over UDP)
Codec	AMR / AMR-WB / EVS	Opus
Latency	~100–150ms	~150–300ms (network-dependent)
Data path	Operator controlled	Peer-to-peer (mostly)
Delivery	Guaranteed (circuit/priority)	Best-effort (UDP)
Encryption	Limited (operator can see)	E2E Encrypted (DTLS + SRTP)
Packet loss handling	Network-level QoS	Opus FEC + NACK
Works without data?	✅ Yes	❌ No
Cost	Per minute or bundled	Uses ~0.3–0.5 MB/min
Emergency calls	✅ Works	❌ Cannot call 112/911

Part 4: Why Voice Sometimes Breaks on Internet Calls 🤖

Ever heard someone sound like a robot during a WhatsApp call? Here's exactly why:

1. Packet Loss

Some UDP packets don't arrive. If too many are lost in a row, the audio decoder has gaps → robotic or stuttering sound.

2. Jitter

Packets arrive out of order or unevenly spaced. WebRTC uses a jitter buffer to smooth this out — but if jitter is too high, the buffer overflows or the audio gets chopped.

Sent:     [P1]--[P2]--[P3]--[P4]--[P5]
Received: [P1]------[P3][P2]----[P5]  ← P4 lost, P2 P3 swapped

3. Network Handoff

When you're moving (driving, walking), your phone switches between towers or WiFi ↔ 4G. During handoff, packets drop → brief audio glitch.

4. Congestion

Your internet is shared. If someone starts a big download in parallel, your voice packets compete for bandwidth → delay spikes.

Part 5: As a Developer — What Should You Know?

If you're building a voice feature, here are the key decisions:

Choosing your approach

Use WebRTC if:

Building for web/mobile app
Need P2P, low cost at scale
Want E2E encryption
Don't need emergency call support

Use VoIP / SIP if:

Need PSTN (real phone number) integration
Need to call regular phones
Enterprise telephony

Use a managed SDK if:

Fast shipping matters
Examples: Twilio, Agora, Daily.co, Vonage

Key WebRTC APIs to know

// Get user's microphone
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });

// Create peer connection
const pc = new RTCPeerConnection({
  iceServers: [{ urls: 'stun:stun.l.google.com:19302' }]
});

// Add audio track to connection
stream.getTracks().forEach(track => pc.addTrack(track, stream));

// Create and send offer
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
// → Send offer to other peer via your signaling server

// When you receive their answer:
await pc.setRemoteDescription(new RTCSessionDescription(answer));

Monitor call quality in real time

// Get audio stats
const stats = await pc.getStats();
stats.forEach(report => {
  if (report.type === 'inbound-rtp' && report.kind === 'audio') {
    console.log('Packets lost:', report.packetsLost);
    console.log('Jitter:', report.jitter);
    console.log('Round trip time:', report.roundTripTime);
  }
});

Quick Summary

Both call types:
  Voice → Digitize → Compress → Send in 20ms chunks → Decode → Play

Without Internet (Normal Call):
  Codec: AMR | Path: Telecom towers | Protocol: GSM/VoLTE | Stable + Guaranteed

With Internet (WhatsApp/WebRTC):
  Codec: Opus | Path: Internet P2P | Protocol: RTP over UDP | Flexible + Encrypted

The biggest conceptual difference:

Normal call = a dedicated pipe reserved just for you (like booking a private road)
Internet call = many small packets racing through shared roads, reassembled on arrival

DEV Community

How Voice Data Travels: With Internet vs Without Internet 📞🌐

The Big Picture First

Part 1: Normal Phone Calls (No Internet) 📞

What's happening under the hood?

Step-by-Step Flow

The Codec: AMR (Adaptive Multi-Rate)

What does the data look like?

Circuit Switching vs VoLTE

Part 2: Internet Calls (WhatsApp, WebRTC) 🌐

What's happening under the hood?

Step-by-Step Flow

The Codec: Opus

Why UDP and not TCP?

How WebRTC Establishes Connection (Simplified)

What does the data look like?

Part 3: Side-by-Side Comparison

Part 4: Why Voice Sometimes Breaks on Internet Calls 🤖

1. Packet Loss

2. Jitter

3. Network Handoff

4. Congestion

Part 5: As a Developer — What Should You Know?

Choosing your approach

Key WebRTC APIs to know

Monitor call quality in real time

Quick Summary

Further Reading

Top comments (0)