DEV Community

Munna Thakur
Munna Thakur

Posted on

How Voice Data Travels: With Internet vs Without Internet πŸ“žπŸŒ

A developer's deep dive into what actually happens when you make a phone call


So you're building a voice call feature in your app. You pick up a library, maybe WebRTC or a third-party SDK, and things just... work. But then a question hits you mid-implementation:

"Wait β€” how is voice data actually being sent? And how is this different from a regular phone call?"

That exact thought led me down a rabbit hole. This article breaks it all down β€” in plain English, with real technical depth underneath.


The Big Picture First

When you speak into a phone, your voice is just air vibrations (analog signal). Before it can travel anywhere β€” through towers or internet β€” it must be converted into digital data. Both call types do this. The difference is how that data travels afterward.

Your Voice (Analog)
      ↓
  Digitize + Compress
      ↓
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ No Internetβ”‚         β”‚  With Internet  β”‚
  β”‚ GSM/VoLTE β”‚         β”‚  WebRTC/VoIP    β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Enter fullscreen mode Exit fullscreen mode

Part 1: Normal Phone Calls (No Internet) πŸ“ž

What's happening under the hood?

A regular phone call uses your telecom operator's infrastructure β€” towers, cables, switching centers β€” completely independent of the internet.

Step-by-Step Flow

You speak 🎀
    ↓
Microphone captures analog audio
    ↓
ADC (Analog-to-Digital Converter) β†’ digital signal
    ↓
Codec compresses it (AMR / AMR-WB / EVS)
    ↓
Sent to nearest Cell Tower πŸ“‘
    ↓
Telecom Core Network (routes the call)
    ↓
Receiver's Cell Tower πŸ“‘
    ↓
Receiver's phone decodes β†’ plays audio πŸ”Š
Enter fullscreen mode Exit fullscreen mode

The Codec: AMR (Adaptive Multi-Rate)

This is the compression algorithm used in traditional calls. It's smart β€” it adapts the bitrate based on network conditions.

AMR Mode Bitrate Quality
AMR 4.75 4.75 kbps Low (weak signal)
AMR 12.2 12.2 kbps High (strong signal)
AMR-WB (HD Voice) 23.85 kbps HD quality

What does the data look like?

Under the hood, voice is not sent as one big audio file. It's split into tiny chunks β€” each chunk represents about 20 milliseconds of audio.

[20ms chunk] β†’ [20ms chunk] β†’ [20ms chunk] β†’ [20ms chunk] β†’ ...
    #1               #2               #3               #4
Enter fullscreen mode Exit fullscreen mode

Each frame looks something like this conceptually:

{
  "type": "voice_frame",
  "codec": "AMR",
  "sequence": 101,
  "timestamp": 2003400,
  "payload": "<compressed binary audio bytes>"
}
Enter fullscreen mode Exit fullscreen mode

⚠️ In reality it's binary, not JSON β€” but this structure represents what's inside each packet.

Circuit Switching vs VoLTE

Old GSM (2G/3G) β†’ Circuit Switching

  • A dedicated "pipe" is reserved just for your call
  • Like booking a private road β€” no one else uses it during your call
  • Very stable, but inefficient (resources wasted during silence)

VoLTE (4G/5G) β†’ Packet Switching (but controlled)

  • Voice is broken into packets like internet data
  • But the network gives it priority (QoS β€” Quality of Service)
  • Lower latency, HD quality, still uses telecom infrastructure

Part 2: Internet Calls (WhatsApp, WebRTC) 🌐

What's happening under the hood?

Apps like WhatsApp, Google Meet, and Discord use the internet to carry voice. The key technology here is WebRTC (Web Real-Time Communication) β€” an open standard built into browsers and mobile OSes.

Step-by-Step Flow

You speak 🎀
    ↓
Microphone captures analog audio
    ↓
ADC β†’ digital signal
    ↓
Opus Codec compresses it
    ↓
Packetized into UDP packets
    ↓
Sent via Internet (WiFi / 4G / 5G)
    ↓
STUN/TURN Server (for NAT traversal)
    ↓
Peer-to-Peer connection (WebRTC)
    ↓
Receiver reassembles packets β†’ decodes β†’ plays audio πŸ”Š
Enter fullscreen mode Exit fullscreen mode

The Codec: Opus

Opus is the go-to codec for internet voice/audio. It's open-source, low-latency, and adaptive.

Feature Opus
Bitrate range 6 kbps – 510 kbps
Latency ~20ms
Handles packet loss? βœ… Yes (built-in FEC)
Quality at low bitrate Excellent
Used by WhatsApp, Discord, Zoom, WebRTC

Opus has Forward Error Correction (FEC) built in β€” meaning it sends redundant data so if a packet is lost, it can still reconstruct the audio. That's why internet calls still sound okay even with minor packet loss.

Why UDP and not TCP?

This is one of the most important decisions in real-time audio.

TCP (used in HTTP, file downloads):

  • Guarantees delivery β€” if a packet is lost, it resends it
  • Problem: Resending takes time β†’ delay β†’ unacceptable in real-time voice

UDP (used in WebRTC voice):

  • No guarantee of delivery
  • No resending lost packets
  • But it's fast β€” packets go out and don't wait

In voice calls, a 200ms old audio packet is useless anyway. Better to skip it and keep playing forward than wait for a retry.

TCP mindset: "Wait, I need packet #47 before I continue"  ❌ (for voice)
UDP mindset: "Packet #47 is gone? Fine, move on."        βœ… (for voice)
Enter fullscreen mode Exit fullscreen mode

How WebRTC Establishes Connection (Simplified)

  1. Signaling β€” Both peers exchange metadata (IP, codec support) via a server
  2. ICE (Interactive Connectivity Establishment) β€” Finding the best network path
  3. STUN Server β€” Figures out your public IP (you're usually behind a router/NAT)
  4. TURN Server β€” Relays traffic if direct P2P fails (firewall situations)
  5. DTLS Handshake β€” Encrypted connection established
  6. SRTP β€” Voice packets flow securely, peer-to-peer
Caller                  Signaling Server               Receiver
  |                           |                            |
  |----offer (SDP)----------->|                            |
  |                           |-------offer (SDP)--------->|
  |                           |<------answer (SDP)---------|
  |<---answer (SDP)-----------|                            |
  |                           |                            |
  |<==================ICE Candidates exchanged============>|
  |                                                        |
  |<================P2P Voice (SRTP/UDP)==================>|
Enter fullscreen mode Exit fullscreen mode

What does the data look like?

{
  "type": "audio_packet",
  "codec": "opus",
  "ssrc": 3892741023,
  "sequence": 4821,
  "timestamp": 96000,
  "payload": "<opus encoded binary>"
}
Enter fullscreen mode Exit fullscreen mode

This is an RTP (Real-time Transport Protocol) packet. WebRTC wraps it in SRTP (Secure RTP) for encryption.


Part 3: Side-by-Side Comparison

Feature Normal Call πŸ“ž Internet Call 🌐
Network Telecom (Jio, Airtel) Internet (WiFi / Mobile data)
Protocol GSM / VoLTE WebRTC (RTP over UDP)
Codec AMR / AMR-WB / EVS Opus
Latency ~100–150ms ~150–300ms (network-dependent)
Data path Operator controlled Peer-to-peer (mostly)
Delivery Guaranteed (circuit/priority) Best-effort (UDP)
Encryption Limited (operator can see) E2E Encrypted (DTLS + SRTP)
Packet loss handling Network-level QoS Opus FEC + NACK
Works without data? βœ… Yes ❌ No
Cost Per minute or bundled Uses ~0.3–0.5 MB/min
Emergency calls βœ… Works ❌ Cannot call 112/911

Part 4: Why Voice Sometimes Breaks on Internet Calls πŸ€–

Ever heard someone sound like a robot during a WhatsApp call? Here's exactly why:

1. Packet Loss

Some UDP packets don't arrive. If too many are lost in a row, the audio decoder has gaps β†’ robotic or stuttering sound.

2. Jitter

Packets arrive out of order or unevenly spaced. WebRTC uses a jitter buffer to smooth this out β€” but if jitter is too high, the buffer overflows or the audio gets chopped.

Sent:     [P1]--[P2]--[P3]--[P4]--[P5]
Received: [P1]------[P3][P2]----[P5]  ← P4 lost, P2 P3 swapped
Enter fullscreen mode Exit fullscreen mode

3. Network Handoff

When you're moving (driving, walking), your phone switches between towers or WiFi ↔ 4G. During handoff, packets drop β†’ brief audio glitch.

4. Congestion

Your internet is shared. If someone starts a big download in parallel, your voice packets compete for bandwidth β†’ delay spikes.


Part 5: As a Developer β€” What Should You Know?

If you're building a voice feature, here are the key decisions:

Choosing your approach

Use WebRTC if:

  • Building for web/mobile app
  • Need P2P, low cost at scale
  • Want E2E encryption
  • Don't need emergency call support

Use VoIP / SIP if:

  • Need PSTN (real phone number) integration
  • Need to call regular phones
  • Enterprise telephony

Use a managed SDK if:

  • Fast shipping matters
  • Examples: Twilio, Agora, Daily.co, Vonage

Key WebRTC APIs to know

// Get user's microphone
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });

// Create peer connection
const pc = new RTCPeerConnection({
  iceServers: [{ urls: 'stun:stun.l.google.com:19302' }]
});

// Add audio track to connection
stream.getTracks().forEach(track => pc.addTrack(track, stream));

// Create and send offer
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
// β†’ Send offer to other peer via your signaling server

// When you receive their answer:
await pc.setRemoteDescription(new RTCSessionDescription(answer));
Enter fullscreen mode Exit fullscreen mode

Monitor call quality in real time

// Get audio stats
const stats = await pc.getStats();
stats.forEach(report => {
  if (report.type === 'inbound-rtp' && report.kind === 'audio') {
    console.log('Packets lost:', report.packetsLost);
    console.log('Jitter:', report.jitter);
    console.log('Round trip time:', report.roundTripTime);
  }
});
Enter fullscreen mode Exit fullscreen mode

Quick Summary

Both call types:
  Voice β†’ Digitize β†’ Compress β†’ Send in 20ms chunks β†’ Decode β†’ Play

Without Internet (Normal Call):
  Codec: AMR | Path: Telecom towers | Protocol: GSM/VoLTE | Stable + Guaranteed

With Internet (WhatsApp/WebRTC):
  Codec: Opus | Path: Internet P2P | Protocol: RTP over UDP | Flexible + Encrypted
Enter fullscreen mode Exit fullscreen mode

The biggest conceptual difference:

  • Normal call = a dedicated pipe reserved just for you (like booking a private road)
  • Internet call = many small packets racing through shared roads, reassembled on arrival

Further Reading


If this helped you understand what's actually happening under the hood when you make a call, drop a ❀️. And if you're building something with WebRTC, feel free to ask questions in the comments!

Tags: #webrtc #voip #networking #javascript #webdev #beginners

Top comments (0)