DEV Community

Cover image for 🎤 The Voice Courier: Meet RTP
SIP GAMES
SIP GAMES

Posted on

🎤 The Voice Courier: Meet RTP

“SDP made the rules, RTP plays the game.”


In the previous episode of SIP GAMES, we peeked inside the SDP invite that tells your opponent how you'd like to play: what codecs, what ports, and what IPs. But who actually carries the media?

🎮 Enter RTPReal-time Transport Protocol.


🧳 What is RTP?

Think of RTP as the courier that carries your voice across the network — broken into little time-stamped, sequence-numbered packages.

  • SIP sets up the call
  • SDP describes the media setup
  • RTP sends the actual media (voice/video)

RTP runs on top of UDP (User Datagram Protocol) because it’s fast and tolerant of occasional loss — just like a real conversation.


🧬 RTP Packet Structure

Here’s the basic layout of an RTP packet:

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X|CC|M| PT | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Timestamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Synchronization Source (SSRC) Identifier |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Contributing Source (CSRC) Identifiers (optional) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Payload (audio/video) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Let’s decode this header:


🔍 RTP Header Fields

Field What It Means
V Version (always 2)
P Padding (if extra bytes added)
X Extension header present
CC CSRC count (used in conferencing)
M Marker bit (e.g. start of a talkspurt)
PT Payload type (codec, e.g., 0 = PCMU, 96 = dynamic)
Sequence Number Increments by 1 per packet — used to detect loss
Timestamp Used for media playback timing
SSRC Sender’s unique ID
CSRC IDs of other contributing streams (optional)
Payload Actual audio or video data

🕒 Packetization Time (a.k.a. ptime)

What is packetization time?

It’s the duration of audio in each RTP packet, often advertised in SDP using a=ptime:20 (means 20 ms per packet).

Common values:

Codec Typical ptime Result
PCMU 20 ms 50 packets/sec
Opus Variable Can do 20–60 ms
G.729 20 ms Small, compressed

🧮 Frequency of RTP Transmission

The number of RTP packets per second depends on the codec’s ptime.

Example:

  • If ptime is 20ms, that’s 50 packets/second
  • If it’s 30ms, ~33.3 packets/sec
  • Higher ptime = fewer packets = less overhead
  • Lower ptime = smoother audio but more packets

🧠 Why Do I Care?

If you're implementing RTP or trying to debug call quality:

  • Jitter? Check packet arrival times and timestamps
  • Audio out of sync? Sequence or timestamp mismatch
  • Silence or gaps? Packets lost or arriving too late
  • Wrong codec? Check the Payload Type (PT) field

RTP is everywhere in VoIP — and understanding this header lets you trace, debug, and build your own media streamers.


🛠️ Example: A Real RTP Packet (with G.711)

Let’s say we're using G.711 with 20ms ptime.

  • Payload Type: 0 (PCMU)
  • Sequence Number: 10567
  • Timestamp: 160000
  • SSRC: 0x789ABC
  • Payload: 160 bytes of G.711 data (8-bit PCM at 8000 Hz)

That’s 160 samples × 8 kHz × 20ms = 160 bytes


🎮 TL;DR

  • RTP carries media after SIP/SDP sets things up
  • Each RTP packet has headers: version, PT, seq, timestamp, etc.
  • Ptime defines how much media is in each packet
  • Frequency of packets is based on ptime
  • Use RTP headers to debug and analyze VoIP issues

📦 Up Next in SIP GAMES:

“Spy Tools for VoIP Agents” 🕵️‍♂️

We’ll break down the best open-source tools like Wireshark, sipp, and rtpengine, and show you how to capture, simulate, and troubleshoot your VoIP calls like a pro.

Follow @sip_games to keep leveling up your VoIP game.

Top comments (0)