“SDP made the rules, RTP plays the game.”
In the previous episode of SIP GAMES, we peeked inside the SDP invite that tells your opponent how you'd like to play: what codecs, what ports, and what IPs. But who actually carries the media?
🎮 Enter RTP — Real-time Transport Protocol.
🧳 What is RTP?
Think of RTP as the courier that carries your voice across the network — broken into little time-stamped, sequence-numbered packages.
- SIP sets up the call
- SDP describes the media setup
- RTP sends the actual media (voice/video)
RTP runs on top of UDP (User Datagram Protocol) because it’s fast and tolerant of occasional loss — just like a real conversation.
🧬 RTP Packet Structure
Here’s the basic layout of an RTP packet:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X|CC|M| PT | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Timestamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Synchronization Source (SSRC) Identifier |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Contributing Source (CSRC) Identifiers (optional) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Payload (audio/video) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Let’s decode this header:
🔍 RTP Header Fields
| Field | What It Means | 
|---|---|
| V | Version (always 2) | 
| P | Padding (if extra bytes added) | 
| X | Extension header present | 
| CC | CSRC count (used in conferencing) | 
| M | Marker bit (e.g. start of a talkspurt) | 
| PT | Payload type (codec, e.g., 0 = PCMU, 96 = dynamic) | 
| Sequence Number | Increments by 1 per packet — used to detect loss | 
| Timestamp | Used for media playback timing | 
| SSRC | Sender’s unique ID | 
| CSRC | IDs of other contributing streams (optional) | 
| Payload | Actual audio or video data | 
🕒 Packetization Time (a.k.a. ptime)
What is packetization time?
It’s the duration of audio in each RTP packet, often advertised in SDP using a=ptime:20 (means 20 ms per packet).
Common values:
| Codec | Typical ptime | Result | 
|---|---|---|
| PCMU | 20 ms | 50 packets/sec | 
| Opus | Variable | Can do 20–60 ms | 
| G.729 | 20 ms | Small, compressed | 
🧮 Frequency of RTP Transmission
The number of RTP packets per second depends on the codec’s ptime.
Example:
- If ptime is 20ms, that’s 50 packets/second
- If it’s 30ms, ~33.3 packets/sec
- Higher ptime = fewer packets = less overhead
- Lower ptime = smoother audio but more packets
🧠 Why Do I Care?
If you're implementing RTP or trying to debug call quality:
- Jitter? Check packet arrival times and timestamps
- Audio out of sync? Sequence or timestamp mismatch
- Silence or gaps? Packets lost or arriving too late
- 
Wrong codec? Check the Payload Type (PT) field
RTP is everywhere in VoIP — and understanding this header lets you trace, debug, and build your own media streamers.
🛠️ Example: A Real RTP Packet (with G.711)
Let’s say we're using G.711 with 20ms ptime.
- Payload Type: 0(PCMU)
- Sequence Number: 10567
- Timestamp: 160000
- SSRC: 0x789ABC
- Payload: 160 bytes of G.711 data (8-bit PCM at 8000 Hz)
That’s 160 samples × 8 kHz × 20ms = 160 bytes
🎮 TL;DR
- RTP carries media after SIP/SDP sets things up
- Each RTP packet has headers: version, PT, seq, timestamp, etc.
- Ptime defines how much media is in each packet
- Frequency of packets is based on ptime
- Use RTP headers to debug and analyze VoIP issues
📦 Up Next in SIP GAMES:
“Spy Tools for VoIP Agents” 🕵️♂️
We’ll break down the best open-source tools like Wireshark, sipp, and rtpengine, and show you how to capture, simulate, and troubleshoot your VoIP calls like a pro.
Follow @sip_games to keep leveling up your VoIP game.
 
 
              
 
    
Top comments (0)