I recently built a SIP monitoring service that uses eBPF to capture SIP traffic directly in the Linux kernel and export metrics to Prometheus. The entire pipeline from packet to Prometheus metric takes ~3μs in userspace.
Here's how it works and what I learned along the way.
The Problem
Monitoring SIP/VoIP infrastructure at scale requires tracking call success rates, active dialogs, and response codes — without adding latency to the signaling path.
I wanted something that:
- Processes packets in kernel space
- Exports standard Prometheus metrics
- Runs as a single container
- Tracks SIP dialogs per RFC 3261
- Implements RFC 6076 performance metrics (Session Establishment Ratio)
Architecture
SIP Traffic → NIC → eBPF socket filter → ringbuf → Go poller → SIP parser → Prometheus
The eBPF program (written in C) attaches as a socket filter via AF_PACKET. It intercepts UDP packets on configurable SIP ports (default 5060/5061), copies them to a ring buffer, and the Go userspace process polls and parses them.
The C program does three things:
- Parse Ethernet/IP/UDP headers — handles both regular and VLAN-tagged frames
- Filter SIP traffic — checks UDP ports (configurable via environment variables)
- Copy to ringbuf — pushes matching packets to userspace
Loaded via cilium/ebpf — the Go library handles BPF map creation, program loading, and ringbuf polling.
Known limitation: The eBPF verifier doesn't allow variable-length bpf_skb_load_bytes, so I copy packets in 64-byte blocks. Planning to migrate to AF_PACKET with PACKET_RX_RING (mmap) for arbitrary sizes.
The Go Part
The Go side is straightforward:
- Poll ringbuf for new packets
- Parse raw SIP messages (method/status, headers, Call-ID, tags)
- Update Prometheus counters
- Track SIP dialog lifecycle
Dialog Tracking
SIP dialogs are identified by {Call-ID, From tag, To tag}. Tags are sorted lexicographically for consistent IDs.
- Dialog created on
200 OKresponse toINVITE - Dialog terminated on
200 OKresponse toBYE - Expired dialogs cleaned up every 1 second (based on
Session-Expiresheader, default 30 min)
Metrics Exported
~30 Prometheus counters:
-
Per-method:
sip_exporter_invite_total,sip_exporter_bye_total,sip_exporter_register_total, etc. -
Per-status:
sip_exporter_200_total,sip_exporter_404_total,sip_exporter_500_total, etc. -
Session count:
sip_exporter_sessions(active dialogs gauge) -
RFC 6076 SER:
sip_exporter_ser— Session Establishment Ratio
The SER metric is interesting because it follows RFC 6076 exactly:
SER = (INVITE → 200 OK) / (Total INVITE - INVITE → 3xx) × 100
3xx redirects are excluded from the denominator — they're routing instructions, not failures.
Performance
Benchmarks on Intel i7-8665U (userspace only):
| Operation | Latency | Throughput | Memory |
|---|---|---|---|
| Packet parsing (L2→SIP) | ~124 ns | 8M pkt/sec | 32 B/op |
| SIP header parsing | ~1.2 μs | 800k pkt/sec | 350 B/op |
| Full processing (with metrics) | ~3 μs | 300k pkt/sec | 1000 B/op |
These are userspace numbers. Actual latency depends on kernel eBPF overhead and system load.
E2E Testing
E2E tests use SIPp via testcontainers-go to generate real SIP traffic and verify that metrics match expected values. Tests cover success/failure scenarios and validate proper dialog cleanup.
Quick Start
services:
sip-exporter:
image: frzq/sip-exporter:0.5.0
privileged: true
network_mode: host
environment:
- SIP_EXPORTER_INTERFACE=eth0
docker-compose up -d
curl http://localhost:2112/metrics
What's Next
- More RFC 6076 metrics (Session Setup Time, Response Time)
Links
- GitHub: https://github.com/aibudaevv/sip-exporter
-
Docker:
docker pull frzq/sip-exporter:0.5.0
Happy to answer questions about the eBPF integration, SIP dialog state machine, or Prometheus metric design. Drop a comment below!
Top comments (0)