Aleksey Budaev

Posted on Apr 5 • Edited on Jun 20

Building an eBPF-based SIP Monitor in Go

#go #monitoring #voip #prometheus

I recently built a SIP monitoring service that uses eBPF to capture SIP traffic directly in the Linux kernel and export metrics to Prometheus. The entire pipeline from packet to Prometheus metric takes ~3μs in userspace.

Here's how it works and what I learned along the way.

The Problem

Monitoring SIP/VoIP infrastructure at scale requires tracking call success rates, active dialogs, and response codes — without adding latency to the signaling path.

I wanted something that:

Processes packets in kernel space
Exports standard Prometheus metrics
Runs as a single container
Tracks SIP dialogs per RFC 3261
Implements RFC 6076 performance metrics (Session Establishment Ratio)

Architecture

SIP Traffic → NIC → eBPF socket filter → ringbuf → Go poller → SIP parser → Prometheus

The eBPF program (written in C) attaches as a socket filter via AF_PACKET. It intercepts UDP packets on configurable SIP ports (default 5060/5061), copies them to a ring buffer, and the Go userspace process polls and parses them.

The C program does three things:

Parse Ethernet/IP/UDP headers — handles both regular and VLAN-tagged frames
Filter SIP traffic — checks UDP ports (configurable via environment variables)
Copy to ringbuf — pushes matching packets to userspace

Loaded via cilium/ebpf — the Go library handles BPF map creation, program loading, and ringbuf polling.

Known limitation: The eBPF verifier doesn't allow variable-length bpf_skb_load_bytes, so I copy packets in 64-byte blocks. Planning to migrate to AF_PACKET with PACKET_RX_RING (mmap) for arbitrary sizes.

The Go Part

The Go side is straightforward:

Poll ringbuf for new packets
Parse raw SIP messages (method/status, headers, Call-ID, tags)
Update Prometheus counters
Track SIP dialog lifecycle

Dialog Tracking

SIP dialogs are identified by {Call-ID, From tag, To tag}. Tags are sorted lexicographically for consistent IDs.

Dialog created on 200 OK response to INVITE
Dialog terminated on 200 OK response to BYE
Expired dialogs cleaned up every 1 second (based on Session-Expires header, default 30 min)

Metrics Exported

~30 Prometheus counters:

Per-method: sip_exporter_invite_total, sip_exporter_bye_total, sip_exporter_register_total, etc.
Per-status: sip_exporter_200_total, sip_exporter_404_total, sip_exporter_500_total, etc.
Session count: sip_exporter_sessions (active dialogs gauge)
RFC 6076 SER: sip_exporter_ser — Session Establishment Ratio

The SER metric is interesting because it follows RFC 6076 exactly:

SER = (INVITE → 200 OK) / (Total INVITE - INVITE → 3xx) × 100

3xx redirects are excluded from the denominator — they're routing instructions, not failures.

Performance

Benchmarks on Intel i7-8665U (userspace only):

Operation	Latency	Throughput	Memory
Packet parsing (L2→SIP)	~124 ns	8M pkt/sec	32 B/op
SIP header parsing	~1.2 μs	800k pkt/sec	350 B/op
Full processing (with metrics)	~3 μs	300k pkt/sec	1000 B/op

These are userspace numbers. Actual latency depends on kernel eBPF overhead and system load.

E2E Testing

E2E tests use SIPp via testcontainers-go to generate real SIP traffic and verify that metrics match expected values. Tests cover success/failure scenarios and validate proper dialog cleanup.

Quick Start

services:
  sip-exporter:
    image: frzq/sip-exporter:0.5.0
    privileged: true
    network_mode: host
    environment:
      - SIP_EXPORTER_INTERFACE=eth0

docker-compose up -d
curl http://localhost:2112/metrics

What's Next

More RFC 6076 metrics (Session Setup Time, Response Time)

DEV Community