Jason Shouldice

Posted on Mar 25 • Originally published at vicistack.com

When Your VICIdial Outgrows One Asterisk Box: A Kamailio Load Balancing Primer

#voip #asterisk #sysadmin #devops

A well-tuned Asterisk server handles 300+ concurrent calls. The problem at 100+ agents isn't capacity -- it's fault tolerance. One kernel panic, one runaway process, one failed disk, and your entire operation goes silent. No graceful degradation. Just dead air and a room full of agents with nothing to do.

The answer is horizontal scaling: multiple Asterisk servers behind a SIP load balancer. In the VICIdial ecosystem, that load balancer is Kamailio. This guide covers why you need it, how to configure it, and the practical scaling roadmap from 50 agents to 500+.

What Breaks Without a Load Balancer

VICIdial's built-in multi-server support handles database replication and web interface distribution well. But for SIP traffic -- the actual calls -- it relies on Asterisk's native capabilities, which were not designed for clustered operation.

Problem 1: Uneven call distribution. VICIdial assigns calls to servers based on campaign config. It distributes leads across servers, but the distribution tracks lead assignment, not real-time server load. Server A ends up with 180 concurrent calls while Server B sits at 60.

Problem 2: No graceful failure. Asterisk crashes on Server A. Calls in progress are lost. New calls keep hitting Server A until VICIdial's keepalive detects the failure -- 30 to 60 seconds later. Every call in that window fails silently.

Problem 3: Single point of entry for inbound. Inbound calls from your carrier hit a single IP address. If that server is overloaded or down, the carrier gets a SIP error and the call is gone.

Problem 4: Codec transcoding bottleneck. If different carriers use different codecs, Asterisk must transcode. A single G.729 to G.711 conversion uses roughly 10x the CPU of a native G.711 call. Under heavy load on one server, transcoding causes audio quality issues across all calls on that box.

Kamailio fixes all of these. It sits in front of your Asterisk servers as a SIP proxy, distributing calls based on configurable algorithms, monitoring server health, and transparently rerouting traffic when servers fail.

The Architecture

                  +-----------+
SIP Carriers ---->| Kamailio  |
                  | Port 5060 |
                  +-----+-----+
                        |
            +-----------+-----------+
            |           |           |
      +-----+---+ +----+----+ +----+----+
      |Asterisk | |Asterisk | |Asterisk |
      |Server 1 | |Server 2 | |Server 3 |
      |Port 5080| |Port 5080| |Port 5080|
      +----+----+ +----+----+ +----+----+
                        |
                  +-----+-----+
                  |  VICIdial  |
                  |  Database  |
                  +-----------+

Kamailio listens on port 5060 (standard SIP) and proxies calls to Asterisk servers listening on 5080. Each Asterisk server connects to the shared VICIdial MySQL database for routing, agent assignment, and CDR logging.

Installing and Configuring the Dispatcher

On CentOS/RHEL:

cat > /etc/yum.repos.d/kamailio.repo << 'REPO'
[kamailio]
name=Kamailio packages
baseurl=https://rpm.kamailio.org/stable/el9/
gpgcheck=0
enabled=1
REPO

yum install -y kamailio kamailio-mysql kamailio-utils kamailio-tls

The Dispatcher List

Create /etc/kamailio/dispatcher.list:

# setid destination flags priority attributes

# Set 1: Outbound Asterisk servers
1 sip:10.0.0.11:5080 0 10 weight=50;duid=ast1
1 sip:10.0.0.12:5080 0 5  weight=30;duid=ast2
1 sip:10.0.0.13:5080 0 3  weight=20;duid=ast3

# Set 2: Inbound Asterisk servers
2 sip:10.0.0.11:5080 0 10 weight=40;duid=ast1
2 sip:10.0.0.12:5080 0 10 weight=40;duid=ast2
2 sip:10.0.0.13:5080 0 10 weight=20;duid=ast3

Each line defines:

setid -- Group identifier. Different sets for different routing (outbound vs inbound).
destination -- SIP URI of the Asterisk server. Internal IP, port 5080.
flags -- 0 for normal. 1 for inactive (maintenance).
priority -- Higher = higher priority for priority-based routing.
attributes -- Weight controls proportional distribution. duid is a unique ID for logging.

Critical Kamailio Parameters

# Dispatcher module parameters
modparam("dispatcher", "list_file", "/etc/kamailio/dispatcher.list")
modparam("dispatcher", "ds_probing_mode", 1)        # Enable health probing
modparam("dispatcher", "ds_ping_interval", 15)       # Probe every 15 seconds
modparam("dispatcher", "ds_probing_threshold", 3)    # 3 failures = inactive
modparam("dispatcher", "ds_ping_reply_codes", "class2;class3;class4")

The routing logic detects call direction by source IP. Calls from known carrier IPs go through Set 2 (inbound). Everything else goes through Set 1 (outbound). Algorithm 4 (weighted round-robin) distributes proportionally by weight.

The failure route is critical: if the selected server returns a 500 or 503, Kamailio marks it as suspect and tries the next server in the set. If all servers are exhausted, it returns 503 Service Unavailable.

Health Probing: The Part That Saves You

When ds_probing_mode=1, Kamailio sends SIP OPTIONS to each server at the configured interval. If a server fails to respond ds_probing_threshold consecutive times, Kamailio marks it inactive and stops sending calls. When it starts responding again, automatic re-activation.

Tuning Probe Parameters

# Aggressive: fast failover, risk of false positives
ds_ping_interval = 10
ds_probing_threshold = 2
# Failover: 10s * 2 = 20 seconds worst case

# Conservative: slower failover, no false positives
ds_ping_interval = 30
ds_probing_threshold = 3
# Failover: 30s * 3 = 90 seconds worst case

For 100+ agent centers, 15-second intervals with threshold 3 gives failover within 45 seconds while avoiding false positives from network hiccups.

Monitoring and Maintenance

Check dispatcher state:

kamcmd dispatcher.list
# FLAGS: A=Active, P=Probing, I=Inactive, D=Disabled

Take a server out for maintenance without affecting active calls:

kamcmd dispatcher.set_state i 1 sip:10.0.0.13:5080
# ...maintenance...
kamcmd dispatcher.set_state a 1 sip:10.0.0.13:5080

Weighted Round-Robin vs Load-Based

Algorithm 4 (weighted round-robin): Distributes calls proportionally by weight. Server with weight=50 gets ~50% of calls. Best for heterogeneous hardware -- 32-core server gets double the weight of 16-core.

Algorithm 10 (call-load based): Tracks active calls per server and routes to the least-loaded, adjusted by weight. Best for identical hardware where you want distribution based on actual load rather than static weights.

Start with algorithm 4. It's simpler, more predictable, and easier to debug. Switch to 10 if you see uneven load due to varying call durations or mixed inbound/outbound patterns.

Scaling Roadmap

Stage 1: Single Server (Up to 50 Agents)

No Kamailio needed. Focus on optimizing that server: AMD tuning, carrier config, agent settings.

Stage 2: Dual Server + Kamailio (50-100 Agents)

Add a second Asterisk server. Kamailio can run on one of them or on a separate lightweight VM.

1 sip:10.0.0.11:5080 0 10 weight=50;duid=ast1
1 sip:10.0.0.12:5080 0 10 weight=50;duid=ast2

Equal weights for identical hardware. Active-active with automatic failover.

Stage 3: Three Servers + Dedicated Kamailio (100-200 Agents)

Move Kamailio to its own server (2 CPU cores, 2GB RAM is plenty). Add a third Asterisk server.

Stage 4: Five+ Servers + Redundant Kamailio (200-500 Agents)

Two Kamailio instances behind a virtual IP using keepalived/VRRP for load balancer redundancy. Five or more Asterisk servers in the dispatch pool.

Stage 5: 10 Servers and Beyond (500+ Agents)

Separate dispatcher sets for inbound vs outbound. Geographic distribution with DNS-based failover between data centers. Dedicated media servers (rtpengine) separate from signaling. Database read replicas for reporting.

Performance Considerations

Kamailio is extremely efficient. It handles SIP signaling only, not audio. Resource requirements scale with calls per second (CPS), not concurrent calls.

Call Volume	CPU	RAM
Up to 50 CPS	2 cores	2 GB
50-200 CPS	4 cores	4 GB
200-500 CPS	8 cores	8 GB

A 100-agent center doing 200 dials/hour/agent generates approximately 5.5 CPS. A minimal Kamailio deployment handles that with negligible CPU.

Kamailio adds under 1ms latency to SIP signaling. Keep Kamailio and Asterisk on the same LAN segment to avoid introducing latency that affects call setup time and AMD detection accuracy.

By default, RTP (audio) flows directly between the carrier and Asterisk, bypassing Kamailio. This is optimal. Only proxy RTP through Kamailio if you need NAT traversal or media recording at the proxy level.

The hard part isn't running Kamailio -- it's getting the configuration right so you don't end up with one-way audio, dropped calls, or incorrect CDR logging. ViciStack has deployed Kamailio-based VICIdial architectures for centers ranging from 50 to 500+ agents, with health monitoring and automatic failover included as part of every managed deployment.

Originally published at https://vicistack.com/blog/vicidial-kamailio-load-balancing/

DEV Community