Your single-server VICIdial box just hit a wall. Agents get "time synchronization" errors, the hopper empties faster than it fills, and your real-time report loads like it's 2003. You've Googled "VICIdial cluster setup" and found a 15-year-old PDF, some AI-generated marketing content, and fragmented forum threads.
The core problem: Asterisk doesn't scale vertically. A 64-core server chokes at the same agent count as a quad-core. A single all-in-one VICIdial server maxes out around 20-25 agents for predictive outbound. That's not hardware — it's architecture. The solution is more servers, each doing one job well.
The Four Server Roles
Database server. The brain. Every agent login, disposition, hopper query, and real-time report flows through one MySQL instance. VICIdial uses MyISAM exclusively — not InnoDB. The VICIdial creator has been clear about this: "We do not recommend using InnoDB under any circumstances." Use SSDs at minimum, NVMe for 200+ agents, and an LSI Logic MegaRAID controller specifically. Not 3ware, not Adaptec.
Telephony servers. Each one runs Asterisk and handles calls for a slice of your agents. Golden rule: 20 agents per telephony server for predictive outbound. You can push to 25 on a high-clock quad-core. If you're running answering machine detection, cut that nearly in half — AMD's loopback trunk architecture doubles the channel count.
The critical spec is single-thread CPU performance, not core count. A 4-core at 4.5 GHz beats an 8-core at 2.5 GHz.
Web servers. Stateless HTTP handlers — Apache serving the agent interface and admin panels. One web server handles about 150 agents, dropping to 75 with SSL/TLS. This is the one role where virtualization is genuinely fine.
Archive server. Each telephony server records calls locally. Without a central archive server, recordings scatter across every dialer and playback links randomly return "Object not found."
The Keepalive Flags That Break Every New Cluster
VICIdial uses numbered flags in /etc/astguiclient.conf to control which background processes run on each server. Get these wrong and you'll see duplicate calls, erratic dial levels, or a system that silently stops dialing.
The critical rule: flags 5 and 7 must run on exactly ONE telephony server. Flag 5 is the adaptive predictive algorithm — it calculates dial levels for the entire cluster. Flag 7 is the fill/balance dialer. Run either on two servers simultaneously and you get conflicting dial-level calculations, double-dialed leads, and behavior that'll make you question reality.
Database server gets VARactive_keepalives = X. Web servers get X. Regular telephony servers get 1238. Exactly one telephony server — your "primary" dialer — gets 123456789.
MySQL Tuning That Prevents 3 AM Meltdowns
Add skip-name-resolve to your my.cnf. Without it, MySQL does a reverse DNS lookup on every connection. In a cluster where everything is hammering the database, those lookups create a connection backlog that looks like "too many connections" — except raising max_connections doesn't fix it. This single line has saved more VICIdial clusters than any other config change.
Set max_connections = 4000, key_buffer_size = 640M (4096M for enterprise), table_open_cache = 8192, and query_cache_size = 0 (write invalidation overhead exceeds the benefit for VICIdial's access pattern). Enable concurrent_insert = 2 for MyISAM.
Above 200 agents, convert vicidial_live_agents to the MEMORY engine. This table tracks every agent's real-time state and gets hammered by every dialer, web server, and real-time report simultaneously. On MyISAM, table-level locks create contention. On MEMORY, it runs from RAM. Forum users consistently describe the difference as "night and day."
Archive your logs or die. vicidial_log, call_log, vicidial_carrier_log grow indefinitely. Run ADMIN_archive_log_tables.pl --daily at 1 AM. Un-archived tables are the #1 cause of cascading failures: slow queries cause table locks cause "too many connections" cause cluster-wide outage.
The "No Audio Between Servers" Problem
This is the #1 reported cluster issue. Calls ring, agent picks up, silence. But only on calls crossing servers — same-server calls work fine.
Fix checklist in order of likelihood:
- Add
externipandlocalnetto sip.conf on every dialer. Without these, Asterisk advertises its internal IP in SDP packets and the remote party sends RTP to an unreachable address. - Open UDP 10000-20000 bidirectionally between ALL nodes. SIP on 5060 handles signaling. Actual audio travels on random high UDP ports.
- Set
canreinvite=noanddirectmedia=noso Asterisk keeps media flowing through itself. - Use a private switch between servers with zero firewall. Public NICs face carriers and agents. Private NICs face each other on an unfiltered gigabit switch. This eliminates the entire class of inter-server audio problems.
Capacity Planning From Real Deployments
| Scale | DB | Web | Telephony | Archive | Total |
|---|---|---|---|---|---|
| 50 agents | 1 | 1 (shared w/ DB) | 2-3 | 1 | 4-5 |
| 100 agents | 1 dedicated | 1-2 | 4-5 | 1 | 7-9 |
| 200 agents | 1 beefy | 2-3 | 8-10 | 1 | 12-15 |
| 500 agents | 1 maxed + 1 slave | 4-5 | 20-25 | 2 | 28-33 |
Around 125 agents, MyISAM table-lock contention on vicidial_live_agents starts causing cascading issues. MEMORY table conversion becomes essential. Above 300 agents with aggressive dial ratios, consider splitting into multiple independent clusters rather than scaling one further — one DB outage taking out 300+ agents is scary math.
Virtualization: Settled
The VICIdial creator: "You will actually need more hardware and spend more money when you virtualize. Best case scenario is 50% of normal bare metal capacity."
The nuance: dedicated cloud instances work. Over 100 agents have been confirmed on AWS EC2 dedicated hosts. The distinction is actual hardware with no noisy neighbors. Shared/burstable instances fail because DAHDI needs precise 1ms kernel timer ticks.
Telephony servers: bare metal or dedicated cloud. Database: bare metal preferred. Web servers: virtualization fine. Archive: anything with enough storage.
A full 500-agent cluster on Hetzner bare metal costs roughly $860/month for 14 servers — $1.72 per agent per month for compute. Compare that to hosted dialers at $120-225/agent/month. ViciStack ships pre-configured bare metal clusters with every optimization described here already baked in.
Originally published at https://vicistack.com/blog/vicidial-cluster-guide/
Top comments (0)