Jason Shouldice

Posted on Mar 26 • Originally published at vicistack.com

How to Build a Real-Time VICIdial Wallboard with Node.js and Grafana

#voip #asterisk #sysadmin #devops

VICIdial logs everything into MySQL. Every dial attempt, every agent state change, every second of talk time, every disposition. The data is there. The problem is getting it onto a screen in a format that tells you something useful before the shift is over.

The built-in realtime_report.php works, but it has scaling issues. Every browser tab running it fires its own MySQL queries on each refresh. Ten managers watching = ten times the database load. No alerting, no historical context, no way to spot a creeping drop rate until you're already past 3%.

This walkthrough covers a two-layer approach: a Node.js WebSocket server that polls MySQL once and pushes snapshots to every connected viewer, paired with Grafana for visual dashboards with time-series charts, threshold gauges, and alerting.

The Architecture

VICIdial MySQL  -->  Node.js (poll every 2-3s)  -->  HTML Wallboard (big TV)
      |
      +-- Grafana (poll every 5-10s) --> Manager laptops, kiosk TVs

The Node.js server hits three table types:

vicidial_live_agents -- MEMORY engine, one row per agent, updated every second by each browser session
vicidial_auto_calls -- MEMORY engine, one row per active call, inserted when calls start, deleted when they end
vicidial_campaign_stats -- pre-aggregated daily totals computed by VICIdial's AST_update.pl every 2-4 seconds

The MEMORY tables live entirely in RAM. Reads take under 1ms. The connection pool is capped at 3 to avoid competing with VICIdial's own cron scripts for MySQL connections.

The WebSocket Server

The server runs on Node.js 20 with two dependencies: ws for WebSocket and mysql2 for the database connection pool. It includes a health check endpoint, graceful shutdown, ping/pong heartbeats to detect dead connections, and optional token authentication.

Five queries run each cycle:

Agent status counts -- how many are talking, waiting, paused, in dispo, per campaign
Agent detail roster -- every logged-in agent with name, status, pause code, call count
Active call summary -- calls in progress grouped by campaign (ringing, live, IVR)
Queue depth -- calls waiting for an agent plus the age of the oldest waiting call
Campaign stats -- today's totals, hourly rolling windows, five-minute snapshots

Total query time is typically 2-8ms. The JSON snapshot gets broadcast to every connected WebSocket client. One poll, unlimited viewers.

The health endpoint at /health returns connection count, last poll latency, and error count. Useful for monitoring with Prometheus, Datadog, or a simple curl in cron.

Grafana Setup

Install Grafana from the official RPM repository on AlmaLinux/Rocky (the ViciBox base). Create a read-only MySQL user with SELECT grants on the specific tables you need:

vicidial_live_agents
vicidial_auto_calls
vicidial_campaign_stats
vicidial_log
vicidial_closer_log
vicidial_agent_log
vicidial_users
vicidial_carrier_log

Set maxOpenConns: 2 in the Grafana data source configuration. Set the minimum refresh interval to 5 seconds in grafana.ini to prevent users from creating 1-second refresh dashboards that hammer the database.

Queries Worth Running

Not every metric deserves a panel. Here is what we actually use on production wallboards, organized by how often they should poll.

Every 2-5 seconds (real-time state tables, tiny and fast):

Agent status by campaign -- how many in each state
Queue depth -- calls waiting per campaign, longest wait time
Queue-to-agent ratio -- calls waiting divided by agents in READY

Every 10-30 seconds (pre-aggregated stats):

Campaign daily totals from vicidial_campaign_stats -- calls, answers, drops, drop rate, dialable leads
System-wide agent counts -- total logged in, on calls, waiting, paused, paused over 10 minutes

Every 60 seconds (log table queries -- always filter by CURDATE()):

Calls per hour today -- the classic hourly volume chart
Agent handle time -- AHT = talk_sec + dispo_sec from vicidial_agent_log
Agent leaderboard ranked by sales
Carrier ASR -- answer-seizure ratio extracted from vicidial_carrier_log
Inbound service level -- percentage of calls answered within 20 seconds

The single most important optimization: always filter log tables with WHERE call_date >= CURDATE(). Without it, an unfiltered GROUP BY on a six-month-old vicidial_log table will scan millions of rows and peg your disk I/O.

Grafana Panel Design

The panels that belong on a VICIdial wallboard:

Agents Talking (stat panel) -- big green number. Threshold colors: yellow under 5, green at 5+. If the floor is running and this is yellow, investigate.

Drop Rate (gauge) -- the compliance panel. Green under 2%, yellow 2-3%, red at 3%+. The FTC Safe Harbor is 3%. When this goes red, reduce the dial level immediately.

Calls Waiting (stat with sparkline) -- green at 0, yellow at 3+, orange at 8+, red at 15+. The sparkline shows if the queue is growing or shrinking.

Agent Status Breakdown (bar gauge) -- horizontal bars showing talking/ready/paused/dispo distribution. Color-coded: green for talking, blue for ready, orange for paused, purple for dispo.

Call Volume (time series) -- hourly buckets for the last 8 hours showing total calls vs. answers. Smooth lines with 20% fill opacity.

Agent Detail (table) -- every agent with color-coded status, campaign, calls today, and seconds in current state. Managers use this to find agents who have been paused too long.

Add a campaign picker template variable that queries active campaigns. Managers can filter the entire dashboard to one campaign or view all at once.

Alerts That Matter

Drop rate over 3% -- FTC compliance. Alert yellow at 2%, red at 3%. Route to every dialer supervisor.

Queue depth spike -- calls waiting exceeds threshold for 30+ seconds. The threshold depends on your staffing model.

Agent idle over 10 minutes -- agent in READY for 10+ minutes means the hopper is empty, dial level too low, or session is stuck.

No agents logged in -- campaign had calls today but zero agents now. Shift coverage gap.

Carrier ASR below 30% -- over the last hour with minimum 50 attempts. Something is broken on the carrier side.

Stale records in vicidial_auto_calls -- calls not updated in 15+ minutes are zombie records. They cause phantom queue depth readings.

Deployment Options

Option 1: systemd on the VICIdial server. Fine for under 100 agents. The Node.js process uses 30-50MB RAM. Run as nobody with ProtectSystem=strict and NoNewPrivileges=true. MySQL connects to localhost with zero network latency.

Option 2: Docker Compose on a separate monitoring VM. For 100+ agents. Grafana and the WebSocket server run in containers. MySQL connects over the LAN to the VICIdial server. Lock down the MySQL port to the dashboard server's IP only.

Put Grafana behind nginx with TLS if it is accessible outside the LAN. For the floor TV, enable anonymous auth with the Viewer role and load the dashboard URL with ?kiosk to hide the toolbar.

Polling Intervals by Deployment Size

Agents	Interval	Notes
Under 30	2 seconds	Tiny tables, negligible load
30-100	3-5 seconds	`vicidial_auto_calls` might have 200 rows at peak
100-300	5 seconds	Watch `SHOW PROCESSLIST` for lock waits
300+	10 seconds	Shift to `vicidial_campaign_stats` instead of counting MEMORY rows

For 200+ agent installs, point the dashboard at a MySQL read replica instead of the primary. Replication lag is under 1 second for MEMORY tables.

Getting Started

Day 1: install Grafana, add MySQL data source, build four stat panels (talking, ready, paused, waiting), set 5-second refresh, put it on a TV. Two hours of work.

Day 2: add drop rate gauge, agent table, call volume chart, and the drop rate alert. That is a production wallboard.

Week 2: deploy the WebSocket server and HTML wallboard on a second TV. Add the leaderboard and carrier ASR panels.

The full Node.js server code, all SQL queries, panel JSON configurations, the systemd service file, Docker Compose setup, and the HTML wallboard client are in the complete guide on our blog.

Need help building dashboards, tuning VICIdial performance, or migrating to a modern deployment? Contact ViciStack for a free operations review. We typically see a 30-50% improvement in agent productivity within the first two weeks.

DEV Community