Jason Shouldice

Posted on Mar 26 • Originally published at vicistack.com

Your Dialer Doesn't Know Which Leads Are Worth Calling. Here's How to Fix That.

#voip #asterisk #sysadmin #devops

Here's something that's true at every outbound call center I've worked with: 60-80% of the leads in the list will never convert. Disconnected numbers, wrong numbers, people who will never buy, numbers that have been called twelve times with no answer. Your agents dial them anyway, because the dialer doesn't know the difference between a lead with a 35% chance of converting and one with a 0.2% chance.

VICIdial's predictive dialer is excellent at keeping agents busy. It's not good at deciding which leads to dial. It works through the hopper in order — list position, maybe randomized, maybe weighted by list priority. But it doesn't know that leads from area code 469 convert at 8.2% while 313 converts at 1.1%. It doesn't know that leads who answered on the first attempt but said "call back later" are 4x more likely to close than leads who went to voicemail three times.

You know all of this intuitively. Your veteran agents know it. But nobody has quantified it, turned it into a number, and fed it back into the dialer.

What You Need

Three months of VICIdial call history (six months is better), Python 3.8+ on any machine that can reach your database, and about four hours. No GPU, no data science team, no PhD.

The Training Data

Pull every lead with a definitive outcome from your database. Join vicidial_list to aggregated call statistics from vicidial_log. For each lead, capture: area code, state, lead source/vendor, total call attempts, human answers, voicemails, no-answers, busys, average talk time, lead age in days, days since last contact, and the target variable — did they convert (SALE, XFER, APPSET) or not.

You need at least 10,000 leads with outcomes for a minimum viable model. 50,000+ gives you a solid one that can capture area-code-level and time-of-day patterns. Under 10,000, keep collecting data for another month — a model trained on thin data gives garbage scores.

Feature Engineering: Where the Value Lives

Raw data isn't useful to a model. You need features that capture behavioral patterns:

Contact rate — fraction of attempts that reached a human. Leads who've answered before are far more likely to answer again. This is consistently the #1 predictive feature.

Engagement score — logarithm of average talk time. A lead who talked for 3 minutes in a prior call has genuine interest. A lead with 8 seconds of average talk time across 5 calls is hanging up on you.

Lead freshness — exponential decay based on age in days. Fresh leads convert at 2-5x the rate of 90-day-old leads. This decay curve is surprisingly consistent across industries.

Attempt saturation — diminishing returns modeled as 1 - exp(-attempts/5). After 6-8 attempts with no human contact, the probability of ever reaching the lead drops below 3%. There's a point where additional attempts cost more than they're worth.

Time-of-day preference — the split of prior attempts across morning, afternoon, and evening. Some leads only answer during their lunch break. Others only pick up after 5 PM.

Area code and source encoding — demographics and lead quality vary enormously by geography and vendor. But be careful: bucket any area code with fewer than 50 leads in your training set as "OTHER" to avoid overfitting on tiny sample sizes.

The Model

Gradient Boosting Classifier from scikit-learn. 300 estimators, max depth 5, learning rate 0.05, subsample 0.8. Nothing exotic — gradient boosting handles mixed feature types well, produces calibrated probabilities, and works out of the box without much hyperparameter tuning.

Train on 80%, test on 20%, stratified by the target variable (your conversion rate is probably 3-8%, so you need stratification to keep the class balance consistent between train and test).

Expected performance: ROC AUC of 0.72-0.82 depending on data volume and feature variety. Cross-validate with 5 folds to make sure you're not overfitting.

The top features are always the same: contact rate, engagement score, lead freshness, attempt saturation, source/vendor quality, and area code demographics. None of this is surprising to experienced outbound operators. What's different is the model quantifies these patterns and produces a single score between 0 and 1 for every lead in your list. Instead of a gut feeling, you have a number. Instead of a veteran agent's intuition that "469 area code leads are better," you have a measured conversion differential that feeds directly into the dialer's hopper priority.

The Lift

Typical results from a scored versus unscored A/B test running identical campaigns, same agents, same hours, same scripts:

Metric	Scored	Control	Improvement
Contact rate	26.6%	22.5%	+18%
Close rate	7.0%	5.2%	+34%
Sales per agent-hour	2.9	1.8	+61%

The top decile of scored leads delivers 3.4x the conversion rate of random dialing. In practical terms: if you only have time to dial 30% of your list today, the model tells you which 30% to dial to capture 66% of the available conversions. That's a massive efficiency gain — your agents spend their talk time on leads that are genuinely more likely to close.

Plugging Scores Into VICIdial

Three approaches, from simplest to most control:

Option A: The rank field. VICIdial's vicidial_list table has a rank column that accepts integers 0-999. Convert the ML score (0.0-1.0) to a rank (0-999) and push it via the VICIdial non-agent API using function=update_lead. Then set the campaign's Lead Order to rank-based sorting in the admin panel. The hopper prioritizes higher ranks.

Option B: List priority buckets. Create four lists with different campaign priorities: HOT (score > 0.30, priority 9), HIGH (0.15-0.30, priority 7), MEDIUM (0.05-0.15, priority 5), LOW (below 0.05, priority 3). Move leads between lists based on scores via API. VICIdial pulls from the highest-priority list first.

Option C: Custom hopper loading. For maximum control, disable native hopper loading and run your own cron script that populates the vicidial_hopper via the API in score order. Most control, most complexity, most things that can break.

Option A is the right starting point for most operations. It takes 2 hours to implement and works with your existing campaign configuration.

Automation

Set up cron jobs to rescore leads daily (6 AM, before the dialing shift starts) and retrain the model weekly with fresh data. Compare each retrained model's AUC against the previous one — if AUC drops more than 0.05, something changed in your business (new lead source, pricing change, best closer quit) and you need to investigate.

Also set a minimum score threshold below which leads don't enter the hopper. If the model says a lead has a 0.3% conversion probability and your cost per dial attempt is $0.15, dialing that lead costs $50 per conversion in dial costs alone. Some leads aren't worth the phone time.

Common Pitfalls

Training on biased data. If your historical data only includes leads dialed during business hours, the model learns that business hours are great — not because of the time, but because that's the only data it has. Make sure your training data has variety in call times, lead sources, and attempt counts.

Overfitting to small area codes. Area codes with fewer than 50 leads produce unreliable conversion estimates. The model might learn that area code 808 has a 50% conversion rate because you called 4 leads from Hawaii and 2 bought. That's noise. Always bucket small sample sizes as "OTHER."

Ignoring model drift. Your model was trained on March data. By June, your lead sources changed, your pricing changed, a competitor launched, and your best closer quit. The model doesn't know any of this. Retrain monthly and compare AUC against the previous version.

Scoring leads with no history. Brand-new leads with zero call attempts have no behavioral features — no contact rate, no talk time, no voicemail history. The model can only use demographic features (area code, state, source). These scores will cluster around the overall conversion rate for their demographics. That's fine — it's still better than random ordering. As leads accumulate history, rescore them and their scores sharpen.

Not setting a minimum score threshold. If a lead has a 0.3% conversion probability and your cost per dial attempt is $0.15, dialing that lead costs $50 per conversion just in phone costs. Some leads genuinely aren't worth dialing. For most operations, the minimum threshold is between 0.02 and 0.05.

The Economics

For a typical B2B outbound campaign with $2,000 average deal value:

Without scoring: 4.5% conversion, $12 cost per sale, 1.8 sales per agent-hour
With scoring (top 50% of leads): 7.2% conversion, $7.50 cost per sale, 2.9 sales per agent-hour

The conversion lift is good. The agent efficiency gain is what pays for the whole thing. More sales per hour means higher commissions, lower turnover, and lower cost per acquisition. The model pays for itself in the first week.

The hidden benefit: agent morale. When agents spend less time dialing dead numbers and more time talking to real prospects, their per-hour commission goes up and their frustration goes down. The lead scoring model doesn't just improve your numbers — it improves the daily experience of working on your floor.

Getting Started: The Minimum Viable Score

If the full pipeline sounds like a lot, here's the minimum viable approach:

Export 3-6 months of completed leads with outcomes from VICIdial (the SQL query takes 10 minutes to write)
Run the feature engineering and gradient boosting training script (copy from the full guide, takes 5 minutes to run)
Score your active leads and assign ranks via the VICIdial API
Set your campaign's Lead Order to rank-based sorting in the admin panel

That's the whole thing. Four steps. You can do it in an afternoon and start seeing results the next dialing day.

The scores won't be perfect on day one. They don't need to be. A model that's right 70% of the time is still dramatically better than random ordering, which is right 0% of the time. As you accumulate more data and retrain monthly, the scores sharpen. By month three, you'll have a well-calibrated model that your team will refuse to dial without.

For the complete implementation with SQL extract queries, the full Python training and scoring scripts, VICIdial API integration code, and cron automation setup, see the full guide at ViciStack.

DEV Community