Traditional AMD vs AI AMD: When the Upgrade Actually Pays for Itself

#voip #asterisk #sysadmin #devops

VICIdial's built-in AMD is a word counter. It listens to the first few seconds of a call, counts sound events, measures silence gaps, and guesses. AI-based AMD feeds the audio into a neural network trained on millions of recordings. One costs nothing. The other costs $1,500-9,000/month. The question isn't which is "better" -- it's whether the better one pays for itself in your specific operation.

After deploying both approaches across 100+ VICIdial call centers, here's where the lines cross.

How Stock AMD Decides

Asterisk's AMD() application monitors audio energy levels. Sound above the silence threshold (default 256) counts as a "word" -- a continuous voice segment. Silence between sounds marks word boundaries. If the word count exceeds maximumNumberOfWords (typically 3-4), it's classified as machine. If a single word exceeds maximumWordLength, also machine. If a short utterance followed by silence (the person waiting for a response), it's human.

With default parameters: 75-82% accuracy, 15-25% false positive rate. With per-carrier tuning and adaptive thresholds: 85-92% accuracy, 4-8% false positives. Detection happens in 500-2000ms depending on audio clarity.

What Stock AMD Cannot Do

It has zero semantic understanding. It doesn't know what's being said -- only how many "words" there are and how long they last. A human saying "Hello, this is Michael from accounting" (6 words) is indistinguishable from a short voicemail greeting to a word counter. Cell phone voicemails that start conversationally ("Hey, it's Sarah, leave a message") sound identical to a human answering casually ("Hey, it's Sarah, what's up").

It's also sensitive to audio quality. Network jitter, codec compression artifacts, background noise, and carrier-side silence suppression all affect word boundary detection. Different carriers require different parameter tuning -- sometimes dramatically different.

And the analysis window is fixed. The system must decide within a few seconds. Modern voicemail greetings that are very short (1-2 words) become indistinguishable from human answers based on word count alone.

How AI-Based AMD Decides

AI systems capture the audio stream (typically via media forking or SIP REFER), extract acoustic features -- Mel-frequency cepstral coefficients (MFCCs), spectral characteristics, pitch contours, speech rate -- and run them through a trained neural network.

The model learns things a word counter can't detect:

Prosody. Humans answering ringing phones have rising intonation ("Hello?"). Voicemail greetings are flat or falling ("Hi, you've reached...").
Background characteristics. Live answers have ambient noise -- TV, traffic, other people. Voicemail greetings are recorded in quiet rooms at close microphone range.
Carrier-specific patterns. Verizon's voicemail system sounds different from AT&T's. The model recognizes both.
Predictive beep detection. The beep hasn't played yet, but the greeting pattern that precedes it is a strong signal.
Confidence updating. Unlike stock AMD's binary decision, AI systems continuously refine their confidence score as more audio arrives, typically stabilizing within 1-2 seconds.

Real-World Accuracy Numbers

Metric	Traditional (default)	Traditional (tuned)	AI-Based
Overall accuracy	75-82%	85-92%	92-96%
False positive rate	15-25%	4-8%	2-4%
False negative rate	10-15%	8-12%	4-8%
Detection latency	800-2000ms	600-1500ms	1000-2500ms

The Real Tradeoff: Latency

Here's what AI AMD vendors don't emphasize enough. While AMD runs, the called party has picked up and hears silence. Every extra millisecond is another "Hello? Hello?" into dead air.

AMD Type	Latency	Agent Connect Delay
Traditional (clear human)	500-800ms	1-1.5 seconds
Traditional (machine)	800-2000ms	N/A (dropped)
AI-based (clear human)	1000-1500ms	1.5-2.5 seconds
AI-based (ambiguous)	2000-3000ms	2.5-3.5 seconds

That extra 500-1000ms with AI AMD is noticeable. Research shows each additional second of post-answer silence drops conversation rates by 5-8%. At 2.5 seconds total delay, some prospects just hang up -- defeating the purpose entirely.

Some providers mitigate with "early classification" at 800ms with refinement as more audio arrives. Helps, but introduces premature misclassification risk.

The Cost Math

Provider Type	Per Analyzed Call	Monthly (10K calls/day)
Traditional (Asterisk built-in)	$0.00	$0
AI AMD SaaS (budget)	$0.005-$0.01	$1,500-$3,000
AI AMD SaaS (premium)	$0.01-$0.03	$3,000-$9,000
Self-hosted AI AMD	$0.001-$0.003	$300-$900 + server

Self-hosted (your own GPU server + model) is cheapest at scale if you have the engineering capability. The GPU server itself runs $500-2,000/month.

The Scale Impact

For a 50-agent center running 200 dials/hour total:

Traditional AMD (tuned, 8% FP rate): 10,000 dials > 4,000 answered > 320 false positives = 320 lost live connections/day
AI AMD (3% FP rate): same volume > 120 false positives = 120 lost live connections/day

That's 200 additional live connections reaching agents daily. At $5 per connection, $1,000/day or $22,000/month in recovered value.

But if you've already tuned traditional AMD to 4-5% false positives, the incremental gain drops to 50-80 additional connections per day. The math still works at high per-connection values ($10+ in insurance, solar, real estate) but gets marginal at lower values.

Reliability and Dependencies

Traditional AMD: no external dependencies. If Asterisk runs, AMD runs.

AI AMD: depends on network connectivity to the AMD service, service availability, API latency (varies with provider load), and model updates that can change behavior without warning. If the AI service goes down, you need a fallback -- which means maintaining traditional AMD anyway.

Decision Framework

Stick with traditional when: under 50 agents, already tuned to 4-5% FP rate, latency is critical (B2C cold calls), zero external dependencies required, or budget is tight.

Switch to AI when: 100+ agents (accuracy gains = hundreds more daily connections), high per-connection value ($10+), cell phone-heavy lists (cell voicemails are harder for stock AMD), or you've maxed out traditional tuning at 6-8% FP.

Hybrid approach: use traditional AMD as primary for clear-cut cases (70-80% of calls). Route borderline cases (TOOLONG, borderline word counts) to AI AMD for a second opinion. Traditional speed on the easy ones, AI accuracy on the hard ones.

; Hybrid dialplan concept
exten => _1NXXNXXXXXX,1,AMD()
exten => _1NXXNXXXXXX,n,GotoIf($["${AMDCAUSE}" = "TOOLONG"]?ai_check)
exten => _1NXXNXXXXXX,n,GotoIf($["${AMDSTATUS}" = "MACHINE"]?machine:human)
exten => _1NXXNXXXXXX,n(ai_check),AGI(ai_amd_check.agi)
exten => _1NXXNXXXXXX,n,GotoIf($["${AI_RESULT}" = "MACHINE"]?machine:human)

Where This Is Heading

AI AMD costs are dropping 50% year-over-year. Edge-deployed models eliminate network round trips. Asterisk 18's AudioSocket enables clean external audio streaming. Traditional AMD stays relevant for small centers, air-gapped environments, and as a fallback.

The smart strategy: master traditional AMD tuning first (free, immediate impact), then layer AI AMD when scale and per-connection value justify the spend.

ViciStack deploys both traditional and AI AMD across 100+ call centers. We know when each approach delivers ROI and when it doesn't. All included at flat-rate pricing, no per-minute charges, no AI AMD surcharges.

Originally published at https://vicistack.com/blog/vicidial-amd-vs-ai-amd/