Dialphone Limited

Posted on Apr 18

I Tested Voicemail Transcription Accuracy Across 6 UK VoIP Providers — The Spread Was Wider Than Expected

#voip #testing #uk #voicemail

Following my mobile notification test a couple weeks back, I got a stack of messages asking whether voicemail transcription quality was comparably variable between UK VoIP providers. Short answer, tbh yes, and the gap is bigger than you would think.

The setup

I recorded 50 voicemail messages in varied conditions, then played them back through a physical speaker into each provider's inbound line. Each provider's transcription engine then processed the same audio and returned text.

Audio conditions covered.

Condition	Count	Description
Quiet office (ideal)	10	Mic 30cm from speaker, no background noise
Open-plan office	10	Faint keyboard typing, occasional cough
Mobile outdoors	10	Traffic noise, wind, 3 bar signal
Mobile indoors weak signal	10	1 bar signal, audio breaks
Regional accent mix	10	Glasgow, Geordie, West Country, Welsh, RP

Scoring, word error rate (WER) — proportion of words incorrectly transcribed. Lower is better. Anything above 15% is unreadable in practice.

The results

From my experience running this test on our own engine and 5 competitor offerings last month, here is where things landed.

Quiet office (ideal conditions)

Provider	WER
Provider A	3.2%
Provider B	4.1%
Provider C	4.7%
Provider D	6.9%
Provider E	11.3%
Provider F	14.6%

Everything under 10% is usable. The last two would already be frustrating. Keep in mind this is the easy condition.

Open-plan office

Results shifted by 2-4 percentage points across the board. Providers A and B held under 7% WER. Provider F hit 18%.

Mobile outdoors

This is where things fell apart. Honestly not gonna lie, some of the results surprised me.

Provider	WER mobile outdoor
Provider A	8.4%
Provider B	11.7%
Provider C	19.2%
Provider D	23.6%
Provider E	31.0%
Provider F	42.4%

Provider F's transcripts under 1-bar signal were frequently pure nonsense. 'Thank you for calling about Dave's poodle' was the literal output when the real message was 'thanks for calling about the VoIP trial'.

Regional accent test

Northern, Scottish and Welsh accents were where the engines really diverged. Of our 10 regional-accent recordings:

Provider	Accent-WER
Provider A	7.8%
Provider B	9.2%
Provider C	13.1%
Provider D	17.4%
Provider E	24.8%
Provider F	29.3%

The Glaswegian sample had Provider F transcribing 'aye, will you phone me back the morrow' as 'I Wilfred one may back them morrow'. This is not usable.

What is driving the differences

From what I can tell, providers fall into 3 categories of underlying transcription tech.

Home-grown speech-to-text trained on US English (Provider F). These perform badly on UK accents, full stop.
Generic cloud APIs like Google Speech or Amazon Transcribe (Providers C, D, E). Decent on neutral accents, mediocre on regional.
UK-focused models fine-tuned on UK voicemail data (Providers A, B). Best results, especially on accents and noisy conditions.

Our engine at DialPhone is in that third category, trained on anonymised UK voicemail samples we collected with consent over 2 years. We placed 2nd in this test. Provider A (who I am not naming) placed 1st and I respect them for it.

Why this matters more than people realise

Businesses do not just read voicemail transcripts. Most UK VoIP providers now ALSO run the transcript through classification to detect urgency, sentiment, or booking requests. A 40% WER destroys downstream logic. If the transcription says 'Dave's poodle', there is no way the urgency classifier recovers.

If you are evaluating VoIP providers and voicemail is important to your workflow, request 5 test recordings through trial. Use varied conditions. Compute your own WER. You will likely be surprised, in both directions.

The UK VoIP market in 2026 has real quality differences on things like this. They are just rarely measured.

DEV Community