DEV Community

TildAlice
TildAlice

Posted on • Originally published at tildalice.io

Whisper Tiny vs faster-whisper: 3x Speed, 12% WER Gap

faster-whisper cuts inference time by 3x but WER jumps 12% on edge hardware — and nobody talks about the memory spikes

I've been running Whisper models on a Jetson Nano (4GB RAM, ARM Cortex-A57) for a voice-controlled robot project, and the vanilla openai-whisper Tiny model was hitting 8-second latency for 10-second audio clips. Unacceptable for real-time interaction. The internet promised that faster-whisper (the CTranslate2-optimized fork) would fix everything. It did cut latency to 2.6 seconds — but Word Error Rate spiked from 8.3% to 20.1% on my test set of noisy workshop commands.

This isn't a "one size fits all" situation. The speed-accuracy tradeoff is real, and it gets worse when you factor in quantization, beam search width, and the fact that edge devices don't have the thermal headroom to sustain peak performance. Here's what actually happens when you benchmark both on constrained hardware, with numbers from 500 audio samples (Mozilla Common Voice English, 5-15 second clips, transcoded to 16kHz mono WAV).

Moody still life of vintage book, coffee mug, and dried flowers on a dark wooden table.

Photo by Sheep . on Pexels

Continue reading the full article on TildAlice

Top comments (0)