DEV Community

M M
M M

Posted on

I built a VAD that beats Silero, Pyannote, and WebRTC on noisy audio — here's how

I built NOVA-VAD — a lightweight, explainable Voice Activity Detector that beats every major open source VAD on real-world noisy audio.

GitHub:(https://github.com/monishmal3375/nova-vad)

Benchmark (100 held-out files, never seen during training)

Model Accuracy Lightweight Explainable
WebRTC VAD 58.0%
Pyannote VAD 62.0%
Silero VAD 87.0%
NOVA-VAD 93.0%

What makes it different

  • No PyTorch or GPU required — pure scikit-learn
  • Explains every decision with confidence scores and feature importance
  • Built-in denoiser pipeline
  • Retrainable on your own data

No existing VAD does all three simultaneously.

Example output
File: speech_001.wav

Prediction: SPEECH (93.47% confidence)

MFCC Delta 1 std (10.63%) → HIGH spectral change rate — dynamic audio like speech
MFCC Delta 2 std ( 6.14%) → HIGH acceleration — rapidly changing audio, speech-like
Silence ratio ( 5.92%) → 56% silence — mix of speech and pauses

Top comments (0)