I built a VAD that beats Silero, Pyannote, and WebRTC on noisy audio — here's how

#python #machinelearning #audio #opensource

I built NOVA-VAD — a lightweight, explainable Voice Activity Detector that beats every major open source VAD on real-world noisy audio.

GitHub:(https://github.com/monishmal3375/nova-vad)

Benchmark (100 held-out files, never seen during training)

Model	Accuracy	Lightweight	Explainable
WebRTC VAD	58.0%	✅	❌
Pyannote VAD	62.0%	❌	❌
Silero VAD	87.0%	❌	❌
NOVA-VAD	93.0%	✅	✅

What makes it different

No PyTorch or GPU required — pure scikit-learn
Explains every decision with confidence scores and feature importance
Built-in denoiser pipeline
Retrainable on your own data

No existing VAD does all three simultaneously.

Example output
File: speech_001.wav

Prediction: SPEECH (93.47% confidence)

MFCC Delta 1 std (10.63%) → HIGH spectral change rate — dynamic audio like speech
MFCC Delta 2 std ( 6.14%) → HIGH acceleration — rapidly changing audio, speech-like
Silence ratio ( 5.92%) → 56% silence — mix of speech and pauses

DEV Community

I built a VAD that beats Silero, Pyannote, and WebRTC on noisy audio — here's how

Top comments (0)