I built NOVA-VAD — a lightweight, explainable Voice Activity Detector that beats every major open source VAD on real-world noisy audio.
GitHub:(https://github.com/monishmal3375/nova-vad)
Benchmark (100 held-out files, never seen during training)
| Model | Accuracy | Lightweight | Explainable |
|---|---|---|---|
| WebRTC VAD | 58.0% | ✅ | ❌ |
| Pyannote VAD | 62.0% | ❌ | ❌ |
| Silero VAD | 87.0% | ❌ | ❌ |
| NOVA-VAD | 93.0% | ✅ | ✅ |
- No PyTorch or GPU required — pure scikit-learn
- Explains every decision with confidence scores and feature importance
- Built-in denoiser pipeline
- Retrainable on your own data
No existing VAD does all three simultaneously.
Example output
File: speech_001.wav
Prediction: SPEECH (93.47% confidence)
MFCC Delta 1 std (10.63%) → HIGH spectral change rate — dynamic audio like speech
MFCC Delta 2 std ( 6.14%) → HIGH acceleration — rapidly changing audio, speech-like
Silence ratio ( 5.92%) → 56% silence — mix of speech and pauses
Top comments (0)