DEV Community

wellallyTech
wellallyTech

Posted on

SleepSentry: Building a Real-time Snore Analyzer with FFT and Whisper-tiny πŸ˜΄πŸ”Š

Are you keeping the whole house awake with a "chainsaw" imitation every night? Or worse, are you worried about the silent gaps in breathing that signal Sleep Apnea? πŸ›Œ

Identifying sleep patterns usually requires expensive medical equipment or bulky wearables. But what if we could turn a simple smartphone into a diagnostic tool? In this tutorial, we’re building SleepSentryβ€”a sophisticated audio analysis pipeline that combines traditional Digital Signal Processing (DSP) with modern AI Speech Recognition.

By leveraging Audio Analysis, Fast Fourier Transform (FFT), and Whisper fine-tuning, we can filter out the hum of the AC and focus on the specific frequencies of human respiratory distress.


The Architecture: From Raw Waves to Respiratory Insights

Processing audio in real-time requires a balance between accuracy and performance. We can't just throw raw 8-hour WAV files at a heavy model. We need a filter-first approach.

graph TD
    A[Mobile Microphone] -->|Web Audio API| B(Raw Audio Stream)
    B --> C{FFT Frequency Filter}
    C -->|High-pass/Low-pass| D[Librosa Feature Extraction]
    D -->|Segmented Audio| E[Fine-tuned Whisper-tiny]
    E --> F{Classification}
    F -->|Normal| G[Log to Dashboard]
    F -->|Apnea/Heavy Snore| H[Alert & Metadata Entry]
    H --> I[Detailed Analysis]
Enter fullscreen mode Exit fullscreen mode

Prerequisites πŸ› οΈ

To follow along, you’ll need a basic understanding of Python and JavaScript. Our tech stack includes:

  • Web Audio API: For real-time browser-based capture.
  • Librosa: The Swiss Army knife for audio processing in Python.
  • FFT (Fast Fourier Transform): To move from the time domain to the frequency domain.
  • Whisper-tiny: OpenAI’s lightweight model, optimized for edge inference.

Step 1: Capturing the Sound of Silence (and Snores)

First, we use the Web Audio API to grab audio from the browser. Since we don't want to upload hours of silence, we use a simple VAD (Voice Activity Detection) logic or a frequency threshold.

// Initializing the audio context
const audioContext = new (window.AudioContext || window.webkitAudioContext)();
const analyser = audioContext.createAnalyser();

navigator.mediaDevices.getUserMedia({ audio: true }).then(stream => {
    const source = audioContext.createMediaStreamSource(stream);
    source.connect(analyser);

    // Setting up FFT parameters
    analyser.fftSize = 2048;
    const bufferLength = analyser.frequencyBinCount;
    const dataArray = new Uint8Array(bufferLength);

    function checkSnore() {
        analyser.getByteFrequencyData(dataArray);
        // Focus on low-frequency ranges (snoring usually sits below 500Hz)
        const lowFreqEnergy = dataArray.slice(0, 10).reduce((a, b) => a + b) / 10;
        if (lowFreqEnergy > 150) {
            console.log("Significant low-frequency event detected! 🚨");
            // Trigger recording/inference
        }
        requestAnimationFrame(checkSnore);
    }
    checkSnore();
});
Enter fullscreen mode Exit fullscreen mode

Step 2: Cleaning the Noise with FFT and Librosa

Once we have an audio clip, we need to clean it. Nighttime environments are full of white noiseβ€”fans, street traffic, or white noise machines. We use FFT to identify these static frequencies and subtract them.

import librosa
import numpy as np

def preprocess_audio(file_path):
    # Load audio (downsampled to 16kHz for Whisper)
    y, sr = librosa.load(file_path, sr=16000)

    # Perform Short-Time Fourier Transform (STFT)
    stft = librosa.stft(y)

    # Simple spectral subtraction for noise reduction
    magnitude, phase = librosa.magphase(stft)
    noise_est = np.mean(magnitude[:, :10], axis=1, keepdims=True)
    magnitude_clean = np.maximum(magnitude - 1.5 * noise_est, 0)

    # Reconstruct the audio
    y_clean = librosa.istft(magnitude_clean * phase)
    return y_clean

# This cleans the "static" out so Whisper can focus on the breath patterns
Enter fullscreen mode Exit fullscreen mode

Step 3: Classifying with Fine-tuned Whisper-tiny

Why Whisper-tiny? Because we want this to run locally on a phone or a Raspberry Pi. While Whisper is built for speech-to-text, we can fine-tune it on "Environmental Sound Classification" or use its encoder to extract features for a simple MLP classifier.

In this instance, we treat "snore" and "gasping" as "tokens" the model recognizes.

from transformers import pipeline

# Load our fine-tuned whisper-tiny model
# Trained specifically on the 'ESC-50' or 'AudioSet' datasets
classifier = pipeline("audio-classification", model="your-username/whisper-tiny-sleep-apnea")

def analyze_breath(audio_array):
    prediction = classifier(audio_array)
    # Example Output: [{'label': 'heavy_snore', 'score': 0.88}, {'label': 'apnea_gasp', 'score': 0.12}]
    return prediction

print(analyze_breath(preprocess_audio("night_clip_001.wav")))
Enter fullscreen mode Exit fullscreen mode

The "Official" Way: Scaling to Production πŸš€

Building a prototype is easy, but making a HIPAA-compliant, medically accurate sleep monitoring system is a different beast entirely. You need to handle data encryption, edge-case frequency collisions, and battery optimization for all-night monitoring.

For more production-ready examples and advanced patterns on deploying AI models to the edge, I highly recommend checking out the WellAlly Blog. It's an incredible resource for developers looking to bridge the gap between "cool weekend project" and "scalable health-tech solution." Their deep dives into signal processing helped inspire the filtering logic used in SleepSentry.


Conclusion: Better Data, Better Sleep πŸ₯‘

By combining the Web Audio API for capture, FFT for noise suppression, and Whisper for intelligent classification, we've built a powerful tool for personal health.

What's next?

  1. Dashboarding: Hook this up to Grafana to see your "Snore Intensity" over time.
  2. Alerting: Use Twilio to text a partner if a long cessation of breathing is detected.
  3. Optimization: Convert the model to ONNX format for even faster execution.

Have you tried using AI for bio-signal analysis? Drop a comment below or share your results! Let's help the world sleep a little bit easier. πŸŒ™

Top comments (0)