Beck_Moulton

Posted on Apr 17

Turn Your Old Phone into a Lifesaver: Edge AI Sleep Apnea Detection with Whisper.cpp & CNNs

#javascript #machinelearning #openai #webassembly

Do you have an old smartphone gathering dust in a drawer? Instead of letting it go to waste, let’s turn it into a high-tech medical guardian. Today, we are building Whisp-Ear, a privacy-first, edge-based sleep monitor that detects snoring and potential sleep apnea patterns entirely within your browser.

By leveraging edge-based machine learning, WebAssembly audio processing, and real-time spectrogram analysis, we can bypass expensive medical equipment for initial screening. This tutorial explores how to orchestrate Whisper.cpp, TensorFlow Lite (TFLite), and the WebAudio API to create a seamless diagnostic tool that respects user privacy by keeping data off the cloud.

Why Edge AI for Health?

Traditional sleep apps often upload your raw audio to a server. That’s a privacy nightmare! By using Wasm (WebAssembly), we can run complex models like Whisper and custom CNNs (Convolutional Neural Networks) directly on the device. This approach provides:

Zero Latency: Real-time feedback without round-trips to a server.
Total Privacy: Your snoring stays on your phone.
Cost Efficiency: No backend GPU costs for the developer.

The Architecture: How Whisp-Ear Works

The system follows a multi-stage pipeline. We first filter out background noise (like a fan or white noise) using a tiny Whisper model, then pass suspicious segments to a specialized CNN for apnea pattern recognition.

graph TD
    A[Microphone Input] --> B[WebAudio API / AudioWorklet]
    B --> C{Whisper.cpp Filter}
    C -- No Speech/Noise --> D[Discard/Wait]
    C -- Breathing/Snoring Sounds --> E[Spectrogram Conversion]
    E --> F[CNN Model - TFLite]
    F -- Pattern Detected --> G[Local Alert / Log]
    F -- Normal --> D

Prerequisites

To follow along, you'll need:

Tech Stack: Basic knowledge of JavaScript/TypeScript.
Tools: A browser supporting SharedArrayBuffer (for Wasm performance).
Models:
- Whisper.cpp (quantized tiny.en model).
- A pre-trained TFLite model for audio classification.

Step 1: Setting up the Audio Stream

First, we need to capture high-quality audio from the browser using the WebAudio API. We use an AudioWorklet to ensure the UI thread doesn't freeze during heavy processing.

// audio-processor.js
async function setupAudio() {
  const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
  const audioContext = new AudioContext({ sampleRate: 16000 }); // Whisper requires 16kHz
  const source = audioContext.createMediaStreamSource(stream);

  await audioContext.audioWorklet.addModule('processor.js');
  const workletNode = new AudioWorkletNode(audioContext, 'recorder-worklet');

  source.connect(workletNode);
  workletNode.connect(audioContext.destination);

  return workletNode;
}

Step 2: Integrating Whisper.cpp via Wasm

Whisper is famous for transcription, but in this project, we use it as an "Intelligent Filter." It helps us distinguish between "Ambient Noise" and "Human-related Sounds."

import { Whisper } from './whisper-wasm/whisper.js';

const whisper = new Whisper('models/whisper-tiny-q5_1.bin');

async function processSegment(audioBuffer) {
  const result = await whisper.run(audioBuffer);

  // We check if the probability of "Breathing" or "Snoring" tokens is high
  if (result.text.includes("[snoring]") || result.probability > 0.7) {
    analyzeSpectrogram(audioBuffer);
  }
}

Step 3: CNN Spectrogram Analysis

Once Whisper flags a segment, we convert the audio into a Mel-spectrogram (a visual representation of sound frequencies) and feed it into a TensorFlow Lite model. This model is specifically trained to look for the "crescendo-decrescendo" pattern of obstructive sleep apnea.

import * as tflite from '@tensorflow/tfjs-tflite';

async function analyzeSpectrogram(audioData) {
  const model = await tflite.loadTFLiteModel('/models/apnea_cnn.tflite');

  // Convert PCM data to Spectrogram Tensor
  const inputTensor = tf.browser.fromPixels(generateSpectrogram(audioData));

  const prediction = model.predict(inputTensor);
  const score = prediction.dataSync()[0];

  if (score > 0.85) {
    console.warn("⚠️ Potential Apnea Event Detected!");
    triggerAlert();
  }
}

Mastering Production-Ready AI Patterns

While building a prototype is fun, deploying health-tech at scale requires more robust architectural patterns, especially regarding signal processing and model quantization.

If you're looking to dive deeper into professional implementations, I highly recommend checking out the WellAlly Blog. They have some fantastic deep dives on optimizing mobile AI workloads and building HIPAA-compliant edge architectures that are perfect for taking a project like Whisp-Ear to the next level.

Step 4: Visualizing the Data

To make this a true "Learning in Public" project, we should visualize the breathing patterns so the user can see what's happening.

function drawWaveform(data) {
  const canvas = document.getElementById('monitor-canvas');
  const ctx = canvas.getContext('2d');
  // Logic to draw real-time PCM waves...
  // Green = Normal, Red = High Snoring Intensity
}

Conclusion

By combining the natural language capabilities of Whisper.cpp with the specialized pattern recognition of a CNN, we've created a powerful diagnostic tool that runs entirely in the browser. This demonstrates the incredible power of modern WebAssembly and the maturity of the JavaScript AI ecosystem.

Next Steps for you:

Quantization: Try using 4-bit quantization on the Whisper model to reduce memory usage on older devices.
Dataset: Use the Urbansound8K or specialized medical datasets to fine-tune your CNN.

What are you building with Edge AI? Let me know in the comments! 👇

DEV Community