Do you have an old smartphone gathering dust in a drawer? Instead of letting it go to waste, let’s turn it into a high-tech medical guardian. Today, we are building Whisp-Ear, a privacy-first, edge-based sleep monitor that detects snoring and potential sleep apnea patterns entirely within your browser.
By leveraging edge-based machine learning, WebAssembly audio processing, and real-time spectrogram analysis, we can bypass expensive medical equipment for initial screening. This tutorial explores how to orchestrate Whisper.cpp, TensorFlow Lite (TFLite), and the WebAudio API to create a seamless diagnostic tool that respects user privacy by keeping data off the cloud.
Why Edge AI for Health?
Traditional sleep apps often upload your raw audio to a server. That’s a privacy nightmare! By using Wasm (WebAssembly), we can run complex models like Whisper and custom CNNs (Convolutional Neural Networks) directly on the device. This approach provides:
- Zero Latency: Real-time feedback without round-trips to a server.
- Total Privacy: Your snoring stays on your phone.
- Cost Efficiency: No backend GPU costs for the developer.
The Architecture: How Whisp-Ear Works
The system follows a multi-stage pipeline. We first filter out background noise (like a fan or white noise) using a tiny Whisper model, then pass suspicious segments to a specialized CNN for apnea pattern recognition.
graph TD
A[Microphone Input] --> B[WebAudio API / AudioWorklet]
B --> C{Whisper.cpp Filter}
C -- No Speech/Noise --> D[Discard/Wait]
C -- Breathing/Snoring Sounds --> E[Spectrogram Conversion]
E --> F[CNN Model - TFLite]
F -- Pattern Detected --> G[Local Alert / Log]
F -- Normal --> D
Prerequisites
To follow along, you'll need:
- Tech Stack: Basic knowledge of JavaScript/TypeScript.
- Tools: A browser supporting
SharedArrayBuffer(for Wasm performance). - Models:
- Whisper.cpp (quantized
tiny.enmodel). - A pre-trained TFLite model for audio classification.
- Whisper.cpp (quantized
Step 1: Setting up the Audio Stream
First, we need to capture high-quality audio from the browser using the WebAudio API. We use an AudioWorklet to ensure the UI thread doesn't freeze during heavy processing.
// audio-processor.js
async function setupAudio() {
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const audioContext = new AudioContext({ sampleRate: 16000 }); // Whisper requires 16kHz
const source = audioContext.createMediaStreamSource(stream);
await audioContext.audioWorklet.addModule('processor.js');
const workletNode = new AudioWorkletNode(audioContext, 'recorder-worklet');
source.connect(workletNode);
workletNode.connect(audioContext.destination);
return workletNode;
}
Step 2: Integrating Whisper.cpp via Wasm
Whisper is famous for transcription, but in this project, we use it as an "Intelligent Filter." It helps us distinguish between "Ambient Noise" and "Human-related Sounds."
import { Whisper } from './whisper-wasm/whisper.js';
const whisper = new Whisper('models/whisper-tiny-q5_1.bin');
async function processSegment(audioBuffer) {
const result = await whisper.run(audioBuffer);
// We check if the probability of "Breathing" or "Snoring" tokens is high
if (result.text.includes("[snoring]") || result.probability > 0.7) {
analyzeSpectrogram(audioBuffer);
}
}
Step 3: CNN Spectrogram Analysis
Once Whisper flags a segment, we convert the audio into a Mel-spectrogram (a visual representation of sound frequencies) and feed it into a TensorFlow Lite model. This model is specifically trained to look for the "crescendo-decrescendo" pattern of obstructive sleep apnea.
import * as tflite from '@tensorflow/tfjs-tflite';
async function analyzeSpectrogram(audioData) {
const model = await tflite.loadTFLiteModel('/models/apnea_cnn.tflite');
// Convert PCM data to Spectrogram Tensor
const inputTensor = tf.browser.fromPixels(generateSpectrogram(audioData));
const prediction = model.predict(inputTensor);
const score = prediction.dataSync()[0];
if (score > 0.85) {
console.warn("⚠️ Potential Apnea Event Detected!");
triggerAlert();
}
}
Mastering Production-Ready AI Patterns
While building a prototype is fun, deploying health-tech at scale requires more robust architectural patterns, especially regarding signal processing and model quantization.
If you're looking to dive deeper into professional implementations, I highly recommend checking out the WellAlly Blog. They have some fantastic deep dives on optimizing mobile AI workloads and building HIPAA-compliant edge architectures that are perfect for taking a project like Whisp-Ear to the next level.
Step 4: Visualizing the Data
To make this a true "Learning in Public" project, we should visualize the breathing patterns so the user can see what's happening.
function drawWaveform(data) {
const canvas = document.getElementById('monitor-canvas');
const ctx = canvas.getContext('2d');
// Logic to draw real-time PCM waves...
// Green = Normal, Red = High Snoring Intensity
}
Conclusion
By combining the natural language capabilities of Whisper.cpp with the specialized pattern recognition of a CNN, we've created a powerful diagnostic tool that runs entirely in the browser. This demonstrates the incredible power of modern WebAssembly and the maturity of the JavaScript AI ecosystem.
Next Steps for you:
- Quantization: Try using 4-bit quantization on the Whisper model to reduce memory usage on older devices.
- Dataset: Use the Urbansound8K or specialized medical datasets to fine-tune your CNN.
What are you building with Edge AI? Let me know in the comments! 👇
Top comments (0)