Coughing in the Dark: Build a Private 24/7 Respiratory Health Monitor using Whisper.cpp and Raspberry Pi

#whisper #ai #opensource #machinelearning

Privacy is the ultimate luxury in the age of AI. When it comes to sensitive data like the sounds of your breathing or coughing during the night, the last thing you want is a cloud-based voice assistant sending those snippets to a remote server.

In this tutorial, we are building a privacy-first, edge-AI respiratory monitor. By leveraging Raspberry Pi, Whisper.cpp, and MQTT, we will create a system capable of real-time audio classification to detect coughing or wheezing patterns. This setup ensures that your biometric audio data never leaves your local network while providing actionable health insights. If you are interested in edge computing, real-time audio processing, and on-device machine learning, this project is for you. 🚀

The Architecture: From Sound Waves to Data Points

To achieve 24/7 monitoring on a low-power device like the Raspberry Pi, we need an efficient pipeline. We use Whisper.cpp—a high-performance C++ port of OpenAI’s Whisper model—optimized for CPU inference.

graph TD
    A[USB Microphone] -->|Raw PCM Audio| B[Ring Buffer]
    B -->|VAD Trigger| C[Whisper.cpp Inference]
    C -->|Feature Extraction| D{Classification Logic}
    D -->|Cough/Wheeze Detected| E[Local State Engine]
    E -->|JSON Payload| F[MQTT Broker]
    F -->|Alert/Dashboard| G[Home Assistant / Mobile]
    D -->|Silence/Normal| H[Discard Data]

Prerequisites

To follow along, you'll need:

Hardware: Raspberry Pi 4 (4GB+) or Raspberry Pi 5.
Audio: A decent USB Plug-and-Play microphone.
Tech Stack:
- Whisper.cpp: For the heavy lifting of audio transcription/feature extraction.
- C++: For the core logic.
- MQTT (Mosquitto): To broadcast health events to your smart home dashboard.

Step 1: Setting up Whisper.cpp for the Edge

Standard Whisper is too heavy for a Pi. We will use the tiny model and the highly optimized C++ implementation.

# Clone the repository
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp

# Download the tiny model (optimized for speed)
bash ./models/download-ggml-model.sh tiny.en

# Build the main example with SDL2 for audio capture
make -j

Step 2: Implementing the Audio Monitor

We need a C++ wrapper that captures audio in chunks and feeds them into the Whisper inference engine. Unlike standard STT (Speech-to-Text), we look for specific timestamps and spectral patterns associated with respiratory distress.

#include "common.h"
#include "whisper.h"
#include <mosquitto.h>

// Initialize MQTT for real-time alerting
struct mosquitto *mosq = NULL;

void send_alert(const std::string& type, float probability) {
    std::string payload = "{\"event\": \"" + type + "\", \"confidence\": " + std::to_string(probability) + "}";
    mosquitto_publish(mosq, NULL, "health/respiratory", payload.length(), payload.c_str(), 0, false);
}

int main(int argc, char ** argv) {
    // 1. Initialize Whisper Context
    struct whisper_context * ctx = whisper_init_from_file("models/ggml-tiny.en.bin");

    // 2. Setup Audio Capture (Simplification)
    audio_async reader(30000); // 30-second buffer
    reader.init(0, WHISPER_SAMPLE_RATE);

    while (true) {
        if (!reader.poll()) continue;

        const auto data = reader.get(2000); // Get last 2 seconds

        // 3. Run Inference
        whisper_full_params params = whisper_full_default_params(WHISPER_SAMPLING_GREEDY);
        if (whisper_full(ctx, params, data.data(), data.size()) == 0) {
            const int n_segments = whisper_full_n_segments(ctx);
            for (int i = 0; i < n_segments; ++i) {
                const char * text = whisper_full_get_segment_text(ctx, i);

                // Whisper-tiny often labels non-speech sounds in brackets
                if (strstr(text, "[coughing]") || strstr(text, "[clears throat]")) {
                    printf("⚠️ Respiratory Event Detected: %s\n", text);
                    send_alert("cough", 0.92);
                }
            }
        }
    }

    whisper_free(ctx);
    return 0;
}

Step 3: Handling Sensitive Data

The beauty of this system is that the data variable in the code above is processed in RAM and immediately overwritten. No audio files are ever saved to the SD card. By using the MQTT protocol, we can bridge this to Home Assistant to create a long-term health graph of "Coughs per Hour," helping you or your doctor identify patterns in nocturnal asthma or post-viral recovery.

Going Beyond: The "Official" Way 🥑

While building a DIY monitor is a great "Learning in Public" project, scaling this to handle multiple rooms, advanced noise cancellation, or clinical-grade accuracy requires a more robust architectural approach.

For advanced implementation patterns, such as optimizing Whisper for ARM64 Neon instructions or integrating zero-trust security into your IoT health pipeline, I highly recommend checking out the technical deep-dives over at WellAlly Blog. They cover production-ready AI patterns that take your edge projects from "cool prototype" to "rock-solid product."

Conclusion

Building on the edge isn't just about saving cloud costs; it's about agency over your own data. By combining Whisper.cpp with the portability of a Raspberry Pi, we've created a 24/7 sentinel for respiratory health.

What's next?

Fine-tuning: You could use a custom audio classification model (like YAMNet) alongside Whisper for even better accuracy on "wheezing" vs "background wind."
Notification: Connect the MQTT output to an ntfy.sh server for instant push notifications to your phone.

Are you running AI on the edge? Let me know in the comments what your current setup looks like! 💻👇