Ertugrul

Posted on Aug 29

Building an Edge AI Sound Classifier (Part 3): Pico Firmware & Live Demo

#raspberrypi #python #cpp #programming

In Part 1 we prepared the dataset, and in Part 2 we hand‑crafted 33‑dim features and trained a small multinomial Logistic Regression model. In this final part, we’ll deploy the model on a Raspberry Pi Pico, stream live features from a microphone, and discuss what worked—and what didn’t.

Honest note: This is a learning project. The model works end‑to‑end, but it’s not perfect in the wild (e.g., baby vs doorbell mix‑ups). I’m sharing the entire process, trade‑offs, and next steps rather than pretending it’s production‑ready.

🎛️ Runtime Architecture

PC captures audio, extracts 33‑D features, and sends a CSV line per window over USB‑CDC. Pico parses the CSV, applies z‑score using training stats, runs a linear layer + softmax, then stabilizes outputs with a small hysteresis FSM. If an alarm class is active, the LED turns on.

🖥️ Live capture & sender (Python)

We reuse the same extraction logic from Part 2 so the feature order matches training exactly. The script writes PING/RESET control messages and one CSV line per snippet.

Key points in simulation.py:

Same pre‑processing: peak normalize → pre‑emphasis → 25 ms frames (10 ms hop), Hann window.
33‑dim features: 12 Goertzel band means/stds (each internally z‑scored), RMS/centroid/rolloff/ZCR/flatness.
Ring buffer + hop timing so you get a decision roughly every hop once the first window is full.
USB‑CDC serial to Pico at 115200 baud.

# extract_features_live(...)  — matches training feature order
# (peak normalize → pre‑emphasis → framing → FFT stats → Goertzel bands)
# then serialize as: f"{v1},{v2},...,{v33}\n"

Source: python/simulation.py

🧠 Inference firmware (C++)

The firmware does four simple things on each incoming feature vector:

Parse CSV into a float array (size = FEATS)
Z‑score with training MU and SIGMA
Logistic regression (bias + weights) → softmax
FSM per class (thresholds + consecutive frames)

Important bits from main.cpp:

Stable softmax via max‑log trick.
Label‑specific FSM defaults: smoke_alarm stricter th_on, doorbell/baby moderate, “other” never lights the LED.
USB commands: PING (health check) and RESET (clears FSM state).

// zscore(x); logits_softmax(x, probs); fsm_step(fsm, probs);
bool led_on = any_alert_active(fsm);
gpio_put(LED_PIN, led_on ? 1 : 0);

Source: firmware/main.cpp

▶️ Demo (YouTube)

I recorded a short video showing the terminal logs while playing doorbell/baby/smoke alarm audio.

YouTube link

What to look for:

The top=... label tracks the dominant class.
active: flags show the FSM decisions.
LED state correlates with alarm classes.

⚖️ Latency vs Stability

Two knobs determine perceived latency:

Snippet length (e.g., 0.8–1.5 s): shorter = faster decisions, but noisier.
FSM (th_on/th_off, need_on): lower thresholds and fewer consecutive frames = faster, but more false positives.

A balanced preset that worked for me:

SNIPPET_S=0.8, HOP_S=0.2 in the sender
smoke_alarm: need_on=1, doorbell: need_on=2, baby: need_on=2–3

⚠️ Known Limitations (a.k.a. things I’d fix next)

Domain shift: some classes came from a single source → struggles when room/mic changes.
Weak baby recall: baby often confused with doorbell/other on real audio.
“Other” is infinite: hard negatives are never enough; needs continual additions.
No on‑device feature extraction: Pico only runs inference; features are computed on PC in this demo.

🧭 What I Learned

Small, explainable models (like LR) are great for MCUs: fast, easy to debug, and predictable.
Feature parity between training and runtime is non‑negotiable. One mismatch (ordering or normalization) can dominate outputs.
FSMs matter as much as the classifier for real‑world UX; thresholds are product decisions, not just ML.
Data diversity beats cleverness: more sources per class and hard negatives improve robustness more than model tweaks.
Latency is a pipeline property: snippet size + hop + FSM determine UX way more than raw FLOPs.

📦 Repo Pointers

python/simulation.py — live mic sender (feature extraction + USB‑CDC)
firmware/main.cpp — inference + FSM + LED
firmware/model_params.hpp — auto‑generated weights & normalization

Thanks for reading this series! If you try it, I’d love to hear how it behaves in your room, with your mic and alarms. PRs with more diverse data are especially welcome 🙌

🌐 Links & Connect

GitHub repo: Edge-AI-Sound-Classifier-on-Raspberry-Pi-Pico ⭐ (if you find this useful, please give it a star!)
LinkedIn: Ertuğrul Mutlu

DEV Community