DEV Community

Priyanshu Chawda
Priyanshu Chawda

Posted on

ESP32 Wi-Fi CSI Observatory: Gemma 4 Turns Noisy Wi-Fi Into Camera-Free Room Intelligence

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

What if a room-awareness system could recognize meaningful activity without capturing a single image or recording a single sound?

I built ESP32 Wi-Fi CSI Observatory, a camera-free spatial sensing system that uses Wi-Fi Channel State Information from a low-cost ESP32 DevKit V1 to observe changes inside a room.

Wi-Fi signals naturally change when a person enters, sits, walks or moves nearby. But those signals are noisy. Raw CSI packets contain spikes, unstable readings, variable packet rates and signal fluctuations that are difficult for a human operator to understand directly.

That is where Gemma 4 becomes central to the project.

Gemma 4 does not pretend to be a camera. It does not receive face images, video frames or identity information. Instead, it receives compact signal statistics produced from the ESP32 stream, selects an appropriate mathematical filtering strategy, interprets the cleaned evidence safely, explains what the system can and cannot claim, and prepares an actionable Telegram message for the operator.

The complete flow is:

Real room movement
      ↓
ESP32 Wi-Fi CSI packets
      ↓
Local Python feature extraction
      ↓
Gemma 4 filter and parameter decision
      ↓
Deterministic mathematical signal filtering
      ↓
Occupancy / activity evidence
      ↓
Gemma 4 safe explanation
      ↓
React Observatory + Telegram action
Enter fullscreen mode Exit fullscreen mode

The result is a live system where invisible RF changes become readable, explainable and actionable — without a camera.


Demo

Live ESP32 Observatory

ESP live Gemma detection

Live terminal CSI receiver showing BPM, RSSI, subcarriers, and waveform

The terminal receiver proves that the sensing input is real. During the final demonstration, an ESP32 DevKit V1 connected on COM5 streamed live CSI packets into the processing pipeline. The system displayed packet activity, RSSI, subcarrier behavior, motion energy, exploratory rhythm estimates and activity candidates while the Observatory visualized the same evidence in real time.


The Problem: Wi-Fi Signals Are Useful, but Noisy

An ESP32 can observe changes in Wi-Fi signal behavior, but the raw stream is not immediately useful.

A real CSI window may contain:

  • sudden spikes and outliers
  • unstable RSSI changes
  • short-term jitter
  • missing or weak packets
  • movement patterns mixed with background noise

A fixed filter is not always ideal. A window with sharp outliers may need different treatment from a window with smooth but continuously noisy variation.

Instead of applying one hard-coded smoothing strategy to every case, I used Gemma 4 as a signal-filtering advisor.

The local pipeline first computes compact numerical features such as:

{
  "sample_count": 29,
  "outlier_ratio": 0.18,
  "signal_std": 2.41,
  "signal_variance": 5.81,
  "rssi_std": 1.27,
  "missing_count": 0
}
Enter fullscreen mode Exit fullscreen mode

Gemma 4 then chooses the most suitable filtering approach and parameters from an allowed set:

  • moving_average
  • median
  • hampel
  • lowpass
  • none

A structured Gemma decision looks like this:

{
  "filter": "hampel",
  "window_size": 5,
  "outlier_threshold": 3.0,
  "lowpass_alpha": 0.25,
  "confidence": 0.87,
  "reason": "The signal window contains spike-like outliers, so local median replacement is appropriate before interpretation."
}
Enter fullscreen mode Exit fullscreen mode

This is an important architectural choice:

Gemma 4 does not fabricate sensor values and does not directly modify raw signal samples. It reasons over numerical evidence and selects the mathematical processing strategy. Deterministic Python code then applies that filter.

That separation keeps the pipeline inspectable and reliable.


How Gemma 4 Improves the Signal Pipeline

The project combines model reasoning with real numerical tools.

1. Mathematical filtering remains deterministic

The local Python engine implements the actual signal-processing methods:

  • Moving average for general smoothing.
  • Median filtering for short spike noise.
  • Hampel filtering for statistical outlier replacement.
  • Low-pass filtering for reducing rapid noise while preserving slower movement trends.

Gemma 4 selects which method is appropriate for the current signal window and recommends parameters such as filter window size, outlier threshold or low-pass alpha.

For example:

High outlier ratio
      → Gemma chooses median or Hampel filtering

High continuous signal variation
      → Gemma chooses moving average or low-pass filtering

Clean, stable signal
      → Gemma chooses no unnecessary filtering
Enter fullscreen mode Exit fullscreen mode

This makes Gemma useful before the final explanation step. It helps determine how the system should clean and prepare uncertain Wi-Fi evidence before displaying an interpretation.

2. Gemma converts numerical evidence into meaning

Even after filtering, numbers such as variance, RSSI deviation, activity distance and motion energy are not naturally understandable to most people.

Without Gemma, the operator might only see:

signal_variance=5.81
signal_std=2.41
rssi_std=1.27
activity_candidate=sitting
quality=GOOD
Enter fullscreen mode Exit fullscreen mode

With Gemma 4, the same evidence becomes a safe briefing:

The RF evidence is consistent with a stationary occupied room.
Signal quality is usable and motion energy is low, which supports
a sitting activity candidate. This is not visual confirmation and
cannot identify any individual.
Enter fullscreen mode Exit fullscreen mode

That is the real value of Gemma in this project: it transforms technical RF measurements into understandable human decisions while preserving uncertainty.


What Gemma 4 Actually Receives

The explanation model never receives camera frames or personal identity data.

For the Observatory, Gemma receives a compact structured event containing only the evidence required for interpretation:

{
  "source": "live_esp32",
  "signal": {
    "quality": "GOOD",
    "packet_count": 1007,
    "reasons": ["stable packet stream", "usable signal variance"]
  },
  "motion": {
    "state": "low_motion",
    "display_level": "low"
  },
  "persons": {
    "range": "candidate_present"
  },
  "visual": {
    "trust": "trusted",
    "pose_state": "sitting_candidate"
  },
  "limitations": [
    "single ESP advisory mode",
    "no identity inference",
    "not a medical device"
  ]
}
Enter fullscreen mode Exit fullscreen mode

Gemma returns structured JSON for the UI:

{
  "status": "trusted",
  "room_interpretation": "The CSI summary supports a stationary occupied-room candidate.",
  "why": [
    "signal quality is usable",
    "motion level is low",
    "activity evidence matches the sitting calibration pattern"
  ],
  "next_action": "Continue monitoring or collect additional labeled sitting windows.",
  "judge_caption": "Trusted Wi-Fi CSI activity state rendered without camera input.",
  "telegram_message": "Trusted CSI: stationary occupied-room candidate detected. No identity inference.",
  "confidence": 0.86
}
Enter fullscreen mode Exit fullscreen mode

This gives the dashboard a consistent contract: status, explanation, reasons, next action, judge-facing caption, Telegram-safe text and confidence.


Gemma 4 Is Also a Safety Layer

Room sensing is powerful, but it is easy to overclaim.

A noisy signal should not become a confident statement such as “a person is definitely sitting” or “heart rate is accurate.” The system is intentionally built to prevent this.

The Gemma instruction layer explicitly prevents claims about:

  • camera-like vision
  • personal identity
  • medical diagnosis
  • true body-pose reconstruction
  • certainty when signal quality is weak

The application also keeps a deterministic trust gate after the model response.

If the Wi-Fi signal is weak, unstable or blocked, the system does not allow an overconfident explanation to pass through. The final result is downgraded to a warning such as:

The ESP32 stream is visible, but the signal is not trusted enough
for a room-state claim. Improve packet rate and RSSI stability before
trusting the activity visualization.
Enter fullscreen mode Exit fullscreen mode

This means Gemma is not being used merely to produce impressive language. It is being used inside a constrained evidence workflow where unsafe claims are blocked.


Calibration Coach: Gemma Helps Improve the Experiment

A Wi-Fi CSI system depends heavily on calibration. The same physical activity can look different depending on router placement, room layout and ESP32 position.

I collect labeled windows such as:

  • empty
  • sitting
  • walking

The local pipeline stores calibration evidence and evaluates whether enough samples exist for useful classification.

Gemma 4 then works as a calibration coach. It summarizes the current readiness, explains which labels are underrepresented and recommends the next capture needed to improve the system.

For example:

Sitting and empty-room samples are available, but walking evidence is
underrepresented. Capture another walking session with the ESP32 and
router positions unchanged before relying on live activity comparisons.
Enter fullscreen mode Exit fullscreen mode

This makes the experiment easier to improve without hiding the limitations of a single-device prototype.


Smart Telegram Alerts From Real Evidence

The system also connects the live interpretation to a useful action: Telegram alerts.

When presence-like evidence is detected, the application can prepare a short message containing useful information such as:

  • detected condition
  • supporting signal statistics
  • selected filter
  • Gemma confidence
  • safety-aware interpretation

For example:

Trusted CSI: stationary occupied-room candidate detected.
Signal quality is usable and motion is low.
Filter selected: Hampel.
This is RF-based evidence only and does not identify a person.
Enter fullscreen mode Exit fullscreen mode

The Telegram flow is intentionally safe:

  • Alerts are based on processed evidence, not raw camera or microphone data.
  • A cooldown prevents repeated alert spam.
  • In the Observatory UI, the prepared message is visible before sending.
  • Delivery requires an explicit user action.
  • The interface returns a masked acknowledgment rather than exposing Telegram credentials.

So Gemma does not only explain the result on screen. It helps turn noisy sensor evidence into a concise, safe and actionable notification.


Why I Chose Gemma 4 31B Dense

I used:

Primary model: gemma-4-31b-it
Fallback model: gemma-4-26b-a4b-it
Enter fullscreen mode Exit fullscreen mode

I chose Gemma 4 31B Dense because this project needs careful reasoning over uncertain evidence, not decorative text generation.

The model must:

  • select an appropriate filter strategy from numerical signal statistics
  • return strict structured JSON for the application
  • explain why the evidence supports a candidate state
  • avoid overclaiming when the signal is weak
  • produce short operator-facing and Telegram-safe messages
  • assist calibration decisions using compact experiment summaries

For the final operator briefing, reasoning quality matters more than generating many responses quickly. The 31B Dense model is therefore used as the primary advisor.

I configured Gemma 4 26B MoE as a fallback model so the system can still return a hosted Gemma result when the primary route is unavailable or unsuitable for the current request.

Both model paths are wrapped with:

  • structured JSON responses
  • deterministic temperature settings
  • bounded input summaries
  • local fallback rules
  • post-model safety gates

This lets Gemma contribute meaningfully while the system remains stable and inspectable.


Architecture

┌─────────────────────────────┐
│ ESP32 DevKit V1             │
│ Live Wi-Fi CSI packets      │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│ Python Signal Pipeline      │
│ Parse packets + features    │
│ variance / RSSI / outliers  │
└──────────────┬──────────────┘
               │ compact numerical evidence
               ▼
┌─────────────────────────────┐
│ Gemma 4 Filter Advisor      │
│ Select filter + parameters  │
└──────────────┬──────────────┘
               │ structured decision
               ▼
┌─────────────────────────────┐
│ Deterministic DSP Tools     │
│ Median / Hampel / Low-pass  │
│ Moving average              │
└──────────────┬──────────────┘
               │ cleaned evidence
               ▼
┌─────────────────────────────┐
│ Gemma 4 Explanation Layer   │
│ Trust-aware briefing        │
│ Calibration guidance        │
│ Telegram-safe message       │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│ React + Three.js Observatory│
│ Timeline / Avatar / Alerts  │
└─────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Code

Repository:

https://github.com/priyanshuchawda/esp32-ai-builder

Main parts of the project:

  • src/, include/, platformio.ini: ESP32 firmware workspace for CSI streaming.
  • esp32-csi-gemma-filter/python-engine/gemma_advisor.py: Gemma 4 advisor selecting mathematical filtering strategies from summary features.
  • esp32-csi-gemma-filter/python-engine/filters.py: deterministic moving average, median, Hampel and low-pass filtering implementations.
  • esp32-csi-gemma-filter/python-engine/app.py: live serial processing pipeline connecting ESP32 evidence, Gemma decisions, filtering and alerts.
  • backend/ai_advice.py: Gemma-powered Observatory explanation layer with structured output and trust gating.
  • backend/telegram_delivery.py: explicit Telegram delivery with masked acknowledgment.
  • frontend/: React/Vite/Three.js Observatory UI showing live evidence, Gemma explanation, calibration state and alert actions.


Validation

I tested the system end to end with an ESP32 DevKit V1 connected on COM5.
omplete path:

Real ESP32 packets
→ numerical feature extraction
→ Gemma 4 filter decision
→ mathematical filtering
→ activity candidate
→ trust-aware Gemma explanation
→ Observatory visualization
→ Telegram delivery action
Enter fullscreen mode Exit fullscreen mode

Why This Matters

Many room-awareness systems rely on cameras, microphones or specialized radar equipment.

This project explores a different direction:

  • no camera frames
  • no face images
  • no microphone recording
  • one low-cost ESP32
  • local numerical signal processing
  • Gemma 4 reasoning over compact evidence
  • explicit safety limits
  • actionable operator notifications

The goal is not to pretend Wi-Fi signals provide perfect vision.

The goal is to show that a small, inexpensive sensor plus a carefully constrained open model can turn uncertain RF evidence into something understandable and useful, while remaining respectful of privacy.


Final Thoughts

The most important part of this project is not the dashboard animation or the ESP32 alone.

It is the complete reasoning loop:

The ESP32 observes invisible Wi-Fi changes.

Mathematical tools extract and clean evidence.

Gemma 4 decides how to interpret that evidence safely.

The Observatory makes it understandable.

Telegram makes it actionable.

Gemma 4 transformed this from a raw RF experiment into an evidence-grounded, privacy-aware room intelligence system.

One ESP32 cannot see like a camera.

But with careful signal processing and Gemma 4 reasoning, it can explain what the room signal is trying to say.

Top comments (0)