This is a submission for the Gemma 4 Challenge: Build with Gemma 4
What I Built
What if a room-awareness system could recognize meaningful activity without capturing a single image or recording a single sound?
I built ESP32 Wi-Fi CSI Observatory, a camera-free spatial sensing system that uses Wi-Fi Channel State Information from a low-cost ESP32 DevKit V1 to observe changes inside a room.
Wi-Fi signals naturally change when a person enters, sits, walks or moves nearby. But those signals are noisy. Raw CSI packets contain spikes, unstable readings, variable packet rates and signal fluctuations that are difficult for a human operator to understand directly.
That is where Gemma 4 becomes central to the project.
Gemma 4 does not pretend to be a camera. It does not receive face images, video frames or identity information. Instead, it receives compact signal statistics produced from the ESP32 stream, selects an appropriate mathematical filtering strategy, interprets the cleaned evidence safely, explains what the system can and cannot claim, and prepares an actionable Telegram message for the operator.
The complete flow is:
Real room movement
↓
ESP32 Wi-Fi CSI packets
↓
Local Python feature extraction
↓
Gemma 4 filter and parameter decision
↓
Deterministic mathematical signal filtering
↓
Occupancy / activity evidence
↓
Gemma 4 safe explanation
↓
React Observatory + Telegram action
The result is a live system where invisible RF changes become readable, explainable and actionable — without a camera.
Demo
Live ESP32 Observatory
The terminal receiver proves that the sensing input is real. During the final demonstration, an ESP32 DevKit V1 connected on COM5 streamed live CSI packets into the processing pipeline. The system displayed packet activity, RSSI, subcarrier behavior, motion energy, exploratory rhythm estimates and activity candidates while the Observatory visualized the same evidence in real time.
The Problem: Wi-Fi Signals Are Useful, but Noisy
An ESP32 can observe changes in Wi-Fi signal behavior, but the raw stream is not immediately useful.
A real CSI window may contain:
- sudden spikes and outliers
- unstable RSSI changes
- short-term jitter
- missing or weak packets
- movement patterns mixed with background noise
A fixed filter is not always ideal. A window with sharp outliers may need different treatment from a window with smooth but continuously noisy variation.
Instead of applying one hard-coded smoothing strategy to every case, I used Gemma 4 as a signal-filtering advisor.
The local pipeline first computes compact numerical features such as:
{
"sample_count": 29,
"outlier_ratio": 0.18,
"signal_std": 2.41,
"signal_variance": 5.81,
"rssi_std": 1.27,
"missing_count": 0
}
Gemma 4 then chooses the most suitable filtering approach and parameters from an allowed set:
moving_averagemedianhampellowpassnone
A structured Gemma decision looks like this:
{
"filter": "hampel",
"window_size": 5,
"outlier_threshold": 3.0,
"lowpass_alpha": 0.25,
"confidence": 0.87,
"reason": "The signal window contains spike-like outliers, so local median replacement is appropriate before interpretation."
}
This is an important architectural choice:
Gemma 4 does not fabricate sensor values and does not directly modify raw signal samples. It reasons over numerical evidence and selects the mathematical processing strategy. Deterministic Python code then applies that filter.
That separation keeps the pipeline inspectable and reliable.
How Gemma 4 Improves the Signal Pipeline
The project combines model reasoning with real numerical tools.
1. Mathematical filtering remains deterministic
The local Python engine implements the actual signal-processing methods:
- Moving average for general smoothing.
- Median filtering for short spike noise.
- Hampel filtering for statistical outlier replacement.
- Low-pass filtering for reducing rapid noise while preserving slower movement trends.
Gemma 4 selects which method is appropriate for the current signal window and recommends parameters such as filter window size, outlier threshold or low-pass alpha.
For example:
High outlier ratio
→ Gemma chooses median or Hampel filtering
High continuous signal variation
→ Gemma chooses moving average or low-pass filtering
Clean, stable signal
→ Gemma chooses no unnecessary filtering
This makes Gemma useful before the final explanation step. It helps determine how the system should clean and prepare uncertain Wi-Fi evidence before displaying an interpretation.
2. Gemma converts numerical evidence into meaning
Even after filtering, numbers such as variance, RSSI deviation, activity distance and motion energy are not naturally understandable to most people.
Without Gemma, the operator might only see:
signal_variance=5.81
signal_std=2.41
rssi_std=1.27
activity_candidate=sitting
quality=GOOD
With Gemma 4, the same evidence becomes a safe briefing:
The RF evidence is consistent with a stationary occupied room.
Signal quality is usable and motion energy is low, which supports
a sitting activity candidate. This is not visual confirmation and
cannot identify any individual.
That is the real value of Gemma in this project: it transforms technical RF measurements into understandable human decisions while preserving uncertainty.
What Gemma 4 Actually Receives
The explanation model never receives camera frames or personal identity data.
For the Observatory, Gemma receives a compact structured event containing only the evidence required for interpretation:
{
"source": "live_esp32",
"signal": {
"quality": "GOOD",
"packet_count": 1007,
"reasons": ["stable packet stream", "usable signal variance"]
},
"motion": {
"state": "low_motion",
"display_level": "low"
},
"persons": {
"range": "candidate_present"
},
"visual": {
"trust": "trusted",
"pose_state": "sitting_candidate"
},
"limitations": [
"single ESP advisory mode",
"no identity inference",
"not a medical device"
]
}
Gemma returns structured JSON for the UI:
{
"status": "trusted",
"room_interpretation": "The CSI summary supports a stationary occupied-room candidate.",
"why": [
"signal quality is usable",
"motion level is low",
"activity evidence matches the sitting calibration pattern"
],
"next_action": "Continue monitoring or collect additional labeled sitting windows.",
"judge_caption": "Trusted Wi-Fi CSI activity state rendered without camera input.",
"telegram_message": "Trusted CSI: stationary occupied-room candidate detected. No identity inference.",
"confidence": 0.86
}
This gives the dashboard a consistent contract: status, explanation, reasons, next action, judge-facing caption, Telegram-safe text and confidence.
Gemma 4 Is Also a Safety Layer
Room sensing is powerful, but it is easy to overclaim.
A noisy signal should not become a confident statement such as “a person is definitely sitting” or “heart rate is accurate.” The system is intentionally built to prevent this.
The Gemma instruction layer explicitly prevents claims about:
- camera-like vision
- personal identity
- medical diagnosis
- true body-pose reconstruction
- certainty when signal quality is weak
The application also keeps a deterministic trust gate after the model response.
If the Wi-Fi signal is weak, unstable or blocked, the system does not allow an overconfident explanation to pass through. The final result is downgraded to a warning such as:
The ESP32 stream is visible, but the signal is not trusted enough
for a room-state claim. Improve packet rate and RSSI stability before
trusting the activity visualization.
This means Gemma is not being used merely to produce impressive language. It is being used inside a constrained evidence workflow where unsafe claims are blocked.
Calibration Coach: Gemma Helps Improve the Experiment
A Wi-Fi CSI system depends heavily on calibration. The same physical activity can look different depending on router placement, room layout and ESP32 position.
I collect labeled windows such as:
emptysittingwalking
The local pipeline stores calibration evidence and evaluates whether enough samples exist for useful classification.
Gemma 4 then works as a calibration coach. It summarizes the current readiness, explains which labels are underrepresented and recommends the next capture needed to improve the system.
For example:
Sitting and empty-room samples are available, but walking evidence is
underrepresented. Capture another walking session with the ESP32 and
router positions unchanged before relying on live activity comparisons.
This makes the experiment easier to improve without hiding the limitations of a single-device prototype.
Smart Telegram Alerts From Real Evidence
The system also connects the live interpretation to a useful action: Telegram alerts.
When presence-like evidence is detected, the application can prepare a short message containing useful information such as:
- detected condition
- supporting signal statistics
- selected filter
- Gemma confidence
- safety-aware interpretation
For example:
Trusted CSI: stationary occupied-room candidate detected.
Signal quality is usable and motion is low.
Filter selected: Hampel.
This is RF-based evidence only and does not identify a person.
The Telegram flow is intentionally safe:
- Alerts are based on processed evidence, not raw camera or microphone data.
- A cooldown prevents repeated alert spam.
- In the Observatory UI, the prepared message is visible before sending.
- Delivery requires an explicit user action.
- The interface returns a masked acknowledgment rather than exposing Telegram credentials.
So Gemma does not only explain the result on screen. It helps turn noisy sensor evidence into a concise, safe and actionable notification.
Why I Chose Gemma 4 31B Dense
I used:
Primary model: gemma-4-31b-it
Fallback model: gemma-4-26b-a4b-it
I chose Gemma 4 31B Dense because this project needs careful reasoning over uncertain evidence, not decorative text generation.
The model must:
- select an appropriate filter strategy from numerical signal statistics
- return strict structured JSON for the application
- explain why the evidence supports a candidate state
- avoid overclaiming when the signal is weak
- produce short operator-facing and Telegram-safe messages
- assist calibration decisions using compact experiment summaries
For the final operator briefing, reasoning quality matters more than generating many responses quickly. The 31B Dense model is therefore used as the primary advisor.
I configured Gemma 4 26B MoE as a fallback model so the system can still return a hosted Gemma result when the primary route is unavailable or unsuitable for the current request.
Both model paths are wrapped with:
- structured JSON responses
- deterministic temperature settings
- bounded input summaries
- local fallback rules
- post-model safety gates
This lets Gemma contribute meaningfully while the system remains stable and inspectable.
Architecture
┌─────────────────────────────┐
│ ESP32 DevKit V1 │
│ Live Wi-Fi CSI packets │
└──────────────┬──────────────┘
│
▼
┌─────────────────────────────┐
│ Python Signal Pipeline │
│ Parse packets + features │
│ variance / RSSI / outliers │
└──────────────┬──────────────┘
│ compact numerical evidence
▼
┌─────────────────────────────┐
│ Gemma 4 Filter Advisor │
│ Select filter + parameters │
└──────────────┬──────────────┘
│ structured decision
▼
┌─────────────────────────────┐
│ Deterministic DSP Tools │
│ Median / Hampel / Low-pass │
│ Moving average │
└──────────────┬──────────────┘
│ cleaned evidence
▼
┌─────────────────────────────┐
│ Gemma 4 Explanation Layer │
│ Trust-aware briefing │
│ Calibration guidance │
│ Telegram-safe message │
└──────────────┬──────────────┘
│
▼
┌─────────────────────────────┐
│ React + Three.js Observatory│
│ Timeline / Avatar / Alerts │
└─────────────────────────────┘
Code
Repository:
https://github.com/priyanshuchawda/esp32-ai-builder
Main parts of the project:
-
src/,include/,platformio.ini: ESP32 firmware workspace for CSI streaming. -
esp32-csi-gemma-filter/python-engine/gemma_advisor.py: Gemma 4 advisor selecting mathematical filtering strategies from summary features. -
esp32-csi-gemma-filter/python-engine/filters.py: deterministic moving average, median, Hampel and low-pass filtering implementations. -
esp32-csi-gemma-filter/python-engine/app.py: live serial processing pipeline connecting ESP32 evidence, Gemma decisions, filtering and alerts. -
backend/ai_advice.py: Gemma-powered Observatory explanation layer with structured output and trust gating. -
backend/telegram_delivery.py: explicit Telegram delivery with masked acknowledgment. -
frontend/: React/Vite/Three.js Observatory UI showing live evidence, Gemma explanation, calibration state and alert actions.
Validation
I tested the system end to end with an ESP32 DevKit V1 connected on COM5.
omplete path:
Real ESP32 packets
→ numerical feature extraction
→ Gemma 4 filter decision
→ mathematical filtering
→ activity candidate
→ trust-aware Gemma explanation
→ Observatory visualization
→ Telegram delivery action
Why This Matters
Many room-awareness systems rely on cameras, microphones or specialized radar equipment.
This project explores a different direction:
- no camera frames
- no face images
- no microphone recording
- one low-cost ESP32
- local numerical signal processing
- Gemma 4 reasoning over compact evidence
- explicit safety limits
- actionable operator notifications
The goal is not to pretend Wi-Fi signals provide perfect vision.
The goal is to show that a small, inexpensive sensor plus a carefully constrained open model can turn uncertain RF evidence into something understandable and useful, while remaining respectful of privacy.
Final Thoughts
The most important part of this project is not the dashboard animation or the ESP32 alone.
It is the complete reasoning loop:
The ESP32 observes invisible Wi-Fi changes.
Mathematical tools extract and clean evidence.
Gemma 4 decides how to interpret that evidence safely.
The Observatory makes it understandable.
Telegram makes it actionable.
Gemma 4 transformed this from a raw RF experiment into an evidence-grounded, privacy-aware room intelligence system.
One ESP32 cannot see like a camera.
But with careful signal processing and Gemma 4 reasoning, it can explain what the room signal is trying to say.


Top comments (0)