I Built a Local AI Vision System That Knows When to Ask a Bigger Gemma 4 Model for Help

#devchallenge #gemmachallenge #gemma

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

I built GemmaEdge Hub, a two-device local AI vision system that keeps routine webcam analysis on an edge device and escalates harder cases to a stronger local machine.

The edge device runs Gemma 4 E2B locally for fast, private inference. When a frame is uncertain, safety-relevant, or due for a periodic audit, it sends that frame to a Mac Mini for deeper analysis. The Mac Mini also hosts a live dashboard showing the edge answer, escalated answer, confidence values, latency, and recent frames.

The core idea is simple: use the small model for the common path, and only spend bigger-model compute when the situation deserves it.

This architecture is useful for:

Home or small-office monitoring, where ordinary frames stay local but possible smoke, fire, injury, or unusual activity gets reviewed.
Workshop and lab safety, where an edge device can watch for risky visual cues near equipment without sending every frame across the network.
Accessibility assistance, where quick local scene descriptions can be escalated when a scene is ambiguous or safety-related.
Retail or front-desk awareness, where routine activity can be summarized locally and unusual situations can be logged for review.
Edge AI prototyping, because the project makes it easy to experiment with model routing, escalation policies, and prompt-based upskilling.

Demo

The live demo runs across two Macs on the same local network:

The MacBook Air captures webcam frames.
Gemma 4 E2B gives a fast local answer with a confidence score.
Routine frames stay on the edge device.
Uncertain, safety-relevant, or audited frames are escalated.
The Mac Mini analyzes the escalated frame and updates the dashboard in real time.

Dashboard during the demo:

http://localhost:8000

Code

Repository:

https://github.com/Prerak1520/gemmaedge-hub

Main files:

air/sensor.py: webcam capture, local inference, and escalation decisions
air/client.py: HTTP client for sending escalations to the Mac Mini
mac/server.py: FastAPI server, stronger-model inference, and live dashboard
mac/upskill_train.py: teacher-student prompt optimization
shared/protocol.py: shared request/response schema

How I Used Gemma 4

I chose Gemma 4 E2B for the edge device because it is small enough to run locally and quickly while keeping routine camera frames private. That made it the right fit for an edge-first vision workflow.

Gemma 4 powers the main loop:

The edge model describes each webcam frame.
The system extracts a confidence signal.
Escalation logic decides whether the local answer is enough.
Safety keywords and periodic audits catch overconfident answers.
A stronger local Gemma model can review harder cases on the Mac Mini.

One important lesson was that self-reported confidence alone is not enough. During testing, the small model often returned high confidence even when the answer still deserved review. I updated the system so escalation considers low confidence, safety-relevant keywords, and periodic audits of overconfident answers.

I also added a teacher-student upskilling step. The Mac Mini generates and scores improved system prompts for the smaller edge model, then the winning prompt is copied back to the edge device as skill.txt. This improves the edge model's behavior without fine-tuning weights.

Why This Fits the Build Criteria

Intentional and effective use of Gemma 4: Gemma 4 is central to the system. E2B handles fast local inference where privacy and responsiveness matter most, while escalation gives harder cases more reasoning power.

Technical implementation and code quality: The project includes separate edge and server modules, shared Pydantic protocol models, FastAPI escalation, configurable audit behavior, safer dashboard rendering, and clear setup docs.

Creativity and originality: Instead of building a single-model demo, this treats local AI like a small distributed system with routing, auditing, and teacher-student prompt improvement.

Usability and user experience: The dashboard makes the system understandable in real time by showing local answers, escalated answers, confidence, latency, and recent frames.

What I Learned

The biggest design lesson was that model orchestration matters as much as model choice. A small local model is great for privacy and responsiveness, but it needs a good policy for knowing when to ask for help. A larger local model is powerful, but it is too slow and expensive to run on every frame.

GemmaEdge Hub combines both: private edge inference by default, stronger local reasoning when needed, and a dashboard that makes the escalation path visible.