DEV Community

Cover image for GemmaLink: Your Private Eye Assistant
Marco Sbragi
Marco Sbragi

Posted on

GemmaLink: Your Private Eye Assistant

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

Most local AI setups are currently a dependency nightmare, forcing users into heavy Python environments, massive CUDA toolkits, or complex Docker configurations. I built GemmaLink to achieve the exact opposite: a "Zero-Cloud", local-first assistant that turns your smartphone into a targeted vision sensor for local VLMs, running entirely on a standard PC with a single-file, plug-and-play binary.

GemmaLink allows you to open a web interface on your smartphone, point it at an object, capture a precise crop via an interactive on-screen viewfinder, and chat about what the camera sees with a local model running on your machine.

Unlike general-purpose tools like Google Lens, which index data on remote servers for commercial classification, GemmaLink is a strictly confidential sandbox. Because it streams data exclusively over your local network, it enables critical use cases where third-party data exposure is unacceptable:

  • Financial Confidentiality: Point your phone at bank statements or invoices to extract line items or analyze balances safely.
  • Private Medical Insights: Process the layout of localized medical data or blood test terminology without uploading your health history to the cloud.

Built-in Guardrails

Handling sensitive, real-world data requires architectural responsibility. GemmaLink enforces explicit notifications regarding the system's inherent fallibility, prompting the user to always consult certified professionals when validating critical financial or medical outputs.


Demo

I have recorded a comprehensive video walkthrough showcasing the complete lifecycle: the adaptive dual-mode interface initialization, the network handshake, the high-precision viewport cropping, and the real-time Server-Sent Events (SSE) token streaming.

Watch the full demo on YouTube: GemmaLink Walkthrough & Architecture Demo


Code

The project is fully modular, featuring a decoupled network layout where firewall rules and port-forwarding scripts (.ps1 and .sh) remain external for maximum user transparency and maintenance efficiency.

Source Code & Binary Assets (v1.0.0): GitHub Repository - eye-assistant


How I Used Gemma 4

GemmaLink is deliberately optimized for the Gemma 4 lightweight vision family. Choosing an ultra-lightweight, efficient vision-capable model was a strategic architectural choice for two main reasons:

  1. Low-Latency Edge Performance: The primary objective was to guarantee a fast Time-To-First-Token in constrained local environments (pure CPU or Vulkan fallback) without demanding enterprise-grade hardware.
  2. Contextual Token Efficiency: Blasting full-resolution mobile snapshots kills local inference speed and pollutes the attention matrix. The frontend computes the exact scale ratios ($videoWidth / videoRect.width$) relative to the CSS viewfinder crosshair, dynamically cropping only the targeted pixels. This surgical payload reduction matches the Gemma 4 vision sensor bounds perfectly, resulting in lightning-fast processing loops.

Driving AI without "Vibecoding"

With over 40 years of experience writing software, Go was not part of my traditional stack. I chose it because its high-concurrency model and clean cross-compilation were required for true single-binary deployment.

While I utilized an AI assistant to accelerate the implementation of the Go backend, this was absolutely not "vibecoding". The AI served as a syntax compiler and fast writer, but the technical steering wheel remained firmly in my hands. The deterministic state machine (preview -> ask -> response), the memory management (explicit URL.revokeObjectURL cleanups to prevent mobile memory leaks), and the streaming chunk buffer that prevents incomplete Markdown strings from flickering during SSE delivery were entirely engineered under my tight architectural directives.


What's Next? (Community-Driven Roadmap)

The core release (v1.0.0) is tagged and stable. I have a backlog of advanced features mapped out, which I will implement if the project gains traction:

  • Multilingual Smartphone UI: Dynamic localization driven by browser headers.
  • JSON-Driven Context Chips: Offloading the quick-question preset chips to an external, customizable chips.json for manual user tuning.
  • Automated Hardware Dispatching: Orchestrating automatic matching of specialized llama.cpp libraries based on real-time instruction set detections directly from the Go launcher.

Top comments (0)