I’ve been playing with browser-based computer vision for a while, and I ended up building something I didn’t expect to feel this fast in practice.
It’s called FrameFind.
The first module detects whether someone is wearing glasses in real time, but the interesting part isn’t the feature itself — it’s how it runs.
Everything executes locally in the browser using ONNX Runtime Web. No backend, no uploads, no API calls. Just a camera feed and a model running on-device.
What surprised me most was that instead of running inference on full frames, I started using MediaPipe FaceMesh landmarks to isolate just the eye region. That small change made a huge difference. The model only sees a 112x112 crop focused on the relevant area, which keeps things fast and stable.
The current model is around 6.2MB and sits at roughly ~27ms per inference on my machine. It’s small enough that it loads quickly and can be cached for near-instant startup on repeat visits.
The pipeline ends up looking something like:
FaceMesh → eye ROI crop → tensor normalization → ONNX inference → smoothing over time
Smoothing was necessary because raw predictions flicker a bit frame-to-frame, especially when lighting changes or the face is partially occluded.
The stack behind it is fairly simple:
ONNX Runtime Web for inference, MediaPipe for landmarks, and optional WebGPU acceleration depending on the environment. It also falls back gracefully when WebGPU isn’t available.
I built a React hook on top of it because I wanted something you could drop into a UI without thinking too much about the underlying pipeline. There’s also a Node.js version for server-side image processing, but the browser version is the main focus.
What I’m trying to explore with this project is less “glasses detection” and more whether small, specialized vision models can make real-time UI interactions more practical in the browser.
Instead of sending frames to a server or relying on heavy cloud APIs, the idea is that more of this kind of computation can just live inside the client.
There are obvious tradeoffs, but the latency and privacy advantages are hard to ignore when everything stays on-device.
Live demo:
https://framefind.moraxh.dev/

Top comments (0)