Most AI tools feel disconnected.
They don’t see your screen.
They don’t understand what you're doing.
So I built one that does.
Meet...
For further actions, you may consider blocking this person and/or reporting abuse
The local-first + context-aware combo is the right bet. Most AI tools treat your desktop like it doesn't exist and wait for you to copy-paste stuff into a chat window.
One thing that could push this further: web context. I've been building an AI-driven browser that navigates and extracts content autonomously. Your blob sees the desktop, mine sees the web. Plugging the two together means the companion could actually go fetch what you need based on what it sees you doing, instead of just reacting to it.
Happy to explore a plugin integration if you're open to it.
That’s actually a really interesting direction.
OpenBlob already does some browser interaction via Chrome/Edge (remote debugging), so it can navigate, click, type and read page context — but it’s still more command-driven than truly autonomous.
What you’re describing goes a step further:
not just controlling the browser, but understanding and navigating the web on its own.
Combining that with desktop-level awareness would be pretty powerful:
seeing what you’re doing locally, deciding what’s needed and fetching it from the web.
We’re planning a plugin / capability system, so something like this could fit really well as an extension layer.
Definitely open to exploring that 👀
I followed you on GitHub, there’s a way to reach you out ?
I’ve added my email and Discord on my GitHub profile, feel free to reach out there
This is a fascinating project! On-device AI companions need a data layer that goes beyond just storing conversation history — you also need persistent memory of past sessions, learned preferences, and efficient retrieval of relevant context.
We built moteDB (Rust-native embedded multimodal DB) with exactly this in mind: vector + time-series + structured data in one zero-dependency engine that runs on the same machine as your AI agent. No cloud required, no separate database server. Would love to follow this project and compare notes on the on-device data architecture. Best of luck with the build!
Thank you! I'll check it out!
Love it - i vibecoded it to connect to my llama & whisper on my strix halo, search via searxng and automatically taking screenshot of my whole desktop and following my question (what do you see in the middle of the screen). So it acts like a chatgpt with the possibility to see my desktop, all purely by voice. Next up would be interacting with windows, but thats a whole lot more complex to vibecode i guess.
The snipping tool felt a bit cumbersome and the browsing functionality is not there yet. Its just a bit slow atm but that might be gemma 4, which is not the fastest for quick questions.
Great project 🔥
Thank you!
Great project! I
Thanks!