DEV Community

Cover image for I’m building a local AI desktop companion that sees your screen — and you can help shape it
southy404
southy404

Posted on • Edited on

I’m building a local AI desktop companion that sees your screen — and you can help shape it

Most AI tools feel disconnected.

They don’t see your screen.
They don’t understand what you're doing.

So I built one that does.


Meet OpenBlob

OpenBlob desktop AI companion showing animated blob avatar, floating UI, and context-aware interaction on Windows desktop

An open-source, local-first desktop AI companion for Windows that doesn’t just respond — it lives on your desktop.

👉 GitHub: https://github.com/southy404/openblob

It can:

  • understand what app you’re using
  • analyze screenshots
  • help inside games, apps, and browsers
  • react visually with an animated companion
  • and yes… even play hide and seek with you

The problem with current AI assistants

Most tools today are:

  • cloud-dependent
  • context-blind
  • static
  • not fun to use

They don’t feel like part of your system.


🧠 It understands context

OpenBlob looks at:

  • active window
  • app name
  • window title

So if you’re in a game, it knows.
If you're debugging, it adapts.

This is where things start to feel different.


🖼 It can see your screen

You can take a screenshot and it will:

  • extract visible text
  • detect what you're looking at
  • generate a real search query
  • explain what's going on
Screenshot → OCR → context → reasoning → answer
Enter fullscreen mode Exit fullscreen mode

Still a bit rough — but already very usable.


🎮 It actually helps inside games

Instead of:

alt-tab → google → guess

You can:

  • screenshot
  • let it detect the game
  • get a real answer

This alone changes how you play.


🤖 Multi-model AI (local-first)

Runs via Ollama with:

  • text models
  • vision models
  • fallback system

No cloud required.


🎨 It feels alive

The companion:

  • has moods (idle, thinking, love, sleepy)
  • reacts to interaction
  • can be “petted”
  • dances when music is playing

Small details, big difference.


🎮 The weird part (my favorite)

Hide and Seek mode

You can literally say:

“let’s play hide and seek”

And it will:

  • hide somewhere on your screen
  • peek occasionally
  • wait until you find it

Sounds dumb.

Feels surprisingly real.


⚡ New UI (WIP)

  • CTRL + SPACE to open
  • floating companion
  • instant interaction

Inspired by tools like Raycast / Arc — but alive.

⚠️ still slightly buggy


🧪 Screenshot assistant (work in progress)

  • fast snipping
  • instant processing
  • contextual answers

Works — but not perfect yet.


Why open source?

Because this shouldn’t belong to one company.

This kind of system should be:

  • transparent
  • hackable
  • community-built

Philosophy

  • local-first
  • context > prompt
  • playful + useful
  • build in public

Current state

Early stage.

  • evolving fast
  • sometimes buggy
  • lots of experiments

If you want to join

This project is wide open.

You can:

  • contribute features
  • improve UI
  • experiment with AI
  • build plugins

👉 https://github.com/southy404/openblob


Final thought

I don’t think the future of AI is chat.

I think it’s something that:

lives with you, understands your environment, and evolves

That’s what I’m trying to build.

Top comments (11)

Collapse
 
valentin_monteiro profile image
Valentin Monteiro

The local-first + context-aware combo is the right bet. Most AI tools treat your desktop like it doesn't exist and wait for you to copy-paste stuff into a chat window.

One thing that could push this further: web context. I've been building an AI-driven browser that navigates and extracts content autonomously. Your blob sees the desktop, mine sees the web. Plugging the two together means the companion could actually go fetch what you need based on what it sees you doing, instead of just reacting to it.

Happy to explore a plugin integration if you're open to it.

Collapse
 
southy404 profile image
southy404

That’s actually a really interesting direction.

OpenBlob already does some browser interaction via Chrome/Edge (remote debugging), so it can navigate, click, type and read page context — but it’s still more command-driven than truly autonomous.

What you’re describing goes a step further:
not just controlling the browser, but understanding and navigating the web on its own.

Combining that with desktop-level awareness would be pretty powerful:
seeing what you’re doing locally, deciding what’s needed and fetching it from the web.

We’re planning a plugin / capability system, so something like this could fit really well as an extension layer.

Definitely open to exploring that 👀

Collapse
 
valentin_monteiro profile image
Valentin Monteiro

I followed you on GitHub, there’s a way to reach you out ?

Thread Thread
 
southy404 profile image
southy404

I’ve added my email and Discord on my GitHub profile, feel free to reach out there

Collapse
 
motedb profile image
mote

This is a fascinating project! On-device AI companions need a data layer that goes beyond just storing conversation history — you also need persistent memory of past sessions, learned preferences, and efficient retrieval of relevant context.

We built moteDB (Rust-native embedded multimodal DB) with exactly this in mind: vector + time-series + structured data in one zero-dependency engine that runs on the same machine as your AI agent. No cloud required, no separate database server. Would love to follow this project and compare notes on the on-device data architecture. Best of luck with the build!

Collapse
 
southy404 profile image
southy404

Thank you! I'll check it out!

Collapse
 
panthau profile image
Phil Lee

Love it - i vibecoded it to connect to my llama & whisper on my strix halo, search via searxng and automatically taking screenshot of my whole desktop and following my question (what do you see in the middle of the screen). So it acts like a chatgpt with the possibility to see my desktop, all purely by voice. Next up would be interacting with windows, but thats a whole lot more complex to vibecode i guess.

The snipping tool felt a bit cumbersome and the browsing functionality is not there yet. Its just a bit slow atm but that might be gemma 4, which is not the fastest for quick questions.

Collapse
 
devbo1 profile image
Sebastian

Great project 🔥

Collapse
 
southy404 profile image
southy404

Thank you!

Collapse
 
benjamin_nguyen_8ca6ff360 profile image
Benjamin Nguyen

Great project! I

Collapse
 
southy404 profile image
southy404

Thanks!