southy404

Posted on Apr 15

OpenBlob is evolving: better architecture, modern UI, and real-time transcripts

#ai #opensource #github #agents

Over the past days, OpenBlob changed a lot.

Not just visually — but fundamentally.

This is a proper progress update on where things are heading 👇

🧠 Quick recap

OpenBlob is a local-first desktop AI companion that:

lives on your desktop
understands your context
can see your screen (via vision models)
reacts in real-time
executes actions directly on your system

👉 Repo: https://github.com/southy404/openblob

🔧 Rebuilding the core (this was the big one)

The biggest update isn’t something you see. It’s how everything works underneath. OpenBlob now has a much cleaner and more scalable structure:

Core pipeline

input (voice / text / screen)
→ intent detection
→ command router
→ execution (local first)
→ AI fallback if needed

What changed

Clear separation of responsibilities
Proper command routing system
Modular capabilities instead of chaos
Easier to extend without breaking everything

This turns OpenBlob into something bigger than a chatbot: a runtime layer for your desktop.

🧩 Open-source friendly structure

One goal became very clear: this needs to be hackable. So the architecture is moving towards a module system like this:

📁 modules/
↳ 📁 discord/
↳ 📁 spotify/
↳ 📁 browser/
↳ 📁 system/

Each module:

exposes commands
runs locally
can be extended independently

This makes it much easier to:

build plugins
integrate APIs
experiment without touching the core

🎨 New UI (cleaner, faster, more alive)

The UI got a big upgrade:

Floating bubble interface
Glassmorphism style
Smoother, more organic animations
Faster interaction

Interaction now feels like:

CTRL + SPACE → instant open
Global voice toggle
Minimal friction

Less “tool”. More presence.

💬 NEW: Just Chatting mode

Sometimes you don’t want commands. You just want to talk. So OpenBlob now has a Just Chatting mode:

Pure conversation with your AI companion
No command routing
No execution layer
Just dialogue

This is important because: the companion shouldn’t only do things — it should also be there.

Use cases:

Thinking out loud
Asking questions
Casual conversation
Testing personality / tone

🖼 Screenshot assistant (more usable now)

The screen pipeline is getting more solid:

screenshot
→ OCR
→ context extraction
→ reasoning
→ answer

Already useful for:

Debugging
UI understanding
Games
Quick research

Still improving — but getting reliable.

🎙️ NEW: real-time transcript system

This is one of the biggest new additions. OpenBlob can now:

Listen to system audio
Listen to microphone input
Generate live transcripts
Store structured sessions

Pipeline

audio (system / mic)
→ transcription
→ segmented timeline
→ structured session
→ saved as text

What it already works for

Meetings (Meet, Zoom, etc.)
YouTube / podcasts
Lectures
General audio capture

🧪 Current prototype

Live text appearing in real-time
Segmented transcript blocks
Session tracking
Simple overlay UI

It’s still early. But it works.

🔮 Where transcripts are going

This is not just speech-to-text. Next steps:

📝 Meeting assistant

Summaries
Key points
Action items

🧠 Memory layer

Link transcripts to context
Searchable history

⚡ Real-time help

Explain while listening
Highlight important info
Suggest responses

⚡ Philosophy (still the same)

Local-first
Context > Prompt
System-level AI
Playful + useful

🧪 Current state

Still experimental
Still buggy sometimes
Evolving very fast

But now: Much better structure, clearer direction, and easier to contribute.

🤝 If you want to join

Now is actually a great time. You can:

Build modules (Discord, Spotify, browser, etc.)
Improve transcription
Design UI
Experiment with AI

👉 Join here: https://github.com/southy404/openblob

💡 Final thought

I’m starting to believe the future of AI is not a chat window in a browser.

But something that lives on your system, understands your context, and can both act and talk.

OpenBlob is slowly getting there.

Top comments (1)

Benjamin Nguyen • Apr 16

great explanation