Akash

Posted on Jan 19

I Built a Voice-Controlled OBS Assistant (Metaltank) — Here’s What Really Happened

#ai #opensource #beginners #software

This post is about debugging pain, systems thinking, and the moment everything finally worked.

🎯 What I Wanted to Build

I wanted to remove clicks from OBS.

Not automate one button.

Not trigger a hotkey.

I wanted to talk to OBS.

Say things like:

“Metaltank mute mic”
“Metaltank switch scene”
“Metaltank start recording”

…and have OBS respond instantly.

No Stream Deck.

No keyboard shortcuts.

No mouse.

Just voice → intent → OBS WebSocket → action.

That project is called Metaltank.

🧠 What I Actually Built (So Far)

Metaltank is a Node.js-based voice controller for OBS that:

Connects to OBS using obs-websocket
Captures microphone audio using arecord
Streams short audio chunks to whisper.cpp (local, offline)
Converts speech → text
Parses intent using a custom rule engine
Executes OBS actions (mute, unmute, toggle mic, scenes, recording)

All offline.

No cloud APIs.

No paid services.

⚙️ Tech Stack

Node.js (ESM)
OBS WebSocket
whisper.cpp (server mode)
arecord (ALSA)
Custom rule-based intent parser

Simple stack.

Hard execution.

😤 Why This Was Way Harder Than It Sounds

Let me be honest — nothing worked the first time.

1️⃣ Native Modules Failed (Vosk)

I initially tried vosk.

It failed because:

Native compilation
Missing build tools
Node-gyp issues
Environment limitations

Lesson:

“Offline” doesn’t always mean “easy.”

2️⃣ OBS Was “Connected” But Not Ready

I kept hitting errors like:

Error: Socket not identified
Error: Not connected

Root cause:

OBS WebSocket connect ≠ identify
Actions were being called before OBS completed its handshake

Fix:

Explicit OBS-ready state
No actions allowed before identification

3️⃣ Voice Was Triggering Commands Without Me Speaking

At one point, Metaltank muted my mic without me saying anything.

Why?

Simulated voice input still wired
Voice module executing too early
No wake-word guard

Fix:

Strict wake word: metaltank
OBS readiness gate
Clear separation between CLI and voice input

4️⃣ whisper.cpp Flags Betrayed Me

I tried flags like:

--step
--length

They don’t exist in whisper-cli.

Fix:

Stop guessing flags
Read --help
Switch to whisper-server
POST WAV files properly

Lesson:

Always check the CLI help. Even when you’re confident.

5️⃣ Audio Was Recording… But Whisper Heard Nothing

This was the hardest part.

WAV files existed
Audio played correctly
Whisper returned [BLANK_AUDIO]

Root causes:

Chunk timing too short
Silence dominance
Wrong assumptions about streaming

Fix:

Fixed-length chunks (3 seconds)
File-based inference
Let whisper finish before deleting audio

🔥 The Moment It Worked

🗣️ Heard: mute mic
[VOICE] MUTE_MIC
🎙 Mic muted

I didn’t celebrate loudly.

I just smiled.

Because this wasn’t luck —

it was layers finally aligning.

🧩 Current Metaltank Capabilities

🎙 Mute / unmute / toggle mic
🎬 Scene control
⏺ Recording control
🧠 Continuous listening
🔒 Fully offline

OBS reacts to my voice.

🧠 What I Learned

“Connected” doesn’t mean “ready”
Audio pipelines fail silently
Logging saves hours
If you’re confused, the system is confused too

Biggest lesson:

Complex systems don’t fail loudly — they fail quietly.

🚧 Still Phase 1

This is still Phase 1.

The vision is bigger:

Zero-click OBS setup
Scene creation via voice
Layout & webcam control
Full recording workflows

The goal stays simple:

No clicks. Only intent.

🏁 Final Thoughts

This project reminded me why I love engineering.

Not because things work —

but because they don’t, and you make them.

If you’re building something ambitious and it feels impossible right now:

You’re probably doing it right.

DEV Community

I Built a Voice-Controlled OBS Assistant (Metaltank) — Here’s What Really Happened

🎯 What I Wanted to Build

🧠 What I Actually Built (So Far)

⚙️ Tech Stack

😤 Why This Was Way Harder Than It Sounds

1️⃣ Native Modules Failed (Vosk)

2️⃣ OBS Was “Connected” But Not Ready

3️⃣ Voice Was Triggering Commands Without Me Speaking

4️⃣ whisper.cpp Flags Betrayed Me

5️⃣ Audio Was Recording… But Whisper Heard Nothing

🔥 The Moment It Worked

🧩 Current Metaltank Capabilities

🧠 What I Learned

🚧 Still Phase 1

🏁 Final Thoughts

Top comments (0)