chataclaw

Posted on Mar 12

Talk to Your AI — Meet Talkative Lobster

#ai #voice #showdev #openclaw

Have you ever wanted to just talk to your AI assistant? Like, literally speak into your microphone and hear it respond naturally?

I work with AI all day, and sometimes typing feels... slow. Copy-pasting is tedious. Sometimes I'm cooking, walking, or just tired of staring at a screen. I wanted something that felt like a real conversation.

So I built Talkative Lobster — a desktop app that turns your voice into AI conversation and back again.

What It Does

You speak  →  Speech-to-Text  →  LLM  →  Text-to-Speech  →  You hear

You talk into your mic, the AI thinks, and then speaks back to you. No typing, no copy-pasting — just natural back-and-forth conversation.

Key features:

Interrupt anytime — Just start talking to cut in mid-response, like a real conversation
Works offline — Local whisper.cpp and Piper TTS when you need privacy or no network
Japanese support — VOICEVOX and Kokoro voices, plus "aizuchi" (those little "mm-hmm" sounds)
Multiple providers — ElevenLabs, OpenAI, or local options for both STT and TTS
Privacy-first — API keys encrypted on your machine, everything routes through your own gateway

How It Works

Voice Detection

The app uses Silero VAD (Voice Activity Detection) — a neural network trained on real speech patterns. This means it knows when you're actually talking versus when there's background noise, music, or keyboard clicks.

Speech-to-Text Options

ElevenLabs Scribe — Best accuracy, fastest
OpenAI Whisper — Reliable, widely used
whisper.cpp — Runs locally, no network needed

Text-to-Speech Options

ElevenLabs — Most natural voices
VOICEVOX — Great for Japanese
Kokoro — Japanese + English
Piper — Lightweight, runs offline

LLM (OpenClaw Gateway)

The AI part runs through OpenClaw — a local gateway that:

Lets you switch models without code changes
Handles rate limits gracefully
Keeps all your API keys in one place

The Challenges I Faced

Challenge 1: The Echo Problem

At first, the AI's voice was triggering the microphone, creating infinite loops. 😅

Solution: A speaker monitor that "subtracts" the audio output from the input stream. Now only your voice triggers the AI.

Challenge 2: Latency

3-4 second delays made conversations feel robotic and awkward.

Solution:

Stream responses — start speaking as soon as the first tokens arrive
Optimized audio buffers — balance latency vs. quality
Added "aizuchi" sounds — those little "mm-hmm" fillers during thinking time

Challenge 3: Japanese Support

Most TTS voices sound robotic in Japanese. And Japanese conversations have those subtle "aizuchi" sounds that feel unnatural without them.

Solution: Integrated VOICEVOX and Kokoro for natural prosody, and added optional aizuchi sounds during thinking.

Try It Out

Prerequisites

Node.js 20+
pnpm
An OpenClaw gateway running locally

Quick Start

git clone https://github.com/coo-quack/talkative-lobster.git
cd talkative-lobster
pnpm install
pnpm dev

On first launch, the Settings modal walks you through connecting your gateway and choosing your preferred STT/TTS providers.

Downloads

Pre-built binaries available:

Platform	Link
macOS (Apple Silicon)	DMG
macOS (Intel)	DMG
Windows	EXE
Linux	AppImage

What's Next

More TTS providers (Azure, Google Cloud)
Voice presets for different use cases
Conversation history with search
Custom wake words
Mobile companion app

Final Thoughts

Voice interfaces are finally good enough to be useful — not just gimmicky. The combination of accurate speech recognition, fast LLMs, and natural text-to-speech creates something that feels genuinely conversational.

If you've ever wanted to just talk to your AI while doing something else — cooking, walking, debugging at 2 AM — give Talkative Lobster a try.

Links:

📂 GitHub Repository
📖 Documentation

Built with 🦞 by the coo-quack team

DEV Community