DEV Community

chataclaw
chataclaw

Posted on

Talk to Your AI — Meet Talkative Lobster

Have you ever wanted to just talk to your AI assistant? Like, literally speak into your microphone and hear it respond naturally?

I work with AI all day, and sometimes typing feels... slow. Copy-pasting is tedious. Sometimes I'm cooking, walking, or just tired of staring at a screen. I wanted something that felt like a real conversation.

So I built Talkative Lobster — a desktop app that turns your voice into AI conversation and back again.


What It Does

You speak  →  Speech-to-Text  →  LLM  →  Text-to-Speech  →  You hear
Enter fullscreen mode Exit fullscreen mode

You talk into your mic, the AI thinks, and then speaks back to you. No typing, no copy-pasting — just natural back-and-forth conversation.

Key features:

  • Interrupt anytime — Just start talking to cut in mid-response, like a real conversation
  • Works offline — Local whisper.cpp and Piper TTS when you need privacy or no network
  • Japanese support — VOICEVOX and Kokoro voices, plus "aizuchi" (those little "mm-hmm" sounds)
  • Multiple providers — ElevenLabs, OpenAI, or local options for both STT and TTS
  • Privacy-first — API keys encrypted on your machine, everything routes through your own gateway

How It Works

Voice Detection

The app uses Silero VAD (Voice Activity Detection) — a neural network trained on real speech patterns. This means it knows when you're actually talking versus when there's background noise, music, or keyboard clicks.

Speech-to-Text Options

  • ElevenLabs Scribe — Best accuracy, fastest
  • OpenAI Whisper — Reliable, widely used
  • whisper.cpp — Runs locally, no network needed

Text-to-Speech Options

  • ElevenLabs — Most natural voices
  • VOICEVOX — Great for Japanese
  • Kokoro — Japanese + English
  • Piper — Lightweight, runs offline

LLM (OpenClaw Gateway)

The AI part runs through OpenClaw — a local gateway that:

  • Lets you switch models without code changes
  • Handles rate limits gracefully
  • Keeps all your API keys in one place

The Challenges I Faced

Challenge 1: The Echo Problem

At first, the AI's voice was triggering the microphone, creating infinite loops. 😅

Solution: A speaker monitor that "subtracts" the audio output from the input stream. Now only your voice triggers the AI.

Challenge 2: Latency

3-4 second delays made conversations feel robotic and awkward.

Solution:

  • Stream responses — start speaking as soon as the first tokens arrive
  • Optimized audio buffers — balance latency vs. quality
  • Added "aizuchi" sounds — those little "mm-hmm" fillers during thinking time

Challenge 3: Japanese Support

Most TTS voices sound robotic in Japanese. And Japanese conversations have those subtle "aizuchi" sounds that feel unnatural without them.

Solution: Integrated VOICEVOX and Kokoro for natural prosody, and added optional aizuchi sounds during thinking.


Try It Out

Prerequisites

  • Node.js 20+
  • pnpm
  • An OpenClaw gateway running locally

Quick Start

git clone https://github.com/coo-quack/talkative-lobster.git
cd talkative-lobster
pnpm install
pnpm dev
Enter fullscreen mode Exit fullscreen mode

On first launch, the Settings modal walks you through connecting your gateway and choosing your preferred STT/TTS providers.

Downloads

Pre-built binaries available:

Platform Link
macOS (Apple Silicon) DMG
macOS (Intel) DMG
Windows EXE
Linux AppImage

What's Next

  • More TTS providers (Azure, Google Cloud)
  • Voice presets for different use cases
  • Conversation history with search
  • Custom wake words
  • Mobile companion app

Final Thoughts

Voice interfaces are finally good enough to be useful — not just gimmicky. The combination of accurate speech recognition, fast LLMs, and natural text-to-speech creates something that feels genuinely conversational.

If you've ever wanted to just talk to your AI while doing something else — cooking, walking, debugging at 2 AM — give Talkative Lobster a try.


Links:


Built with 🦞 by the coo-quack team

Top comments (0)