I've been running OpenClaw as my personal AI assistant for a while — text-based, the usual way. It handled my emails, managed my calendar, searched the web, wrote code. It worked fine.
Then I added voice.
The shift
The difference hit me on the first day. Instead of reading walls of text on my phone, my AI just talks to me. I ask a question while cooking — it answers out loud. I send a voice message from my car — it responds with voice. No screens, no typing, no waiting to read.
It sounds like a small change. It's not.
Text-based AI feels like email. Voice-based AI feels like a colleague sitting next to you. The personality comes through in a way that text never quite captures — the pauses, the tone, the pacing.
The shock
The real transformation came after I tuned the voice. Once I configured a quality TTS model and dialed in the tone and emotion parameters, something clicked. My AI suddenly felt like a real person — someone with high emotional intelligence who knows when to be warm, when to be direct, when to pause before delivering news.
Chatting became genuinely enjoyable. And it still works as hard as ever — researching, coding, sending emails, managing files. That combination — a colleague who is both pleasant to talk to and extremely capable — was honestly stunning.
I didn't expect voice to change the experience this much.
The problems
It's not perfect. A few things I'm running into:
Memory dies between sessions. My AI remembers me during a conversation, but after a restart? Gone. I have to re-teach preferences every time.
Personality drifts on model switches. When the model falls back (upgrades, rate limits), the tone shifts. Same AI, different vibe. It's jarring.
Batch message blindness. When I send three messages quickly, it replies to each one separately. It should read them all, then respond once.
What I built
I packaged my voice setup into an open-source project called VoiceClaw — a voice-first interaction layer for OpenClaw.
What it does:
- Full voice pipeline (STT + TTS), pre-configured
- Persistent memory across sessions
- Consistent personality across model switches
- Multi-model routing optimized for voice
It's early. The problems above are real and unsolved. That's why I'm sharing this.
Come try it
If you've ever wished your AI assistant would just talk to you, give it a shot. And if you want to help solve any of the problems above, we're looking for contributors.
Top comments (0)