DEV Community

Cover image for You're Sending Voice Messages to OpenClaw. Here's What It Actually Receives.
Ondrej Machala
Ondrej Machala

Posted on

You're Sending Voice Messages to OpenClaw. Here's What It Actually Receives.

You ask OpenClaw something via voice message on Telegram.

It responds with something off. You read it again. Then you realize: it transcribed "Kubernetes" as "Cuban eties" and your entire prompt made no sense. You go back, re-record, send again.

That loop is the problem.


When you send a voice message on Telegram, OpenClaw receives an audio file. It transcribes it somewhere in its pipeline before the AI sees your words. You never see that transcript. By the time you get a bad response, you can't tell if the AI misunderstood your intent or if your words came in garbled.

With text, that problem disappears. The AI gets exactly what you typed. You can re-read it before hitting send.

The missing piece was a fast way to get from your voice to reviewed text before it hits the chat.


What Diction Does

Diction is an iPhone keyboard that transcribes as you speak — inside the keyboard, before you send.

The workflow becomes:

  1. Open Telegram, find OpenClaw
  2. Switch to Diction keyboard (globe icon)
  3. Tap the mic, speak your prompt
  4. The keyboard shows you the transcription immediately
  5. Read it. Fix "Cuban eties" to "Kubernetes." Fix a name, a command, whatever
  6. Hit send

OpenClaw receives clean text. You know exactly what it's working with.


Why This Beats Voice Messages for AI Prompts

Voice messages work fine for casual conversation. For AI prompts, they introduce a failure mode you can't debug.

A few things that go wrong with voice-to-OpenClaw:

  • Technical terms get mangled. "n-grams" becomes "engrams." "React hook" becomes "react who."
  • Proper nouns the transcription layer hasn't seen before come out as phonetic guesses
  • Long prompts with multiple conditions lose a clause in the middle and you never know

When you see the transcription first, you catch these before they reach the model. The AI responds to what you actually meant.


Setup

Diction is a keyboard extension — install it once, switch to it whenever you want to dictate.

After install:

  1. Go to Settings → General → Keyboard → Keyboards → Add New Keyboard → Diction
  2. Enable Full Access (required for the keyboard to insert text)
  3. Open any app, tap a text field, press the globe icon to switch to Diction
  4. Tap the mic

For transcription, three options: on-device (runs on your iPhone, no network), self-hosted (your own server, if you run one), or Diction One (their cloud).

If you use OpenClaw because you prefer keeping data local, the on-device or self-hosted options mean your voice never leaves your phone or network. OpenClaw gets text either way.


It Works Everywhere, Not Just Telegram

Since Diction is a keyboard, it works in any app with a text field. Same workflow for:

  • WhatsApp bots
  • NanoGPT
  • The Claude app
  • ChatGPT
  • Any web app in Safari

You don't configure anything per-app. Switch keyboard, tap mic, speak, review, send.


The app is free to download and try. Self-hosted mode is free. On-device basic model downloads automatically.

Download on the App Store · diction.one · github.com/omachala/diction

Top comments (0)