DEV Community

Cover image for Your Telegram Bot's Voice Messages Are Missing Speed Control. Here's the Fix.
Turtleand
Turtleand

Posted on • Originally published at openclaw.turtleand.com

Your Telegram Bot's Voice Messages Are Missing Speed Control. Here's the Fix.

If your Telegram bot sends voice messages using TTS, you've probably noticed something missing: the speed control button.

No 1.5x. No 2x. Just plain audio that plays at one speed.

The problem is the audio format.

MP3 doesn't cut it

Most TTS providers output MP3 files. When you send these via Telegram's sendVoice API, they technically work. They play. But Telegram doesn't treat them as proper voice messages.

You get:

  • No waveform visualization
  • No speed control (0.5x/1x/1.5x/2x)
  • Just a basic audio player

This matters if your bot sends briefings, summaries, or long-form content. A 2-minute message at 2x speed takes 1 minute. Over time, that's real savings.

The fix

Convert your MP3 to OGG Opus before sending:

ffmpeg -i input.mp3 \
  -c:a libopus \
  -b:a 48k \
  -vbr on \
  -compression_level 10 \
  -frame_duration 60 \
  -application voip \
  output.ogg
Enter fullscreen mode Exit fullscreen mode

Send the .ogg file via sendVoice. Telegram now recognizes it as a voice message. Speed control buttons appear.

Why this works

Telegram's voice message system is built for OGG Opus. The Bot API docs mention this:

"For sendVoice to work, your audio must be in an .ogg file encoded with OPUS."

But they don't emphasize it. MP3 files still work, so many developers never notice they're missing features.

The ffmpeg flags matter:

  • -c:a libopus — Use the Opus codec
  • -b:a 48k — 48kbps bitrate (good for voice)
  • -vbr on — Variable bitrate
  • -compression_level 10 — Maximum compression
  • -frame_duration 60 — 60ms frames (faster playback start)
  • -application voip — Optimize for speech, not music

That last one (-application voip) tells Opus to prioritize speech clarity.

Implementation

If you control the TTS pipeline, add the conversion step after generation:

# Generate TTS (example)
edge-tts --text "Your message" --write-media output.mp3

# Convert to OGG Opus
ffmpeg -i output.mp3 -c:a libopus -b:a 48k -vbr on \
  -compression_level 10 -frame_duration 60 \
  -application voip output.ogg

# Send via Telegram using output.ogg
Enter fullscreen mode Exit fullscreen mode

Or batch-convert existing files:

for mp3 in *.mp3; do
  ffmpeg -i "$mp3" -c:a libopus -b:a 48k -vbr on \
    -compression_level 10 -frame_duration 60 \
    -application voip "${mp3%.mp3}.ogg"
done
Enter fullscreen mode Exit fullscreen mode

What the docs don't tell you

The API docs mention OGG Opus as a requirement, but don't explain what happens if you skip it. MP3 still works, so it seems fine. Until you notice your voice messages look different from native Telegram ones.

This affects any bot sending TTS audio: Google TTS, Azure Speech, ElevenLabs, OpenAI. If it outputs MP3, you'll hit this.

One ffmpeg command. Proper voice messages with speed control.


Want more OpenClaw tips? Check out the OpenClaw Lab for research notes on autonomous agents, cron jobs, voice integration, and more.

Top comments (0)