đ Executive Summary
TL;DR: This project solves the problem of unstructured voice memos in Telegram by creating a Python bot that automatically transcribes them. It uses the Telegram Bot API to receive voice notes and the OpenAI Whisper API to convert them into searchable, copy-pasteable text, significantly boosting efficiency.
đŻ Key Takeaways
- The solution integrates
python-telegram-botfor message handling,pydubwithffmpegfor Ogg Opus to MP3 audio conversion, and theopenailibrary for Whisper API transcription. - Secure management of API keys is achieved using
python-dotenvto loadTELEGRAM\_BOT\_TOKENandOPENAI\_API\_KEYfrom aconfig.envfile, preventing hardcoding. - Temporary audio files (OGA and MP3) are downloaded, processed, and then reliably cleaned up using
os.removewithin afinallyblock to ensure resource management.
Convert Voice Memos from Telegram to Text using OpenAI Whisper API
Alright, team. Darian here. Letâs talk about efficiency. I used to leave myself voice memos on the goâquick thoughts, reminders, even mini-debug sessions while walking the dog. The problem? Theyâd pile up in my Telegram âSaved Messages,â becoming a black hole of unstructured audio. Listening back to find one specific thought was a huge time sink. This little project changed that. Now, I just send a voice note to a bot, and a few seconds later, I get a clean text transcription back. Itâs searchable, copy-pasteable, and has genuinely saved me a couple of hours a week.
This isnât just a gimmick; itâs a powerful way to bridge the gap between spoken ideas and actionable, written data. Letâs build it.
Prerequisites
Before we dive in, make sure you have the following ready. Weâre all busy, so getting this sorted out first will make the process much smoother.
- A Telegram Bot Token: You can get this from the BotFather on Telegram. Just start a chat with him, create a new bot, and heâll give you an API token.
- An OpenAI API Key: Youâll need an account on the OpenAI platform. Grab your API key from your account dashboard.
- Python Environment: A working Python 3.8+ installation.
- FFmpeg: This is a crucial dependency for audio processing. Youâll need to install it on your system. A quick search for âinstall ffmpeg on [your OS]â will get you there. Pydub, the library weâll use, depends on it.
The Guide: Step-by-Step
Iâll skip the standard virtual environment setup (venv, etc.) since you likely have your own workflow for that. Letâs jump straight to the logic. Youâll need to install a few Python libraries. Run your package installer for python-telegram-bot, openai, python-dotenv, and pydub.
Step 1: Environment and Configuration
First rule of production: never hardcode secrets. Weâll store our API keys in a config.env file. Create a file with that name in your project directory and add your keys like this:
TELEGRAM_BOT_TOKEN="YOUR_TELEGRAM_TOKEN_HERE"
OPENAI_API_KEY="YOUR_OPENAI_KEY_HERE"
Now, letâs start our Python script. Weâll call it transcriber\_bot.py. Weâll begin by importing the necessary libraries and loading our environment variables.
import os
import logging
from dotenv import load_dotenv
from telegram import Update
from telegram.ext import Application, MessageHandler, filters, ContextTypes
from openai import OpenAI
from pydub import AudioSegment
# Load environment variables from config.env
load_dotenv('config.env')
# Setup basic logging
logging.basicConfig(
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
level=logging.INFO
)
logger = logging.getLogger(__name__)
# Initialize OpenAI client
openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
Step 2: The Telegram Bot Core Logic
Next, weâll set up the main structure of our bot. This involves creating an Application instance and adding a MessageHandler. We specifically want to filter for voice messages, so weâll use filters.VOICE.
async def handle_voice_message(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
# This is where the magic will happen. We'll fill this in next.
await update.message.reply_text("Processing your voice memo...")
# (Future steps go here)
def main() -> None:
"""Start the bot."""
telegram_token = os.getenv("TELEGRAM_BOT_TOKEN")
if not telegram_token:
logger.error("TELEGRAM_BOT_TOKEN not found in environment variables!")
return
application = Application.builder().token(telegram_token).build()
# Add a handler for voice messages
application.add_handler(MessageHandler(filters.VOICE, handle_voice_message))
# Start the Bot
logger.info("Bot is starting...")
application.run_polling()
if __name__ == '__main__':
main()
This boilerplate code sets up a listener. When the bot receives a voice message, it will call our handle\_voice\_message function.
Step 3: Downloading and Converting the Audio
Telegram voice messages usually come in the Ogg Opus audio codec (.oga format). Whisper API works best with more standard formats like MP3 or WAV. This is where pydub and ffmpeg shine. Weâll download the file, then convert it.
Letâs flesh out the handle\_voice\_message function:
async def handle_voice_message(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
"""Downloads, converts, and transcribes a voice message."""
file_id = update.message.voice.file_id
try:
# 1. Download the file
voice_file = await context.bot.get_file(file_id)
# We create temporary file paths
oga_path = f'{file_id}.oga'
mp3_path = f'{file_id}.mp3'
await voice_file.download_to_drive(oga_path)
logger.info(f"Downloaded voice file to {oga_path}")
# 2. Convert OGA to MP3
audio = AudioSegment.from_ogg(oga_path)
audio.export(mp3_path, format="mp3")
logger.info(f"Converted {oga_path} to {mp3_path}")
# (Transcription step comes next)
except Exception as e:
logger.error(f"An error occurred: {e}")
await update.message.reply_text("Sorry, I couldn't process that voice memo.")
finally:
# 4. Clean up the temporary files
if os.path.exists(oga_path):
os.remove(oga_path)
if os.path.exists(mp3_path):
os.remove(mp3_path)
logger.info("Cleaned up temporary files.")
Pro Tip: In my production setups, I handle file paths more robustly, often using a dedicated
/tmpor temporary directory structure. For this example, creating files in the local directory is fine, but always be mindful of where youâre writing data, especially in a containerized environment. Cleaning up files in afinallyblock ensures they get deleted even if an error occurs.
Step 4: Transcribing with OpenAI Whisper
With our MP3 file ready, sending it to OpenAI is straightforward. Weâll use the client.audio.transcriptions.create method.
Letâs add the transcription logic into our handle\_voice\_message function:
async def handle_voice_message(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
"""Downloads, converts, and transcribes a voice message."""
file_id = update.message.voice.file_id
oga_path = f'{file_id}.oga'
mp3_path = f'{file_id}.mp3'
try:
await update.message.reply_text("Processing your voice memo...")
voice_file = await context.bot.get_file(file_id)
await voice_file.download_to_drive(oga_path)
audio = AudioSegment.from_ogg(oga_path)
audio.export(mp3_path, format="mp3")
# 3. Send to Whisper API for transcription
with open(mp3_path, "rb") as audio_file:
transcription = openai_client.audio.transcriptions.create(
model="whisper-1",
file=audio_file
)
transcribed_text = transcription.text
logger.info(f"Transcription successful: {transcribed_text}")
# 4. Reply to the user
await update.message.reply_text(f"Transcription:\n\n{transcribed_text}", parse_mode='HTML')
except Exception as e:
logger.error(f"An error occurred: {e}")
await update.message.reply_text("Sorry, I couldn't process that voice memo.")
finally:
# 5. Clean up
if os.path.exists(oga_path):
os.remove(oga_path)
if os.path.exists(mp3_path):
os.remove(mp3_path)
logger.info("Cleaned up temporary files for " + file_id)
And thatâs the complete loop! The bot receives a voice note, downloads it, converts it, sends it to Whisper, and replies with the text.
Common Pitfalls
Here are a few places Iâve tripped up in the past. Hopefully, you can avoid them.
-
ffmpegNot Found: The most common issue. Thepydublibrary is just a Python wrapper around theffmpegcommand-line tool. Ifffmpegisnât installed and available in your systemâs PATH,pydubwill fail. The error message is usually pretty clear about this. -
API Key Errors: Double-check your
config.envfile. A typo in the variable name or a misplaced quote can lead to authentication failures. Make sure the file is in the same directory youâre running the script from, or provide an absolute path to it. - File Size Limits: The OpenAI Whisper API has a file size limit (currently 25 MB). For a simple voice memo bot, this is rarely an issue. But if you were adapting this for longer audio, youâd need to implement chunkingâsplitting the audio into smaller pieces and processing them sequentially.
Conclusion
You now have a fully functional, private transcription service. This pattern is incredibly versatile. You could modify it to save transcriptions to a database, send them to a Notion page, or create a Jira ticket. Itâs a fantastic building block for automating any workflow that starts with a spoken idea.
Happy building,
Darian Vance
Senior DevOps Engineer, TechResolve
đ Read the original article on TechResolve.blog
â Support my work
If this article helped you, you can buy me a coffee:

Top comments (0)