DEV Community

Ekrem MUTLU
Ekrem MUTLU

Posted on

Building a Custom Telegram Bot with AI: Beyond Simple Commands

{
"title": "Building a Custom Telegram Bot with AI: Beyond Simple Commands - Text, Voice, Images, and More!",
"body_markdown": "# Building a Custom Telegram Bot with AI: Beyond Simple Commands - Text, Voice, Images, and More!\n\nTired of basic Telegram bots that just parrot back simple commands? Want to build something truly intelligent, something that can understand not only text, but also voice and images, remember past conversations, and even leverage external tools? This article will guide you through building a powerful, multimodal AI-powered Telegram bot and deploying it for production using Docker.\n\n## From Echo Chambers to Intelligent Agents\n\nThe standard \"/start\" and \"/help\" Telegram bots are a good starting point, but they lack the sophistication needed for real-world applications. We're going to move beyond simple command-response patterns and create a bot that can:\n\n* Understand and respond to text naturally: Leverage large language models (LLMs) for meaningful conversations.\n* Process voice messages: Transcribe audio and respond accordingly.\n* Analyze images: Understand the content of images and provide relevant information.\n* Maintain conversation memory: Remember past interactions to provide contextually relevant responses.\n* Use external tools: Access APIs and services to perform actions based on user requests (e.g., search the web, get the weather).\n* Have a personality: Define a persona for your bot to make interactions more engaging.\n\n## Setting Up the Foundation\n\nFirst, you'll need a Telegram bot token. Create a bot using BotFather on Telegram and save the token. You'll also need a Python environment with the following packages installed:\n\n

bash\npip install python-telegram-bot openai speech_recognition pydub pillow\n

\n\nHere's a basic structure for our bot:\n\n

python\nimport telegram\nfrom telegram.ext import Updater, CommandHandler, MessageHandler, Filters\nimport openai\nimport speech_recognition as sr\nfrom pydub import AudioSegment\nfrom PIL import Image\nimport io\nimport os\n\n# Replace with your actual tokens and API keys\nTELEGRAM_BOT_TOKEN = 'YOUR_TELEGRAM_BOT_TOKEN'\nOPENAI_API_KEY = 'YOUR_OPENAI_API_KEY'\n\nopenai.api_key = OPENAI_API_KEY\n\n# Initialize the bot\nbot = telegram.Bot(token=TELEGRAM_BOT_TOKEN)\n\n# Define a start command handler\ndef start(update, context):\n update.message.reply_text('Hello! I am your AI-powered Telegram bot. Ask me anything!')\n\n# Main function to run the bot\ndef main():\n updater = Updater(TELEGRAM_BOT_TOKEN, use_context=True)\n dp = updater.dispatcher\n\n dp.add_handler(CommandHandler(\"start\", start))\n\n # Start the bot\n updater.start_polling()\n updater.idle()\n\nif __name__ == '__main__':\n main()\n

\n\nThis code sets up a basic bot that responds to the /start command. Let's enhance it.\n\n## Adding Text-Based Conversation with OpenAI\n\nWe'll integrate OpenAI's GPT models to enable natural language conversations.\n\n

python\ndef respond_to_text(update, context):\n user_message = update.message.text\n\n # Construct prompt for OpenAI\n prompt = f\"User: {user_message}\\nBot: \"\n\n response = openai.Completion.create(\n engine=\"text-davinci-003\", # Or another suitable model\n prompt=prompt,\n max_tokens=150,\n n=1,\n stop=None,\n temperature=0.7,\n )\n\n bot_response = response.choices[0].text.strip()\n update.message.reply_text(bot_response)\n\n# Add the handler to the dispatcher\ndp.add_handler(MessageHandler(Filters.text & ~Filters.command, respond_to_text))\n

\n\nNow, the bot can respond to text messages using OpenAI. Remember to replace "text-davinci-003" with your desired model and adjust the temperature and max_tokens parameters for optimal results.\n\n## Processing Voice Messages\n\nTo handle voice messages, we need to:\n\n1. Download the voice message.\n2. Convert it to a suitable audio format (WAV).\n3. Use speech recognition to transcribe the audio.\n4. Process the transcribed text with the LLM.\n\n

python\ndef respond_to_voice(update, context):\n voice = update.message.voice\n file_id = voice.file_id\n file = bot.get_file(file_id)\n file_path = file.file_path\n\n # Download the voice message\n ogg_file = f\"voice_{update.message.chat_id}_{update.message.message_id}.ogg\"\n wav_file = f\"voice_{update.message.chat_id}_{update.message.message_id}.wav\"\n file.download(ogg_file)\n\n # Convert to WAV\n try:\n sound = AudioSegment.from_ogg(ogg_file)\n sound.export(wav_file, format=\"wav\")\n except Exception as e:\n update.message.reply_text(f\"Error converting audio: {e}\")\n return\n\n # Transcribe the audio\n r = sr.Recognizer()\n with sr.AudioFile(wav_file) as source:\n audio = r.record(source)\n\n try:\n text = r.recognize_google(audio)\n # Process the transcribed text with OpenAI\n prompt = f\"User (voice): {text}\\nBot: \"\n response = openai.Completion.create(\n engine=\"text-davinci-003\",\n prompt=prompt,\n max_tokens=150,\n n=1,\n stop=None,\n temperature=0.7,\n )\n bot_response = response.choices[0].text.strip()\n update.message.reply_text(bot_response)\n\n except sr.UnknownValueError:\n update.message.reply_text(\"Could not understand audio\")\n except sr.RequestError as e:\n update.message.reply_text(f\"Could not request results from speech recognition service; {e}\")\n except Exception as e:\n update.message.reply_text(f\"Error processing voice message: {e}\")\n finally:\n # Clean up temporary files\n os.remove(ogg_file)\n os.remove(wav_file)\n\n# Add the handler to the dispatcher\ndp.add_handler(MessageHandler(Filters.voice, respond_to_voice))\n

\n\n## Analyzing Images\n\nFor image analysis, we'll use OpenAI's vision capabilities (if available via API access) or a suitable alternative like a pre-trained image recognition model.\n\n

python\n# Placeholder for image analysis function - replace with actual implementation\ndef analyze_image(image_path):\n # This is a simplified example. Replace with actual image analysis logic.\n # For example, use a pre-trained model from torchvision or call an external API.\n try:\n img = Image.open(image_path)\n # Example: Get image size\n width, height = img.size\n return f\"Image size: {width}x{height}. This is a placeholder for actual image analysis.\" # Replace with actual results\n except Exception as e:\n return f\"Error analyzing image: {e}\"\n\ndef respond_to_image(update, context):\n photo = update.message.photo[-1]\n file_id = photo.file_id\n file = bot.get_file(file_id)\n file_path = file.file_path\n\n # Download the image\n image_file = f\"image_{update.message.chat_id}_{update.message.message_id}.jpg\"\n file.download(image_file)\n\n # Analyze the image\n analysis_result = analyze_image(image_file)\n\n update.message.reply_text(analysis_result)\n\n # Clean up temporary file\n os.remove(image_file)\n\n# Add the handler to the dispatcher\ndp.add_handler(MessageHandler(Filters.photo, respond_to_image))\n

\n\n*Important:* The analyze_image function is a placeholder. You'll need to replace it with actual image analysis logic using libraries like torchvision or cloud-based vision APIs.\n\n## Maintaining Conversation Memory\n\nTo remember past interactions, we can store the conversation history in a dictionary.\n\n

python\nconversation_history = {}\n\ndef respond_to_text_with_memory(update, context):\n chat_id = update.message.chat_id\n user_message = update.message.text\n\n if chat_id not in conversation_history:\n conversation_history[chat_id] = []\n\n conversation_history[chat_id].append(f\"User: {user_message}\")\n\n prompt = \"\\n\".join(conversation_history[chat_id]) + \"\\nBot: \"\n\n response = openai.Completion.create(\n engine=\"text-davinci-003\",\n prompt=prompt,\n max_tokens=150,\n n=1,\n stop=None,\n temperature=0.7,\n )\n\n bot_response = response.choices[0].text.strip()\n conversation_history[chat_id].append(f\"Bot: {bot_response}\")\n update.message.reply_text(bot_response)\n\n # Limit conversation history to prevent excessively long prompts\n if len(conversation_history[chat_id]) > 10:\n conversation_history[chat_id] = conversation_history[chat_id][-10:]\n\n# Replace the original respond_to_text handler with this one\ndp.remove_handler(dp.handlers[2][0]) # Remove the old handler\ndp.add_handler(MessageHandler(Filters.text & ~Filters.command, respond_to_text_with_memory))\n

\n\n## Dockerizing Your Bot\n\nTo deploy your bot, Docker is invaluable. Create a Dockerfile:\n\n

dockerfile\nFROM python:3.9-slim-buster\n\nWORKDIR /app\n\nCOPY requirements.txt . # Create a requirements.txt file with your dependencies\nRUN pip install --no-cache-dir -r requirements.txt\n\nCOPY . .\n\nCMD [\"python\", \"your_bot_script.py\"] # Replace with your bot script name\n

\n\nThen, build and run the Docker image:\n\n

bash\ndocker build -t your_bot_image .\ndocker run -d -e TELEGRAM_BOT_TOKEN=\"YOUR_TELEGRAM_BOT_TOKEN\" -e OPENAI_API_KEY=\"YOUR_OPENAI_API_KEY\" your_bot_image\n

\n\nReplace placeholders with your actual values.\n\n## Taking it Further: Tool Use and Personality\n\n* Tool Use: Integrate APIs (e.g., weather, search) by adding function calls to your bot's logic.\n* Personality: Define a system prompt that sets the tone and behavior of your bot. Pass this prompt to OpenAI before each user message.\n\n## Conclusion\n\nBuilding a truly intelligent Telegram bot is a complex but rewarding process. By combining the power of LLMs, speech recognition, image analysis, and Docker, you can create a bot that goes far beyond simple commands. Remember to handle errors gracefully, manage resources efficiently, and prioritize user privacy.\n\nReady to take your Telegram bot to the next level? Check out ClawDBot Pro for a pre-built, production-ready solution with advanced features and ongoing support:\n\nhttps://bilgestore.com/product/clawdbot-pro\n",
"tags": ["telegram", "ai", "python", "chatbot"]
}

Top comments (0)