Building a Custom Telegram Bot with AI: Beyond Simple Commands - Text, Voice, Images, and More!
Telegram bots have become incredibly versatile tools, evolving from simple command responders to sophisticated AI assistants. In this article, we'll dive deep into building a custom Telegram bot that goes beyond basic functionality. We'll explore how to integrate AI for multimodal input (text, voice, and images), implement conversation memory, enable tool use, define a distinct personality, and finally, deploy our creation to production using Docker.
From /start to Smart: Leveling Up Your Bot
Most Telegram bot tutorials stop at the /start command and maybe a few simple text-based interactions. We're aiming higher. We want a bot that can understand and respond to different media types, remember past conversations, and leverage external tools to provide more valuable assistance.
Setting Up the Foundation
First, you'll need a Telegram bot token. Create a new bot using BotFather on Telegram and obtain your unique token. You'll also need Python installed, along with the python-telegram-bot library and other dependencies we'll mention later.
pip install python-telegram-bot
pip install python-dotenv # For managing API keys safely
# Install dependencies for voice and image processing (see relevant sections below)
Let's start with a basic structure:
import telegram
from telegram.ext import Updater, CommandHandler, MessageHandler, Filters
import os
from dotenv import load_dotenv
load_dotenv()
TELEGRAM_TOKEN = os.getenv("TELEGRAM_TOKEN")
def start(update, context):
update.message.reply_text('Hello! I am your AI-powered Telegram bot.')
def echo(update, context):
update.message.reply_text(update.message.text)
def main():
updater = Updater(TELEGRAM_TOKEN, use_context=True)
dp = updater.dispatcher
dp.add_handler(CommandHandler("start", start))
dp.add_handler(MessageHandler(Filters.text & ~Filters.command, echo))
updater.start_polling()
updater.idle()
if __name__ == '__main__':
main()
This is a barebones bot that responds to /start and echoes back any text you send. Save this as bot.py and run it (after setting your TELEGRAM_TOKEN in a .env file). You should be able to interact with your bot on Telegram now.
Embracing Multimodal Input: Voice and Images
Voice Recognition:
To handle voice messages, we'll need a speech-to-text library like SpeechRecognition and an audio processing library like pydub to convert the audio format.
pip install SpeechRecognition pydub
Here's how you can extract text from a voice message:
import speech_recognition as sr
from pydub import AudioSegment
def handle_voice(update, context):
voice = update.message.voice
file_id = voice.file_id
file_info = context.bot.get_file(file_id)
file_path = file_info.file_path
# Download the voice file
ogg_file = context.bot.download_file(file_path)
mp3_file = 'voice.mp3'
# Convert OGG to MP3 using pydub
try:
audio = AudioSegment.from_ogg(ogg_file)
audio.export(mp3_file, format="mp3")
except Exception as e:
update.message.reply_text(f"Error converting audio: {e}")
return
# Use SpeechRecognition to transcribe the audio
r = sr.Recognizer()
with sr.AudioFile(mp3_file) as source:
audio = r.record(source)
try:
text = r.recognize_google(audio)
update.message.reply_text(f"You said: {text}")
except sr.UnknownValueError:
update.message.reply_text("Could not understand audio")
except sr.RequestError as e:
update.message.reply_text(f"Could not request results from Google Speech Recognition service; {e}")
# Add this handler to your dispatcher
dp.add_handler(MessageHandler(Filters.voice, handle_voice))
Image Recognition:
For image processing, we'll use Pillow (PIL) and potentially a computer vision library like OpenCV or a cloud-based service like Google Cloud Vision API.
pip install Pillow
Here's a simple example using Pillow to get image dimensions:
from PIL import Image
import io
def handle_image(update, context):
photo = update.message.photo[-1] # Get the largest resolution photo
file_id = photo.file_id
file_info = context.bot.get_file(file_id)
file_path = file_info.file_path
# Download the image
image_file = context.bot.download_file(file_path)
try:
image = Image.open(io.BytesIO(image_file))
width, height = image.size
update.message.reply_text(f"Image dimensions: {width}x{height}")
except Exception as e:
update.message.reply_text(f"Error processing image: {e}")
# Add this handler to your dispatcher
dp.add_handler(MessageHandler(Filters.photo, handle_image))
Adding AI Power: Conversation Memory, Tool Use, and Personality
This is where things get really interesting. We'll use a large language model (LLM) like OpenAI's GPT-3 or a similar model to provide the AI brainpower. For conversation memory, we can store the conversation history in a list or database. For tool use, we can define functions that the LLM can call to perform specific tasks (e.g., fetching weather data, searching the web).
Example using OpenAI (requires openai library and API key):
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")
conversation_history = {}
def handle_ai_message(update, context):
chat_id = update.message.chat_id
user_message = update.message.text
if chat_id not in conversation_history:
conversation_history[chat_id] = []
conversation_history[chat_id].append({"role": "user", "content": user_message})
try:
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo", # Or your preferred model
messages=conversation_history[chat_id],
max_tokens=150,
)
ai_response = response['choices'][0]['message']['content']
conversation_history[chat_id].append({"role": "assistant", "content": ai_response})
update.message.reply_text(ai_response)
except Exception as e:
update.message.reply_text(f"AI error: {e}")
# Add this handler to your dispatcher (replace echo handler)
dp.add_handler(MessageHandler(Filters.text & ~Filters.command, handle_ai_message))
Tool Use Example (Simplified):
Imagine a function to fetch weather data:
def get_weather(city):
# In a real implementation, this would call a weather API
if city == "London":
return "The weather in London is cloudy with a chance of rain."
else:
return f"Weather data not available for {city}."
You would need to instruct the LLM to use this function. This usually involves crafting prompts like: "If the user asks about the weather, use the get_weather(city) function. The user will provide the city name." The LLM would then need to output a structured format indicating the function call and its arguments, which your bot would parse and execute.
Defining Personality:
The personality of your bot is shaped by the prompts you use with the LLM. You can instruct the LLM to be helpful, sarcastic, or any other persona you desire. Experiment with different prompts to find the personality that best suits your needs.
Production Deployment with Docker
Docker allows you to package your bot and its dependencies into a container, ensuring consistent execution across different environments.
- Create a
Dockerfile:
FROM python:3.9-slim-buster
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "bot.py"]
- Create a
requirements.txtfile:
python-telegram-bot
python-dotenv
SpeechRecognition
pydub
Pillow
openai
- Build the Docker image:
docker build -t my-telegram-bot .
- Run the Docker container:
docker run -d --name my-bot -e TELEGRAM_TOKEN=<YOUR_TELEGRAM_TOKEN> -e OPENAI_API_KEY=<YOUR_OPENAI_API_KEY> my-telegram-bot
Replace <YOUR_TELEGRAM_TOKEN> and <YOUR_OPENAI_API_KEY> with your actual tokens.
Conclusion
Building a custom Telegram bot with AI opens up a world of possibilities. By integrating multimodal input, conversation memory, tool use, and a defined personality, you can create a truly intelligent and helpful assistant. Remember to handle API keys securely using environment variables and to leverage Docker for reliable production deployment.
Ready to take your Telegram bot development to the next level? Check out ClawDBot Pro for advanced features and a streamlined development experience!
Top comments (0)