Ekrem MUTLU

Posted on Mar 12

Building a Custom Telegram Bot with AI: Beyond Simple Commands - Text, Voice, Images, and More!

#telegram #ai #python #chatbot

Building a Custom Telegram Bot with AI: Beyond Simple Commands - Text, Voice, Images, and More!

Telegram bots have become incredibly versatile tools, evolving from simple command responders to sophisticated AI assistants. In this article, we'll dive deep into building a custom Telegram bot that goes beyond basic functionality. We'll explore how to integrate AI for multimodal input (text, voice, and images), implement conversation memory, enable tool use, define a distinct personality, and finally, deploy our creation to production using Docker.

From `/start` to Smart: Leveling Up Your Bot

Most Telegram bot tutorials stop at the /start command and maybe a few simple text-based interactions. We're aiming higher. We want a bot that can understand and respond to different media types, remember past conversations, and leverage external tools to provide more valuable assistance.

Setting Up the Foundation

First, you'll need a Telegram bot token. Create a new bot using BotFather on Telegram and obtain your unique token. You'll also need Python installed, along with the python-telegram-bot library and other dependencies we'll mention later.

pip install python-telegram-bot
pip install python-dotenv  # For managing API keys safely
# Install dependencies for voice and image processing (see relevant sections below)

Let's start with a basic structure:

import telegram
from telegram.ext import Updater, CommandHandler, MessageHandler, Filters
import os
from dotenv import load_dotenv

load_dotenv()

TELEGRAM_TOKEN = os.getenv("TELEGRAM_TOKEN")

def start(update, context):
    update.message.reply_text('Hello! I am your AI-powered Telegram bot.')

def echo(update, context):
    update.message.reply_text(update.message.text)

def main():
    updater = Updater(TELEGRAM_TOKEN, use_context=True)
    dp = updater.dispatcher

    dp.add_handler(CommandHandler("start", start))
    dp.add_handler(MessageHandler(Filters.text & ~Filters.command, echo))

    updater.start_polling()
    updater.idle()

if __name__ == '__main__':
    main()

This is a barebones bot that responds to /start and echoes back any text you send. Save this as bot.py and run it (after setting your TELEGRAM_TOKEN in a .env file). You should be able to interact with your bot on Telegram now.

Embracing Multimodal Input: Voice and Images

Voice Recognition:

To handle voice messages, we'll need a speech-to-text library like SpeechRecognition and an audio processing library like pydub to convert the audio format.

pip install SpeechRecognition pydub

Here's how you can extract text from a voice message:

import speech_recognition as sr
from pydub import AudioSegment

def handle_voice(update, context):
    voice = update.message.voice
    file_id = voice.file_id
    file_info = context.bot.get_file(file_id)
    file_path = file_info.file_path

    # Download the voice file
    ogg_file = context.bot.download_file(file_path)
    mp3_file = 'voice.mp3'

    # Convert OGG to MP3 using pydub
    try:
        audio = AudioSegment.from_ogg(ogg_file)
        audio.export(mp3_file, format="mp3")
    except Exception as e:
        update.message.reply_text(f"Error converting audio: {e}")
        return

    # Use SpeechRecognition to transcribe the audio
    r = sr.Recognizer()
    with sr.AudioFile(mp3_file) as source:
        audio = r.record(source)

    try:
        text = r.recognize_google(audio)
        update.message.reply_text(f"You said: {text}")
    except sr.UnknownValueError:
        update.message.reply_text("Could not understand audio")
    except sr.RequestError as e:
        update.message.reply_text(f"Could not request results from Google Speech Recognition service; {e}")

# Add this handler to your dispatcher
dp.add_handler(MessageHandler(Filters.voice, handle_voice))

Image Recognition:

For image processing, we'll use Pillow (PIL) and potentially a computer vision library like OpenCV or a cloud-based service like Google Cloud Vision API.

pip install Pillow

Here's a simple example using Pillow to get image dimensions:

from PIL import Image
import io

def handle_image(update, context):
    photo = update.message.photo[-1] # Get the largest resolution photo
    file_id = photo.file_id
    file_info = context.bot.get_file(file_id)
    file_path = file_info.file_path

    # Download the image
    image_file = context.bot.download_file(file_path)

    try:
        image = Image.open(io.BytesIO(image_file))
        width, height = image.size
        update.message.reply_text(f"Image dimensions: {width}x{height}")
    except Exception as e:
        update.message.reply_text(f"Error processing image: {e}")

# Add this handler to your dispatcher
dp.add_handler(MessageHandler(Filters.photo, handle_image))

Adding AI Power: Conversation Memory, Tool Use, and Personality

This is where things get really interesting. We'll use a large language model (LLM) like OpenAI's GPT-3 or a similar model to provide the AI brainpower. For conversation memory, we can store the conversation history in a list or database. For tool use, we can define functions that the LLM can call to perform specific tasks (e.g., fetching weather data, searching the web).

Example using OpenAI (requires openai library and API key):

import openai

openai.api_key = os.getenv("OPENAI_API_KEY")

conversation_history = {}

def handle_ai_message(update, context):
    chat_id = update.message.chat_id
    user_message = update.message.text

    if chat_id not in conversation_history:
        conversation_history[chat_id] = []

    conversation_history[chat_id].append({"role": "user", "content": user_message})

    try:
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",  # Or your preferred model
            messages=conversation_history[chat_id],
            max_tokens=150,
        )
        ai_response = response['choices'][0]['message']['content']
        conversation_history[chat_id].append({"role": "assistant", "content": ai_response})
        update.message.reply_text(ai_response)

    except Exception as e:
        update.message.reply_text(f"AI error: {e}")

# Add this handler to your dispatcher (replace echo handler)
dp.add_handler(MessageHandler(Filters.text & ~Filters.command, handle_ai_message))

Tool Use Example (Simplified):

Imagine a function to fetch weather data:

def get_weather(city):
    # In a real implementation, this would call a weather API
    if city == "London":
        return "The weather in London is cloudy with a chance of rain."
    else:
        return f"Weather data not available for {city}."

You would need to instruct the LLM to use this function. This usually involves crafting prompts like: "If the user asks about the weather, use the get_weather(city) function. The user will provide the city name." The LLM would then need to output a structured format indicating the function call and its arguments, which your bot would parse and execute.

Defining Personality:

The personality of your bot is shaped by the prompts you use with the LLM. You can instruct the LLM to be helpful, sarcastic, or any other persona you desire. Experiment with different prompts to find the personality that best suits your needs.

Production Deployment with Docker

Docker allows you to package your bot and its dependencies into a container, ensuring consistent execution across different environments.

Create a Dockerfile:

FROM python:3.9-slim-buster

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["python", "bot.py"]

Create a requirements.txt file:

python-telegram-bot
python-dotenv
SpeechRecognition
pydub
Pillow
openai

Build the Docker image:

docker build -t my-telegram-bot .

Run the Docker container:

docker run -d --name my-bot -e TELEGRAM_TOKEN=<YOUR_TELEGRAM_TOKEN> -e OPENAI_API_KEY=<YOUR_OPENAI_API_KEY> my-telegram-bot

Replace <YOUR_TELEGRAM_TOKEN> and <YOUR_OPENAI_API_KEY> with your actual tokens.

Conclusion

Building a custom Telegram bot with AI opens up a world of possibilities. By integrating multimodal input, conversation memory, tool use, and a defined personality, you can create a truly intelligent and helpful assistant. Remember to handle API keys securely using environment variables and to leverage Docker for reliable production deployment.

Ready to take your Telegram bot development to the next level? Check out ClawDBot Pro for advanced features and a streamlined development experience!

DEV Community

Building a Custom Telegram Bot with AI: Beyond Simple Commands - Text, Voice, Images, and More!

Building a Custom Telegram Bot with AI: Beyond Simple Commands - Text, Voice, Images, and More!

From `/start` to Smart: Leveling Up Your Bot

Setting Up the Foundation

Embracing Multimodal Input: Voice and Images

Adding AI Power: Conversation Memory, Tool Use, and Personality

Production Deployment with Docker

Conclusion

Top comments (0)

Building a Custom Telegram Bot with AI: Beyond Simple Commands - Text, Voice, Images, and More!

From /start to Smart: Leveling Up Your Bot

Setting Up the Foundation

Embracing Multimodal Input: Voice and Images

Adding AI Power: Conversation Memory, Tool Use, and Personality

Production Deployment with Docker

Conclusion

From `/start` to Smart: Leveling Up Your Bot