liveavabot

Posted on Apr 21

How I Built a Telegram Video Avatar Bot With Python and FFmpeg

#telegram #ffmpeg #python #showdev

I tried to set a video avatar on Telegram last month. Recorded a short clip on my iPhone, uploaded it, and nothing happened. No error message. No toast notification. Telegram just silently rejected it.

Turns out iPhone records in HEVC (H.265) by default since the iPhone 7. Telegram's video avatar system only accepts H.264. When you upload HEVC, the server validates it, fails, and tells you absolutely nothing.

I spent two hours thinking my internet connection was broken before I figured this out.

What Telegram Video Avatars Actually Require

The documentation is sparse, so I had to reverse-engineer the constraints by testing every variable independently:

Codec: H.264 only (libx264). HEVC, VP9, AV1 all rejected silently.
Resolution: Exactly 800x800 pixels, square.
Duration: 10 seconds max.
File size: 2 MB max.
Audio: None. Any audio track means rejection.
Pixel format: yuv420p.
Container: MP4 with the faststart flag set.

Miss any single one of these and Telegram drops the file without telling you why. The yuv420p requirement caught me off guard because I couldn't find it documented anywhere.

The FFmpeg Pipeline

Here's the command I landed on after a lot of trial and error:

ffmpeg -y -i input.mp4 -t 10 \
  -vf "crop=min(iw\,ih):min(iw\,ih),scale=800:800,fps=30,format=yuv420p" \
  -c:v libx264 -preset medium \
  -b:v 900k -maxrate 1200k -bufsize 2M \
  -an -movflags +faststart \
  output.mp4

Let me break it down piece by piece.

crop=min(iw\,ih):min(iw\,ih) takes a center-square crop from any aspect ratio. A 16:9 landscape video becomes 1:1 without stretching or distorting faces. This handles portrait phone recordings and widescreen clips equally well.

scale=800:800 hits Telegram's exact resolution requirement. Not 799, not 801. Exactly 800.

fps=30 normalizes variable frame rate sources. VFR game captures from OBS were breaking duration detection without this.

format=yuv420p forces the pixel format. Some sources use yuv444p or yuv422p, and Telegram rejects both.

-c:v libx264 -preset medium re-encodes everything to H.264. This is where HEVC from iPhones gets converted. The medium preset balances encoding speed and compression quality.

-b:v 900k -maxrate 1200k -bufsize 2M targets 900 kbps with a ceiling at 1200 kbps. This keeps most 10-second clips under the 2 MB cap. If the output still exceeds 2 MB (high-motion content, lots of fine detail), I retry at a lower bitrate until it fits.

-an strips all audio tracks completely.

-movflags +faststart moves the moov atom to the front of the file. Required for proper streaming playback.

The 900 kbps target is a sweet spot I found empirically. High enough for decent quality at 800x800, low enough that most clips fit under 2 MB on the first encode pass.

Wrapping It in an Aiogram 3 Handler

I wanted users to just send a video and get a result back. No commands, no menus. Here's the handler:

import asyncio, subprocess, tempfile
from pathlib import Path
from aiogram import Router, F
from aiogram.types import Message, FSInputFile

router = Router()
sem = asyncio.Semaphore(4)

@router.message(F.video | F.animation | F.document)
async def on_video(message: Message):
    file = message.video or message.animation or message.document
    async with sem:
        with tempfile.TemporaryDirectory() as td:
            inp = Path(td) / "in.bin"
            out = Path(td) / "out.mp4"
            await message.bot.download(file, destination=inp)

            cmd = [
                "ffmpeg", "-y", "-i", str(inp), "-t", "10",
                "-vf", "crop=min(iw\\,ih):min(iw\\,ih),"
                       "scale=800:800,fps=30,format=yuv420p",
                "-c:v", "libx264", "-preset", "medium",
                "-b:v", "900k", "-maxrate", "1200k",
                "-bufsize", "2M", "-an",
                "-movflags", "+faststart", str(out),
            ]
            proc = await asyncio.create_subprocess_exec(
                *cmd,
                stdout=subprocess.DEVNULL,
                stderr=subprocess.DEVNULL,
            )
            await proc.wait()

            await message.answer_video(FSInputFile(out))

A few things worth noting here.

The asyncio.Semaphore(4) caps concurrent ffmpeg processes. Without it, a burst of uploads would eat all CPU and memory on a small VPS. Each conversion takes 2 to 5 seconds depending on input length and codec.

tempfile.TemporaryDirectory() handles cleanup automatically. No orphaned files accumulating on disk after crashes.

I use .bin for the input file extension because Telegram doesn't always provide a useful filename. FFmpeg detects the container format from the file header, not the extension, so it doesn't matter.

The handler accepts three input types: F.video for direct video uploads, F.animation for GIFs, and F.document for files sent as documents (which happens when people forward large videos).

Shipping It as @liveavabot

I deployed this on a Hetzner VPS running Python 3.11 with aiogram 3. The typical flow is simple: someone records a 15-second iPhone clip, sends it to the bot, gets back a 10-second 800x800 H.264 MP4 with no audio. They long-press their profile picture in Telegram, select the video, done.

The bot handles a bunch of input formats transparently. iPhone HEVC recordings (the original problem), Samsung HEVC mode videos, OBS captures in VP9 or MKV, Telegram animated stickers in WebM, Instagram Reels that are already H.264 but have the wrong aspect ratio. Everything goes through the same pipeline: crop, scale, re-encode, strip audio.

Edge Cases That Bit Me

Variable frame rate

Game capture software like OBS sometimes produces VFR video. The fps=30 filter normalizes this, but I had a few cases where ffprobe metadata was corrupted and duration detection failed entirely. I added a fallback that skips the probe and just runs the conversion blind.

The 2 MB wall

Some inputs with dense visual content (confetti, particle effects, fast camera pans) exceed 2 MB even at 900 kbps for 10 seconds of footage. The retry logic drops the bitrate incrementally until the output fits. Most clips work on the first pass. Dense footage sometimes needs a second encode at 600 to 700 kbps.

Looping artifacts

Telegram plays video avatars on repeat. If the last frame looks very different from the first frame, you get a visible jump on every loop. I don't do anything smart about this yet (like cross-fading the boundary), but it's on the list.

Silent failures in production

The hardest bugs to track down were all silent rejections from Telegram's server. No error codes, no status messages. I had to test each constraint in isolation to build the spec I described above. The yuv420p pixel format requirement took the longest to discover.

What I Learned

FFmpeg is doing the heavy lifting here. I wrote the wrapper, the queue management, and the Telegram integration. The actual video processing is entirely ffmpeg with the right flags.

The biggest insight: Telegram's video avatar format is strict but undocumented. Once you know the exact spec (800x800, H.264, yuv420p, no audio, under 2 MB, faststart), the conversion is straightforward. The pain was discovering those constraints in the first place.

If you want to try it, send any video to @LiveAvaBot on Telegram.

Built by me. Feedback welcome.

DEV Community