DEV Community

liveavabot
liveavabot

Posted on

How I Built a Bot to Fix Telegram's Silent Video Avatar Failures

I wanted to set a video avatar on Telegram. Recorded a short clip on my iPhone, uploaded it, and nothing happened. No error message. No warning. Telegram just silently ignored my video.

Took me way too long to figure out why. iPhones record in HEVC (H.265) by default. Telegram video avatars only accept H.264. But instead of telling you that, Telegram accepts the upload, returns a success response, and then quietly does nothing with your file.

I built @LiveAvaBot to fix this. Send it any video or GIF, get back a file ready to use as your Telegram video avatar. Here's how the conversion pipeline works.

What Telegram Actually Requires

The documentation for video avatars is thin. After testing and reading Telegram client source code, here's the actual spec:

  • Codec: H.264 (libx264), High profile
  • Resolution: 800x800 pixels, must be square
  • Duration: 10 seconds max
  • File size: 2 MB max
  • Audio: none, must be stripped entirely
  • Pixel format: yuv420p
  • Container: MP4 with faststart flag

The tricky part: Telegram's API returns success even when the video doesn't meet these requirements. Your client shows a spinner, maybe a checkmark, and then the avatar just doesn't update. No error callback, no webhook failure. Nothing in the logs.

The FFmpeg Pipeline

The conversion needs to handle several things at once: crop to square, scale down, re-encode to H.264, strip audio, and keep the file under 2 MB.

Here's the production command:

ffmpeg -y -v error \
  -i input.mp4 \
  -t 9 \
  -vf "crop='min(iw,ih)':'min(iw,ih)',scale=800:800:flags=lanczos,fps=30,format=yuv420p" \
  -an \
  -c:v libx264 -profile:v high -level 4.0 \
  -preset medium -crf 23 \
  -maxrate 1400k -bufsize 2800k \
  -pix_fmt yuv420p -movflags +faststart \
  output.mp4
Enter fullscreen mode Exit fullscreen mode

Breaking this down piece by piece.

crop='min(iw,ih)':'min(iw,ih)' takes a square crop from the center of the frame. Works for both landscape and portrait inputs without separate logic per aspect ratio.

scale=800:800:flags=lanczos resizes to the target resolution with Lanczos resampling. Noticeably sharper than bilinear at this size.

-t 9 caps duration at 9 seconds, not 10. One second of safety margin because some containers report duration slightly longer than the actual stream content.

-an strips all audio tracks. This is critical. Even a silent audio track can cause Telegram to reject the file without any error message.

-maxrate 1400k -bufsize 2800k is the VBV (Video Buffering Verifier) constraint. At 9 seconds and 1400 kbps peak, the math works out to well under 2 MB. The bufsize at 2x maxrate gives ffmpeg room to allocate bits to complex frames without overshooting the limit.

-movflags +faststart moves the MP4 moov atom to the beginning of the file. Lets Telegram start processing before the full download completes.

Handling It in Aiogram 3

The bot runs on aiogram 3. Here's a simplified version of the handler:

from aiogram import Router, F, types, Bot
from pathlib import Path
import asyncio

router = Router()

@router.message(F.video | F.animation | F.document)
async def handle_video(message: types.Message, bot: Bot):
    if message.video:
        file_id = message.video.file_id
    elif message.animation:
        file_id = message.animation.file_id
    elif message.document and message.document.mime_type \
         and message.document.mime_type.startswith("video"):
        file_id = message.document.file_id
    else:
        return

    file = await bot.get_file(file_id)
    src = Path(f"/tmp/{file_id}.mp4")
    dst = Path(f"/tmp/{file_id}_avatar.mp4")

    await bot.download_file(file.file_path, src)

    proc = await asyncio.create_subprocess_exec(
        "ffmpeg", "-y", "-v", "error",
        "-i", str(src), "-t", "9",
        "-vf", "crop='min(iw,ih)':'min(iw,ih)',"
               "scale=800:800:flags=lanczos,fps=30,format=yuv420p",
        "-an", "-c:v", "libx264", "-profile:v", "high",
        "-level", "4.0", "-preset", "medium", "-crf", "23",
        "-maxrate", "1400k", "-bufsize", "2800k",
        "-pix_fmt", "yuv420p", "-movflags", "+faststart",
        str(dst),
        stderr=asyncio.subprocess.PIPE,
    )
    await proc.wait()

    if proc.returncode == 0 and dst.exists():
        await message.reply_document(
            types.FSInputFile(dst),
            caption="Your video avatar is ready. "
                    "Download it, then set it in Telegram settings.",
        )
    else:
        await message.reply("Conversion failed. Try a shorter clip.")
Enter fullscreen mode Exit fullscreen mode

I accept F.video | F.animation | F.document because users send files in all three ways. GIFs come through as animation. Some clients send videos as document when the file extension is unusual. Covering all three catches most inputs.

The conversion runs as a subprocess with asyncio.create_subprocess_exec. This keeps the event loop free while ffmpeg does the heavy lifting. Median conversion time is around 11 seconds on a cheap VPS.

Edge Cases I Hit Along the Way

HEVC detection matters before you even start. If someone sends a .mov from an iPhone, ffmpeg handles the transcode fine on most systems. But some builds of ffmpeg don't include the HEVC decoder. I added a pre-check with ffprobe -v error -select_streams v:0 -show_entries stream=codec_name to identify the codec before running the main pipeline. If you don't have HEVC support, you can at least return a useful error instead of a garbled output.

GIFs need normalization. Telegram "GIFs" are actually MP4s with no audio. But real GIF files from other sources can have variable frame rates that confuse the encoder. The fps=30 filter in the chain normalizes this.

CRF alone doesn't guarantee file size. CRF 23 usually produces files under 2 MB at 800x800 for 9 seconds. But high-motion clips (confetti, water, fast camera panning) can exceed it. That's why the maxrate/bufsize pair exists as a hard ceiling. The tradeoff is slightly lower quality on those specific frames, but the file stays within Telegram's spec.

Silent audio tracks are the sneakiest failure. Some screen recordings include a silent audio stream by default. The file plays fine everywhere, but Telegram rejects it as a video avatar with zero feedback. -an is not optional.

What I Shipped

The bot has 8 users right now. Small, but it works. Send it a video, get back an avatar-ready file.

If you're building something similar, the ffmpeg pipeline above is the core of it. The rest is plumbing: downloading files from Telegram's API, cleaning up temp files, handling timeouts.

Built by me. Try it: https://t.me/LiveAvaBot?start=devto_article_20260425

The source of most "broken" Telegram video avatars is just codec mismatch. Once you know the exact spec, ffmpeg does the rest.

Top comments (0)