Why iPhone Videos Silently Fail as Telegram Video Avatars (and How to Fix It With ffmpeg)

#telegram #ffmpeg #python #tutorial

I run a small Telegram bot called @liveavabot. It takes any video or GIF you send it and turns it into a Telegram video avatar. Sounds trivial. It's not, because Telegram's video-avatar format is picky in ways that aren't obvious until you fail three times in a row.

This post is the build log: what the spec actually requires, why iPhone videos silently fail, and the ffmpeg pipeline I settled on.

The spec Telegram never really documents

If you dig through the Bot API and the mobile client source, the constraints for a video avatar are:

Container: MP4 with faststart (moov atom at the front).
Video codec: H.264 (AVC). Baseline or main profile is safest.
Pixel format: yuv420p. Not yuv420p10le, not HDR.
Resolution: square, 800x800 works everywhere. Some clients accept 640x640.
Duration: at most 10 seconds.
File size: at most about 2 MB. Big files get rejected or heavily re-compressed.
Audio: must be stripped. Not muted, stripped.
Frame rate: 30 fps is fine. 60 fps sometimes trips older clients.

Send anything that misses one of these and the Telegram client will accept the upload, spin for a second, and then silently show the old avatar. No error toast. No log line you can see from your side.

Why iPhone videos in particular fail

Since iOS 11, the iPhone camera records in HEVC (H.265) inside a .mov container, with yuv420p10le pixel format for HDR captures. When a user shares that clip to the bot, Telegram forwards it as-is. Telegram's server then tries to remux it into an avatar. HEVC is not on the accepted list, so the client falls back to a still frame or drops the update entirely.

The user thinks they sent a video. The bot thinks it processed it. Telegram just quietly ignored it. This is the single most common bug report I got in the first month.

The fix is to re-encode every upload to H.264 + yuv420p before handing it back to Telegram, even if the input looks like an MP4 already. Trust nothing, transcode everything.

The ffmpeg pipeline

Here is the actual command the bot runs. Two passes: first cropdetect to find the tight bounding box, then encode.

# Pass 1: detect the crop box on a 2-second sample
ffmpeg -ss 0 -t 2 -i input.mov \
  -vf cropdetect=24:16:0 -f null - 2>&1 \
  | grep -oE 'crop=[0-9:]+' | tail -1
# -> crop=1080:1080:0:420

# Pass 2: crop, scale to 800x800, re-encode, strip audio
ffmpeg -y -i input.mov \
  -t 10 \
  -vf "crop=1080:1080:0:420,scale=800:800:flags=lanczos,format=yuv420p" \
  -c:v libx264 -profile:v main -level 4.0 \
  -pix_fmt yuv420p \
  -preset veryfast -crf 26 \
  -movflags +faststart \
  -an \
  output.mp4

A few notes on the flags that matter:

-t 10 caps the output at 10 seconds. Users send 30-second clips all the time and I'd rather truncate than reject.
format=yuv420p in the filter chain, plus -pix_fmt yuv420p on the encoder, is belt-and-suspenders. HDR sources need both to actually get downconverted.
-an drops audio. Not -c:a copy, not -af volume=0. Strip it.
-movflags +faststart moves the moov atom to the start of the file so Telegram can begin decoding before the download finishes. Without this, the avatar often shows as a black square.
-crf 26 is my sweet spot. 23 blows past the 2 MB budget on busy scenes. 28 looks mushy.

The aiogram 3 handler

Wiring it into a bot is boring in a good way. Here's the minimal handler:

import asyncio
import tempfile
from pathlib import Path

from aiogram import Router, F
from aiogram.types import Message, FSInputFile

router = Router()

@router.message(F.video | F.animation | F.document)
async def handle_video(message: Message) -> None:
    file = message.video or message.animation or message.document
    if not file:
        return

    with tempfile.TemporaryDirectory() as tmp:
        src = Path(tmp) / "input.bin"
        dst = Path(tmp) / "avatar.mp4"

        await message.bot.download(file, destination=src)

        proc = await asyncio.create_subprocess_exec(
            "ffmpeg", "-y", "-i", str(src),
            "-t", "10",
            "-vf", "scale=800:800:force_original_aspect_ratio=increase,"
                   "crop=800:800,format=yuv420p",
            "-c:v", "libx264", "-profile:v", "main",
            "-pix_fmt", "yuv420p",
            "-preset", "veryfast", "-crf", "26",
            "-movflags", "+faststart",
            "-an",
            str(dst),
            stdout=asyncio.subprocess.DEVNULL,
            stderr=asyncio.subprocess.PIPE,
        )
        _, err = await proc.communicate()

        if proc.returncode != 0 or not dst.exists():
            await message.reply("couldn't convert this one, sorry")
            return

        await message.reply_video(FSInputFile(dst), caption="upload this via Settings, Edit profile, camera icon")

I dropped the two-pass cropdetect here and used scale + crop in one filter for simplicity. It handles portrait and landscape inputs by scaling the short side to 800 and center-cropping the long side. Good enough for 90 percent of uploads.

Packaging this as @liveavabot

The bot lives at https://t.me/LiveAvaBot?start=devto_article_20260701. Send it any video or GIF. It replies with a converted MP4 you can set as your Telegram video avatar. That's the whole product. No accounts, no menus, no upsell.

Infra is boring by design: a single 4 GB VPS, systemd unit for the aiogram process, ffmpeg from the Ubuntu repos. No queue, no worker pool. Each conversion is under a second for typical 5-second clips, so I just run them inline in the handler.

Edge cases I've hit

Live Photos on iPhone: they arrive as a still image plus a .mov. Handle the .mov, ignore the still.
Vertical videos: center-crop is fine for faces, terrible for full-body. I'm considering face-detection crop but it's not worth the dependency yet.
Animated WebP: .webp from stickers. ffmpeg handles it if you have libwebp built in. Ubuntu's package does.
10-bit HDR: covered by format=yuv420p in the filter chain. Without it, the encoder chokes with "pixel format not supported."
Files larger than 20 MB: Telegram's Bot API refuses the download. I catch this and ask the user to trim first.

What's next: batch mode (send a photo, get a 3-second Ken Burns pan as an avatar), and better handling of GIFs with palettes that go muddy after H.264 re-encode.

Built by me, @liveavabot. If you have a weird clip that breaks it, send it. That's how I find bugs.