Why Your iPhone Video Avatar Silently Fails in Telegram

#telegram #ffmpeg #python #tutorial

The silent failure that pushed me to build this

I took a 5-second clip on my iPhone, tried to set it as my Telegram avatar, and got nothing. The upload completed. The progress bar finished. My avatar stayed the same boring static photo. No error message, no warning.

Turns out Telegram quietly rejects iPhone videos because iOS records in HEVC (H.265) by default, and Telegram's video avatar pipeline only accepts H.264. The spec is strict, and if your file misses even one constraint, the upload "succeeds" but the avatar never updates.

I got tired of doing the ffmpeg dance by hand every time, so I packaged the conversion into a bot.

What the Telegram video avatar spec actually requires

After reading the Bot API docs and poking at the MTProto layer, here is the exact set of constraints that has to be true for a video avatar to stick:

Container: MP4
Video codec: H.264 (libx264), baseline or main profile
Pixel format: yuv420p
Resolution: 800x800 square, exact
Duration: 10 seconds maximum
File size: 2MB maximum
Audio: must be removed entirely
moov atom: at the start of the file (faststart)

Miss any one of these and the upload "succeeds" but the avatar silently does not change. The 2MB cap is the painful one, because a 10-second 800x800 H.264 clip at decent quality wants to be 3 to 5 MB.

The ffmpeg pipeline that fixes it

Here is the actual command the bot runs. It re-encodes, crops to square, scales to 800x800, strips audio, sets yuv420p, and writes faststart:

ffmpeg -y -i input.mov \
  -t 10 \
  -vf "crop='min(in_w,in_h)':'min(in_w,in_h)',scale=800:800:flags=lanczos,format=yuv420p" \
  -c:v libx264 -profile:v baseline -level 3.1 \
  -preset slow -crf 28 \
  -movflags +faststart \
  -an \
  output.mp4

A few notes on why each flag is there:

-t 10 hard-caps duration to 10 seconds.
The crop filter uses min(in_w,in_h) so portrait and landscape both end up square without distortion.
scale=800:800:flags=lanczos keeps detail better than bilinear on faces.
-profile:v baseline -level 3.1 matches Telegram's playback target across old Android devices.
-crf 28 is the sweet spot for staying under 2MB at 800x800.
-an strips audio. The avatar pipeline rejects files with an audio stream entirely.
+faststart moves the moov atom to the front so the file can stream-decode.

If the result is still above 2MB (rare, but happens on busy scenes), the bot bumps CRF to 30 and re-runs.

The minimal aiogram 3 handler

The bot listens for video, video_note, animation, and document messages, downloads them, runs the pipeline above, and sends back the converted file. Here is the trimmed handler:

from aiogram import Router, F
from aiogram.types import Message, FSInputFile
import asyncio, tempfile, pathlib

router = Router()

@router.message(F.video | F.video_note | F.animation | F.document)
async def convert(msg: Message):
    file = msg.video or msg.video_note or msg.animation or msg.document
    if not file:
        return

    with tempfile.TemporaryDirectory() as tmp:
        src = pathlib.Path(tmp) / "in.bin"
        dst = pathlib.Path(tmp) / "out.mp4"

        await msg.bot.download(file, destination=src)

        proc = await asyncio.create_subprocess_exec(
            "ffmpeg", "-y", "-i", str(src),
            "-t", "10",
            "-vf", "crop='min(in_w,in_h)':'min(in_w,in_h)',"
                   "scale=800:800:flags=lanczos,format=yuv420p",
            "-c:v", "libx264", "-profile:v", "baseline", "-level", "3.1",
            "-preset", "slow", "-crf", "28",
            "-movflags", "+faststart", "-an",
            str(dst),
        )
        await proc.wait()

        if dst.stat().st_size > 2 * 1024 * 1024:
            await msg.reply("Result still over 2MB, try a shorter or simpler clip.")
            return

        await msg.reply_video(FSInputFile(dst), caption="Set this as your video avatar.")

Two things worth flagging here. First, I use F.video | F.video_note | F.animation | F.document because iPhone Live Photos and screen recordings arrive as document with a video mime type, not video. Missing this filter meant about 30% of clips never got processed. Second, the subprocess is awaited via asyncio.create_subprocess_exec, not subprocess.run, otherwise the whole event loop blocks while ffmpeg chews through a 4K source.

How I packaged this as @liveavabot

The whole thing lives at https://t.me/LiveAvaBot?start=devto_article_20260524. Send any video, GIF, video note, or even a screen recording, and you get back a Telegram-spec-compliant 800x800 H.264 file ready to drop into Settings, Edit profile, Set new photo.

The bot runs on a small Hetzner VM. ffmpeg is doing all the heavy lifting, I just wrote the wrapper, the queueing, and the failure messages. About 80 people have used it so far, mostly folks who tried to set an iPhone clip as their avatar and hit the same silent failure I did.

Edge cases and what is next

A few things still trip the pipeline:

4K HDR clips with Dolby Vision metadata sometimes need a tone-mapping pass before the crop filter. I have not wired this in yet because it adds about 8 seconds of encode time per clip.
Live Photos: I currently take the video track, but the trim point Apple stores in the metadata is sometimes off-center. Worth honoring on a future pass.
Very dark scenes get blocky at CRF 28. Bumping the preset to veryslow helps but I do not want to push encode time past 15 seconds per clip on shared hardware.

If you want to extend or fork the converter, the constraints above are the part to memorize. The rest is plumbing. Built by me, @liveavabot.