The silent failure that pushed me to build this
I took a 5-second clip on my iPhone, tried to set it as my Telegram avatar, and got nothing. The upload completed. The progress bar finished. My avatar stayed the same boring static photo. No error message, no warning.
Turns out Telegram quietly rejects iPhone videos because iOS records in HEVC (H.265) by default, and Telegram's video avatar pipeline only accepts H.264. The spec is strict, and if your file misses even one constraint, the upload "succeeds" but the avatar never updates.
I got tired of doing the ffmpeg dance by hand every time, so I packaged the conversion into a bot.
What the Telegram video avatar spec actually requires
After reading the Bot API docs and poking at the MTProto layer, here is the exact set of constraints that has to be true for a video avatar to stick:
- Container: MP4
- Video codec: H.264 (libx264), baseline or main profile
- Pixel format: yuv420p
- Resolution: 800x800 square, exact
- Duration: 10 seconds maximum
- File size: 2MB maximum
- Audio: must be removed entirely
- moov atom: at the start of the file (faststart)
Miss any one of these and the upload "succeeds" but the avatar silently does not change. The 2MB cap is the painful one, because a 10-second 800x800 H.264 clip at decent quality wants to be 3 to 5 MB.
The ffmpeg pipeline that fixes it
Here is the actual command the bot runs. It re-encodes, crops to square, scales to 800x800, strips audio, sets yuv420p, and writes faststart:
ffmpeg -y -i input.mov \
-t 10 \
-vf "crop='min(in_w,in_h)':'min(in_w,in_h)',scale=800:800:flags=lanczos,format=yuv420p" \
-c:v libx264 -profile:v baseline -level 3.1 \
-preset slow -crf 28 \
-movflags +faststart \
-an \
output.mp4
A few notes on why each flag is there:
-
-t 10hard-caps duration to 10 seconds. - The crop filter uses
min(in_w,in_h)so portrait and landscape both end up square without distortion. -
scale=800:800:flags=lanczoskeeps detail better than bilinear on faces. -
-profile:v baseline -level 3.1matches Telegram's playback target across old Android devices. -
-crf 28is the sweet spot for staying under 2MB at 800x800. -
-anstrips audio. The avatar pipeline rejects files with an audio stream entirely. -
+faststartmoves the moov atom to the front so the file can stream-decode.
If the result is still above 2MB (rare, but happens on busy scenes), the bot bumps CRF to 30 and re-runs.
The minimal aiogram 3 handler
The bot listens for video, video_note, animation, and document messages, downloads them, runs the pipeline above, and sends back the converted file. Here is the trimmed handler:
from aiogram import Router, F
from aiogram.types import Message, FSInputFile
import asyncio, tempfile, pathlib
router = Router()
@router.message(F.video | F.video_note | F.animation | F.document)
async def convert(msg: Message):
file = msg.video or msg.video_note or msg.animation or msg.document
if not file:
return
with tempfile.TemporaryDirectory() as tmp:
src = pathlib.Path(tmp) / "in.bin"
dst = pathlib.Path(tmp) / "out.mp4"
await msg.bot.download(file, destination=src)
proc = await asyncio.create_subprocess_exec(
"ffmpeg", "-y", "-i", str(src),
"-t", "10",
"-vf", "crop='min(in_w,in_h)':'min(in_w,in_h)',"
"scale=800:800:flags=lanczos,format=yuv420p",
"-c:v", "libx264", "-profile:v", "baseline", "-level", "3.1",
"-preset", "slow", "-crf", "28",
"-movflags", "+faststart", "-an",
str(dst),
)
await proc.wait()
if dst.stat().st_size > 2 * 1024 * 1024:
await msg.reply("Result still over 2MB, try a shorter or simpler clip.")
return
await msg.reply_video(FSInputFile(dst), caption="Set this as your video avatar.")
Two things worth flagging here. First, I use F.video | F.video_note | F.animation | F.document because iPhone Live Photos and screen recordings arrive as document with a video mime type, not video. Missing this filter meant about 30% of clips never got processed. Second, the subprocess is awaited via asyncio.create_subprocess_exec, not subprocess.run, otherwise the whole event loop blocks while ffmpeg chews through a 4K source.
How I packaged this as @liveavabot
The whole thing lives at https://t.me/LiveAvaBot?start=devto_article_20260524. Send any video, GIF, video note, or even a screen recording, and you get back a Telegram-spec-compliant 800x800 H.264 file ready to drop into Settings, Edit profile, Set new photo.
The bot runs on a small Hetzner VM. ffmpeg is doing all the heavy lifting, I just wrote the wrapper, the queueing, and the failure messages. About 80 people have used it so far, mostly folks who tried to set an iPhone clip as their avatar and hit the same silent failure I did.
Edge cases and what is next
A few things still trip the pipeline:
- 4K HDR clips with Dolby Vision metadata sometimes need a tone-mapping pass before the crop filter. I have not wired this in yet because it adds about 8 seconds of encode time per clip.
- Live Photos: I currently take the video track, but the trim point Apple stores in the metadata is sometimes off-center. Worth honoring on a future pass.
- Very dark scenes get blocky at CRF 28. Bumping the preset to
veryslowhelps but I do not want to push encode time past 15 seconds per clip on shared hardware.
If you want to extend or fork the converter, the constraints above are the part to memorize. The rest is plumbing. Built by me, @liveavabot.
Top comments (0)