I run a small Telegram bot called @liveavabot. It takes any video or GIF you send it and turns it into a Telegram video avatar. Sounds trivial. It's not, because Telegram's video-avatar format is picky in ways that aren't obvious until you fail three times in a row.
This post is the build log: what the spec actually requires, why iPhone videos silently fail, and the ffmpeg pipeline I settled on.
The spec Telegram never really documents
If you dig through the Bot API and the mobile client source, the constraints for a video avatar are:
-
Container: MP4 with
faststart(moov atom at the front). - Video codec: H.264 (AVC). Baseline or main profile is safest.
-
Pixel format:
yuv420p. Notyuv420p10le, not HDR. - Resolution: square, 800x800 works everywhere. Some clients accept 640x640.
- Duration: at most 10 seconds.
- File size: at most about 2 MB. Big files get rejected or heavily re-compressed.
- Audio: must be stripped. Not muted, stripped.
- Frame rate: 30 fps is fine. 60 fps sometimes trips older clients.
Send anything that misses one of these and the Telegram client will accept the upload, spin for a second, and then silently show the old avatar. No error toast. No log line you can see from your side.
Why iPhone videos in particular fail
Since iOS 11, the iPhone camera records in HEVC (H.265) inside a .mov container, with yuv420p10le pixel format for HDR captures. When a user shares that clip to the bot, Telegram forwards it as-is. Telegram's server then tries to remux it into an avatar. HEVC is not on the accepted list, so the client falls back to a still frame or drops the update entirely.
The user thinks they sent a video. The bot thinks it processed it. Telegram just quietly ignored it. This is the single most common bug report I got in the first month.
The fix is to re-encode every upload to H.264 + yuv420p before handing it back to Telegram, even if the input looks like an MP4 already. Trust nothing, transcode everything.
The ffmpeg pipeline
Here is the actual command the bot runs. Two passes: first cropdetect to find the tight bounding box, then encode.
# Pass 1: detect the crop box on a 2-second sample
ffmpeg -ss 0 -t 2 -i input.mov \
-vf cropdetect=24:16:0 -f null - 2>&1 \
| grep -oE 'crop=[0-9:]+' | tail -1
# -> crop=1080:1080:0:420
# Pass 2: crop, scale to 800x800, re-encode, strip audio
ffmpeg -y -i input.mov \
-t 10 \
-vf "crop=1080:1080:0:420,scale=800:800:flags=lanczos,format=yuv420p" \
-c:v libx264 -profile:v main -level 4.0 \
-pix_fmt yuv420p \
-preset veryfast -crf 26 \
-movflags +faststart \
-an \
output.mp4
A few notes on the flags that matter:
-
-t 10caps the output at 10 seconds. Users send 30-second clips all the time and I'd rather truncate than reject. -
format=yuv420pin the filter chain, plus-pix_fmt yuv420pon the encoder, is belt-and-suspenders. HDR sources need both to actually get downconverted. -
-androps audio. Not-c:a copy, not-af volume=0. Strip it. -
-movflags +faststartmoves the moov atom to the start of the file so Telegram can begin decoding before the download finishes. Without this, the avatar often shows as a black square. -
-crf 26is my sweet spot. 23 blows past the 2 MB budget on busy scenes. 28 looks mushy.
The aiogram 3 handler
Wiring it into a bot is boring in a good way. Here's the minimal handler:
import asyncio
import tempfile
from pathlib import Path
from aiogram import Router, F
from aiogram.types import Message, FSInputFile
router = Router()
@router.message(F.video | F.animation | F.document)
async def handle_video(message: Message) -> None:
file = message.video or message.animation or message.document
if not file:
return
with tempfile.TemporaryDirectory() as tmp:
src = Path(tmp) / "input.bin"
dst = Path(tmp) / "avatar.mp4"
await message.bot.download(file, destination=src)
proc = await asyncio.create_subprocess_exec(
"ffmpeg", "-y", "-i", str(src),
"-t", "10",
"-vf", "scale=800:800:force_original_aspect_ratio=increase,"
"crop=800:800,format=yuv420p",
"-c:v", "libx264", "-profile:v", "main",
"-pix_fmt", "yuv420p",
"-preset", "veryfast", "-crf", "26",
"-movflags", "+faststart",
"-an",
str(dst),
stdout=asyncio.subprocess.DEVNULL,
stderr=asyncio.subprocess.PIPE,
)
_, err = await proc.communicate()
if proc.returncode != 0 or not dst.exists():
await message.reply("couldn't convert this one, sorry")
return
await message.reply_video(FSInputFile(dst), caption="upload this via Settings, Edit profile, camera icon")
I dropped the two-pass cropdetect here and used scale + crop in one filter for simplicity. It handles portrait and landscape inputs by scaling the short side to 800 and center-cropping the long side. Good enough for 90 percent of uploads.
Packaging this as @liveavabot
The bot lives at https://t.me/LiveAvaBot?start=devto_article_20260701. Send it any video or GIF. It replies with a converted MP4 you can set as your Telegram video avatar. That's the whole product. No accounts, no menus, no upsell.
Infra is boring by design: a single 4 GB VPS, systemd unit for the aiogram process, ffmpeg from the Ubuntu repos. No queue, no worker pool. Each conversion is under a second for typical 5-second clips, so I just run them inline in the handler.
Edge cases I've hit
-
Live Photos on iPhone: they arrive as a still image plus a
.mov. Handle the.mov, ignore the still. - Vertical videos: center-crop is fine for faces, terrible for full-body. I'm considering face-detection crop but it's not worth the dependency yet.
-
Animated WebP:
.webpfrom stickers. ffmpeg handles it if you have libwebp built in. Ubuntu's package does. -
10-bit HDR: covered by
format=yuv420pin the filter chain. Without it, the encoder chokes with "pixel format not supported." - Files larger than 20 MB: Telegram's Bot API refuses the download. I catch this and ask the user to trim first.
What's next: batch mode (send a photo, get a 3-second Ken Burns pan as an avatar), and better handling of GIFs with palettes that go muddy after H.264 re-encode.
Built by me, @liveavabot. If you have a weird clip that breaks it, send it. That's how I find bugs.
Top comments (0)