My girlfriend sent me a 4 second clip from her iPhone and asked me to set it as her Telegram profile video. Telegram accepted the upload, showed a spinner, then quietly kept the old static photo. No error message, nothing to google. The upload just evaporated.
The clip was HEVC (H.265), 1080x1920, 10-bit, straight from default iOS camera settings. Telegram's video avatar pipeline wants something very different, and when it doesn't get it, it fails without telling you what went wrong. I spent an evening figuring out the actual requirements, wrote an ffmpeg wrapper, and ended up shipping it as a bot. This post is the write-up.
What Telegram actually requires
The official docs on video avatars are thin. Most of this came from reading TDLib sources and brute-force testing:
- Container: MP4
- Video codec: H.264. HEVC gets rejected, even though Telegram plays HEVC fine in regular messages.
- Resolution: square. 800x800 is what the official clients produce.
- Duration: 10 seconds max.
- File size: 2 MB max. Not 20. Two.
- Audio: stripped entirely. A muted track is not enough, the stream has to be gone.
- Pixel format: yuv420p. Modern iPhones record 10-bit HDR (yuv420p10le) and that fails.
- moov atom at the front of the file (faststart), otherwise some mobile clients choke.
The 2 MB cap is the one that actually hurts. Ten seconds of 800x800 video in 2 MB gives you a total budget of about 1600 kbps. A default iPhone clip runs 15 to 50 Mbps. You are throwing away 95%+ of the original bitrate, so every encoding decision matters.
The ffmpeg pipeline
I run it in two passes. The first pass is cropdetect over the first 3 seconds, because letterboxed clips and screen recordings carry black bars that waste the tiny bitrate budget:
ffmpeg -i input.mov -t 3 -vf "cropdetect=24:16:0" -f null - 2>&1 \
| grep -o "crop=.*" | tail -1
That prints something like crop=1080:1440:0:240. Then the real encode: apply the crop, scale so the short side hits 800, center-crop to a square, normalize to 30 fps and 8-bit yuv420p, cut at 10 seconds, strip audio:
ffmpeg -i input.mov -t 10 \
-vf "crop=1080:1440:0:240,scale=-2:800,crop=800:800,fps=30,format=yuv420p" \
-c:v libx264 -profile:v high -preset slow \
-b:v 1400k -maxrate 1600k -bufsize 2000k \
-an -movflags +faststart \
output.mp4
Flag notes. -an removes the audio stream completely. -b:v 1400k with a 1600k maxrate leaves headroom so a full 10 second clip lands under 2 MB. format=yuv420p downconverts 10-bit HDR sources. -movflags +faststart relocates the moov atom to the front. -preset slow buys real quality at these starved bitrates, and on an 800x800 clip it costs only a few extra seconds of CPU.
If the output still overshoots 2 MB (busy footage with lots of motion does this), I retry once with the bitrate scaled down: new_bitrate = old_bitrate * (2_000_000 / actual_size) * 0.95. One retry has been enough for every real-world file I've seen.
Wiring it into aiogram 3
The bot side is small. Videos, GIFs (animations) and video files sent as documents all funnel into one handler:
from aiogram import Router, F
from aiogram.types import Message, FSInputFile
router = Router()
@router.message(F.video | F.animation | F.document)
async def handle_media(message: Message):
media = message.video or message.animation or message.document
if media.file_size and media.file_size > 100 * 1024 * 1024:
await message.answer("File too big, 100 MB max.")
return
status = await message.answer("Converting...")
src = await download_file(message.bot, media.file_id)
result = await convert_to_avatar(src) # the ffmpeg pipeline above
await message.answer_video(
FSInputFile(result.path),
caption=(
f"{result.size_kb} KB, {result.duration:.1f}s. "
"Save the file, then set it as your profile video "
"in Settings > Edit Profile."
),
)
await status.delete()
convert_to_avatar runs ffmpeg through asyncio.create_subprocess_exec, so the event loop stays free while a 60 MB clip gets chewed through. One gotcha: Bot API file downloads cap at 20 MB unless you run a local Bot API server, which I do, because iPhone clips blow past 20 MB constantly.
Shipping it as @liveavabot
I packaged the pipeline as @LiveAvaBot. Send it any video or GIF, it replies with a ready file plus instructions. 193 people have used it so far, and iPhone HEVC clips are still the most common input, which matches the original itch.
The whole thing is one Python process: aiogram 3 plus ffmpeg plus a small queue so parallel conversions don't stack up on the VPS it runs on. ffmpeg is doing the heavy lifting, I just wrote the wrapper and the retry logic around it.
Edge cases that cost me time
Rotation metadata. iPhones record sensor-native and set a display matrix instead of rotating pixels. Modern ffmpeg autorotates before the filter chain, but if you build filters assuming the raw stream width and height, you'll compute crops for a sideways frame. I read dimensions after autorotation, not from the raw stream info.
GIF inputs. Palette GIFs with odd frame delays produce stuttery output unless you normalize with fps=30. Also, a "GIF" coming from Telegram is usually already an MP4 animation, so the extension means nothing.
10-bit HDR. Without format=yuv420p, some uploads got accepted and then rendered as green garbage on certain Android clients. Silent success turned out to be worse than silent failure.
Sub-second clips. Telegram accepts a 0.4 second avatar but the loop looks broken. The bot warns instead of refusing, people's call.
Next on the list is a trim picker, choosing which 10 seconds of a longer video to keep. Right now the bot takes the first 10, which is the wrong 10 about half the time.
Built by me: @liveavabot. Free to use, no signup, and the core of the ffmpeg pipeline is basically the two commands above.
Top comments (0)