The bug that wasted my afternoon
I tried to set a video as my Telegram profile picture. I recorded a short clip on my iPhone, opened Telegram, and picked it as my avatar. Telegram accepted the file, showed a spinner, then quietly fell back to a static frame. No error. No warning. Just a still image where a looping video should have been.
Turns out Telegram has strict rules for video avatars, and iPhone video breaks almost all of them at once. The clip was HEVC, the wrong resolution, too long, and carried an audio track. Telegram does not tell you any of this. It just gives up.
I wanted the actual feature, so I dug into what the spec really needs and wrote a converter. This post is the result.
What Telegram actually requires
A Telegram video avatar is not a normal video upload. The constraints are tight:
- Codec: H.264 (AVC). HEVC is rejected.
- Container: MP4 with the moov atom at the front (faststart).
- Resolution: square. 800x800 is the safe target.
- Duration: 10 seconds or less.
- Size: roughly 2MB or less.
- No audio track.
- Pixel format: yuv420p. Some players choke on yuv444p or 10-bit.
iPhone footage fails on codec, shape, duration, audio, and often pixel format. A modern iPhone records HEVC in 10-bit at 1080x1920 or wider. Every one of those points needs fixing before Telegram will animate it.
The silent failure is the worst part. You cannot debug a process that refuses to report what went wrong.
Fixing it with ffmpeg
ffmpeg does all the real work here. The interesting step is cropping a vertical phone video into a centered square without squishing faces.
Detecting the crop region
cropdetect scans frames and prints the largest non-black crop rectangle. I run a quick pass to read it:
ffmpeg -i input.mov -vf cropdetect=24:16:0 -frames:v 60 -f null - 2>&1 | \
grep -o 'crop=[0-9:]*' | tail -1
That prints something like crop=1080:1080:0:420. For a phone clip I usually ignore the detected offset and take a centered square of the shorter side, but cropdetect earns its keep when the video has letterboxing or black bars.
The encode pass
Here is the command that produces a Telegram-ready avatar:
ffmpeg -y -i input.mov \
-t 10 \
-an \
-vf "crop='min(iw,ih)':'min(iw,ih)',scale=800:800:flags=lanczos,format=yuv420p" \
-c:v libx264 -profile:v high -level 4.0 \
-preset slow -crf 26 \
-movflags +faststart \
-pix_fmt yuv420p \
output.mp4
Walking through the flags that matter:
-
-t 10hard-caps duration at 10 seconds. -
-androps the audio track. Telegram wants none. -
crop='min(iw,ih)':'min(iw,ih)'takes the largest centered square that fits. -
scale=800:800:flags=lanczosresizes to the target. Lanczos keeps edges sharp. -
format=yuv420pplus-pix_fmt yuv420pforce 8-bit 4:2:0, which fixes 10-bit HEVC. -
-c:v libx264re-encodes to H.264. -
-movflags +faststartmoves the moov atom to the front so playback can start immediately. -
-crf 26is the quality knob. If the file lands above 2MB, I raise CRF a few points and re-run.
One pass, and an iPhone clip becomes a file Telegram will actually loop.
Keeping it under 2MB
CRF 26 handles most 10-second clips, but a busy scene with lots of motion can still overshoot. My loop is simple: encode, check the file size, and if it is over 2MB raise CRF by 4 and try again. Two attempts cover almost everything. For stubborn clips I trim duration before touching quality, since 6 seconds at decent quality beats 10 seconds of mush.
Wiring it into a bot
Running ffmpeg by hand is fine once. I wanted to send a video to a chat and get the avatar back, so I wrapped it in an aiogram 3 handler.
from aiogram import Bot, Dispatcher, F
from aiogram.types import Message, FSInputFile
import asyncio, tempfile, os
dp = Dispatcher()
@dp.message(F.video | F.animation | F.document)
async def handle_video(message: Message, bot: Bot):
file = message.video or message.animation or message.document
if not file:
await message.answer("Send me a video or GIF.")
return
with tempfile.TemporaryDirectory() as tmp:
src = os.path.join(tmp, "in")
dst = os.path.join(tmp, "out.mp4")
await bot.download(file, destination=src)
ok = await convert_to_avatar(src, dst)
if not ok:
await message.answer("Could not convert that one. Try a shorter clip.")
return
await message.answer_video(
FSInputFile(dst),
caption="Ready. Set this as your profile video.",
)
async def convert_to_avatar(src: str, dst: str) -> bool:
proc = await asyncio.create_subprocess_exec(
"ffmpeg", "-y", "-i", src, "-t", "10", "-an",
"-vf", "crop='min(iw,ih)':'min(iw,ih)',scale=800:800:flags=lanczos,format=yuv420p",
"-c:v", "libx264", "-preset", "slow", "-crf", "26",
"-movflags", "+faststart", "-pix_fmt", "yuv420p", dst,
stdout=asyncio.subprocess.DEVNULL,
stderr=asyncio.subprocess.DEVNULL,
)
await proc.wait()
return proc.returncode == 0 and os.path.exists(dst)
The F.video | F.animation | F.document filter matters. Telegram delivers the same file as different types depending on whether the user sends a video, a GIF, or a file attachment. Catching all three saves a lot of confused users.
asyncio.create_subprocess_exec keeps the bot responsive while ffmpeg runs. A 10-second clip encodes in a couple of seconds on a small VPS, but blocking the event loop would still freeze every other chat.
Packaging it as a bot anyone can use
I cleaned this up and shipped it as @LiveAvaBot. You send a video or GIF, and it returns an 800x800 H.264 file under 2MB with audio stripped, ready to drop in as a profile video. It handles HEVC from iPhones, vertical clips, oversized files, and GIFs pulled from random websites.
Most of the credit goes to ffmpeg. I just wrote the wrapper that picks sane flags and hides the silent-failure problem behind a single message.
Edge cases and what is next
A few things I learned along the way:
- Square clips that are already H.264 still need re-encoding for the moov atom and the audio strip. Skipping the convert step because the file looks fine produced more silent failures.
- Animated WebP and some GIFs decode with a variable frame rate. Adding
-r 30to the encode smooths playback. - Very dark footage confuses cropdetect. The centered-square crop is a more reliable default than trusting detection on every clip.
- 4K HEVC is slow to decode on a small VPS. It works, it just takes longer, and that is the next thing I want to speed up.
The whole project started because Telegram would not tell me why my avatar was a still image. Now it is one message to a bot.
I built this one myself. The bot is @liveavabot. If you try it and a clip fails, I want to hear which clip.
Top comments (0)