The bug that started this
I tried to set a 6-second iPhone clip as my Telegram video avatar. The app accepted the upload, showed a spinner, then silently reverted to my old photo. No error toast, no log message, nothing. Just refusal.
Turned out my iPhone records HEVC (H.265) inside an MOV container by default since iOS 11. Telegram's video avatar slot wants H.264 in MP4. The client uploads the file, the server rejects the codec, the client doesn't bother telling you why.
This is the kind of paper cut that pushed me to write @liveavabot. Send it any video or GIF, get back a 800x800 H.264 clip that Telegram actually accepts.
What Telegram's video avatar actually requires
The spec is not in the official docs in one place, you have to piece it together from API hints and trial and error. Here is what works:
- Codec: H.264 (libx264). HEVC is rejected.
- Container: MP4 with faststart.
- Resolution: Exactly 800x800. Square.
- Pixel format: yuv420p. Anything else and the server refuses.
- Duration: 10 seconds max. Longer clips get truncated or rejected.
- Size: 2 MB max. Hit this and you need to drop bitrate.
- Audio: Must be stripped. The avatar slot doesn't carry sound.
- Aspect ratio: Square. You need to crop or pad non-square sources.
Miss any of these and the Telegram client does the silent-revert thing.
The ffmpeg pipeline
I lean on ffmpeg cropdetect to find the largest centered square inside the source, then scale to 800, then encode H.264 yuv420p with audio dropped.
ffmpeg -y -i input.mov \
-t 10 \
-vf "crop='min(iw,ih)':'min(iw,ih)':(iw-min(iw,ih))/2:(ih-min(iw,ih))/2,scale=800:800:flags=lanczos,format=yuv420p" \
-c:v libx264 -profile:v main -level 4.0 -preset medium -crf 23 \
-movflags +faststart \
-an \
output.mp4
What each piece does:
-
-t 10caps duration. Cheaper than measuring first. -
crop=...centers on the smaller side. A 1920x1080 source becomes 1080x1080. -
scale=800:800:flags=lanczosresizes with a decent filter. Bilinear looks mushy at this small size. -
format=yuv420pis the chroma sub-sampling Telegram insists on. -
libx264 -profile:v main -level 4.0keeps decoder compatibility tight. -
-crf 23is a starting point. If the file is over 2 MB I bump CRF to 28 and re-encode. -
-movflags +faststartmoves the moov atom to the front so Telegram can stream-decode. -
-androps audio.
For GIFs the same pipeline works, I just add -r 25 to lock framerate and skip cropdetect for tiny squares.
Wrapping it in aiogram 3
The bot handler is short. The user sends a video, video_note, animation, or document, the bot downloads, runs ffmpeg, sends back the result.
from aiogram import Router, F
from aiogram.types import Message, FSInputFile
from pathlib import Path
import asyncio
import tempfile
router = Router()
FFMPEG_CMD = [
"ffmpeg", "-y", "-i", "{src}",
"-t", "10",
"-vf",
"crop='min(iw,ih)':'min(iw,ih)':(iw-min(iw,ih))/2:(ih-min(iw,ih))/2,"
"scale=800:800:flags=lanczos,format=yuv420p",
"-c:v", "libx264", "-profile:v", "main", "-level", "4.0",
"-preset", "medium", "-crf", "{crf}",
"-movflags", "+faststart", "-an",
"{dst}",
]
@router.message(F.video | F.animation | F.video_note | F.document)
async def to_avatar(message: Message) -> None:
file = message.video or message.animation or message.video_note or message.document
if file is None:
return
with tempfile.TemporaryDirectory() as tmp:
src = Path(tmp) / "in.bin"
dst = Path(tmp) / "out.mp4"
await message.bot.download(file, destination=src)
for crf in (23, 28, 34):
cmd = [p.format(src=src, dst=dst, crf=crf) for p in FFMPEG_CMD]
proc = await asyncio.create_subprocess_exec(*cmd,
stdout=asyncio.subprocess.DEVNULL,
stderr=asyncio.subprocess.PIPE)
_, err = await proc.communicate()
if proc.returncode != 0:
await message.answer(f"ffmpeg failed: {err.decode()[:200]}")
return
if dst.stat().st_size <= 2 * 1024 * 1024:
break
else:
await message.answer("Source too dense, can't squeeze under 2 MB.")
return
await message.answer_video_note(FSInputFile(dst))
A few notes on this snippet:
- The retry loop on CRF (23, 28, 34) is the simple way to hit the 2 MB cap without measuring bitrate up front. Three passes is overkill in practice, the first usually fits.
-
answer_video_notesends the file as a Telegram video_note (the round bubble). For the actual avatar set you useset_chat_photo, but most users just want a clip they can drop in chat. - I'm using
tempfile.TemporaryDirectory()so the source and target get cleaned up even if ffmpeg crashes. -
F.documentis in the filter because some clients send MP4s as documents instead of videos.
What I packaged as @liveavabot
The bot is the above, plus a few production niceties: progress messages while ffmpeg runs, a queue so a single user can't pin a worker, telemetry on which codecs come in (HEVC is about 60% of iPhone uploads I see), and a payment hook for users who want to remove a daily quota.
You can poke at it here: t.me/LiveAvaBot. Send any short clip, it returns a Telegram-ready 800x800 MP4 in a few seconds. No login, no signup.
Edge cases I hit
- Vertical phone videos with letterboxing baked in. cropdetect picks the actual content, but for some sources the black bars are inside the frame data. I run cropdetect first on a 2-second probe, then crop using its result. Two passes, but I get rid of the bars.
-
Very short GIFs. Anything under 1 second confuses cropdetect. I just bypass crop and pad to square with
pad=max(iw,ih):max(iw,ih):(ow-iw)/2:(oh-ih)/2:black. -
Animated stickers (.tgs / .webm). Telegram sends these as
sticker, notanimation. The webm ones work with the same pipeline once converted. The tgs ones are Lottie JSON, I render them withtgs-to-giffirst. - Files over 20 MB. Bots can't download files over 20 MB through the standard API. For now I tell the user to compress on their device first. The MTProto route via Telethon would lift that, on the roadmap.
-
Audio-only or 1x1 pixel inputs. I sniff with
ffprobefirst, reject if no video stream. Saves a confusing ffmpeg error.
What's next
The HEVC handling is solved. Bitrate-cap retry works. The two open items: faster CRF picking (could pre-estimate from source bitrate instead of three-pass retry), and 4K source downscale (currently fine since I scale to 800 anyway, but cropdetect is slow on 4K).
Built by me, @liveavabot is the wrapper, ffmpeg is doing the actual encoding. The bot is just the glue that makes it a 3-second user experience instead of a 20-minute command-line session.
Top comments (0)