My iPhone clip uploaded fine. Telegram showed the spinner, then quietly kept my old photo. No error, no toast, nothing in the logs. I re-recorded the clip twice before realizing the upload wasn't failing, it was being silently discarded.
The reason: iPhones have recorded HEVC (H.265) by default since iOS 11, and Telegram's video avatar pipeline only accepts H.264. Wrong codec means the client drops your file without a word. I lost an evening to this, then built a bot so I'd never have to debug it again.
What Telegram actually wants
The video avatar requirements aren't written down in one place. I pieced them together from the limits page, the Bot API docs, and a pile of trial uploads:
- Codec: H.264. HEVC gets silently rejected.
- Pixel format: yuv420p. iPhone HDR clips are 10-bit (yuv420p10le) and fail too, even after transcoding to H.264, if you forget this.
- Resolution: square, 800x800 is the sweet spot.
- Duration: 10 seconds max.
- Audio: must be removed entirely. Not muted, removed.
- Size: keep it under 2MB or processing gets flaky.
Miss any one of these and you get the same symptom: spinner, then nothing. The pixel format one is nasty because ffprobe shows you "h264" and you think you're done.
The ffmpeg pipeline
Three problems to solve: find the interesting part of the frame, crop it square, and transcode to spec.
For the crop I run cropdetect on the first couple of seconds:
ffmpeg -i input.mov -t 2 -vf "cropdetect=24:16:0" -f null - 2>&1 \
| grep -o "crop=[0-9:]*" | tail -1
That spits out something like crop=1080:1080:0:420. Then the real encode:
ffmpeg -y -i input.mov -t 10 \
-vf "crop=1080:1080:0:420,scale=800:800,format=yuv420p" \
-c:v libx264 -profile:v main -preset slow -crf 26 \
-an -movflags +faststart \
output.mp4
What each piece does:
-
-t 10trims to the 10 second cap. -
cropsquares the frame using the cropdetect result. For portrait video I bias the crop toward the top third, because center-cropping a talking head cuts the forehead off. -
scale=800:800runs after the crop, so nothing gets upscaled more than needed. -
format=yuv420pforces 8-bit 4:2:0. This is the line that fixes iPhone HDR clips. -
-anstrips the audio track. -
-movflags +faststartmoves the moov atom to the front of the file.
The 2MB budget over 10 seconds works out to roughly 1.6 Mbps. CRF 26 lands under that for most clips. When it doesn't, I re-encode in a loop, bumping CRF by 2 until the file fits. Three passes covers everything I've seen in practice.
Wiring it into aiogram 3
The bot side is small. Accept a video, GIF, or document, download it, run the pipeline, send the result back:
import asyncio
import tempfile
from pathlib import Path
from aiogram import Bot, F, Router
from aiogram.types import FSInputFile, Message
router = Router()
@router.message(F.video | F.animation | F.document)
async def convert(msg: Message, bot: Bot):
media = msg.video or msg.animation or msg.document
if media.file_size and media.file_size > 20 * 1024 * 1024:
await msg.answer("That's over 20MB, send something smaller.")
return
with tempfile.TemporaryDirectory() as tmp:
src = Path(tmp) / "input"
dst = Path(tmp) / "avatar.mp4"
await bot.download(media, destination=src)
proc = await asyncio.create_subprocess_exec(
"ffmpeg", "-y", "-i", str(src), "-t", "10",
"-vf", "crop=in_h:in_h,scale=800:800,format=yuv420p",
"-c:v", "libx264", "-profile:v", "main",
"-preset", "slow", "-crf", "26",
"-an", "-movflags", "+faststart", str(dst),
)
await proc.wait()
if proc.returncode != 0 or not dst.exists():
await msg.answer("ffmpeg choked on that file, try another one.")
return
await msg.answer_video(
FSInputFile(dst),
caption="Save this file, then set it as your profile video.",
)
Notes on the real version: ffmpeg runs through create_subprocess_exec, never a shell string, so weird filenames can't inject anything. The production handler also runs the cropdetect pass first and retries at higher CRF when the output is over 2MB. I kept this snippet to the essentials.
Shipping it as @liveavabot
I packaged the whole thing as @LiveAvaBot. You send a video or GIF, it replies with a ready-to-use avatar file. It runs on a small VPS, ffmpeg does the heavy lifting, I just wrote the wrapper and the queue around it.
Current numbers, because build logs should have numbers: 136 users, 9 conversions in the last 24 hours. Not a startup, just a tool that scratches an itch I had.
Edge cases I hit
- GIFs with odd dimensions. A 481px wide GIF breaks yuv420p encoding (dimensions must be even). The scale to 800x800 handles it, but only because 800 is even. If you change the target size, keep it even.
- Rotation metadata. Some phone videos are stored sideways with a rotate tag. Newer ffmpeg applies it automatically before filters; older builds crop the wrong axis. Check your ffmpeg version if portrait clips come out rotated.
- Black bars. Screen recordings often have letterboxing, which is exactly what cropdetect is for. Without it you get an avatar that's mostly black.
-
10-bit HDR. Worth repeating:
format=yuv420pis mandatory, or Telegram rejects the file even though the codec is right.
What it doesn't do yet: no face detection for smarter portrait crops, and no way to pick which 10 seconds to keep (it always takes the start). Both are on the list.
Disclosure: built by me, @liveavabot.
The annoying part was never ffmpeg. It was reverse-engineering an unwritten spec from silent failures. If you know of other undocumented Telegram media constraints, I'd like to hear about them in the comments.
Top comments (0)