DEV Community

liveavabot
liveavabot

Posted on

Converting iPhone HEVC Videos to Telegram Avatars with FFmpeg

The Problem

I tried setting a video avatar on Telegram using a clip from my iPhone. Telegram accepted the upload, the progress bar finished, then nothing happened. No error, no toast, no avatar. The video sat in chat as a regular file.

After digging into Telegram's behavior I found the silent rejection rule: video avatars must be exactly 800x800, H.264 video, no audio track, under 10 seconds, under 2MB. iPhone videos are HEVC (H.265) by default since iOS 11, usually portrait 1080x1920, with audio, and often longer than 10s. Every single criterion fails.

The frustrating part is the silence. Telegram doesn't say "wrong codec" or "too long". The upload just doesn't apply. You think you did something wrong.

So I built a bot.

What Telegram Actually Wants

The spec (you find it in the docs for setProfilePhoto with the animation field, plus reverse-engineered details from people complaining on forums) is strict:

  • Container: MP4
  • Video codec: H.264 (libx264)
  • Pixel format: yuv420p (not yuv420p10le, not yuv422)
  • Resolution: exactly 800x800, square
  • Duration: at most 10 seconds
  • File size: at most 2MB
  • Audio: must be absent, not muted, fully removed
  • Faststart: moov atom at the start of the file

The faststart bit caught me out. ffmpeg by default puts the moov atom at the end of the file. Telegram's parser appears to give up rather than seek backwards. Adding -movflags +faststart was the single change that fixed half of my early test cases.

The FFmpeg Pipeline

I let ffmpeg do all the heavy lifting. I just wrote the wrapper. The pipeline is cropdetect, then re-encode with the exact flags Telegram needs.

# Optional first pass: detect the largest centered crop
ffmpeg -i input.mov -vf "cropdetect=24:16:0" -f null - 2>&1 \
  | grep -oP 'crop=\K[0-9:]+' | tail -1

# Main pass: crop to square, scale to 800x800, transcode, strip audio
ffmpeg -i input.mov \
  -vf "crop=in_h:in_h,scale=800:800,format=yuv420p" \
  -c:v libx264 -preset veryfast -crf 28 \
  -an \
  -t 10 \
  -movflags +faststart \
  -y output.mp4
Enter fullscreen mode Exit fullscreen mode

A few notes on the flags:

  • crop=in_h:in_h crops to a square using the input height as both dimensions. For portrait clips this gives a centered square crop without manually computing offsets.
  • scale=800:800 resizes to the exact target. No aspect-ratio preservation, we already cropped to square.
  • format=yuv420p forces 8-bit 4:2:0. iPhone HEVC is often 10-bit (yuv420p10le) which Telegram does not accept.
  • -an drops audio entirely.
  • -t 10 caps duration at 10 seconds.
  • -crf 28 is aggressive but keeps the file under 2MB for typical 10s clips. For 5s clips I drop to crf 23.
  • +faststart is the magic flag mentioned above.

For files that still exceed 2MB after the first encode I bisect on CRF (28, 30, 32, 34) until the size fits. Crude, but it works for the long tail of weird inputs.

The Aiogram 3 Handler

The bot side is small. Aiogram 3 routes any video, video note, animation, or document to one converter:

from aiogram import Router, F
from aiogram.types import Message, FSInputFile
from pathlib import Path
import asyncio, tempfile

router = Router()

@router.message(F.video | F.video_note | F.animation | F.document)
async def convert(msg: Message):
    file = msg.video or msg.video_note or msg.animation or msg.document
    if not file:
        return

    with tempfile.TemporaryDirectory() as tmp:
        src = Path(tmp) / "in.mp4"
        dst = Path(tmp) / "out.mp4"

        f = await msg.bot.get_file(file.file_id)
        await msg.bot.download_file(f.file_path, src)

        proc = await asyncio.create_subprocess_exec(
            "ffmpeg", "-i", str(src),
            "-vf", "crop=in_h:in_h,scale=800:800,format=yuv420p",
            "-c:v", "libx264", "-preset", "veryfast", "-crf", "28",
            "-an", "-t", "10", "-movflags", "+faststart",
            "-y", str(dst),
            stdout=asyncio.subprocess.DEVNULL,
            stderr=asyncio.subprocess.DEVNULL,
        )
        await asyncio.wait_for(proc.wait(), timeout=30)

        if dst.exists() and dst.stat().st_size < 2 * 1024 * 1024:
            await msg.answer_video(FSInputFile(dst))
        else:
            await msg.answer("File too big after encode, try a shorter clip.")
Enter fullscreen mode Exit fullscreen mode

asyncio.wait_for with a 30-second timeout is important. ffmpeg occasionally hangs on malformed input (especially screen recordings with odd codecs), and you don't want a stuck process per user piling up.

Shipping It as @liveavabot

I wrapped this into @LiveAvaBot. The user sends a video, the bot replies with the converted MP4, the user forwards that to "Edit profile photo" in Telegram settings. That's the entire flow. No login, no signup, no settings menu.

Practical bits I added on top:

  • Per-user queue, one ffmpeg process per chat at a time, so spammy uploads don't fork-bomb the server.
  • 30-second ffmpeg timeout (above), with a friendly error if it trips.
  • Telegram Stars payments for batch conversions, mostly experimental, not a serious revenue path yet.
  • Logs of input format so I can see what people actually throw at it (mostly iPhone HEVC, occasional Android H.264, a surprising number of screen recordings).

Current state: 84 users, slow organic growth from word of mouth.

Edge Cases and Lessons

  • HDR (HLG, Dolby Vision) from newer iPhones produces washed-out output when transcoded naively. Adding colorspace=bt709:iall=bt709:fast=1 to the filter chain fixes it for most clips.
  • Vertical videos under 10 seconds sometimes get over-cropped if the subject sits near the top of the frame. Running cropdetect first costs an extra ffmpeg pass but produces noticeably better framing on portraits.
  • GIFs route through the same pipeline. ffmpeg reads them fine. The quirk is they often have no duration set in the container, so -t 10 still applies correctly.
  • 2MB limit is the real ceiling. Codec and resolution are easy. Staying under 2MB on a 10-second 800x800 H.264 clip means crf 28 or higher, which looks acceptable for an avatar but would be ugly at full screen.
  • Screen recordings are the worst case. Variable framerate, weird color spaces, sometimes 10-bit. The bisect-on-CRF fallback exists mainly for these.

Next on the list is detecting "this video is already valid" and skipping the re-encode entirely. About 5% of uploads are already 800x800 H.264 muted (likely from other converters), and re-encoding them is wasted CPU and quality loss.

If you've got an iPhone clip that won't stick as a TG avatar, throw it at the bot and see what comes back: https://t.me/LiveAvaBot?start=devto_article_20260526

Built by me, @liveavabot.

Top comments (0)