DEV Community

liveavabot
liveavabot

Posted on

Converting iPhone HEVC Video to a Telegram Video Avatar With FFmpeg

The Problem: iPhone Videos Silently Fail as Telegram Avatars

You record a clip on your iPhone, open Telegram, tap your profile picture, pick "Set Video", select the clip. Telegram chews on it for a second, then nothing happens. No error. No upload. Just silence.

I hit this myself last winter. The clip looked fine in the Photos app. Played fine on the Mac. But Telegram refused to accept it as a video avatar. After a lot of poking, the answer turned out to be boring: iPhone records video in HEVC (H.265) inside a .mov container, and Telegram's video-avatar slot only accepts H.264 in a specific shape. The official iOS Telegram client transcodes for you. The desktop client and most third-party uploaders do not. So the file gets rejected with no feedback.

That is the bug I built @liveavabot to fix.

What Telegram Actually Wants From a Video Avatar

The Telegram video-avatar slot (setProfilePhoto with video) is fussy. After reading the API docs and breaking things several times, here is what actually works:

  • Codec: H.264 (libx264), not HEVC.
  • Container: .mp4 with +faststart so the moov atom is at the front.
  • Pixel format: yuv420p. Anything else and clients downstream get unhappy.
  • Resolution: square, ideally 800x800. 640x640 works too but 800 is the sweet spot.
  • Duration: 10 seconds maximum, hard cap.
  • Audio: stripped. Avatars are silent.
  • File size: under ~2 MB for clean upload on slow connections.

If any one of these is off, Telegram rejects the upload silently or shows a generic error. The HEVC issue is the most common because HEVC has been the iPhone default since iOS 11.

Using FFmpeg cropdetect to Find the Square

Phone videos are 9:16 or 16:9. A naive scale=800:800 stretches the image into a squashed mess. I needed a smart crop that finds the meaningful part of each frame.

FFmpeg has cropdetect for exactly this. You run it in analysis mode first, parse the output for the suggested crop box, then use that crop in the real encode. Two passes, but cheap on small clips.

ffmpeg -i input.mov -vf "cropdetect=24:16:0" -f null - 2>&1 \
  | grep -oE 'crop=[0-9:]+' | tail -n 1
Enter fullscreen mode Exit fullscreen mode

That prints something like crop=720:720:0:480. Then you apply it, scale to 800x800, and encode:

ffmpeg -i input.mov \
  -vf "crop=720:720:0:480,scale=800:800,format=yuv420p" \
  -c:v libx264 -preset veryfast -crf 28 \
  -movflags +faststart \
  -t 10 -an \
  output.mp4
Enter fullscreen mode Exit fullscreen mode

The flags worth knowing: -t 10 caps duration, -an drops audio, -movflags +faststart puts the moov atom at the file start so Telegram can begin upload validation without reading the whole file. crf 28 is aggressive but keeps most clips under 2 MB. Bump to 30 if you need smaller.

Wrapping FFmpeg in Python

I shell out to ffmpeg with asyncio.create_subprocess_exec. No PyAV, no MoviePy. Both add weight and neither gives me anything the CLI does not.

import asyncio
import re
from pathlib import Path

CROPDETECT_RE = re.compile(r"crop=(\d+:\d+:\d+:\d+)")

async def detect_crop(src: Path) -> str:
    proc = await asyncio.create_subprocess_exec(
        "ffmpeg", "-i", str(src),
        "-vf", "cropdetect=24:16:0",
        "-f", "null", "-",
        stdout=asyncio.subprocess.DEVNULL,
        stderr=asyncio.subprocess.PIPE,
    )
    _, stderr = await proc.communicate()
    matches = CROPDETECT_RE.findall(stderr.decode("utf-8", "ignore"))
    return matches[-1] if matches else "in_w:in_h:0:0"

async def to_avatar(src: Path, dst: Path) -> None:
    crop = await detect_crop(src)
    vf = f"crop={crop},scale=800:800,format=yuv420p"
    proc = await asyncio.create_subprocess_exec(
        "ffmpeg", "-y", "-i", str(src),
        "-vf", vf,
        "-c:v", "libx264", "-preset", "veryfast", "-crf", "28",
        "-movflags", "+faststart",
        "-t", "10", "-an",
        str(dst),
    )
    await proc.wait()
Enter fullscreen mode Exit fullscreen mode

Two passes, two subprocesses. Total runtime on a 5 second iPhone HEVC clip: about 1.2 seconds on a cheap VPS. Good enough.

The aiogram 3 Handler

The bot side is short. aiogram 3 makes this almost boring, which is the whole point.

import tempfile, uuid
from pathlib import Path
from aiogram import Router, F
from aiogram.types import Message, FSInputFile

router = Router()

@router.message(F.video | F.animation | F.document)
async def handle_video(message: Message) -> None:
    file = message.video or message.animation or message.document
    if not file:
        return

    work = Path(tempfile.mkdtemp())
    src = work / f"in_{uuid.uuid4().hex}.bin"
    dst = work / f"out_{uuid.uuid4().hex}.mp4"

    tg_file = await message.bot.get_file(file.file_id)
    await message.bot.download_file(tg_file.file_path, destination=src)

    await to_avatar(src, dst)

    if dst.stat().st_size > 2_000_000:
        await message.answer("Output is over 2 MB. Try a shorter clip.")
        return

    await message.answer_video(
        FSInputFile(dst),
        caption="Ready. Save this, then set it as your profile video.",
    )
Enter fullscreen mode Exit fullscreen mode

The F.video | F.animation | F.document filter catches the three ways Telegram delivers video: native video, GIF (animation), or raw document upload (which is how most iPhone clips arrive from desktop). One handler, three sources.

Packaging It as @liveavabot

The whole thing lives on a $4 Hetzner VPS. Bot framework: aiogram 3. Queue: just asyncio, no Redis, no Celery. Storage: tempfile, deleted after send. Stars billing wired in through Telegram's native invoice flow, no Stripe.

You can try it here: https://t.me/LiveAvaBot?start=devto_article_20260518. Send a video, get back a square clip that actually works as your profile video.

Lessons and Edge Cases

A few things that bit me along the way.

Portrait clips often need a top-anchored crop, not center. cropdetect handles this if you let it look at enough frames. The default of 2 seconds is too short for some clips, so I bumped the analysis window to 4 seconds.

4K input is slow on a $4 VPS. I added an upfront scale=-2:1080 downscale before cropdetect when the source is huge. Quality stays fine because the output is 800x800 anyway.

Some users send .mov files renamed to .mp4. Do not trust the extension. ffmpeg sniffs the container correctly, so just let it.

If the source is shorter than 1 second, Telegram rejects the avatar as too short. I now refuse anything under 1 second on the bot side with a clear error message instead of letting ffmpeg produce a broken file.

Next on the list: HDR tone-mapping for newer iPhone clips, which currently come out flat after the H.264 conversion. ffmpeg has a zscale filter for this but it needs a custom build on Debian, which I have been putting off.

Built by me, @liveavabot. Code is messier than the snippets here but the shape is the same.

Top comments (0)