DEV Community

Alex Shev
Alex Shev

Posted on

How I Made AI Video Uploads Boring with a Terminal Skill

AI video demos are getting easier to generate.

Publishing them is still weirdly fragile.

Last week I had a simple task: take an AI-generated demo video, attach it to a post, and publish it without the platform silently dropping the media.

The video looked fine locally.

The upload did not.

Wrong codec. Too large. Slow processing. Missing preview. No clear error. The browser said the upload was okay, but the composer did not actually register the file.

That is the kind of problem I do not want to solve from memory at 11 PM.

So I turned it into a Terminal Skill.

Not a giant automation platform. Not a magic agent prompt. Just a small repeatable workflow that takes a messy video file and produces an X-safe default MP4 with checks before and after the conversion.

Here is the use case.


The actual problem

AI video tools often export files that are technically valid but awkward for social platforms.

Common issues:

  • file is too large
  • codec is accepted by QuickTime but not by the platform
  • pixel format is not yuv420p
  • metadata is not optimized for web playback
  • resolution is higher than needed
  • audio track is missing or weird
  • duration is fine visually but platform processing hangs

You can fix most of this with FFmpeg.

But the problem is not knowing that FFmpeg exists.

The problem is remembering the exact flags, running them consistently, checking the output, and not trusting the browser upload until the platform shows a real preview.

That is where a Terminal Skill helps.


What I mean by a Terminal Skill

For this workflow, a Terminal Skill is a small folder with:

  • one script that does the conversion
  • one validation step before conversion
  • one validation step after conversion
  • a short SKILL.md explaining when to use it
  • predictable input and output paths
  • logs that make it obvious what happened

The important part is not the script itself.

The important part is that the workflow becomes reusable by a human or an agent without rediscovering the rules every time.

The skill answers:

When should I use this?
What input does it expect?
What output should it produce?
How do I know it worked?
When should I stop instead of publishing?
Enter fullscreen mode Exit fullscreen mode

That last question matters a lot.

For external platforms, "the command succeeded" is not the same as "the post is safe to publish."


The folder structure

I kept the structure boring:

x-safe-video/
  SKILL.md
  make-x-safe.sh
  examples/
  output/
Enter fullscreen mode Exit fullscreen mode

The script does the mechanical work.

The SKILL.md documents the operating rules:

# X-Safe Video

Use this when preparing generated video for X/Twitter upload.

Input:
- MP4, MOV, or WebM
- preferably under 2 minutes

Output:
- MP4
- H.264 video
- AAC audio if audio exists
- yuv420p pixel format
- faststart metadata
- scaled down if needed

Stop if:
- ffprobe cannot read the file
- output has no video stream
- output is larger than the target platform limit
- upload composer does not show a real video preview
- platform does not show Uploaded 100%
Enter fullscreen mode Exit fullscreen mode

That is the difference between a script and a skill.

A script says:

Run this command.
Enter fullscreen mode Exit fullscreen mode

A skill says:

Here is the workflow, the boundary, and the definition of done.
Enter fullscreen mode Exit fullscreen mode

The conversion script

Here is the simplified version.

#!/usr/bin/env bash
set -euo pipefail

INPUT="${1:?Usage: ./make-x-safe.sh input-video}"
OUTDIR="${OUTDIR:-./output}"
mkdir -p "$OUTDIR"

if [[ ! -f "$INPUT" ]]; then
  echo "Input file not found: $INPUT" >&2
  exit 1
fi

BASENAME="$(basename "$INPUT")"
NAME="${BASENAME%.*}"
OUTPUT="$OUTDIR/${NAME}_x_safe.mp4"

echo "Inspecting input..."
ffprobe -v error \
  -select_streams v:0 \
  -show_entries stream=codec_name,width,height,pix_fmt,duration \
  -of default=noprint_wrappers=1 \
  "$INPUT"

echo "Converting to X-safe MP4..."
ffmpeg -y -i "$INPUT" \
  -vf "scale='min(1280,iw)':-2" \
  -c:v libx264 \
  -profile:v high \
  -pix_fmt yuv420p \
  -preset medium \
  -crf 23 \
  -movflags +faststart \
  -c:a aac \
  -b:a 128k \
  "$OUTPUT"

echo "Inspecting output..."
ffprobe -v error \
  -select_streams v:0 \
  -show_entries stream=codec_name,width,height,pix_fmt,duration \
  -of default=noprint_wrappers=1 \
  "$OUTPUT"

BYTES=$(wc -c < "$OUTPUT")
MB=$((BYTES / 1024 / 1024))

echo "Output: $OUTPUT"
echo "Size: ${MB}MB"
Enter fullscreen mode Exit fullscreen mode

This is not the most advanced FFmpeg command in the world.

That is the point.

The goal is not to make the cleverest media pipeline. The goal is to make a reliable default that works under pressure.


Why these flags matter

The important pieces:

-c:v libx264
Enter fullscreen mode Exit fullscreen mode

H.264 is still the safest default for social uploads.

-pix_fmt yuv420p
Enter fullscreen mode Exit fullscreen mode

This avoids the classic "works locally, fails elsewhere" problem with pixel formats.

-movflags +faststart
Enter fullscreen mode Exit fullscreen mode

This moves metadata to the beginning of the file so web playback can start faster.

-vf "scale='min(1280,iw)':-2"
Enter fullscreen mode Exit fullscreen mode

This keeps smaller videos unchanged and scales oversized ones down to a practical width.

-crf 23
Enter fullscreen mode Exit fullscreen mode

Good enough quality without creating a monster file.

Could I tune this per video? Yes.

Do I want to think about that every time I need to publish a 20-second AI demo? No.


The verification step is the real skill

The conversion is only half the workflow.

The platform check matters more.

My rule now:

Do not trust "upload succeeded."
Trust only the composer state.
Enter fullscreen mode Exit fullscreen mode

For an AI video post, I want to see:

  • video preview visible
  • upload progress completed
  • platform disclosure selected if needed
  • post button enabled
  • final published post shows the video

If the browser automation says the file was uploaded but the composer does not show the video, the workflow stops.

That sounds obvious, but this is where a lot of automation breaks.

It checks the API call or file input state, not the actual user-facing publishing state.

The skill's job is to keep that distinction explicit.


Making it agent-friendly

The next step is making the workflow easy for an AI agent to use.

That means the skill needs plain instructions:

## Agent Instructions

1. Run `./make-x-safe.sh <video>`.
2. Read the output path from stdout.
3. Confirm `ffprobe` shows:
   - codec_name=h264
   - pix_fmt=yuv420p
4. Check file size.
5. Upload through the real platform composer.
6. Verify visual preview before posting.
7. After posting, open the final URL and verify the video is embedded.
Enter fullscreen mode Exit fullscreen mode

Notice what is missing:

No vague "make it work."

No "post when ready."

No giant prompt with 40 edge cases.

The skill gives the agent a small operating procedure with a clear stop condition.

That is much easier to trust.


What changed

Before the skill:

Generate video.
Try upload.
Watch it fail.
Search old commands.
Re-encode.
Try again.
Hope the preview appears.
Enter fullscreen mode Exit fullscreen mode

After the skill:

Generate video.
Run one command.
Check output.
Upload.
Verify preview.
Publish.
Verify final post.
Enter fullscreen mode Exit fullscreen mode

The workflow is not glamorous.

It is better than glamorous: it is boring.

And boring is what I want from production media prep.


The bigger lesson

This is why I keep coming back to Terminal Skills.

A useful AI workflow is usually not one huge autonomous agent.

It is a set of small, documented, reusable capabilities:

  • prepare a video for upload
  • inspect a repo
  • generate thumbnails
  • validate a draft
  • resize images
  • check links
  • publish only after approval

Each skill removes one fragile piece of manual memory.

Each skill gives the agent a narrower job.

Each skill creates a cleaner definition of done.

That is the part I think a lot of AI tooling conversations miss.

The future is not just "agents can use tools."

The useful version is:

Agents can use well-defined skills with clear boundaries.
Enter fullscreen mode Exit fullscreen mode

That is how the work becomes repeatable.

And for AI video uploads, repeatable beats clever every time.


If you build with AI agents, what is the workflow you keep fixing manually?

That is probably your next Terminal Skill.

Disclosure: I used AI assistance to draft and edit this article, then reviewed the workflow, commands, and claims before publishing.

Top comments (0)