AI video demos are getting easier to generate.
Publishing them is still weirdly fragile.
Last week I had a simple task: take an AI-generated demo video, attach it to a post, and publish it without the platform silently dropping the media.
The video looked fine locally.
The upload did not.
Wrong codec. Too large. Slow processing. Missing preview. No clear error. The browser said the upload was okay, but the composer did not actually register the file.
That is the kind of problem I do not want to solve from memory at 11 PM.
So I turned it into a Terminal Skill.
Not a giant automation platform. Not a magic agent prompt. Just a small repeatable workflow that takes a messy video file and produces an X-safe default MP4 with checks before and after the conversion.
Here is the use case.
The actual problem
AI video tools often export files that are technically valid but awkward for social platforms.
Common issues:
- file is too large
- codec is accepted by QuickTime but not by the platform
- pixel format is not
yuv420p - metadata is not optimized for web playback
- resolution is higher than needed
- audio track is missing or weird
- duration is fine visually but platform processing hangs
You can fix most of this with FFmpeg.
But the problem is not knowing that FFmpeg exists.
The problem is remembering the exact flags, running them consistently, checking the output, and not trusting the browser upload until the platform shows a real preview.
That is where a Terminal Skill helps.
What I mean by a Terminal Skill
For this workflow, a Terminal Skill is a small folder with:
- one script that does the conversion
- one validation step before conversion
- one validation step after conversion
- a short
SKILL.mdexplaining when to use it - predictable input and output paths
- logs that make it obvious what happened
The important part is not the script itself.
The important part is that the workflow becomes reusable by a human or an agent without rediscovering the rules every time.
The skill answers:
When should I use this?
What input does it expect?
What output should it produce?
How do I know it worked?
When should I stop instead of publishing?
That last question matters a lot.
For external platforms, "the command succeeded" is not the same as "the post is safe to publish."
The folder structure
I kept the structure boring:
x-safe-video/
SKILL.md
make-x-safe.sh
examples/
output/
The script does the mechanical work.
The SKILL.md documents the operating rules:
# X-Safe Video
Use this when preparing generated video for X/Twitter upload.
Input:
- MP4, MOV, or WebM
- preferably under 2 minutes
Output:
- MP4
- H.264 video
- AAC audio if audio exists
- yuv420p pixel format
- faststart metadata
- scaled down if needed
Stop if:
- ffprobe cannot read the file
- output has no video stream
- output is larger than the target platform limit
- upload composer does not show a real video preview
- platform does not show Uploaded 100%
That is the difference between a script and a skill.
A script says:
Run this command.
A skill says:
Here is the workflow, the boundary, and the definition of done.
The conversion script
Here is the simplified version.
#!/usr/bin/env bash
set -euo pipefail
INPUT="${1:?Usage: ./make-x-safe.sh input-video}"
OUTDIR="${OUTDIR:-./output}"
mkdir -p "$OUTDIR"
if [[ ! -f "$INPUT" ]]; then
echo "Input file not found: $INPUT" >&2
exit 1
fi
BASENAME="$(basename "$INPUT")"
NAME="${BASENAME%.*}"
OUTPUT="$OUTDIR/${NAME}_x_safe.mp4"
echo "Inspecting input..."
ffprobe -v error \
-select_streams v:0 \
-show_entries stream=codec_name,width,height,pix_fmt,duration \
-of default=noprint_wrappers=1 \
"$INPUT"
echo "Converting to X-safe MP4..."
ffmpeg -y -i "$INPUT" \
-vf "scale='min(1280,iw)':-2" \
-c:v libx264 \
-profile:v high \
-pix_fmt yuv420p \
-preset medium \
-crf 23 \
-movflags +faststart \
-c:a aac \
-b:a 128k \
"$OUTPUT"
echo "Inspecting output..."
ffprobe -v error \
-select_streams v:0 \
-show_entries stream=codec_name,width,height,pix_fmt,duration \
-of default=noprint_wrappers=1 \
"$OUTPUT"
BYTES=$(wc -c < "$OUTPUT")
MB=$((BYTES / 1024 / 1024))
echo "Output: $OUTPUT"
echo "Size: ${MB}MB"
This is not the most advanced FFmpeg command in the world.
That is the point.
The goal is not to make the cleverest media pipeline. The goal is to make a reliable default that works under pressure.
Why these flags matter
The important pieces:
-c:v libx264
H.264 is still the safest default for social uploads.
-pix_fmt yuv420p
This avoids the classic "works locally, fails elsewhere" problem with pixel formats.
-movflags +faststart
This moves metadata to the beginning of the file so web playback can start faster.
-vf "scale='min(1280,iw)':-2"
This keeps smaller videos unchanged and scales oversized ones down to a practical width.
-crf 23
Good enough quality without creating a monster file.
Could I tune this per video? Yes.
Do I want to think about that every time I need to publish a 20-second AI demo? No.
The verification step is the real skill
The conversion is only half the workflow.
The platform check matters more.
My rule now:
Do not trust "upload succeeded."
Trust only the composer state.
For an AI video post, I want to see:
- video preview visible
- upload progress completed
- platform disclosure selected if needed
- post button enabled
- final published post shows the video
If the browser automation says the file was uploaded but the composer does not show the video, the workflow stops.
That sounds obvious, but this is where a lot of automation breaks.
It checks the API call or file input state, not the actual user-facing publishing state.
The skill's job is to keep that distinction explicit.
Making it agent-friendly
The next step is making the workflow easy for an AI agent to use.
That means the skill needs plain instructions:
## Agent Instructions
1. Run `./make-x-safe.sh <video>`.
2. Read the output path from stdout.
3. Confirm `ffprobe` shows:
- codec_name=h264
- pix_fmt=yuv420p
4. Check file size.
5. Upload through the real platform composer.
6. Verify visual preview before posting.
7. After posting, open the final URL and verify the video is embedded.
Notice what is missing:
No vague "make it work."
No "post when ready."
No giant prompt with 40 edge cases.
The skill gives the agent a small operating procedure with a clear stop condition.
That is much easier to trust.
What changed
Before the skill:
Generate video.
Try upload.
Watch it fail.
Search old commands.
Re-encode.
Try again.
Hope the preview appears.
After the skill:
Generate video.
Run one command.
Check output.
Upload.
Verify preview.
Publish.
Verify final post.
The workflow is not glamorous.
It is better than glamorous: it is boring.
And boring is what I want from production media prep.
The bigger lesson
This is why I keep coming back to Terminal Skills.
A useful AI workflow is usually not one huge autonomous agent.
It is a set of small, documented, reusable capabilities:
- prepare a video for upload
- inspect a repo
- generate thumbnails
- validate a draft
- resize images
- check links
- publish only after approval
Each skill removes one fragile piece of manual memory.
Each skill gives the agent a narrower job.
Each skill creates a cleaner definition of done.
That is the part I think a lot of AI tooling conversations miss.
The future is not just "agents can use tools."
The useful version is:
Agents can use well-defined skills with clear boundaries.
That is how the work becomes repeatable.
And for AI video uploads, repeatable beats clever every time.
If you build with AI agents, what is the workflow you keep fixing manually?
That is probably your next Terminal Skill.
Disclosure: I used AI assistance to draft and edit this article, then reviewed the workflow, commands, and claims before publishing.
Top comments (0)