Ngawang Tenzin

Posted on Jun 29

Building an AI Video Production Workflow — Claude Cowork + ElevenLabs + ffmpeg

#ai #claude #worldcup #workflow

Nine days ago I had right-hand tendon surgery. Stuck at home with limited mobility, I decided to deep-dive into Claude's documentation—Cowork, scheduled tasks, skills.md, and how Claude integrates with user devices. I've always been a huge World Cup fan, and it hit me: what if I could automate an entire prediction channel using Claude?

Two weeks later, I've published 14 AI-generated World Cup shorts on YouTube and TikTok. This is the technical breakdown of how I did it—the stack, the integrations, the gotchas, and what I learned about building production workflows with Claude Cowork.

The Problem

Creating consistent sports analysis content requires research, scripting, voiceover recording, video editing, and uploads. All manual. All time-intensive.

The Solution

Fully automated pipeline:

Claude Opus researches fixtures and writes scripts
ElevenLabs generates voiceover (consistent voice: Arthur)
Python/Pillow creates motion-graphic stat cards
ffmpeg assembles the final video with zoom-pan effects
Claude in Chrome automates YouTube + TikTok uploads
WebSearch re-verifies the live bracket before each post

No cameras, no studio, no manual editing. Just real-time data + original graphics.

The Stack

Claude Opus — Script writing, research, stat validation. All predictions framed as opinion, every stat sourced.

ElevenLabs (standalone) — Not Higgsfield. Standalone connector bills my own account. Voice (Arthur).

ffmpeg — Local video assembly. Each of 7 stat cards (1188x2112) loops with a zoom-pan effect, crossfades in/out, then gets muxed with a speed-adjusted voiceover to fit 60 seconds exactly.

Python/Pillow — Card generation. Static pngs, motion from ffmpeg's zoompan filter.

Claude in Chrome — YouTube Studio + TikTok Studio automation. Reads *_PUBLISH.md packs, fills title/description/pinned comment, stops for my approval before publishing anything.

WebSearch — Daily bracket verification. The fixture changes during knockouts—this catches it before upload.

Key Integration Points

Why standalone ElevenLabs, not Higgsfield?

Higgsfield hit 0 credits mid-render on the first video (Brazil vs Japan). Switched to standalone ElevenLabs which bills my own account. Lesson learned early.

Why ffmpeg for video assembly?

Reproducibility. Every card is a static png. Motion comes from ffmpeg's zoom-pan filter applied uniformly. The math:

7 cards × specific durations (6.5s + 9s + 9s + 8s + 11s + 10s + 6.5s ≈ 60s)
Voiceover sped ~8% with atempo=1.08 to land just under 60s
Crossfades (0.35s in/out per card) baked in
No UI, no manual clicking—purely programmatic

The automation gate

Claude drives the YouTube/TikTok uploads, but stops before anything goes public. I review title/description/pinned comment, re-check the bracket, and give explicit approval. That human gate is critical.

The Workflow Per Video

Pick fixture
Research with verified sources (every stat cites a URL)
Draft script (130–150 words, predictions framed as opinion)
Your approval
Generate voiceover + cards
ffmpeg build (2–3 min)
Generate PUBLISH.md pack (title options, description, pinned comment, thumbnail prompt)
Your final approval
Claude uploads to both platforms
Archive into 06_published/

Total end-to-end: ~45–60 min (most time is platform compression + ingest)

The Gotchas

ElevenLabs path wrong — Always copy the mp3 to the build folder manually before running ffmpeg
ffmpeg crops too tight — Pre-render cards at 1188×2112, not 1080×1920. ffmpeg scales + pans from the larger canvas
VO timing misses 60s — Adjust the atempo filter (try 1.06 or 1.10) and test with ffprobe
Wrong account signed into Chrome — Close Chrome, sign out everywhere, sign back in to the correct Google/TikTok account. Extension remembers the profile
Bracket changed — WebSearch re-check always runs before upload. Catches it before posting

The Rules (Non-Negotiable)

No broadcast footage, highlights, or copyrighted clips — only original motion graphics + text
Every stat cites a current source (no exceptions)
Predictions framed as opinion/entertainment, never as fact
AI use disclosed on every upload (description + pinned comment)
Every script gets written approval before production
Re-check the live bracket before each post (it changes daily during knockouts)

Why This Matters

This approach decouples content creation from studio production. Recovery time becomes learning time. A tendon injury becomes an excuse to build something interesting.

The channel is live at https://www.youtube.com/channel/UCuLBTmlmLr8AHAhqwLKE6Ew with 14 Round of 32 videos posted. Each one is 100% generated, 100% transparent about AI use, and 100% sourced.

Curious about the stack? Check the channel description + pinned comments for full disclosure + links to the tools.

DEV Community