DEV Community

Cover image for Building an AI Video Production Workflow — Claude Cowork + ElevenLabs + ffmpeg
Ngawang Tenzin
Ngawang Tenzin

Posted on

Building an AI Video Production Workflow — Claude Cowork + ElevenLabs + ffmpeg

Nine days ago I had right-hand tendon surgery. Stuck at home with limited mobility, I decided to deep-dive into Claude's documentation—Cowork, scheduled tasks, skills.md, and how Claude integrates with user devices. I've always been a huge World Cup fan, and it hit me: what if I could automate an entire prediction channel using Claude?

Two weeks later, I've published 14 AI-generated World Cup shorts on YouTube and TikTok. This is the technical breakdown of how I did it—the stack, the integrations, the gotchas, and what I learned about building production workflows with Claude Cowork.

The Problem

Creating consistent sports analysis content requires research, scripting, voiceover recording, video editing, and uploads. All manual. All time-intensive.

The Solution

Fully automated pipeline:

  1. Claude Opus researches fixtures and writes scripts
  2. ElevenLabs generates voiceover (consistent voice: Arthur)
  3. Python/Pillow creates motion-graphic stat cards
  4. ffmpeg assembles the final video with zoom-pan effects
  5. Claude in Chrome automates YouTube + TikTok uploads
  6. WebSearch re-verifies the live bracket before each post

No cameras, no studio, no manual editing. Just real-time data + original graphics.

The Stack

Claude Opus — Script writing, research, stat validation. All predictions framed as opinion, every stat sourced.

ElevenLabs (standalone) — Not Higgsfield. Standalone connector bills my own account. Voice (Arthur).

ffmpeg — Local video assembly. Each of 7 stat cards (1188x2112) loops with a zoom-pan effect, crossfades in/out, then gets muxed with a speed-adjusted voiceover to fit 60 seconds exactly.

Python/Pillow — Card generation. Static pngs, motion from ffmpeg's zoompan filter.

Claude in Chrome — YouTube Studio + TikTok Studio automation. Reads *_PUBLISH.md packs, fills title/description/pinned comment, stops for my approval before publishing anything.

WebSearch — Daily bracket verification. The fixture changes during knockouts—this catches it before upload.

Key Integration Points

Why standalone ElevenLabs, not Higgsfield?

Higgsfield hit 0 credits mid-render on the first video (Brazil vs Japan). Switched to standalone ElevenLabs which bills my own account. Lesson learned early.

Why ffmpeg for video assembly?

Reproducibility. Every card is a static png. Motion comes from ffmpeg's zoom-pan filter applied uniformly. The math:

  • 7 cards × specific durations (6.5s + 9s + 9s + 8s + 11s + 10s + 6.5s ≈ 60s)
  • Voiceover sped ~8% with atempo=1.08 to land just under 60s
  • Crossfades (0.35s in/out per card) baked in
  • No UI, no manual clicking—purely programmatic

The automation gate

Claude drives the YouTube/TikTok uploads, but stops before anything goes public. I review title/description/pinned comment, re-check the bracket, and give explicit approval. That human gate is critical.

The Workflow Per Video

  1. Pick fixture
  2. Research with verified sources (every stat cites a URL)
  3. Draft script (130–150 words, predictions framed as opinion)
  4. Your approval
  5. Generate voiceover + cards
  6. ffmpeg build (2–3 min)
  7. Generate PUBLISH.md pack (title options, description, pinned comment, thumbnail prompt)
  8. Your final approval
  9. Claude uploads to both platforms
  10. Archive into 06_published/

Total end-to-end: ~45–60 min (most time is platform compression + ingest)

The Gotchas

  1. ElevenLabs path wrong — Always copy the mp3 to the build folder manually before running ffmpeg
  2. ffmpeg crops too tight — Pre-render cards at 1188×2112, not 1080×1920. ffmpeg scales + pans from the larger canvas
  3. VO timing misses 60s — Adjust the atempo filter (try 1.06 or 1.10) and test with ffprobe
  4. Wrong account signed into Chrome — Close Chrome, sign out everywhere, sign back in to the correct Google/TikTok account. Extension remembers the profile
  5. Bracket changed — WebSearch re-check always runs before upload. Catches it before posting

The Rules (Non-Negotiable)

  1. No broadcast footage, highlights, or copyrighted clips — only original motion graphics + text
  2. Every stat cites a current source (no exceptions)
  3. Predictions framed as opinion/entertainment, never as fact
  4. AI use disclosed on every upload (description + pinned comment)
  5. Every script gets written approval before production
  6. Re-check the live bracket before each post (it changes daily during knockouts)

Why This Matters

This approach decouples content creation from studio production. Recovery time becomes learning time. A tendon injury becomes an excuse to build something interesting.

The channel is live at https://www.youtube.com/channel/UCuLBTmlmLr8AHAhqwLKE6Ew with 14 Round of 32 videos posted. Each one is 100% generated, 100% transparent about AI use, and 100% sourced.

Curious about the stack? Check the channel description + pinned comments for full disclosure + links to the tools.

Top comments (0)