Three weeks ago, I had zero videos on YouTube. Today I have 29 — including 4 long-form videos over 7 minutes each. Total budget: $0. Total manual editing: zero.
Here's exactly how the pipeline works, what broke along the way, and the brutal truth about what happened to the views.
The Pipeline
Every video goes through the same automated pipeline:
Script → TTS → Subtitle Sync → B-roll/Infographics → Build → Upload
Each step is a Python script. The whole thing runs on a Mac Mini with 64GB unified memory.
Step 1: Script Generation
The AI agent writes scripts following a proven formula I discovered through A/B testing:
- Hook: Contrarian statement ("Don't save in a savings account")
- Korean financial context: Specific to Korea's jeonse system, mandatory insurance culture, salary structures
- Concrete numbers: Always include specific amounts (₩270,000/month, not "a lot")
- Personal tone: Written as if sharing with a friend, not lecturing
The winning formula? Counter-intuitive claim + Korea-specific context + real numbers. My best performer (1,635 views) was "Don't Follow the 50/30/20 Rule" — because in Korea, housing costs alone eat 50%+ of most salaries.
Step 2: Text-to-Speech
I use Edge TTS (Microsoft's free API) with specific rate tuning:
# Korean: +45% speed (natural conversation pace)
# English: +15% speed
edge_tts.Communicate(text, voice="ko-KR-SunHiNeural", rate="+45%")
Getting the speed right took multiple iterations. +95% sounded like an auctioneer. -30% sounded like a meditation app. +45% hits the sweet spot for Korean financial content.
Step 3: Subtitle Sync (The Hardest Part)
This is where things got interesting. I have a silence compression script that removes dead air from TTS output. Great for pacing — terrible for subtitle timing.
The subtitles would drift up to 7 seconds out of sync after silence removal.
The fix: subtitle_sync.py — a custom script that uses MLX Whisper (running locally on Apple Silicon) to generate word-level timestamps, then realigns every subtitle line:
# After silence compression, ALWAYS run:
python3 subtitle_sync.py fix voice.mp3 subs.ass
# Before upload, ALWAYS verify:
python3 subtitle_sync.py check voice.mp3 subs.ass
The Whisper model (mlx-community/whisper-large-v3-turbo) runs entirely on-device. No API costs, no latency.
Step 4: Visual Assets
For Shorts (< 60s): dark gradient background + ASS subtitles burned in (FontSize 78, bold, outline 4 — optimized for mobile).
For Long-form (7-10 min): 4 custom infographics (matplotlib) + 20 B-roll clips + automated timeline assembly via build_video.py.
Step 5: Upload
YouTube Data API v3 handles uploads. Metadata (title, description, tags, playlist) is pre-generated in JSON. OAuth token refreshes automatically.
📦 Building something similar? The $0 Developer Playbook covers automation patterns, rate limiting, and self-healing architectures.
The Results: Honest Numbers
First 8 days (Mar 6-12): 🚀
- 8 videos, 6,194 total views
- Best video: 1,635 views ("Don't Follow 50/30/20")
- 7 subscribers
Days 9-23 (Mar 13-26): 📉
- 15 more videos (including 4 long-form 7+ min)
- Views per day: 0-1
- New subscribers: 0
The Algorithm Cliff
My theory:
- 8 videos in 8 days from a new channel = spam pattern
- One bad video (13 views on a saturated topic) killed momentum
- New channel test period ended on Day 8
What I Learned
✅ "Don't do X" + Korea-specific context = 1,635 views
✅ Automated pipeline = consistent quality at scale
❌ Daily uploads = algorithm punishment
❌ Oversaturated topics = dead on arrival
❌ More videos ≠ more views
The real lesson: Automation solves production. It doesn't solve distribution.
The Tech Stack (All Free)
| Component | Tool |
|---|---|
| Script writing | AI agent |
| TTS | Edge TTS (Microsoft) |
| Subtitle sync | MLX Whisper (local) |
| Video build | FFmpeg + Python |
| Infographics | Matplotlib |
| Upload | YouTube Data API v3 |
See the Videos
🎬 Best performer (1,635 views): Don't Follow the 50/30/20 Rule
🎬 First long-form (7m20s): Complete Guide to Splitting Bank Accounts
🎬 Latest long-form (8m23s): The Credit Card Points Trap
📺 Channel: MaxMini Dev
The pipeline works. The content quality is there. Now it's a distribution problem — and that's a much harder automation challenge.
Building in public — from first video to (hopefully) monetization:
Top comments (0)