Maksims Gavrilovs

Posted on Jun 5 • Edited on Jun 12 • Originally published at dev.to

Zero to Autopilot, Part 1: I Built an AI That Runs a YouTube Channel (the landscape, and my $10 wake-up call)

#ai #machinelearning #python #video

Series: Zero to Autopilot — Building a Self-Improving AI Media Channel. Part 1 of 7. I'm an AI engineer and this is the full build log of an autonomous AI short-video channel — one that writes, renders, publishes, and decides what to make next, then grades its own homework. No face, no film crew, no me clicking "upload" at midnight.

Data status (Part 1): real-now. Everything below is code, costs, and public facts I can verify today. The juicy audience metrics from my own channel are sandbagged until Part 7, so they have time to become real instead of noise.

The two-billion-view problem

Late 2025, a channel called Bandar Apna Dost crossed ~2 billion views and an estimated $4.25M/year (~₹38 crore). Its content? Short AI clips of a monkey and a Hulk-ish dude. No dialogue. No plot. No discernible reason to exist. (techlusive, Business Standard)

Cue every dev's reaction: "...I have a GPU and zero shame, how hard can this be?"

Pretty hard, actually — because here's the part the get-rich-quick threads leave out. A few months later YouTube's "AI slop" crackdown nuked an estimated 4.7 billion views across 16 channels, ~35M subs, and nearly $10M in revenue. Among the bodies: Three Minute Wisdom, a ~1.7M-sub / ~2B-view faceless AI channel, most of its catalog vaporized. (OutlierKit, Miraflow)

So the lay of the land in mid-2026:

Faceless AI video is a real, monetizable category. Billions of views, real revenue, nobody's face required.
It's also a ban speedrun if you ship slop. The platforms are now actively rm -rf-ing low-effort content at scale.

I looked at that and saw a clean engineering problem with two non-negotiable constraints: don't make slop, and don't go broke making it. This series is me brute-forcing both.

Why "faceless" is catnip for an engineer

Faceless means narration + visuals do all the work. No on-camera talent, no lighting rig, no "can you do Tuesday?" Every input is a file that an LLM or a model can spit out. Which means the whole thing is programmable — and anything programmable can be measured, costed, and (eventually) left to run while you sleep.

The winning recipe is boringly well-documented: pick a niche, nail a 2-second hook, stay on-brand, keep people watching to the end, and build a deep library so the algorithm has something to binge-feed. Notice what's not on that list: a human, per video. That's a system, not a craft.

The channels getting deleted skipped the system and cranked the volume knob to 11. The survivors — and the non-AI GOATs like Kurzgesagt and CrashCourse — win on structure, pacing, and actually having a point. My bet: an engineer can clear that quality bar and the volume bar if each video is cheap enough to run hundreds of experiments, with a learning loop deciding which ones to rerun.

Exhibit A: my first video quietly ate $10

Here's video #1, live on the channel — Lobachevsky, the guy who broke geometry:

🎬 The heretic who broke geometry → youtube.com/shorts/gaR76MiAK0U

I did the rookie thing: reached for AI image-to-video on every single scene, because that's what the shiny demos show. It looked great. Then I checked the bill.

Ten dollars. One Short.

The villain is one line of arithmetic — hosted AI video is priced per second, not per clip:

# studio/providers/video.py — real per-second prices (verified on fal.ai, June 2026)
FAL_MODELS = {
    "kling":    {"per_s": 0.07},   # 150s Short ≈ $10.50   <-- oof
    "ltx":      {"per_s": 0.04},   # cheapest hosted i2v
    "seedance": {"per_s": 0.30},   # 150s ≈ $45 (lol no)
    "hailuo":   {"per_s": 0.045},
    "wan":      {"per_s": 0.16},
}

150 seconds × $0.07 = $10.50, no matter how you slice the clips. Now do the napkin math on a content strategy: at ~$10/video, a hundred experiments is a thousand bucks, and you cannot run a "post a lot and learn" loop you can't afford to repeat. The economics were quietly DOA.

Plot twist: I'd solved this before, in a past life

Before AI ate my career, I shipped indie games. And indie game dev is a master class in faking expensive things for free, because you've got a $0 art budget and a build due Saturday. You don't buy motion — you engineer the feeling of motion: parallax scrolling layers, drifting backgrounds, snappy cuts, a little camera push. Cheap tricks, real game-feel.

Same energy, new domain. Why pay $10.50 for AI video when I can take one still image and add:

drift / Ken-Burns — slow pan + zoom, the still breathes;
parallax — split the frame into depth planes and slide them at different speeds (the background literally drifts behind a static subject);
cuts & transitions — rhythm beats AI motion for retention anyway.

All in ffmpeg. All free. That's the entire Part 3 of this series, and it's where most of the $10 goes to die. Spoiler: it does not look like slop —

(These stills don't move on the page — but every free effect is playing live in the effects gallery. Drift, parallax, rain, embers, glitch, all $0. Part 3 dissects how.)

Exhibit B: the six-cent video

Killing AI video was step one. Step two was realizing Nano Banana isn't always the move. For a goofy "why do cats have fur" Short, I didn't need photoreal noir — I needed clean flat cartoon. Enter Flux Schnell at $0.003 per megapixel, roughly half a cent an image:

Here's that one, live:

🎬 Why do cats have fur? → youtube.com/shorts/FWtEJjeK_vI

And the receipts, straight from its manifest:

Stage	Provider	Cost
Script	local LLM	$0.00
Visuals (10 images)	`fal-flux-schnell`	$0.054
Motion (all scenes)	Ken-Burns (ffmpeg)	$0.00
Voice	`edge-tts` (neural)	$0.00
Sound FX + music	`fal-elevenlabs-sfx` + local bed	$0.0076
Save + Publish	ffmpeg / YouTube API	$0.00
TOTAL		≈ $0.06

From $10.50 → six cents. Same pipeline, different knobs. That's a ~175× cost cut, and it's the difference between "fun demo" and "I can run hundreds of these and let a bandit pick the winners." (Full cost teardown: Part 4.)

That $0.0076 line is quietly important, too: it's an AI sound layer — generated SFX plus a music bed ducked under the narration — and atmosphere is a big reason cheap doesn't read as slop. The how is in Part 3.

The gap I'm actually building into

After mapping the field, two things were suspiciously absent from every faceless-AI playbook:

Cost honesty. Everyone screenshots the $4M. Nobody publishes a per-second price table or admits their first video cost $10. So they never explain how to afford video #100.
Autonomy. "Just post consistently for 6 months" — cool, that's a full-time job done by hand. Nobody treats what to make next as a decision a system can learn: explore vs. exploit, a memory of what won, a verdict on every bet.

That's the thesis. Over the next six parts I'll build a channel that:

turns a one-line idea into a finished, well-directed vertical Short (Part 2),
moves nearly all motion off paid AI video onto free custom effects (Part 3),
drives cost per video from ~$10 toward pennies (Part 4),
remembers what worked via a per-channel journal + self-reflection (Part 5),
decides what to make next with a Thompson-sampling bandit over a falsifiable hypothesis (Part 6),
and runs itself on a schedule, grading each post 48–72h later (Part 7).

The learning loop is already showing its teeth. A batch of near-identical clips dumped in the same minute cannibalized itself (3–6 views each — brutal). Meanwhile one video — a real mathematician framed as a heretic, with a "this breaks reality" hook in the first two seconds — hit roughly 50× the channel's other Shorts. The rest of this series is the machine I'm building so that's a repeatable pattern, not a lucky roll.

It's all open source — and it's a live experiment

The whole studio is on GitHub — slope-studio (one letter from "slop", which, given the genre, is either a typo or a mission statement). Every line of code in this series lives there: the 7-stage pipeline, the free ffmpeg effects, the cost model, the bandit. Part 2 is the guided tour, with a one-command smoke test you can run with zero API keys.

And this isn't a retrospective with the numbers airbrushed in — it's a live experiment you can watch compound or faceplant in public. Every Short the system ships asks viewers to subscribe, because the whole point is watching an autonomous channel grow from zero. Consider it subscribing to the test harness.

What I'd tell another AI engineer

Takeaway: Treat content as a pipeline, not a craft. The instant every input — script, image, motion, voice, sound — is a function call with a measured cost, three superpowers unlock: you can drive unit cost toward zero, run hundreds of cheap experiments, and bolt a learning loop on top that decides which experiments to repeat. The folks making millions optimized the system and the volume. The folks getting deleted only had volume. The alpha is the system.

Next — Part 2: Idea → Published in 7 Stages. The actual architecture: every stage as an independent CLI subcommand, the runs/<id>/ artifact flow, a manifest that records measured cost per stage, and how a single line of text becomes an uploaded Short without me touching a video editor.

▶ Live effects gallery: dasein108.github.io/slope-studio
⭐ Star the repo: github.com/dasein108/slope-studio
🔔 Subscribe (watch the experiment from zero): the Lobachevsky Short

Sources: techlusive · Business Standard · OutlierKit (AI-slop crackdown) · Miraflow (faceless explosion 2026). View/revenue figures are third-party estimates.