<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Maksims Gavrilovs</title>
    <description>The latest articles on DEV Community by Maksims Gavrilovs (@dasein108).</description>
    <link>https://dev.to/dasein108</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2834130%2F3829c1e4-0e97-4998-b2f2-2904b1d4f698.jpg</url>
      <title>DEV Community: Maksims Gavrilovs</title>
      <link>https://dev.to/dasein108</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dasein108"/>
    <language>en</language>
    <item>
      <title>Zero to Autopilot, Part 2: One Line of Text a Published Short, in 7 Stages</title>
      <dc:creator>Maksims Gavrilovs</dc:creator>
      <pubDate>Sat, 06 Jun 2026 06:17:19 +0000</pubDate>
      <link>https://dev.to/dasein108/zero-to-autopilot-part-2-one-line-of-text-a-published-short-in-7-stages-inp</link>
      <guid>https://dev.to/dasein108/zero-to-autopilot-part-2-one-line-of-text-a-published-short-in-7-stages-inp</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Series:&lt;/strong&gt; &lt;em&gt;Zero to Autopilot — Building a Self-Improving AI Media Channel.&lt;/em&gt; Part 2 of 7. &lt;a href="https://dev.to/dasein108/zero-to-autopilot-part-1-i-built-an-ai-that-runs-a-youtube-channel-the-landscape-and-my-10-1ki6"&gt;Part 1&lt;/a&gt; covered the landscape and my $10 wake-up call. This one is the architecture: how a single line of text becomes an uploaded Short without me ever opening a video editor.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data status (Part 2): real-now.&lt;/strong&gt; Code, file layout, and measured costs straight from the repo. No audience metrics — those are sandbagged to Part 7.&lt;/p&gt;

&lt;p&gt;⭐ &lt;strong&gt;The whole thing is open source: &lt;a href="https://github.com/dasein108/slope-studio" rel="noopener noreferrer"&gt;github.com/dasein108/slope-studio&lt;/a&gt;.&lt;/strong&gt; Clone along — there's a zero-API-key smoke test at the bottom.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh6hs3s6u05bs6wep3vys.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh6hs3s6u05bs6wep3vys.png" alt="The opening frame of the Lobachevsky Short — railroad tracks vanishing into a question mark. This is what " width="768" height="1344"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The mental model: a video is a Makefile
&lt;/h2&gt;

&lt;p&gt;Most "AI video generator" tools are a single monolith — one giant button, one black box, and when scene 14 comes out cursed you get to regenerate all 14. I've shipped enough software to know that's the wrong shape.&lt;/p&gt;

&lt;p&gt;So I stole the model from build systems: &lt;strong&gt;a video is a directed pipeline of stages, each stage is a pure function from files to files, and the whole thing is idempotent.&lt;/strong&gt; Re-run a stage, it skips work that's already done. Blow away one artifact, only that stage (and its dependents) rebuild. It's &lt;code&gt;make&lt;/code&gt; with a YouTube upload at the end.&lt;/p&gt;

&lt;p&gt;Here's the pipeline, top to bottom:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; idea ──► [1 script] ──► 01_script.json        (timed scenes + narration)
            │
            ├──► [2 visuals] ──► 02_visuals/scene_NN.png
            │
            ├──► [2.5 narrate] ─► 05_voice/scenes/*.mp3 + timing.json + captions.srt
            │
            ├──► [3 clips] ────► 03_clips/scene_NN.mp4   (animate the stills)
            │
            ├──► [4 stitch] ───► 04_stitched.mp4         (transitions, no audio)
            │
            ├──► [5 voice] ────► 05_voice/final.mp4      (TTS + music muxed)
            │
            ├──► [6 save] ─────► 06_final.mp4            (platform master)
            │
            └──► [7 publish] ──► YouTube
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every arrow writes a file. Every file lives under one run directory. Which brings us to the most important design decision in the whole project.&lt;/p&gt;

&lt;h2&gt;
  
  
  Everything is a file under &lt;code&gt;runs/&amp;lt;id&amp;gt;/&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;No database. No hidden state. One run = one directory, and the directory &lt;strong&gt;is&lt;/strong&gt; the state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;runs/lobachevsky/
├── project.json          # the manifest: provider + cost + done-flag per stage
├── 01_script.json        # scenes, narration, title, hashtags
├── 02_visuals/scene_01..15.png
├── 03_clips/scene_NN.mp4
├── 04_stitched.mp4
├── 05_voice/
│   ├── scenes/*.mp3       # per-scene TTS
│   ├── timing.json        # per-scene durations (drives clip lengths)
│   ├── captions.srt
│   └── final.mp4
├── 06_final.mp4          # the master you upload
├── 06_final.json         # SEO title/description/tags
└── 07_publish.json       # the YouTube video id, once live
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This sounds almost too simple, but it buys you everything:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Debuggability&lt;/strong&gt; — something looks off? Open the PNG. Read the JSON. No "inspect the pipeline state" tooling needed; &lt;code&gt;ls&lt;/code&gt; and an image viewer are the debugger.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resumability&lt;/strong&gt; — kill the process at scene 9, restart, it picks up at scene 9.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Idempotency&lt;/strong&gt; — stages check for their own output and skip it. Re-running &lt;code&gt;visuals&lt;/code&gt; won't re-bill you for 15 images you already have (&lt;code&gt;--force&lt;/code&gt; when you actually want to regenerate).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Version control of *artifacts&lt;/strong&gt;* — every authored video in the repo is a folder you can diff, copy, or hand-edit.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Canonical paths live in exactly one place (&lt;code&gt;studio/paths.py&lt;/code&gt;), so no stage ever hardcodes a filename:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;scene_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sid&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;visuals_dir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scene_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;sid&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;02&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;master&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;06_final.mp4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Each stage is a CLI subcommand (and they chain)
&lt;/h2&gt;

&lt;p&gt;The pipeline is a &lt;a href="https://typer.tiangolo.com/" rel="noopener noreferrer"&gt;Typer&lt;/a&gt; app. Every stage is its own subcommand, so you can run the whole thing or surgically poke one stage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# the whole pipeline, one idea in, one Short out:&lt;/span&gt;
studio run &lt;span class="s2"&gt;"lobachevsky geometry explained in a fun way"&lt;/span&gt; &lt;span class="nt"&gt;--duration&lt;/span&gt; 150

&lt;span class="c"&gt;# or drive it stage by stage and inspect between steps:&lt;/span&gt;
&lt;span class="nv"&gt;RID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;studio init &lt;span class="s2"&gt;"lobachevsky..."&lt;/span&gt; &lt;span class="nt"&gt;--duration&lt;/span&gt; 150&lt;span class="si"&gt;)&lt;/span&gt;
studio script  &lt;span class="nv"&gt;$RID&lt;/span&gt;     &lt;span class="c"&gt;# → 01_script.json   (read it! confirm the narration is real)&lt;/span&gt;
studio visuals &lt;span class="nv"&gt;$RID&lt;/span&gt;     &lt;span class="c"&gt;# → 02_visuals/*.png&lt;/span&gt;
studio status  &lt;span class="nv"&gt;$RID&lt;/span&gt;     &lt;span class="c"&gt;# render the manifest: what's done, what it cost&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The stage order is one list, and &lt;code&gt;run&lt;/code&gt; just walks it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;STAGE_ORDER&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;script&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;visuals&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;narrate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;clips&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stitch&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;voice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;save&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Adding a stage = write a pure function in &lt;code&gt;stages/&lt;/code&gt;, add a subcommand, drop its name in that list. Adding a &lt;em&gt;provider&lt;/em&gt; (a new image model, a new TTS) doesn't touch the pipeline at all — more on that next.&lt;/p&gt;

&lt;h2&gt;
  
  
  The provider contract: every model reports its own cost
&lt;/h2&gt;

&lt;p&gt;Here's the design choice I'm proudest of, because it's what makes the &lt;em&gt;whole rest of the series&lt;/em&gt; possible. Every media-producing provider — every LLM, image model, video model, TTS — returns the same dataclass:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;GenResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;cost_usd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;     &lt;span class="c1"&gt;# the REAL cost, computed by the provider
&lt;/span&gt;    &lt;span class="n"&gt;latency_s&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
    &lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;note&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That &lt;code&gt;cost_usd&lt;/code&gt; is not an estimate I jotted in a spreadsheet. The Nano Banana provider returns &lt;code&gt;$0.039&lt;/code&gt;. The kling provider computes &lt;code&gt;seconds × $0.07&lt;/code&gt;. The Ken-Burns animator returns &lt;code&gt;$0.00&lt;/code&gt;. So when a stage runs, the manifest records &lt;strong&gt;measured&lt;/strong&gt; cost, not guessed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;StageRecord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;done&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
    &lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;cost_usd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Manifest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# ...
&lt;/span&gt;    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;total_cost_usd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cost_usd&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the foundation. You can't optimize what you don't measure, and you definitely can't put a &lt;em&gt;budget-aware bandit&lt;/em&gt; (Part 6) on top of costs you're guessing at. Every dollar in this series is a real dollar the system reported on itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Watching it actually run
&lt;/h2&gt;

&lt;p&gt;Here's the real log from the Lobachevsky run — note each stage announcing its provider and cost as it goes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;» visuals
visuals 15 images via fal-nanobanana  $0.585
» clips
clips 15 clips via fal-i2v  $0.75
» stitch
stitch 15 clips
» voice
voice captions=burn via edge  $0.0
» save
save runs/lobachevsky/06_final.mp4
done lobachevsky  total $1.335
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fifteen stills, fifteen animated clips, narration, captions, muxed and mastered — &lt;strong&gt;$1.34&lt;/strong&gt;, fully automated, from one line of text. (That run used a bit of paid AI video; the all-Ken-Burns version of the same Short is &lt;strong&gt;$0.585&lt;/strong&gt;, and the cheap-tier playbook from Part 1 gets a similar video to &lt;strong&gt;six cents&lt;/strong&gt;. The cost knobs are Part 4.) Here's a frame from the finished thing:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8kjax2pode3vute8zzvv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8kjax2pode3vute8zzvv.png" alt="A scene from the finished Lobachevsky Short — generated still, free motion, real narration." width="768" height="1344"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And the data shape underneath each scene — the script stage emits timed scenes the rest of the pipeline consumes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json-doc"&gt;&lt;code&gt;&lt;span class="c1"&gt;// 01_script.json (one scene)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"start_s"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"end_s"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"narration"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"What if everything you were taught about parallel lines was secretly a lie?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"visual_prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"railroad tracks vanishing toward a glowing question mark, retro poster"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"on_screen_text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"...a lie?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"motion_hint"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"slow push-in toward the vanishing point"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;narration&lt;/code&gt; drives the TTS (and therefore the clip length — audio leads, video follows, so nothing ever desyncs). &lt;code&gt;visual_prompt&lt;/code&gt; drives the image model. &lt;code&gt;motion_hint&lt;/code&gt; drives the free animator. One JSON object, three downstream stages.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it yourself (zero API keys, zero dollars)
&lt;/h2&gt;

&lt;p&gt;The repo ships an offline mode so you can watch the whole pipeline run without a single key or cent. Stub providers stand in for the paid ones; everything else is real ffmpeg:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/dasein108/slope-studio
&lt;span class="nb"&gt;cd &lt;/span&gt;slope-studio
uv venv &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate
uv pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;".[fal]"&lt;/span&gt;

&lt;span class="c"&gt;# free, offline, end-to-end smoke test:&lt;/span&gt;
studio run &lt;span class="s2"&gt;"how black holes bend time"&lt;/span&gt; &lt;span class="nt"&gt;--duration&lt;/span&gt; 12 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--script-provider&lt;/span&gt; stub &lt;span class="nt"&gt;--image-provider&lt;/span&gt; stub &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--video-provider&lt;/span&gt; kenburns &lt;span class="nt"&gt;--voice-provider&lt;/span&gt; edge
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll get a real &lt;code&gt;runs/&amp;lt;id&amp;gt;/&lt;/code&gt; folder with a stitched, narrated &lt;code&gt;06_final.mp4&lt;/code&gt; — built entirely from free local tooling. (Heads up: &lt;code&gt;stub&lt;/code&gt; is a &lt;em&gt;wiring&lt;/em&gt; generator — it emits placeholder text so you can test the plumbing. Swap in a real LLM key before you spend money on visuals, or you'll lovingly render meaningless filler. Ask me how I know.)&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd tell another AI engineer
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Takeaway:&lt;/strong&gt; Resist the monolith. Model your AI pipeline as &lt;strong&gt;stages of pure file-to-file functions over a single run directory&lt;/strong&gt;, make each one an independently runnable command, and give every provider a uniform result type that reports its own cost. You get free debuggability (&lt;code&gt;ls&lt;/code&gt; is your inspector), free resumability, free idempotency, and — crucially — a &lt;em&gt;measured&lt;/em&gt; cost ledger that everything smarter you build later (budgets, auto-strategies, bandits) gets to stand on. Boring architecture is a feature.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;strong&gt;Next — Part 3: Free Motion.&lt;/strong&gt; The fun part. AI video is $0.07/second; I'm going to take a single still image and give it real motion — drift, parallax with subject inpainting, kinetic type, atmospheric rain and embers — for &lt;strong&gt;$0.00&lt;/strong&gt;, with a deep dive into the ffmpeg filtergraphs and the indie-game-dev tricks behind them. (Spoiler: it's all already running in the &lt;strong&gt;&lt;a href="https://dasein108.github.io/slope-studio/" rel="noopener noreferrer"&gt;live effects gallery&lt;/a&gt;&lt;/strong&gt;.)&lt;/p&gt;

&lt;p&gt;▶ &lt;strong&gt;Live effects gallery:&lt;/strong&gt; &lt;a href="https://dasein108.github.io/slope-studio/" rel="noopener noreferrer"&gt;dasein108.github.io/slope-studio&lt;/a&gt;&lt;br&gt;
⭐ &lt;strong&gt;Star the repo to follow along:&lt;/strong&gt; &lt;a href="https://github.com/dasein108/slope-studio" rel="noopener noreferrer"&gt;github.com/dasein108/slope-studio&lt;/a&gt;&lt;br&gt;
🔔 &lt;strong&gt;Subscribe to the channel&lt;/strong&gt; to watch the experiment grow from zero: &lt;a href="https://www.youtube.com/shorts/gaR76MiAK0U" rel="noopener noreferrer"&gt;the Lobachevsky Short&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>architecture</category>
      <category>video</category>
    </item>
    <item>
      <title>Zero to Autopilot, Part 1: I Built an AI That Runs a YouTube Channel (the landscape, and my $10 wake-up call)</title>
      <dc:creator>Maksims Gavrilovs</dc:creator>
      <pubDate>Fri, 05 Jun 2026 14:59:45 +0000</pubDate>
      <link>https://dev.to/dasein108/zero-to-autopilot-part-1-i-built-an-ai-that-runs-a-youtube-channel-the-landscape-and-my-10-1ki6</link>
      <guid>https://dev.to/dasein108/zero-to-autopilot-part-1-i-built-an-ai-that-runs-a-youtube-channel-the-landscape-and-my-10-1ki6</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Series:&lt;/strong&gt; &lt;em&gt;Zero to Autopilot — Building a Self-Improving AI Media Channel.&lt;/em&gt; Part 1 of 7. I'm an AI engineer and this is the full build log of an autonomous AI short-video channel — one that writes, renders, publishes, &lt;em&gt;and&lt;/em&gt; decides what to make next, then grades its own homework. No face, no film crew, no me clicking "upload" at midnight.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data status (Part 1): real-now.&lt;/strong&gt; Everything below is code, costs, and public facts I can verify today. The juicy audience metrics from my own channel are sandbagged until Part 7, so they have time to become real instead of noise.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fslxzml259mt8z9wz6qkw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fslxzml259mt8z9wz6qkw.png" alt="A lone figure walking through a crowd of silhouettes — a real frame from my channel, generated for $0.039." width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The two-billion-view problem
&lt;/h2&gt;

&lt;p&gt;Late 2025, a channel called &lt;strong&gt;Bandar Apna Dost&lt;/strong&gt; crossed &lt;strong&gt;~2 billion views&lt;/strong&gt; and an estimated &lt;strong&gt;$4.25M/year (~₹38 crore)&lt;/strong&gt;. Its content? Short AI clips of a monkey and a Hulk-ish dude. No dialogue. No plot. No discernible reason to exist. (&lt;a href="https://www.techlusive.in/news/how-this-indian-ai-generated-youtube-channel-is-pulling-billions-of-views-and-millions-in-revenue-1635923/" rel="noopener noreferrer"&gt;techlusive&lt;/a&gt;, &lt;a href="https://www.business-standard.com/technology/tech-news/india-youtube-bandar-apna-dost-channel-global-ai-video-charts-slop-content-125123100396_1.html" rel="noopener noreferrer"&gt;Business Standard&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;Cue every dev's reaction: &lt;em&gt;"...I have a GPU and zero shame, how hard can this be?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Pretty hard, actually — because here's the part the get-rich-quick threads leave out. A few months later YouTube's &lt;strong&gt;"AI slop" crackdown&lt;/strong&gt; nuked an estimated &lt;strong&gt;4.7 billion views across 16 channels&lt;/strong&gt;, ~35M subs, and nearly &lt;strong&gt;$10M in revenue&lt;/strong&gt;. Among the bodies: &lt;strong&gt;Three Minute Wisdom&lt;/strong&gt;, a ~1.7M-sub / ~2B-view faceless AI channel, most of its catalog vaporized. (&lt;a href="https://outlierkit.com/resources/youtube-ai-slop-crackdown-2026/" rel="noopener noreferrer"&gt;OutlierKit&lt;/a&gt;, &lt;a href="https://miraflow.ai/blog/faceless-youtube-channel-explosion-ai-million-subscriber-creators-2026" rel="noopener noreferrer"&gt;Miraflow&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;So the lay of the land in mid-2026:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Faceless AI video is a &lt;strong&gt;real, monetizable&lt;/strong&gt; category. Billions of views, real revenue, nobody's face required.&lt;/li&gt;
&lt;li&gt;It's also a &lt;strong&gt;ban speedrun&lt;/strong&gt; if you ship slop. The platforms are now actively &lt;code&gt;rm -rf&lt;/code&gt;-ing low-effort content at scale.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I looked at that and saw a clean engineering problem with two non-negotiable constraints: &lt;strong&gt;don't make slop, and don't go broke making it.&lt;/strong&gt; This series is me brute-forcing both.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why "faceless" is catnip for an engineer
&lt;/h2&gt;

&lt;p&gt;Faceless means &lt;strong&gt;narration + visuals do all the work.&lt;/strong&gt; No on-camera talent, no lighting rig, no "can you do Tuesday?" Every input is a file that an LLM or a model can spit out. Which means the whole thing is &lt;em&gt;programmable&lt;/em&gt; — and anything programmable can be measured, costed, and (eventually) left to run while you sleep.&lt;/p&gt;

&lt;p&gt;The winning recipe is boringly well-documented: pick a niche, nail a 2-second hook, stay on-brand, keep people watching to the end, and build a deep library so the algorithm has something to binge-feed. Notice what's &lt;em&gt;not&lt;/em&gt; on that list: a human, per video. That's a &lt;strong&gt;system&lt;/strong&gt;, not a craft.&lt;/p&gt;

&lt;p&gt;The channels getting deleted skipped the system and cranked the volume knob to 11. The survivors — and the non-AI GOATs like Kurzgesagt and CrashCourse — win on structure, pacing, and actually having a point. My bet: an engineer can clear that quality bar &lt;em&gt;and&lt;/em&gt; the volume bar &lt;strong&gt;if&lt;/strong&gt; each video is cheap enough to run hundreds of experiments, with a learning loop deciding which ones to rerun.&lt;/p&gt;

&lt;h2&gt;
  
  
  Exhibit A: my first video quietly ate $10
&lt;/h2&gt;

&lt;p&gt;Here's video #1, live on the channel — Lobachevsky, the guy who broke geometry:&lt;/p&gt;

&lt;p&gt;🎬 &lt;strong&gt;&lt;a href="https://www.youtube.com/shorts/gaR76MiAK0U" rel="noopener noreferrer"&gt;The heretic who broke geometry → youtube.com/shorts/gaR76MiAK0U&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I did the rookie thing: reached for &lt;strong&gt;AI image-to-video on every single scene&lt;/strong&gt;, because that's what the shiny demos show. It looked great. Then I checked the bill.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ten dollars. One Short.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The villain is one line of arithmetic — hosted AI video is priced &lt;strong&gt;per second&lt;/strong&gt;, not per clip:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# studio/providers/video.py — real per-second prices (verified on fal.ai, June 2026)
&lt;/span&gt;&lt;span class="n"&gt;FAL_MODELS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kling&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;per_s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.07&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;   &lt;span class="c1"&gt;# 150s Short ≈ $10.50   &amp;lt;-- oof
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ltx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;      &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;per_s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.04&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;   &lt;span class="c1"&gt;# cheapest hosted i2v
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;seedance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;per_s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.30&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;   &lt;span class="c1"&gt;# 150s ≈ $45 (lol no)
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hailuo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;per_s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.045&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wan&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;      &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;per_s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.16&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;150 seconds × $0.07 = &lt;strong&gt;$10.50&lt;/strong&gt;, no matter how you slice the clips. Now do the napkin math on a content &lt;em&gt;strategy&lt;/em&gt;: at ~$10/video, a hundred experiments is a thousand bucks, and you cannot run a "post a lot and learn" loop you can't afford to repeat. The economics were quietly DOA.&lt;/p&gt;

&lt;h2&gt;
  
  
  Plot twist: I'd solved this before, in a past life
&lt;/h2&gt;

&lt;p&gt;Before AI ate my career, I shipped indie games. And indie game dev is a master class in &lt;strong&gt;faking expensive things for free&lt;/strong&gt;, because you've got a $0 art budget and a build due Saturday. You don't buy motion — you &lt;em&gt;engineer the feeling&lt;/em&gt; of motion: parallax scrolling layers, drifting backgrounds, snappy cuts, a little camera push. Cheap tricks, real game-feel.&lt;/p&gt;

&lt;p&gt;Same energy, new domain. Why pay $10.50 for AI video when I can take &lt;strong&gt;one still image&lt;/strong&gt; and add:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;drift / Ken-Burns&lt;/strong&gt; — slow pan + zoom, the still breathes;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;parallax&lt;/strong&gt; — split the frame into depth planes and slide them at different speeds (the background literally drifts behind a static subject);&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;cuts &amp;amp; transitions&lt;/strong&gt; — rhythm beats AI motion for retention anyway.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All in ffmpeg. All free. That's the entire Part 3 of this series, and it's where most of the $10 goes to die. Spoiler: it does &lt;strong&gt;not&lt;/strong&gt; look like slop —&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvhfbrm0rmlztmteu16ju.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvhfbrm0rmlztmteu16ju.png" alt="Noir woodcut village, lone figure — a single $0.039 Nano Banana still, animated for free." width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkb6sn4j9lk21cbj4cas5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkb6sn4j9lk21cbj4cas5.png" alt="A man reading in an empty train carriage — same pipeline, free motion. The art *direction* is what kills the slop vibe, not an expensive model." width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(These stills don't move on the page — but every free effect is playing live in the &lt;strong&gt;&lt;a href="https://dasein108.github.io/slope-studio/" rel="noopener noreferrer"&gt;effects gallery&lt;/a&gt;&lt;/strong&gt;. Drift, parallax, rain, embers, glitch, all $0. Part 3 dissects how.)&lt;/p&gt;

&lt;h2&gt;
  
  
  Exhibit B: the six-cent video
&lt;/h2&gt;

&lt;p&gt;Killing AI video was step one. Step two was realizing &lt;strong&gt;Nano Banana isn't always the move.&lt;/strong&gt; For a goofy "why do cats have fur" Short, I didn't need photoreal noir — I needed clean flat cartoon. Enter &lt;strong&gt;Flux Schnell&lt;/strong&gt; at &lt;strong&gt;$0.003 per megapixel&lt;/strong&gt;, roughly &lt;strong&gt;half a cent an image&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fobcgq8u38r1guruxbz1w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fobcgq8u38r1guruxbz1w.png" alt="A Flux Schnell cat — about $0.005 to generate. Right tool, right price." width="800" height="1400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's that one, live:&lt;/p&gt;

&lt;p&gt;🎬 &lt;strong&gt;&lt;a href="https://www.youtube.com/shorts/FWtEJjeK_vI" rel="noopener noreferrer"&gt;Why do cats have fur? → youtube.com/shorts/FWtEJjeK_vI&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;And the receipts, straight from its manifest:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stage&lt;/th&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Script&lt;/td&gt;
&lt;td&gt;local LLM&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.00&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Visuals (10 images)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;fal-flux-schnell&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.054&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Motion (all scenes)&lt;/td&gt;
&lt;td&gt;Ken-Burns (ffmpeg)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.00&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Voice&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;edge-tts&lt;/code&gt; (neural)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.00&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Sound FX + music&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;fal-elevenlabs-sfx&lt;/code&gt; + local bed&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.0076&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Save + Publish&lt;/td&gt;
&lt;td&gt;ffmpeg / YouTube API&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.00&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;TOTAL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;≈ $0.06&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;From &lt;strong&gt;$10.50 → six cents.&lt;/strong&gt; Same pipeline, different knobs. That's a &lt;strong&gt;~175× cost cut&lt;/strong&gt;, and it's the difference between "fun demo" and "I can run hundreds of these and let a bandit pick the winners." (Full cost teardown: Part 4.)&lt;/p&gt;

&lt;p&gt;That &lt;code&gt;$0.0076&lt;/code&gt; line is quietly important, too: it's an &lt;strong&gt;AI sound layer&lt;/strong&gt; — generated SFX plus a music bed ducked under the narration — and atmosphere is a big reason cheap doesn't read as &lt;em&gt;slop&lt;/em&gt;. The how is in Part 3.&lt;/p&gt;

&lt;h2&gt;
  
  
  The gap I'm actually building into
&lt;/h2&gt;

&lt;p&gt;After mapping the field, two things were suspiciously absent from every faceless-AI playbook:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cost honesty.&lt;/strong&gt; Everyone screenshots the $4M. Nobody publishes a per-second price table or admits their first video cost $10. So they never explain how to afford video #100.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Autonomy.&lt;/strong&gt; "Just post consistently for 6 months" — cool, that's a full-time job done by hand. Nobody treats &lt;em&gt;what to make next&lt;/em&gt; as a decision a system can learn: explore vs. exploit, a memory of what won, a verdict on every bet.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's the thesis. Over the next six parts I'll build a channel that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;turns a one-line idea into a finished, well-directed vertical Short (&lt;strong&gt;Part 2&lt;/strong&gt;),&lt;/li&gt;
&lt;li&gt;moves nearly all motion off paid AI video onto &lt;strong&gt;free custom effects&lt;/strong&gt; (&lt;strong&gt;Part 3&lt;/strong&gt;),&lt;/li&gt;
&lt;li&gt;drives cost per video from ~$10 toward &lt;strong&gt;pennies&lt;/strong&gt; (&lt;strong&gt;Part 4&lt;/strong&gt;),&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;remembers&lt;/strong&gt; what worked via a per-channel journal + self-reflection (&lt;strong&gt;Part 5&lt;/strong&gt;),&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;decides&lt;/strong&gt; what to make next with a Thompson-sampling bandit over a &lt;em&gt;falsifiable&lt;/em&gt; hypothesis (&lt;strong&gt;Part 6&lt;/strong&gt;),&lt;/li&gt;
&lt;li&gt;and &lt;strong&gt;runs itself&lt;/strong&gt; on a schedule, grading each post 48–72h later (&lt;strong&gt;Part 7&lt;/strong&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The learning loop is already showing its teeth. A batch of near-identical clips dumped in the same minute cannibalized itself (3–6 views each — brutal). Meanwhile one video — a real mathematician framed as a heretic, with a "this breaks reality" hook in the first two seconds — hit roughly &lt;strong&gt;50× the channel's other Shorts.&lt;/strong&gt; The rest of this series is the machine I'm building so that's a &lt;em&gt;repeatable pattern&lt;/em&gt;, not a lucky roll.&lt;/p&gt;

&lt;h2&gt;
  
  
  It's all open source — and it's a live experiment
&lt;/h2&gt;

&lt;p&gt;The whole studio is on GitHub — &lt;strong&gt;&lt;a href="https://github.com/dasein108/slope-studio" rel="noopener noreferrer"&gt;&lt;code&gt;slope-studio&lt;/code&gt;&lt;/a&gt;&lt;/strong&gt; (one letter from "slop", which, given the genre, is either a typo or a mission statement). Every line of code in this series lives there: the 7-stage pipeline, the free ffmpeg effects, the cost model, the bandit. Part 2 is the guided tour, with a one-command smoke test you can run with zero API keys.&lt;/p&gt;

&lt;p&gt;And this isn't a retrospective with the numbers airbrushed in — it's a &lt;strong&gt;live experiment&lt;/strong&gt; you can watch compound or faceplant in public. Every Short the system ships asks viewers to subscribe, because the whole point is watching an autonomous channel grow from zero. Consider it subscribing to the test harness.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd tell another AI engineer
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Takeaway:&lt;/strong&gt; Treat content as a pipeline, not a craft. The instant every input — script, image, motion, voice, sound — is a function call with a &lt;em&gt;measured&lt;/em&gt; cost, three superpowers unlock: you can drive unit cost toward zero, run hundreds of cheap experiments, and bolt a learning loop on top that decides which experiments to repeat. The folks making millions optimized the system and the volume. The folks getting deleted &lt;em&gt;only&lt;/em&gt; had volume. The alpha is the system.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;strong&gt;Next — Part 2: Idea → Published in 7 Stages.&lt;/strong&gt; The actual architecture: every stage as an independent CLI subcommand, the &lt;code&gt;runs/&amp;lt;id&amp;gt;/&lt;/code&gt; artifact flow, a manifest that records measured cost per stage, and how a single line of text becomes an uploaded Short without me touching a video editor.&lt;/p&gt;

&lt;p&gt;▶ &lt;strong&gt;Live effects gallery:&lt;/strong&gt; &lt;a href="https://dasein108.github.io/slope-studio/" rel="noopener noreferrer"&gt;dasein108.github.io/slope-studio&lt;/a&gt;&lt;br&gt;
⭐ &lt;strong&gt;Star the repo:&lt;/strong&gt; &lt;a href="https://github.com/dasein108/slope-studio" rel="noopener noreferrer"&gt;github.com/dasein108/slope-studio&lt;/a&gt;&lt;br&gt;
🔔 &lt;strong&gt;Subscribe&lt;/strong&gt; (watch the experiment from zero): &lt;a href="https://www.youtube.com/shorts/gaR76MiAK0U" rel="noopener noreferrer"&gt;the Lobachevsky Short&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Sources: &lt;a href="https://www.techlusive.in/news/how-this-indian-ai-generated-youtube-channel-is-pulling-billions-of-views-and-millions-in-revenue-1635923/" rel="noopener noreferrer"&gt;techlusive&lt;/a&gt; · &lt;a href="https://www.business-standard.com/technology/tech-news/india-youtube-bandar-apna-dost-channel-global-ai-video-charts-slop-content-125123100396_1.html" rel="noopener noreferrer"&gt;Business Standard&lt;/a&gt; · &lt;a href="https://outlierkit.com/resources/youtube-ai-slop-crackdown-2026/" rel="noopener noreferrer"&gt;OutlierKit (AI-slop crackdown)&lt;/a&gt; · &lt;a href="https://miraflow.ai/blog/faceless-youtube-channel-explosion-ai-million-subscriber-creators-2026" rel="noopener noreferrer"&gt;Miraflow (faceless explosion 2026)&lt;/a&gt;. View/revenue figures are third-party estimates.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>video</category>
    </item>
  </channel>
</rss>
