Maksims Gavrilovs

Posted on Jun 8 • Edited on Jun 12

Zero to Autopilot, Part 4: The Cost Collapse — $10.50 $0.06 per Video

#ai #python #cost #video

Series: Zero to Autopilot — Building a Self-Improving AI Media Channel. Part 4 of 7. Part 1 landscape · Part 2 pipeline · Part 3 free motion. Now the headline number: how a video went from $10.50 to six cents.

Data status (Part 4): real-now. Every figure is a measured cost_usd from the manifest, not an estimate. Code is straight from the repo.

Where the money actually goes

After Part 3, motion is free — I animate stills in ffmpeg for $0. So a video's cost collapses to just two line items that can cost real money:

Images — one still per scene.
AI video — if and only if I choose to use it on a scene.

Everything else (script on a local LLM, narration on edge-TTS, stitching, muxing, publishing) is already $0. So the cost game is entirely about those two knobs. Let's turn them down without making slop.

Knob 1: the per-second video bomb

Recap of the villain from Part 1 — hosted AI image-to-video bills per second of output. The cost of one clip isn't a flat fee; it's duration × rate, snapped to the model's accepted duration grid:

# studio/providers/video.py
def estimate_cost(provider: str, model: str, seconds: float) -> float:
    spec = FAL_MODELS.get(model, FAL_MODELS["kling"])
    return round(_clip_dur(model, seconds) * spec["per_s"], 4)   # seconds × $/s

At kling's $0.07/s, a 150-second Short with AI video on every scene is ~$10.50. That was my first video. The fix isn't a cheaper model (though ltx at $0.04/s helps) — it's using AI video far more selectively, which I'll get to. First, the cheaper knob.

Knob 2: right-size the image model

I had been defaulting every image to Nano Banana ($0.039/img) — Google's Gemini 2.5 Flash Image. It's gorgeous and, crucially, supports character-reference consistency, which you want for photoreal or recurring-character content like my noir Kafka series:

But a goofy "why do cats have fur" explainer doesn't need photoreal noir. It needs clean flat cartoon — and for that, Flux Schnell at $0.003/megapixel (~half a cent an image) is perfect:

Same pipeline, one config change, ~8× cheaper images when the style allows. The lesson generalizes: don't pay for capabilities the scene doesn't use. Photoreal + character-ref? Nano Banana. Flat/graphic/cartoon? Flux. The system keeps both wired as image and image_cheap.

The tiers: one knob to set them all

Rather than fiddle providers per stage, I bundled the choices into four tiers. This is the whole config:

# studio/tiers.py
TIER_PRESETS = {
    "free":     {"image": "card",            "voice": "edge",       "strategy": "kenburns"},
    "cheap":    {"image": "fal-flux-schnell", "voice": "edge",      "strategy": "kenburns",
                 "sfx": "local", "music": "local"},
    "balanced": {"image": "fal-nanobanana",  "voice": "edge",       "strategy": "auto"},   # fill AI within budget
    "premium":  {"image": "fal-nanobanana",  "voice": "openai-tts", "strategy": "all"},    # AI every scene
}

And the resulting cost ladder for a 150s Short:

Tier	Images	Video strategy	~Cost / 150s	When
free	offline card	Ken-Burns	$0	wiring / drafts
cheap	Flux Schnell	Ken-Burns	~$0.06	budget volume
balanced	Nano Banana	`auto` (AI on hero scenes)	= your `--max-cost`	best per dollar
premium	Nano Banana	AI every scene	$6–10+	quality first

--tier sets everything; any --*-provider flag still overrides a single choice. The interesting one is balanced, because of how auto works.

`auto`: spend the budget where it matters

Most scenes are fine as a drifting still. A few — the hook, the climax, the outro — earn real AI motion. So auto is a tiny greedy knapsack: rank scenes by priority, then spend the AI budget on the highest-priority ones that fit, Ken-Burns the rest.

Priority is either explicitly set on a scene, or inferred by a hero heuristic:

# studio/stages/clips.py
def _effective_priority(scene, index, total):
    if scene.priority:        return float(scene.priority)
    if index == 0:            return 3.0    # the hook
    if index >= total - 2:    return 2.5    # outro / CTA
    # ...else an evenly-spread beat gets a mid priority

Then fill the budget greedily, highest priority first:

budget = max_cost if max_cost is not None else float("inf")
for i in sorted(range(n), key=lambda i: (_effective_priority(scenes[i], i, n), -i), reverse=True):
    c = video.estimate_cost("fal-i2v", model, scenes[i].duration_s)
    if spent + c <= budget:
        per_scene[scenes[i].id] = "fal-i2v"   # animate this one with AI
        spent += c
    # else: it stays Ken-Burns (free)

So --tier balanced --max-cost 1.50 means: "give me AI motion on the hook and a couple of key beats, free motion everywhere else, and never spend more than $1.50." You get the perceptual punch of AI video where viewers actually notice it, at a fraction of all-AI cost.

The pre-flight that refuses to overspend

Costs are estimated before a single API call. auto trims to fit; the rigid strategies (all/hybrid) abort if the estimate exceeds the budget rather than surprise you with a bill:

$ studio estimate lobachevsky --budget 3
  kling   150s → $10.50   ❌ over budget
  ltx     150s → $6.00    ❌ over budget
  auto    (fills $3.00)   ✅ AI on 6 hero scenes, Ken-Burns the rest

studio run defaults to --max-cost 3 and the clips stage won't blow past it. A running guard backstops the estimate in case a provider returns something unexpected. The golden rule from Part 2 pays off here: because every provider reports its real cost, the budget logic is exact, not hopeful.

The receipts

Same ~150s video, every tier, measured from the manifests:

Build	Images	Video	Sound	Total
premium (my first video)	Nano Banana	kling, every scene	—	~$10.50
balanced	Nano Banana ($0.585)	a few AI clips ($0.75)	—	$1.34
cheap (Nano + free motion)	Nano Banana	Ken-Burns	—	$0.585
cheap (Flux + free motion + AI SFX)	Flux ($0.054)	Ken-Burns	$0.0076	$0.06

$10.50 → $0.06. About a 175× cut, and the cheap version isn't a toy — it's a published Short with real narration, free motion, and atmosphere. The quality lever moved to art direction and pacing (free), not the size of the model bill.

A fair caveat, though: $0.06 is the floor — a deliberately minimal Short. Once I turn the art-direction layer all the way up — parallax with generated plates, atmosphere, a vintage grade, a few Nano-Banana hero stills where they earn it — a fully art-directed, near-premium video lands around $0.15–0.25. That's still 40–65× cheaper than the ~$10 all-AI cut, at quality I genuinely can't tell apart in a feed. So read this as a ladder, not a single number:

Build	~Cost	When
floor (minimal effects)	$0.06	volume, throwaway tests
fully effected, near-premium	~$0.15–0.25	the realistic everyday build
premium (AI video every scene)	~$10	almost never worth it

The honest anchor is that middle rung. "The $0.06 Short" is the hook; "a great-looking Short for a quarter" is the number I actually run on.

A field update: what the catalog actually cost

I wrote that ladder as a forecast. Since then I've built a real back-catalog, so I can replace the forecast with the receipts — and the receipts are blunter than I expected. Across the dated runs in the repo, the median cost is well under a cent, and the cheapest published Shorts — full 60-second explainers with narration and free motion — measured $0.006. That's a tenth of the $0.06 I just called the floor. The real floor turned out an order of magnitude lower:

Real video (measured from its manifest)	What it used	Cost
Chandrasekhar (60s)	1 Flux still, free motion, edge-TTS	$0.006
Gödel, "math can't prove itself" (60s)	2 Flux stills, free motion, edge-TTS	$0.012
Galois, "the duel"	Nano stills + a little AI SFX	$0.18
Rabies (60s)	5 Nano stills + SFX + a music bed	$0.41
Fermat, "the margin note"	Nano stills + `ltx` AI clips + music	$0.78

What moves the needle is never the script or the motion — those are free in every row. It's exactly three opt-in knobs: Nano stills instead of Flux (about $0.14–0.20 a video), the paid audio layer (AI SFX plus a stable-audio music bed, about $0.20), and any AI video clips (ltx at $0.40 a hero beat). Turn all three off and you land at a sixth of a cent. Turn all three on and you're still under a dollar. The only way back to a $10 video is AI motion on every scene, which — as the receipts above keep saying — you almost never should.

The one line item I never cut: sound

Cost-optimizing sounds like "cut everything," but the real skill is knowing what punches above its price — and then keeping it. The audio layer is the clearest case. AI sound effects plus a music bed run about $0.0076 to $0.20 a video, rounding error next to the image and video knobs, and they do more for perceived quality than anything else on the list.

The reason is that sound doesn't just decorate the picture — it cues the viewer's imagination to render the rest. A gust of wind, a distant bell, a low cello under a line of narration: the still shows a single frozen frame, but the soundscape makes the mind supply the motion, the depth, and the room the scene lives in. A fuller "video" plays out in the viewer's head that the image never actually contained. A real share of the production value a viewer feels is happening behind their own eyes, prompted by a few cents of audio.

So when I trim cost, sound is the last thing to go, and usually it never does. It's the highest return-on-investment line in the whole pipeline: pennies for atmosphere and liveness you can't buy any other way. "Right-size the spend" cuts both directions — kill the costs that don't earn their keep, and protect the cheap ones that punch far above their weight.

And cheaper actually wins

That last claim isn't theoretical. My most expensive video was the premium Lobachevsky cut — AI video on every scene, ~$10.50, hours of fussing. One of my cheapest real bets was Ramanujan: 8 Nano-Banana stills, free ffmpeg motion plus a sliver of cheap ltx on the hero beats, $0.65 measured, start to finish in about an hour:

🎬 Ramanujan: Math's Divine Genius → youtube.com/shorts/rsk8XruZWBQ

The 65-cent video outperformed the ten-dollar one. (Full numbers land in Part 7, per the series' data policy — but the direction is already unambiguous.) That's the empirical version of the whole argument: once free motion clears the "doesn't look like slop" bar, extra dollars buy shockingly little. Production quality is barely a success factor — the hook, the subject, and the story are. So the right move is to floor the cost and spend your real effort on which videos to make.

And you don't have to take my word that this scales. Channels like Cuentos de la Choza — Spanish folklore and horror tales — sit at 400k+ subscribers across 1,200+ videos, built on AI-generated stills, narration, and simple motion. Sit with that catalog size for a second: at 1,200 videos, nobody is paying per-second for AI video on every scene. The unit economics simply don't allow it. The "post at volume" play and the "drive cost to the floor" play are the same play — which is the entire reason the rest of this series exists.

Why this is the whole ballgame

A $10 video is a precious artifact you agonize over. A six-cent video is an experiment. At six cents, a hundred attempts costs six dollars — so I can stop guessing what works and start measuring it. Cheap unit cost is what turns "make content" into "run a search over content."

Which raises the obvious question: if I can cheaply make hundreds of videos, which hundreds should I make? That needs a brain — a memory of what worked and a way to decide what to try next. That's the back half of this series.

What I'd tell another AI engineer

Takeaway: Cost-optimize by removing capabilities you aren't using, not by buying the cheapest everything. Free motion killed the per-second video bill; right-sizing the image model (photoreal vs flat) cut images ~8×; an auto strategy spends the remaining budget only on the scenes that perceptually earn it; and a pre-flight estimate makes the cap exact. The payoff isn't the saved dollars — it's that a cheap-enough unit cost converts a craft into a search, which is the only thing that makes the learning loop (next) affordable.

Next — Part 5: Memory & Self-Reflection. Now that videos are cheap, the channel needs to remember. I'll build the per-channel journal — a long-term strategy plus an episodic ledger of every bet, with virality scoring and an LLM reflection step that turns measured outcomes into an updated game plan.

▶ Live effects gallery: dasein108.github.io/slope-studio
⭐ Star the repo: github.com/dasein108/slope-studio
🔔 Subscribe to watch the experiment grow from zero: the Lobachevsky Short

DEV Community

Zero to Autopilot, Part 4: The Cost Collapse — $10.50 $0.06 per Video

Where the money actually goes

Knob 1: the per-second video bomb

Knob 2: right-size the image model

The tiers: one knob to set them all

`auto`: spend the budget where it matters

The pre-flight that refuses to overspend

The receipts

A field update: what the catalog actually cost

The one line item I never cut: sound

And cheaper actually wins

Why this is the whole ballgame

What I'd tell another AI engineer

Top comments (0)

Where the money actually goes

Knob 1: the per-second video bomb

Knob 2: right-size the image model

The tiers: one knob to set them all

auto: spend the budget where it matters

The pre-flight that refuses to overspend

The receipts

A field update: what the catalog actually cost

The one line item I never cut: sound

And cheaper actually wins

Why this is the whole ballgame

What I'd tell another AI engineer

`auto`: spend the budget where it matters