DEV Community: Maksims Gavrilovs

Zero to Autopilot, Part 9: Anatomy of a $25 AI Company

Maksims Gavrilovs — Fri, 10 Jul 2026 12:56:36 +0000

Series: Zero to Autopilot — Building a Self-Improving AI Media Channel. Part 9 — the internals. Part 8 told the story; this one opens the machine. Full prior arc: Part 1 … Part 7.

Data status: real-now — real runtime config and code from the repo. Open source.

In Part 1 the interesting claim was that the thing runs itself. Here I want to answer the engineer's next question: how? Not "an LLM does it," but the actual mechanics — how a goal becomes tasks, what the company remembers, how an agent decides what to do when it wakes up, and how the pieces hand off without stepping on each other. I'll name the theory where it's useful, because it turns out this design maps onto a well-worn stack, and the places where it doesn't are the interesting ones.

Everything below is the real runtime (Paperclip): a local daemon plus an embedded Postgres, agents that are Claude Code sessions, and a company you can read as a folder.

1. The goal, and how it becomes tasks

The company has a goal tree. At the root: unlock YouTube monetization. Under it, child goals — reach 1,000 subscribers, reach the watch-time threshold. Each is a row with a level (company / team / agent / task) and a parentId, so goals form a hierarchy.

Here's the part people assume and get wrong: the runtime does not decompose the goal into tasks. There's no planner reading the goal and emitting a work breakdown. A goal is just data. The only link between a goal and the work under it is a single field on each ticket — goalId — that an agent stamps when it creates the ticket.

So how does the goal actually produce today's three video ideas? Through a prompt. The Growth Lead agent's instructions say, in effect, "own progress toward this goal." When it wakes, it reads the goal's metrics and the channel state, reasons about the most useful next move, and creates the tasks itself — a script here, a measurement there — each tagged with the goal. The decomposition lives in an agent's head, not in the scheduler.

The classic vocabulary for this is worth borrowing. Planning theory calls the ideal goal-driven decomposition by an orchestrator that owns "high-level task comprehension, planning, and decomposition," and it warns that planning should be a pipeline, not a prompt — something you can instrument and constrain (survey, arXiv:2602.10479). The Belief–Desire–Intention model gives the cleanest frame: desires = the goal and its constraints, beliefs = world state and memory, intentions = the plans and tool-calls the agent commits to ([CoALA-adjacent; ChatBDI, AAMAS 2025]). Map that onto the company: the goal tree is desires, the tickets and journal are beliefs, the assigned issues are intentions.

The honest divergence: Paperclip's decomposition is a prompt, not an instrumented pipeline. The goal steers behavior only as much as the Growth Lead's prompt makes it, and goalId is a flat tag with no automatic roll-up of task completion into goal progress. It's cheaper and it works, but it's the weakest joint in the system. The goal is scaffolding, not control flow. (Progress toward the goal is measured separately, by the marketing engine, not computed by the runtime from closed tickets.)

2. Two kinds of memory

Ask "where's the memory" and the surprising answer is: there's no vector store, no embeddings, no "agent memory" module. There are two plain stores doing two different jobs.

Coordination memory is the tickets and their exhaust, all in Postgres: issue comments (the reasoning and handoff ledger), issue documents (durable artifacts like a plan or spec attached to a ticket), the activity log (an append-only audit stream), and per-run NDJSON transcripts (the black-box recorder of every agent turn). This is "who did what, why, and what's attached."

Learning memory is a separate file the marketing engine owns — the journal — holding every bet, its measured metrics, a virality score, and the strategy the company has learned (winning patterns, losing patterns, seeds for the next idea). This is "what actually works."

The split is deliberate, and the theory backs it. Cognitive-architecture work (CoALA, arXiv:2309.02427) tiers agent memory into working (the ephemeral context of one decision), episodic (time- and task-indexed experience), semantic (durable knowledge), and procedural (skills/code). Paperclip's stores land on that grid cleanly: working memory is the context assembled for one heartbeat; the comments, activity log, and run transcripts are a genuine episodic store (they're literally indexed by when and which task); documents and the journal's strategy are semantic; and the agents' instruction files are procedural. Multi-agent memory research adds a rule Paperclip follows by accident: keep memory two-layered — a shared global layer plus a local per-agent layer — because fully-shared memory homogenizes agents and erodes role specialization ([LLM-MAS memory survey]). The per-agent instruction bundle is that local layer; the tickets and journal are the shared one.

The divergence here: nothing consolidates. Comments and activity grow without bound; there's no summarize/forget/prune policy, which the literature flags as a drift-and-bloat hazard ([MemGPT, arXiv:2310.08560]; [MemInsight]). The company's only real "consolidation" is the Analytics agent periodically distilling measured bets into updated strategy. For a small channel that's fine. At scale it wouldn't be.

3. How an agent decides what to do

Agents aren't long-running processes polling a queue. Each is dormant until woken, runs one heartbeat — one Claude Code process — then exits. So "decision-making" is per-wake, and the whole company is a set of stimulus→response reflexes over tickets.

Four things wake an agent:

assignment — set a ticket's assigneeAgentId to the agent. This is the backbone: reassign a ticket, the new owner wakes. It works even with idle heartbeats off.
schedule — a cron routine fires and creates a ticket assigned to an agent (the daily 9am kick to the Growth Lead).
idle heartbeat — a per-agent self-wake on an interval. I turned these off; ~240 no-op wakes a day is pure token burn and the chain runs fine without them.
mention — an @handle in a comment. Unreliable: a wrong handle wakes nobody, which is exactly how QA gates silently stalled until I switched handoffs to reassignment.

On wake, the runtime injects context — PAPERCLIP_WAKE_REASON, PAPERCLIP_TASK_ID, the issue to act on, the workspace. The agent's opening move is a fixed reflex I can see in the transcripts: "new heartbeat — check env and inbox." If there's a task, work it. If nothing is assigned and there's no wake context, exit. (This is why firing a bare heartbeat at the Growth Lead did nothing once: with no driver ticket, it correctly no-ops. You steer this company by creating and assigning tickets, not by poking agents.)

The tidy way to describe the loop is a POMDP control cycle: perceive → update memory → decide → act, feeding measured feedback back in (arXiv:2601.12560). Each heartbeat is one timestep of that; the "update memory" step — reading the ticket's comments and documents before acting — is retrieval-augmented generation by another name. The control style is firmly reactive/event-driven, not a central deliberative planner. That's a deliberate trade: it's cheap, cron-friendly, and reproducible, at the cost of the instrumentable planning stage the theory would prefer.

4. Coordination: tickets, not chatter

The company coordinates through one artifact (the ticket) and one signal (assignment). A handoff is literally: change the ticket's assignee, which wakes the next owner. Comments are advisory (reasoning, mentions), not a transport. Documents are the artifacts that ride along. One agent owns a ticket at a time, enforced by an atomic checkout lock.

In the taxonomy of multi-agent shapes — chain, star/hub-and-spoke, mesh — this is a star (the Growth Lead orchestrates) using ticket coordination rather than a blackboard or message-passing.

Topology isn't cosmetic: it measurably affects both goal attainment and cost, and dense mesh topologies burn 2–12× the tokens of a chain (arXiv:2601.12560; [arXiv:2505.22467]). Tickets are on the cheap end, which is part of how the bill stays at $25.

The satisfying part is that the two classic failure modes of an orchestrator-worker company are named in the literature, and Paperclip's design is exactly their prescribed mitigations (arXiv:2602.10479; silent failures are ~75% of multi-agent errors, [arXiv:2606.08162]):

Silent worker failure — a worker stops, the orchestrator assumes progress. Mitigation: heartbeats plus "never end a turn with an in-flight ticket." Paperclip has both, plus an Observability agent that hunts stuck tickets. The one rule every agent's prompt repeats is: before you exit, set the ticket done or blocked — never leave it hanging.
Goal-semantic drift, the "telephone effect" — the goal's meaning degrades as it passes down layers. Mitigation: periodic re-alignment to the root goal. That's the Analytics agent measuring every bet against the goal's real metrics and rewriting strategy — a re-grounding step baked into the loop.

5. Why the guardrails live in code, not the prompt

This is the lesson I earned the hard way, and the one place the theory is bluntly prescriptive. Stability against drift comes from bounded autonomy: hard limits on tokens, time, tool-calls, and money; fail-safe termination; and, critically, policy-mediated tool execution and observability as first-class requirements (arXiv:2602.10479; [Agent Contracts, arXiv:2601.08815]).

Paperclip runs each agent as a Claude Code session with dangerouslySkipPermissions on. Which means the agents' tools are not policy-mediated. And it shows: the Producer's prompt clearly said "never commit to the main branch" and "never git add -A," and an agent did both anyway — one of them swept a live API token into a commit, caught only by a push-protection hook. A per-video budget wasn't passed once, so a producer generated premium AI video on all nine scenes and a single Short cost $2.81 instead of twelve cents.

The fix in every case was the same shape: stop asking, start enforcing. A pre-commit hook now blocks secrets and direct-to-main commits regardless of what any agent intends. Channel runs now derive their spend cap from the budget automatically, so the expensive default can't apply. None of that lives in a prompt, because a prompt is a suggestion an LLM will cheerfully ignore in the same breath it agrees with it. Creative latitude belongs in the prompt; rights — spending money, publishing, touching git — belong behind code you control.

What the design gets right, and where it's thin

Mapping the whole thing back to the literature, the scorecard is honest:

Right: the org shape (a star with ticket coordination) and its two failure modes are textbook, and the mitigations — heartbeats, "never leave a ticket hanging," and goal-re-alignment through measurement — are exactly what the theory prescribes. The two-memory split (coordination vs learning) and per-role local prompts match best practice. The per-heartbeat POMDP loop is a clean, cheap control model.
Thin: goal→task decomposition is a prompt, not an instrumented pipeline, and goalId is a flat tag — the goal is scaffolding, not control flow. Memory never consolidates. Tools aren't policy-mediated, so safety has to be bolted on from outside.

That last gap is the whole thesis in one line: an agent company is only as safe as the code around it, not the instructions inside it. The interesting engineering isn't the agents — it's the harness that lets you trust them with the publish button and the credit card.

That's the machine. If you want the story instead — the overnight ships, the two times I stepped in, the receipts — that's Part 8: The $25 Company.

⭐ Repo: github.com/dasein108/slope-studio — the company package, the agents, the pipeline.
📚 Foundational build log: the Zero to Autopilot series, Part 1.

Zero to Autopilot, Part 8: The $25 Company — an Org of AI Agents That Runs My Channel

Maksims Gavrilovs — Thu, 02 Jul 2026 18:54:08 +0000

Series: Zero to Autopilot — Building a Self-Improving AI Media Channel. Part 8. Part 1 built the channel; Parts 2–7 made it run itself (finale: Part 7). This part replaces that single loop with a company of agents that manages the channel.

Data status: real-now — costs, code, architecture, and qualitative outcomes, all measured today. Repo is open source.

I opened the dashboard one morning expecting nothing, and three videos had already shipped overnight. Scripts written, scenes rendered, voiced, captioned, QA'd, published to YouTube. I hadn't touched anything. There was no notification waiting for me either, because at no point did the work need a human. The 9am job had fired, a handful of agents passed the job between themselves, and by the time I looked it was done.

That's the system I want to describe. Not the videos, the org that makes them.

What this is

I run a faceless AI YouTube Shorts channel. Software does the whole thing: it picks a topic, writes a 60-second script, generates the keyframes and motion, synthesizes a voiceover, mixes audio, and uploads. The whole operation runs on about $25 a month, and it's structured like an actual company, with a CEO, a growth lead, a QA critic, a producer, the works.

For a while the brain of it was a single loop, one function on a timer that walked through ideate → produce → measure → learn and picked the next action each tick. It worked. It also had no judgment.

One process did everything, which meant nothing checked anything. The thing that wrote the script was the same thing that decided the script was good enough to spend money rendering. It never noticed a bug or re-thought a budget. It ran its if-statements and stopped there.

So I replaced the loop with a company.

The one goal

Everything below serves a single number. The company has one goal, the kind you'd give a real team: cross YouTube's monetization bar (1,000 subscribers plus the watch-time threshold). The CEO agent owns it; the Growth Lead works toward it. Every video is a bet placed against that goal, and every measurement is scored relative to it. How a goal like that actually turns into today's three video ideas is the interesting part, and it's the subject of Part 9. For now: there's a goal, and the org exists to move it.

The company

The brain is now eight role-specialized LLM agents running on Paperclip, a small local runtime that gives each agent an identity, a task inbox, and the ability to hand work to another agent. They don't chat in a free-for-all. They pass tickets, like a real team.

Here's what each one actually owns:

CEO / Operator. The board. Owns the company goal (for me, unlocking monetization), the budget caps, and the publishing policy. It approves spend and policy changes and makes the final call on direction. The other agents escalate decisions here; it doesn't write or render anything itself. (Sonnet 4.6.)
Growth Lead. The initiator, and the closest thing to a manager. Every cycle it reads the channel's state and decides the single most useful move right now: make something, measure matured videos, or reflect and update strategy. When it's "make," it picks which bet from the backlog and writes the SEO framing (title candidate, hook promise, target keyword), then hands the bet to the Screenwriter. (Sonnet 4.6.)
Screenwriter. Turns a one-line bet into a real script: a hook that lands in the first three seconds, one concrete idea actually explained, and visual prompts the renderer can use. If QA sends it back, it rewrites against the specific notes. (Opus 4.8, because script quality is the product.)
QA / Critic. The independent gate, and the reason I trust the thing. It runs twice: once on the script before any money is spent (is the hook real, is the payoff there, does the title overpromise, are the prompts safe to render), and once on the final video before publish (does it play, is the audio clean, does the metadata match). It can block either gate and send the work back. (Opus 4.8.)
Producer. Turns a passed script into a finished, published video. It runs the whole render pipeline (keyframes, motion clips, stitching, sound, voice, master, metadata), publishes to YouTube, and links the result back to the channel's journal so it can be measured later. It works inside the budget cap. (Sonnet 4.6.)
Analytics & Learning. Closes the loop. It waits for videos to mature (~60 hours), pulls the real YouTube numbers, scores each bet's virality relative to the channel's own history, and rewrites the strategy: what's winning, what's losing, what to try next. (Sonnet 4.6.)
Observability / Ops. The watchdog. Looks for stuck tickets, failed renders, published videos that never got linked, and budget drift, then opens incidents. (Haiku 4.5.)
Secretary. A daily Telegram digest so I can read the state of the company without opening anything. (Haiku 4.5.)

The model tiering is deliberate and it's a cost decision: Opus only where judgment is the product (writing and gating), everything coordinational on cheaper models. And because the agents run on a Claude subscription rather than metered API calls, the reasoning is effectively free. The only thing that costs real money is the AI video generation itself.

How a day runs

There is exactly one timer left in the system: a 9am job that wakes the Growth Lead. Everything after that is event-driven. Finishing one step assigns the next ticket to the next agent, and being assigned a ticket is what wakes that agent. A comment doesn't wake anyone; the assignment does.

Here's what that looks like in practice. These are the actual tickets for a single Short, "Hilbert's Infinite Hotel," from idea to measured:

Ticket	Real task name	Assignee
SLO-80	Script: j0034 — Hilbert's Infinite Hotel: The Paradox That Breaks Infinity	QA Critic
SLO-82	Produce: j0034 — Hilbert's Infinite Hotel (paid stages authorized)	QA Critic
SLO-85	Packaging/SEO gate: j0034 — Hilbert's Infinite Hotel	Growth Lead
SLO-86	Publish approval: j0034 — Hilbert's Infinite Hotel	CEO Operator
SLO-87	Publish: j0034 — Hilbert's Infinite Hotel	Producer
SLO-97	Measure + learn: j0033, j0034, j0036	Analytics & Learning

(The assignee is the agent that owned the ticket when it closed; a "Script" ticket finishes assigned to QA because that's who it was handed to for the gate. The numbers skip around because two sibling Shorts moved through the same morning's cycle in parallel: SLO-81, 83, and 84 belong to "The Arrow of Time," which the cron produced the same day.) Six tickets, six handoffs, none of which needed me.

How it decides what to make

This part isn't an LLM guessing. The scouting is mostly statistics.

Every published video is recorded as a falsifiable bet with a measured outcome. Analytics turns the winners and losers into explicit patterns and idea seeds. A plain Thompson-sampling bandit sits over the learned theme and format features and decides, per slot, whether to exploit a known winner or explore something new. Growth Lead takes the bandit's pick plus the learned patterns and shapes the actual bet. The creative agent only enters at the end, working from evidence rather than vibes.

What the evidence said for my channel: tragic-genius stories wrapped around a paradox, in math and physics, win. Melancholy, horror, and demographic quiz formats lose. The system figured that out from its own measured history, not from me.

What I control vs what the agents do

The clean line, because it's the whole point:

What I control (the levers):

The goal (what "winning" means: subscribers, watch time, monetization).
Budget caps: per-video and daily spend.
Promotion velocity: how many videos per day, how aggressively to push.
Publishing policy: attended (I approve each) vs unattended.
Content vision and guardrails: themes to chase or avoid.
Strategic development (curation): the occasional analysis brief that reshapes what the company makes.

What the agents do on their own:

Scout the next topic (bandit + learned patterns).
Write and rewrite scripts.
Gate quality at the script and the final cut.
Render, voice, publish, and link each video.
Measure real performance and rewrite the strategy.
Notice stuck work and fix their own coordination.

I set the rules of the game. They play it.

Why I barely touch it

Standing the company up was a one-time cost, and a small one. I declared it in a single config: the company and its goal (cross monetization), the eight agents with their roles, models, and prompts, one cron routine (the 9am wake), and the policy knobs — budget caps, promotion velocity (how many videos a day, how hard to push), and whether publishing is attended or not. Paperclip read that file, created the agents and the goal tree, and the org existed. No per-agent babysitting, no wiring handoffs by hand; the handoffs are just tickets the agents pass among themselves.

Then I was heads-down on other work, and travelling for a few days. Worth being honest about one fragility here: Paperclip runs on my laptop, so when the lid is shut the 9am cron doesn't fire. While I was away the channel missed a few days of publishing — the price of hosting your autonomous company under your own desk instead of on a server. The rest of the week it ran without me.

Across that week I stepped in by hand three times, and every touch was direction, not labor:

I adjusted the content vision — moved to three videos a day, with one slot reserved as a deliberate experimental bet so the bandit always keeps exploring, not just exploiting. I left it as a comment on the policy task and the Growth Lead translated it into how it picks bets.
I rebranded the channel — an SEO-driven rename from Starship Pilot to Paradox Noir, to match what the data said was winning: paradoxes, told dark.
I asked it to study its own failures, and it re-tooled production. The most recent one, filed just this morning: a one-line brief — analyze the unpopular videos and find the common patterns — and I went back to other work. The Analytics agent ranked every loser and came back with seven loss patterns: off-brand genres failed without exception, a mechanism with no named human lost, vague stakes lost, and, against my own instinct, the videos carrying the most visual effects flopped hardest. The CEO agent turned that into policy on its own: a genre kill-list, a Kling-over-LTX rule, zero effects by default, and a mandatory screenwriter checklist — a named human in the first three seconds, one concrete outcome, a rejection-or-vindication arc, a keyword-first title. It wrote the rules into fresh tickets for the Screenwriter and the Producer and closed the loop without me.

Note the shape of that third one. I didn't run the analysis or write the rules; I asked one question and the company rewrote how it writes and produces. That's the management I actually do now: set the goal, now and then point at a weakness, read what comes back. Scouting, scripting, QA, rendering, publishing, measurement — none of it needs me. I check in to steer and to read, not to drive.

Receipts

Small channel, honest numbers. To date the company has logged 53 bets, produced 43 videos, measured 41, for about 5,388 views and 22 new subscribers. Nobody's quitting their day job. But the distribution is the interesting part:

Video	Views	Retention	Subs	Note
"The mathematician who proved a theorem decades too early"	523	70%	7	best — 100th percentile
"The Cat That Is Alive AND Dead: Superposition"	667	23%	3	most views, but reach without conversion

The cat video got the most eyeballs and taught the least: 23% retention, weak subs. The "genius ahead of his time" video got fewer views but held 70% of them and converted seven subscribers. The company now knows the difference, and the bandit weights toward the second kind.

On cost, the math is the whole pitch. A clean Short runs 7 to 25 cents. Across 21 days the channel spent $13.30 generating 39 videos, a run-rate of about $19 a month at three Shorts a day. The agents' reasoning doesn't add to that, it rides a flat Claude subscription instead of metered API calls, so video generation is the only real spend. Call it roughly $25 a month with headroom, for a company that writes, judges, and ships on its own.

Exactly one video blew past the per-video cap: $2.81. Worth describing why, because it's the kind of thing autonomy quietly does to your wallet. The render pipeline defaults its spend cap to $3 when no cap is passed, and the produce step didn't pass the channel's much tighter budget. So nothing stopped the Producer from generating premium AI video (the costliest model, around $0.31 a scene) on all nine scenes of the Short. Nine premium clips, about $2.80, for one 114-second video. When I looked at the data, spend and virality were negatively correlated: the expensive model wasn't measurably better, it just emptied the budget faster. The fix was to make every channel run derive its cap from the budget automatically, so the $3 default can never apply again.

That's the whole system: a goal, a budget, a cadence, and eight agents that pass tickets until something ships. I set the rules and read the dashboard. The channel does the rest.

Part 9 — *Anatomy of a $25 AI Company* goes inside the machine: how the goal becomes tasks, the two kinds of memory the company runs on, how an agent decides what to do when it wakes, and why the guardrails have to live in code and not in a prompt. (Link when published.)

It's all open source, the company package, the agents, the render pipeline, the bandit. Go read it, fork it, or point out what I got wrong.

⭐ Repo: github.com/dasein108/slope-studio
▶ Live effects gallery: dasein108.github.io/slope-studio
📚 This is a continuation of the Zero to Autopilot series — start at Part 1: I built an AI that runs a YouTube channel, which covers the channel, the pipeline, the cost collapse, the memory, and the bandit, everything this piece builds on.

Zero to Autopilot, Part 7: Closing the Loop — the Channel That Runs Itself

Maksims Gavrilovs — Fri, 12 Jun 2026 02:28:39 +0000

Series: Zero to Autopilot — Building a Self-Improving AI Media Channel. Part 7 of 7 — the finale. Part 1 landscape · 2 pipeline · 3 free motion · 4 cost · 5 memory · 6 bandit. Now I remove myself from the loop.

Data status (Part 7): real-now. Metrics refreshed from YouTube on 2026-06-12: 24 measured videos, 1,742 total views, 48 likes, +10 subscribers, $5.04 measured production spend, 7 wins, 6 losses, and 11 neutral results. Small-channel data is noisy, but it is real enough to grade the loop honestly.

Everything's built. Now delete the operator.

Here's where six parts leaves us. I can: turn an idea into a finished Short (Parts 2–3), for six cents (Part 4); remember every bet and score it against my own portfolio (Part 5); and decide what to make next with a bandit (Part 6). Each of those is a command I run. The last step is making me unnecessary — a scheduler that runs the whole cycle while I sleep.

There's exactly one thing that makes this hard, and it's not the AI. It's time.

The crux: measurement is deferred

A freshly published Short's metrics are meaningless. Views/day, retention, engagement — they don't stabilize for 48–72 hours. So you cannot write the loop as a straight-line script (ideate → produce → measure → learn), because between produce and measure there's a two-to-three-day wait, and during that wait the loop should be doing other useful things (producing the next bet, reflecting on older ones).

So the loop isn't a script. It's a state machine over time. Each tick, it asks one question: given the journal and the current time, what is the single most useful thing to do right now?

# studio/marketing/loop.py — the five possible actions
#   measure  →  one or more videos have matured; fetch their stats
#   learn    →  enough new measurements have accrued; reflect into strategy
#   ideate   →  the backlog is running low; generate fresh bets
#   produce  →  cadence allows another video; make the next backlog bet (budget-sized)
#   idle     →  nothing due (waiting on maturation or the produce cadence)

`plan()` — one tick, one decision

The whole engine is a pure function: plan(journal, now) → Plan. It returns the one due action, in priority order. Measuring matured videos comes first (that data unlocks everything else), then reflecting, then refilling the backlog, then producing:

class Plan(BaseModel):
    phase: str                      # cold-start | optimizing
    next: str = "idle"              # measure | learn | ideate | produce | idle
    measure_due: list[str] = []     # entry ids past the maturation window
    learn: bool = False             # enough new measurements to reflect?
    produce_entry: str = ""         # the bet to produce next (chosen by the bandit)
    produce_max_cost: float | None = None   # budget cap for that produce

And the core of the decision — note it's all time-driven off published_at and a few cadence knobs:

def plan(j, now=None):
    cfg = j.loop_config
    phase = "cold-start" if j.in_cold_start else "optimizing"

    # 1) measure: deployed videos past the maturation window, not yet measured
    measure_due = [e.id for e in j.deployed()
                   if _age_hours(e.published_at, now) >= cfg.maturation_hours]

    # 2) learn: count measurements newer than the last reflection
    new_measured = [e for e in j.measured()
                    if not j.last_learn_at or e.metrics.fetched_at > j.last_learn_at]
    learn = len(new_measured) >= cfg.learn_every

    # 3) produce/ideate cadence → pick the next bet with the bandit (Part 6)
    # ...priority: measure > learn > ideate (backlog low) > produce (cadence ok) > idle

The knobs are all in one config — maturation window, produce cadence, how often to reflect, when to refill the backlog:

class LoopConfig(BaseModel):
    maturation_hours: float = 60.0          # wait ~2.5 days before measuring
    min_hours_between_produces: float = 20.0  # ≈ 1 video/day
    daily_produce_cap: int = 2
    learn_every: int = 3                    # reflect after 3 new measurements
    backlog_min: int = 2                    # ideate when planned bets drop below this
    select: str = "bandit"                  # next-bet picker (Part 6)

The driver: `tick` and `autopilot`

plan() decides; two CLI commands act. studio marketing tick runs exactly one due action and exits — perfect for a cron job. studio marketing autopilot loops ticks for a session. Put tick on a schedule (cron, a systemd timer, /loop) and the channel runs itself:

# one cron line ≈ a self-running channel
0 */6 * * *  cd /path/to/slope-studio && studio marketing tick --channel pilot

Every 6 hours it wakes, asks plan() what's due, does that one thing — measure a matured video, reflect, ideate, or produce the next bandit-picked bet — and goes back to sleep. The deferred-measurement problem disappears because the state machine simply doesn't measure until published_at + maturation_hours, and spends the wait producing and reflecting instead.

A reproducibility detail that matters here: the bandit's RNG is seeded from journal state, so tick called twice in the same state makes the same decision. No double-producing, no races.

Even the channel setup is automated

One loose end: a new channel needs a brand. So that's a lego-block too — studio brand <spec.json> generates a full kit (banner, profile avatar, a transparent watermark logo, plus keywords and an SEO description) into runs/_brand/<slug>/. Text-free generated art, with the wordmark composited in Pillow's safe area. Zero-to-channel, including the identity, is scriptable.

Internal agents vs skill-based orchestration

There are two ways to run this kind of loop.

The first is internal agent orchestration: the system owns the whole state machine, calls its own tools, and treats every step as part of one product. That is what studio marketing tick does. It knows the journal schema, the maturation window, the budget config, and the next due action. It is tight, reproducible, and cron-friendly.

The second is skill-based orchestration: the same work is decomposed into portable operating instructions that any capable external LLM can follow — Claude, Codex, Gemini, or whatever agent shell you prefer. In that mode, the skill is the durable interface: measure this channel, learn from the journal, pick a bet, deploy it, report the result. The external model brings reasoning, writing, critique, and research; the CLI remains the deterministic I/O layer. That is less sealed than a pure internal agent, but more flexible: you can swap models, run the same marketing workflow from different agent environments, and keep the operational knowledge outside any one vendor's hidden prompt.

In practice I want both. The internal autopilot handles boring scheduled execution. The skills let a stronger external agent step in for strategy, critique, and one-off investigation without rewriting the studio.

The first thing autonomy taught me: cheap + automated = automated garbage

The day the loop ran end-to-end with no one watching, it published a video that was technically fine and still felt wrong. The frames moved. The narration lined up. The audio ducked under the voice. It had all the machinery from the first six parts.

But the story was weak.

Some early Shorts were raw in a way that only became obvious after watching a batch together: not enough concrete explanation, inconsistent emotional arc, pretty frames carrying a script that didn't quite earn the viewer's minute. I had spent six articles making production cheap, and the first lesson of autonomy was blunt: a cheap content machine can manufacture weak stories faster.

Effects are polish; content is the product. A bandit picks a good topic, but "topic" is not a script. The writing still has to deliver a real fact and a real feeling, and nothing in the pipeline was checking for that.

So I added a new stage between script and spend: a content critic (stages/critic.py). It's an LLM-as-judge that reads the scenario and scores it on four things before a cent goes to image or video generation:

# studio/models.py — the bar a scenario has to clear
CRITIC_CRITERIA = {
    "topic_revealed": "viewer comes away KNOWING the thing",
    "fact_explained": "a concrete fact/idea/event is STATED and EXPLAINED",
    "informative_interesting": "teaches something non-obvious with a curiosity gap",
    "emotional_payoff": "lands a clear emotion",
}

Each criterion returns pass/fail, a 1–5 score, one specific note, and revision_notes the writer can act on. The important part is not the prompt. It's where the prompt sits in control flow:

for attempt in range(retries + 1):
    verdict = critic(script)
    if verdict.passed:
        return script
    script = write_again(revision_notes=verdict.revision_notes)

The real code keeps the best-scoring attempt, caps retries with --critic-retries, and can either proceed-best (--critic on) or abort (--critic strict). No framework, no infinite loop, just a bounded script -> critic -> rewrite gate inside studio run. The headless cron inherits it by default, which is the entire point: the gate has to live where there is no human in the seat.

The rewrite that made the failure legible

The cleanest example was Fermat.

I had an older Short about Fermat's Last Theorem: the note in the margin that took 358 years to solve. It had the right ingredients — Fermat's taunt, Andrew Wiles, a famous unsolved problem — but the story was soft. It gestured at the myth more than it explained the hook.

The critic made the problem concrete:

{
  "fact_explained": "2/5 — Wiles' proof is mentioned, but the fix and concepts are not explained",
  "emotional_payoff": "2/5 — highlight Wiles' despair after the fatal flaw, then the triumph"
}

That is a useful failure. "Make it better" is vague. "State the equation, explain the 358-year gap, show the fatal hole, then land Wiles alone finding the fix" is executable.

So I re-made the Short with the same basic media path but a stronger scenario: Fermat's Last Theorem. The new narration opens with the actual equation shape, names the margin note, gives Wiles the seven-year attic beat, and spends the payoff on the near-collapse of the proof:

"In 1994 he unveiled the proof. Then a referee found a fatal hole in it. For a year it looked dead, until, alone, Wiles suddenly saw how to fix it."

The rewrite is still an early read, so I do not mix it into the mature cohort dashboard below. But the direction was not subtle:

Version	URL	Views	Likes	Cost	Result
softer story	`F3STKw8Nlr8`	12	0	$0.776	P29, neutral
critic-guided rewrite	`rozAXRztijQ`	119	3	$0.208	P92, win

Roughly 10x the views, some actual likes, and less money spent because the rewrite reused the cheap path instead of treating the whole thing as a fresh premium render. That's the kind of result I want from an eval: not an abstract "quality score," but a concrete edit that changes the video and the market response.

There is a second lesson hiding inside this one. An eval can expose weak output, but it cannot author the fix by itself. On another video, the critic made the writer-model choice obvious: The Universe Has No Edge failed with a cheap writer, then passed after switching to a stronger writer model. The cost floor and the quality floor live in different places. Keep visuals and motion cheap, but do not cheap out on the script when the whole video depends on it.

The results

I refreshed the YouTube measurements on 2026-06-12. The journal had 24 measured videos, 4 planned bets, 1,742 total views, 48 likes, 10 new subscribers, 0 comments, and $5.04 of measured production spend. The average measured cost was about $0.21 per video, with 7 wins, 11 neutral results, and 6 losses by the channel's own portfolio-relative scoring.

The top of the portfolio is not one format. That's the useful fact. The loop found wins in philosophy, physics, math, and even poetry, but the winners all had a sharper emotional or conceptual promise than the flops.

Rank	Video	Views	Likes	Retention	Cost	Percentile
1	Diogenes and the rich man's spotless palace	268	9	74.29%	$0.207	P100
2	Black hole information paradox	170	4	n/a	$0.234	P96
3	Fermat's Last Theorem	119	3	77.68%	$0.208	P92
4	Rubaiyat — Awake!	132	15	61.40%	$0.084	P88
5	The Universe Has No Edge	182	1	50.07%	$0.581	P83

The best result was not the most expensive one. Diogenes cost $0.207 and landed P100. Fermat's rewrite cost $0.208 and landed P92. Rubaiyat Awake cost $0.084 and landed P88. The signal is not "spend more." The signal is that the idea, story shape, and hook have to earn their minute before the pipeline spends anything.

There is also a weird failure mode I do not want to over-explain: some videos appear to get no initial push at all. A few more are near-zero after several days: Rabies has 2 views, Population of Italy has 4, and the first Galois version has 5. I cannot tell from this data whether that is a metadata problem, a topic problem, a batch-upload penalty, a Shorts distribution quirk, or simply YouTube deciding not to kick-start those uploads. The honest takeaway is that a small-channel autopilot is not only learning audience taste; it is also learning around platform distribution randomness.

That changes how I read losses. A video with 150 views and weak engagement is a content lesson. A video with 0 views is partly a distribution lesson. The loop can still score both, but the strategy should treat them differently: content critique for videos that got a chance, packaging and cadence experiments for videos that never entered the room.

The whole arc, in one breath

A faceless AI channel is a search problem. Make the unit cost trivial (free motion + right-sized models → six cents), record every video as a falsifiable bet with measured cost and portfolio-relative score, let a bandit exploit what wins while exploring the rest, and wrap it in a time-aware state machine that runs the cycle unattended. None of it needed a vector DB, a fine-tune, or a render farm — just boring architecture, honest cost accounting, and a willingness to let the data, not the ego, pick the next video.

What I'd tell another AI engineer

Takeaway: When an agent's feedback is delayed, don't model the workflow as a pipeline — model it as a state machine over time whose tick asks "what's the single most useful thing to do now?" Deferred reward (here, 48–72h of metric maturation) is the norm in real systems, not the exception; a plan(state, now) → one action function handles it cleanly, stays cron-friendly, and (seeded from state) stays reproducible. Automate the boring 90%, be loud about the 10% you can't, and let the loop compound.

That's the series. Zero to autopilot: a channel that writes, renders, publishes, scores, and decides — for cents, on a schedule. It's all open source; go break it, fork it, or beat it.

▶ Live effects gallery: dasein108.github.io/slope-studio
⭐ Star the repo: github.com/dasein108/slope-studio
🔔 Subscribe to watch the experiment continue: the channel

Zero to Autopilot, Part 6: A Thompson-Sampling Bandit That Picks the Next Video

Maksims Gavrilovs — Thu, 11 Jun 2026 14:31:53 +0000

Series: Zero to Autopilot — Building a Self-Improving AI Media Channel. Part 6 of 7. Part 5 gave the channel a memory. This part gives it a decision — the explore/exploit engine that picks what to make next.

Data status (Part 6): real-now (mechanism). The bandit, its math, and the real bets it's choosing among are shown below. Which arms won (the quantitative payoff) lands in Part 7, once the data matures.

The dilemma, made concrete

Part 5 ends with the channel remembering that the "heretic mathematician" format won big. So… just make that forever? No — that's how a channel flatlines. But chasing novelty every time throws away everything you learned. This is the explore/exploit dilemma, and for a small channel it bites hard: you have maybe one video a day of budget, so every pick is expensive. Over-exploit and you plateau; over-explore and you never compound.

The honest first version of this in my code was a fixed 60/40 split — 60% of the time make something like a known winner, 40% try something new. It works, but it's dumb in two specific ways:

It over-explores weak arms — a 40% explore rate keeps spending on themes that have already proven mediocre.
It's context-blind — it treats "make a winner" as one bucket, ignoring which features of past videos actually drove the wins.

A contextual bandit fixes both. But first, a phase gate.

Phase 1: cold-start (you have no baseline yet)

You can't run a bandit with zero data — and worse, on a brand-new channel even your "good" videos get tiny numbers, so absolute scores lie. So the channel runs a cold-start phase first: the first 10 deployed videos are pure exploration, deliberately spread across themes, with no winner/loser judgment at all.

@property
def in_cold_start(self) -> bool:
    return self.deployed_count < self.bootstrap_target   # default 10

Relative scoring (the portfolio percentile from Part 5) only unlocks once there are enough videos to be a portfolio. Until then: explore, gather, don't pretend you know anything. After that, the bandit takes over.

There's a hidden footgun here: the seed set teaches the bandit what the universe looks like. If the first ten videos are all the same shape, or all weak scripts, the posterior doesn't learn "audience taste" — it learns your bad sampling strategy. Cold-start needs varied but hooky seed videos: different themes, different emotional promises, different formats, each still a real falsifiable bet. You are not feeding it random content. You are giving it enough distinct arms that "exploit the winner" will mean something later.

Phase 2: a warm-started contextual Thompson bandit

Once there's a baseline, picking the next bet becomes a Thompson-sampling problem. Three design decisions make it fit this domain:

1. Context = what's knowable before production. A bet's features are its theme and tags. Not its effects or animators — those only exist after rendering, so they're a learning/attribution concern, not a selection signal.

def _features(e: Entry) -> list[tuple[str, str]]:
    """Selection context known at planning time: theme + tags."""
    feats = []
    if e.theme: feats.append(("theme", e.theme.strip().lower()))
    feats += [("tag", t.strip().lower()) for t in e.tags if t.strip()]
    return feats

2. Per-feature Beta-Bernoulli posteriors, warm-started from the channel's base rate. Each feature (theme:infinity, tag:heretic-format, …) gets its own Beta(α, β) win-probability posterior. The key trick: instead of an optimistic flat Beta(1,1) prior — which makes every brand-new arm look amazing and causes over-exploration — I warm-start the prior from the channel's actual base win rate, with a weak pseudo-count so real data dominates fast:

def posteriors(measured, prior_strength=2.0):
    base = _base_rate(_evidence(measured))                  # channel's actual win rate
    pa, pb = max(base*prior_strength, 0.5), max((1-base)*prior_strength, 0.5)
    stats = defaultdict(lambda: [pa, pb])                   # every feature starts here
    for e, win in _evidence(measured):
        for f in _features(e):
            stats[f][0 if win else 1] += 1.0               # +win → α, +loss → β
    return stats

A win on a feature pushes its α up; a loss pushes β up. Wins and losses are the relative outcomes from Part 5 — only measured, non-cold-start bets with a real percentile count as evidence.

3. Score a candidate by Thompson-sampling its features and averaging. For each planned bet, draw a sample from each of its features' posteriors and average them. Arms with little history have wide posteriors, so they sometimes draw high — that's exploration emerging naturally from the uncertainty, no explicit explore-rate knob needed:

def score(e, stats, prior, rng):
    feats = _features(e)
    if not feats: return rng.betavariate(*prior)
    samples = [rng.betavariate(*stats.get(f, prior)) for f in feats]
    return sum(samples) / len(samples)

def pick(planned, measured, ...):
    # highest Thompson draw wins; well-proven arms usually win,
    # but uncertain arms self-explore via their wide posteriors
    return rank(planned, measured, ...)[0]

A proven feature (tag:heretic-format with lots of wins) has a tight, high posterior and usually wins the draw — exploit. A fresh theme has a wide posterior and occasionally spikes — explore. The split is adaptive and per-feature, not a global 60/40.

One practical detail: pick() is stochastic (that's the whole point), but the caller passes a state-seeded RNG, so the same journal state yields the same pick. That matters because the autonomous driver Part 7 calls this from two places per cycle and they must agree.

Where do new candidates come from? `ideate`

The bandit chooses among planned bets — but something has to generate them, or it'd just reshuffle the same backlog. That's ideate: an LLM proposes new bets from three inputs — the learned Strategy, the most relevant past episodes (via recall from Part 5), and live trend signals gathered by web search:

# ideate.generate(): build the prompt from learned state + recalled winners + trends
query = " ".join([*j.strategy.next_seeds, *j.strategy.winning_patterns, ...])
episodes = memory.recall_block(j, query, k=6)   # the relevant past, not the recent past
# → LLM returns new bets: {idea, hook, assumption, goal, theme, tags}

So exploration isn't random either — it's informed exploration: new bets that rhyme with what's working and with what's currently trending, each still a falsifiable hypothesis. (No LLM key? It falls back to deterministic seeds from the strategy.)

The loop, end to end

Put together, the decision engine is a closed cycle:

ideate ──► backlog of planned bets (each: idea + hook + assumption + tags)
   ▲                │
   │                ▼
 learn        bandit.pick()  ── exploit proven theme+tags, explore uncertain ones
   ▲                │
   │                ▼
 measure ◄──── produce + publish  (the cheap pipeline from Parts 2–4)

One guard rail sits inside that produce step. The bandit picks what to make, but a topic isn't a script — so before any money is spent, the chosen bet's scenario passes through a content critic that can send it back for a rewrite if the writing is hollow. The bandit chooses the bet; the critic guards the execution. That gate is its own Part 7 story (it exists because the autopilot, unsupervised, shipped an uninformative video); here it's enough to know the loop won't spend on a good pick with a bad script.

And it's running on real bets right now. The journal's winning pattern — the heretic-mathematician format, tag:heretic-format — means the bandit favors arms carrying that feature, which is why the backlog filled with Cantor (infinity → asylum), Galois (algebra → fatal duel), Russell (one sentence breaks math), Gödel (math can't prove itself). Each is the same proven feature (heretic + tragedy + paradox) on a new theme (set theory, group theory, logic) — textbook exploit-the-feature-while-exploring-the-instance. The bandit didn't invent the format; the memory learned it and the bandit is pressing it, while leaving room for the occasional wildcard to keep finding new winners.

And those wildcards are real, not hypothetical. Alongside the math-mystery core, the loop has spent explore-picks on genuinely different lanes: deadpan academic humor ("how mathematicians catch a lion"), science-horror (a 100%-fatal-virus explainer), and a run of atmospheric Persian poetry. Each carries a theme+tags combination the posteriors had never seen, so their wide priors occasionally win the Thompson draw and buy a probe into fresh territory. Which of those probes hardened into new winning arms is the quantitative reveal I'm saving for Part 7 — the point here is that the exploration is informed and deliberate, emerging from each arm's uncertainty, not a blind 40% dice roll.

What I'd tell another AI engineer

Takeaway: A fixed explore/exploit split is a code smell — it's a constant where you want a posterior. Make exploration emerge from uncertainty: per-feature Beta-Bernoulli posteriors, Thompson-sampled, and the wide posteriors of under-tried arms self-explore for free. Two domain details earned their keep: warm-start the prior from your own base rate (a flat optimistic prior over-explores), and only use features knowable at decision time as context (everything else is post-hoc attribution). Seed the RNG from state so an autonomous caller is reproducible. The result is a picker with one honest knob (prior_strength) instead of a magic split.

Next — Part 7: Autopilot. Every piece now exists — cheap production, memory, scoring, a bandit, ideation. The finale wires them into a scheduler that runs the whole loop unattended (handling the 48–72h measurement wait), and — finally — reveals the real numbers: what the channel did, what the autonomous loop decided, and what actually worked.

▶ Live effects gallery: dasein108.github.io/slope-studio
⭐ Star the repo: github.com/dasein108/slope-studio
🔔 Subscribe to watch the experiment grow from zero: the Lobachevsky Short

Zero to Autopilot, Part 5: Teaching a YouTube Channel to Remember

Maksims Gavrilovs — Thu, 11 Jun 2026 14:22:32 +0000

Series: Zero to Autopilot — Building a Self-Improving AI Media Channel. Part 5 of 7. Part 1 landscape · Part 2 pipeline · Part 3 free motion · Part 4 cost collapse, which together turn an idea into a published Short for six cents. Now the back half: giving the channel a brain. This part is memory; Part 6 is deciding.

Data status (Part 5): real-now (qualitative). The memory architecture and the patterns it has already learned are real and shown below. The quantitative virality scores are defined here but reported with real numbers in Part 7, after the data matures (≥1 week).

Cheap content is a search problem

Here's where Part 4 leaves us: I can make a hundred videos for six bucks. That sounds great until you realize it just moves the hard problem. Making videos was never the bottleneck — knowing which videos to make is. A hundred random Shorts is a hundred coin flips. To make it a search, the channel needs to remember what it tried and what happened.

So I gave it a memory — and I modeled it on how human memory actually splits:

Semantic memory — the durable, generalized lessons ("tragic-genius math stories work").
Episodic memory — the specific events ("on June 3 I posted the Lobachevsky one and it hit 50×").
Retrieval — pulling the relevant episodes back up when facing a new decision.

In the code that's three pieces: a long-term Strategy, an episodic Entry[] ledger, and a recall() function. All of it lives in a per-channel journal (runs/_marketing/<channel>/journal.json + a human-readable .md).

Every video is a falsifiable bet

The unit of episodic memory is the Entry, and its most important design choice is that a video isn't just content — it's a hypothesis. Before anything renders, an entry states what it believes and how it'll be judged:

class Entry(BaseModel):
    idea: str
    hook: str = ""
    assumption: str = ""   # WHY we think this goes viral  ← the falsifiable claim
    goal: str = ""         # the target, e.g. ">=P75 virality vs the channel's portfolio"
    theme: str = ""
    tags: list[str] = []
    explore: bool = True   # an exploration bet, or exploiting a known winner?

These aren't hypothetical — here are three real entries from my channel's journal, each a stated bet:

{ "idea": "The Madman Who Counted Infinity: Cantor",
  "hook": "He proved some infinities are BIGGER than others — and it drove him to the asylum.",
  "assumption": "Counterintuitive 'sizes of infinity' + tragic-genius arc = the exact Lobachevsky formula that hit 50x.",
  "theme": "infinity / set theory", "tags": ["math-mystery","heretic-format","explore"] }

{ "idea": "The Equation Written the Night Before a Duel: Galois",
  "hook": "A 20-year-old invented modern algebra in one night — then died in a duel at dawn.",
  "assumption": "Ticking-clock tragedy + 'one night of genius' is an irresistible curiosity gap." }

{ "idea": "One Sentence That Destroyed All of Mathematics: Russell",
  "assumption": "'One sentence breaks everything' is a pure curiosity gap; paradoxes are trending." }

Writing the assumption down, before publishing is the whole trick. When the numbers land, I'm not asking "did it do well?" — I'm asking "was my stated assumption right?" That's the difference between a content diary and a science.

And it pays off most when a bet is wrong. I ran two videos in the same "deadpan academic humor" lane: one on the absurd, straight-faced "how mathematicians catch a lion," and one on relatable "which scientist are you?" lab-personality bait. The first landed; the second didn't. Because both assumptions were on the record, the lesson came out precise instead of vague: it isn't that "humor works," it's that the absurd, specific method is the hook and broad relatability is not. Two falsifiable bets turned a hunch into a rule the next idea inherits — which is exactly what the reflection step (below) writes down.

The entry also remembers how it was made

Each entry doesn't just record the bet and the outcome — it captures its own production telemetry, pulled from the run manifest (the measured-cost ledger from Part 2 finally pays a second dividend):

    cost_usd: float = 0.0          # measured $ to produce
    tier: str = ""                 # free | cheap | balanced | premium
    video_model: str = ""          # kling | ltx | … | kenburns
    animators: list[str] = []      # distinct animators across scenes
    effects: list[str] = []        # fx + atmosphere used
    n_scenes: int = 0

So later I can ask not just "do heretic-mathematician stories win?" but "do the Flux-Schnell, kinetic-heavy, 60-second ones win?" The memory spans content and craft.

That join turns out to matter more than I expected. Once cost, model, effects, music provider, SFX provider, and market outcome sit on the same row, the channel can ask craft questions too: did the $0.20 music bed actually earn its keep, or did a free synth drone do the job? Did the video win because of the topic, the sound, the animation style, or because the script finally had a real story? The first version of this was just "latest metrics." I later added age-bucket
snapshots — 1d, 3d, 7d, 14d, 30d — because comparing a one-day upload to a thirty-day upload is lying with extra steps. The real slice-and-compare receipts stay in Part 7; the important design point here is that the memory row is no longer just an idea log. It's the place where production choices meet market feedback.

Scoring virality — against yourself

When results come in, each entry gets a virality score. The composite is deliberately simple and weighted toward what "viral" actually feels like — velocity — while guarding against cheap reach that doesn't convert:

W_VELOCITY, W_RETENTION, W_ENGAGEMENT, W_SUBS = 0.5, 0.2, 0.2, 0.1

def virality(m):
    return (
        W_VELOCITY  * math.log10(m.velocity + 1)          # views/day, log-damped
        + W_RETENTION  * (m.retention or 0)/100
        + W_ENGAGEMENT * min(m.engagement * 20, 1.0)      # ~5% engagement saturates
        + W_SUBS       * min(subs_conv * 50, 1.0)         # ~2% sub-rate saturates
    )

But an absolute score is meaningless for a small channel — 800 views might be a smash or a dud depending on your baseline. So the score that decides anything is relative to the channel's own portfolio:

def relativize(scores):   # percentile rank within THIS channel's history
    return [round(100.0 * sum(s <= x for s in scores)/n, 1) for x in scores]

def outcome(percentile, cold_start):
    if cold_start:            return "cold-start"
    if percentile >= 75:      return "win"
    if percentile <= 25:      return "loss"
    return "neutral"

A video is a win if it lands in the top quartile of my own videos, a loss in the bottom quartile. Self-relative grading means the loop keeps working whether the channel does 50 views or 50,000 — it's always chasing better than my median, which is exactly what compounding growth needs. (The real percentile numbers go public in Part 7.)

Virality is the post-publish eval — a verdict from the market, after the fact. It turns out to have a mirror image: a pre-spend eval that judges a scenario before a cent is spent — a content critic that asks "does this script actually reveal a fact and land a feeling?" and reworks it if not. Two judges, two timings: one on the idea after the audience sees it, one on the script before the camera rolls. The pre-spend critic earns its own story in Part 7 — it exists because the autopilot, left alone, cheerfully published something hollow.

Recall: pulling up the relevant past

When the channel is about to decide what to make next (Part 6), it shouldn't reason from its entire history — it should pull the episodes relevant to the current direction. That's recall(), and I kept it deliberately dependency-free: relevance is lexical token-overlap, ties broken by virality, so a relevant winner outranks a relevant flop:

def recall(j, query, k=6):
    """Top-k measured episodes most relevant to `query`, best first.
    Ties broken by virality, so a relevant winner outranks a relevant flop."""
    q = _tokens(query)
    scored = [(_relevance(q, _episode_tokens(e)), e.virality or 0.0, e)
              for e in j.measured()]                 # only measured bets have a lesson
    scored.sort(key=lambda t: (t[0], t[1]), reverse=True)   # by relevance, then virality
    return [e for rel, _, e in scored[:k] if rel > 0.0]

The seam is intentional — you could swap in embeddings here — but lexical works, costs nothing, and runs offline. The default in this whole project is "free and local unless paying clearly wins."

Reflection: turning outcomes into strategy

The last piece closes the loop. After a few new videos are measured, a reflect() step feeds the scored bets to an LLM and asks it to update the long-term strategy — what's winning, what's losing, what to try next:

class Strategy(BaseModel):
    niche: str = ""
    current_direction: str = ""
    winning_patterns: list[str] = []
    losing_patterns: list[str] = []
    next_seeds: list[str] = []        # concrete idea seeds for the next ideation

This isn't aspirational — it's the actual current strategy in my channel's journal right now, rewritten by the LLM reflecting on real outcomes:

"niche": "math & physics mystery — rebels, paradoxes, forbidden knowledge (anime-noir visuals)",
"winning_patterns": [
  "Outsider-genius figures, mysticism, and high personal stakes (early death, divine inspiration) in math/physics",
  "Intellectual shock + curiosity gaps framed around 'everything breaking' or a foundational paradox",
  "Absurdist, deadpan academic humor rooted in one specific bizarre concept (mathematicians hunting a lion)",
  "Highly active, vivid, grand imagery in short poetic forms — not contemplative or melancholic ones"
],
"losing_patterns": [
  "Contemplative, melancholic, abstract poetry that lacks active imagery and a dramatic hook",
  "Pure science-horror missing the 'mystery / rebel / paradox' element central to the niche",
  "Generic 'relatable academic humor' that isn't rooted in a truly absurd, deadpan concept",
  "Historical mysteries lacking an immediate, shocking, or deeply personal angle"
]

The important thing isn't the list, it's that the list moved. The very first lesson this loop ever recorded was the cat-anatomy flop from Part 1: don't batch-dump near-identical clips (that series cannibalized itself at three-to-six views each). Everything above is what it has reflected its way toward since — through the math-hero winners, then a deliberate push outside the core into deadpan humor and a run of poetry reels. Look at that first losing pattern: "melancholic poetry that lacks active imagery." The loop learned that from my own poetry experiments underperforming, and wrote itself a rule about it. That's the system caught in the act of learning, not a strategy I typed in.

There's a heuristic fallback too (top and bottom performers by score) so reflection still works with no LLM key, but with one the lessons get sharper and feed straight back into the next idea. reflect() writes Strategy; ideation (Part 6) reads it. The snake eats its tail, and gets smarter each lap.

What I'd tell another AI engineer

Takeaway: If you want a system that improves, make every action a falsifiable bet recorded before the outcome — idea, the why, and the bar to clear. Split memory into durable strategy + an episodic ledger + cheap retrieval, mirror human memory, and score outcomes relative to the agent's own history so the loop is scale-invariant. Capture production telemetry alongside results so the agent can learn craft, not just content. None of this needs a vector DB or a fine-tune — a JSON ledger, a weighted score, token-overlap recall, and one reflection prompt already close the loop.

Next — Part 6: The Bandit. Memory tells the channel what worked; now it has to decide what to try next, balancing exploiting known winners against exploring new bets. I'll wire up a warm-started Thompson-sampling bandit over theme+tags — the actual explore/exploit engine that picks the next video.

Zero to Autopilot, Part 4: The Cost Collapse — $10.50 $0.06 per Video

Maksims Gavrilovs — Mon, 08 Jun 2026 14:37:26 +0000

Series: Zero to Autopilot — Building a Self-Improving AI Media Channel. Part 4 of 7. Part 1 landscape · Part 2 pipeline · Part 3 free motion. Now the headline number: how a video went from $10.50 to six cents.

Data status (Part 4): real-now. Every figure is a measured cost_usd from the manifest, not an estimate. Code is straight from the repo.

Where the money actually goes

After Part 3, motion is free — I animate stills in ffmpeg for $0. So a video's cost collapses to just two line items that can cost real money:

Images — one still per scene.
AI video — if and only if I choose to use it on a scene.

Everything else (script on a local LLM, narration on edge-TTS, stitching, muxing, publishing) is already $0. So the cost game is entirely about those two knobs. Let's turn them down without making slop.

Knob 1: the per-second video bomb

Recap of the villain from Part 1 — hosted AI image-to-video bills per second of output. The cost of one clip isn't a flat fee; it's duration × rate, snapped to the model's accepted duration grid:

# studio/providers/video.py
def estimate_cost(provider: str, model: str, seconds: float) -> float:
    spec = FAL_MODELS.get(model, FAL_MODELS["kling"])
    return round(_clip_dur(model, seconds) * spec["per_s"], 4)   # seconds × $/s

At kling's $0.07/s, a 150-second Short with AI video on every scene is ~$10.50. That was my first video. The fix isn't a cheaper model (though ltx at $0.04/s helps) — it's using AI video far more selectively, which I'll get to. First, the cheaper knob.

Knob 2: right-size the image model

I had been defaulting every image to Nano Banana ($0.039/img) — Google's Gemini 2.5 Flash Image. It's gorgeous and, crucially, supports character-reference consistency, which you want for photoreal or recurring-character content like my noir Kafka series:

But a goofy "why do cats have fur" explainer doesn't need photoreal noir. It needs clean flat cartoon — and for that, Flux Schnell at $0.003/megapixel (~half a cent an image) is perfect:

Same pipeline, one config change, ~8× cheaper images when the style allows. The lesson generalizes: don't pay for capabilities the scene doesn't use. Photoreal + character-ref? Nano Banana. Flat/graphic/cartoon? Flux. The system keeps both wired as image and image_cheap.

The tiers: one knob to set them all

Rather than fiddle providers per stage, I bundled the choices into four tiers. This is the whole config:

# studio/tiers.py
TIER_PRESETS = {
    "free":     {"image": "card",            "voice": "edge",       "strategy": "kenburns"},
    "cheap":    {"image": "fal-flux-schnell", "voice": "edge",      "strategy": "kenburns",
                 "sfx": "local", "music": "local"},
    "balanced": {"image": "fal-nanobanana",  "voice": "edge",       "strategy": "auto"},   # fill AI within budget
    "premium":  {"image": "fal-nanobanana",  "voice": "openai-tts", "strategy": "all"},    # AI every scene
}

And the resulting cost ladder for a 150s Short:

Tier	Images	Video strategy	~Cost / 150s	When
free	offline card	Ken-Burns	$0	wiring / drafts
cheap	Flux Schnell	Ken-Burns	~$0.06	budget volume
balanced	Nano Banana	`auto` (AI on hero scenes)	= your `--max-cost`	best per dollar
premium	Nano Banana	AI every scene	$6–10+	quality first

--tier sets everything; any --*-provider flag still overrides a single choice. The interesting one is balanced, because of how auto works.

`auto`: spend the budget where it matters

Most scenes are fine as a drifting still. A few — the hook, the climax, the outro — earn real AI motion. So auto is a tiny greedy knapsack: rank scenes by priority, then spend the AI budget on the highest-priority ones that fit, Ken-Burns the rest.

Priority is either explicitly set on a scene, or inferred by a hero heuristic:

# studio/stages/clips.py
def _effective_priority(scene, index, total):
    if scene.priority:        return float(scene.priority)
    if index == 0:            return 3.0    # the hook
    if index >= total - 2:    return 2.5    # outro / CTA
    # ...else an evenly-spread beat gets a mid priority

Then fill the budget greedily, highest priority first:

budget = max_cost if max_cost is not None else float("inf")
for i in sorted(range(n), key=lambda i: (_effective_priority(scenes[i], i, n), -i), reverse=True):
    c = video.estimate_cost("fal-i2v", model, scenes[i].duration_s)
    if spent + c <= budget:
        per_scene[scenes[i].id] = "fal-i2v"   # animate this one with AI
        spent += c
    # else: it stays Ken-Burns (free)

So --tier balanced --max-cost 1.50 means: "give me AI motion on the hook and a couple of key beats, free motion everywhere else, and never spend more than $1.50." You get the perceptual punch of AI video where viewers actually notice it, at a fraction of all-AI cost.

The pre-flight that refuses to overspend

Costs are estimated before a single API call. auto trims to fit; the rigid strategies (all/hybrid) abort if the estimate exceeds the budget rather than surprise you with a bill:

$ studio estimate lobachevsky --budget 3
  kling   150s → $10.50   ❌ over budget
  ltx     150s → $6.00    ❌ over budget
  auto    (fills $3.00)   ✅ AI on 6 hero scenes, Ken-Burns the rest

studio run defaults to --max-cost 3 and the clips stage won't blow past it. A running guard backstops the estimate in case a provider returns something unexpected. The golden rule from Part 2 pays off here: because every provider reports its real cost, the budget logic is exact, not hopeful.

The receipts

Same ~150s video, every tier, measured from the manifests:

Build	Images	Video	Sound	Total
premium (my first video)	Nano Banana	kling, every scene	—	~$10.50
balanced	Nano Banana ($0.585)	a few AI clips ($0.75)	—	$1.34
cheap (Nano + free motion)	Nano Banana	Ken-Burns	—	$0.585
cheap (Flux + free motion + AI SFX)	Flux ($0.054)	Ken-Burns	$0.0076	$0.06

$10.50 → $0.06. About a 175× cut, and the cheap version isn't a toy — it's a published Short with real narration, free motion, and atmosphere. The quality lever moved to art direction and pacing (free), not the size of the model bill.

A fair caveat, though: $0.06 is the floor — a deliberately minimal Short. Once I turn the art-direction layer all the way up — parallax with generated plates, atmosphere, a vintage grade, a few Nano-Banana hero stills where they earn it — a fully art-directed, near-premium video lands around $0.15–0.25. That's still 40–65× cheaper than the ~$10 all-AI cut, at quality I genuinely can't tell apart in a feed. So read this as a ladder, not a single number:

Build	~Cost	When
floor (minimal effects)	$0.06	volume, throwaway tests
fully effected, near-premium	~$0.15–0.25	the realistic everyday build
premium (AI video every scene)	~$10	almost never worth it

The honest anchor is that middle rung. "The $0.06 Short" is the hook; "a great-looking Short for a quarter" is the number I actually run on.

A field update: what the catalog actually cost

I wrote that ladder as a forecast. Since then I've built a real back-catalog, so I can replace the forecast with the receipts — and the receipts are blunter than I expected. Across the dated runs in the repo, the median cost is well under a cent, and the cheapest published Shorts — full 60-second explainers with narration and free motion — measured $0.006. That's a tenth of the $0.06 I just called the floor. The real floor turned out an order of magnitude lower:

Real video (measured from its manifest)	What it used	Cost
Chandrasekhar (60s)	1 Flux still, free motion, edge-TTS	$0.006
Gödel, "math can't prove itself" (60s)	2 Flux stills, free motion, edge-TTS	$0.012
Galois, "the duel"	Nano stills + a little AI SFX	$0.18
Rabies (60s)	5 Nano stills + SFX + a music bed	$0.41
Fermat, "the margin note"	Nano stills + `ltx` AI clips + music	$0.78

What moves the needle is never the script or the motion — those are free in every row. It's exactly three opt-in knobs: Nano stills instead of Flux (about $0.14–0.20 a video), the paid audio layer (AI SFX plus a stable-audio music bed, about $0.20), and any AI video clips (ltx at $0.40 a hero beat). Turn all three off and you land at a sixth of a cent. Turn all three on and you're still under a dollar. The only way back to a $10 video is AI motion on every scene, which — as the receipts above keep saying — you almost never should.

The one line item I never cut: sound

Cost-optimizing sounds like "cut everything," but the real skill is knowing what punches above its price — and then keeping it. The audio layer is the clearest case. AI sound effects plus a music bed run about $0.0076 to $0.20 a video, rounding error next to the image and video knobs, and they do more for perceived quality than anything else on the list.

The reason is that sound doesn't just decorate the picture — it cues the viewer's imagination to render the rest. A gust of wind, a distant bell, a low cello under a line of narration: the still shows a single frozen frame, but the soundscape makes the mind supply the motion, the depth, and the room the scene lives in. A fuller "video" plays out in the viewer's head that the image never actually contained. A real share of the production value a viewer feels is happening behind their own eyes, prompted by a few cents of audio.

So when I trim cost, sound is the last thing to go, and usually it never does. It's the highest return-on-investment line in the whole pipeline: pennies for atmosphere and liveness you can't buy any other way. "Right-size the spend" cuts both directions — kill the costs that don't earn their keep, and protect the cheap ones that punch far above their weight.

And cheaper actually wins

That last claim isn't theoretical. My most expensive video was the premium Lobachevsky cut — AI video on every scene, ~$10.50, hours of fussing. One of my cheapest real bets was Ramanujan: 8 Nano-Banana stills, free ffmpeg motion plus a sliver of cheap ltx on the hero beats, $0.65 measured, start to finish in about an hour:

🎬 Ramanujan: Math's Divine Genius → youtube.com/shorts/rsk8XruZWBQ

The 65-cent video outperformed the ten-dollar one. (Full numbers land in Part 7, per the series' data policy — but the direction is already unambiguous.) That's the empirical version of the whole argument: once free motion clears the "doesn't look like slop" bar, extra dollars buy shockingly little. Production quality is barely a success factor — the hook, the subject, and the story are. So the right move is to floor the cost and spend your real effort on which videos to make.

And you don't have to take my word that this scales. Channels like Cuentos de la Choza — Spanish folklore and horror tales — sit at 400k+ subscribers across 1,200+ videos, built on AI-generated stills, narration, and simple motion. Sit with that catalog size for a second: at 1,200 videos, nobody is paying per-second for AI video on every scene. The unit economics simply don't allow it. The "post at volume" play and the "drive cost to the floor" play are the same play — which is the entire reason the rest of this series exists.

Why this is the whole ballgame

A $10 video is a precious artifact you agonize over. A six-cent video is an experiment. At six cents, a hundred attempts costs six dollars — so I can stop guessing what works and start measuring it. Cheap unit cost is what turns "make content" into "run a search over content."

Which raises the obvious question: if I can cheaply make hundreds of videos, which hundreds should I make? That needs a brain — a memory of what worked and a way to decide what to try next. That's the back half of this series.

What I'd tell another AI engineer

Takeaway: Cost-optimize by removing capabilities you aren't using, not by buying the cheapest everything. Free motion killed the per-second video bill; right-sizing the image model (photoreal vs flat) cut images ~8×; an auto strategy spends the remaining budget only on the scenes that perceptually earn it; and a pre-flight estimate makes the cap exact. The payoff isn't the saved dollars — it's that a cheap-enough unit cost converts a craft into a search, which is the only thing that makes the learning loop (next) affordable.

Next — Part 5: Memory & Self-Reflection. Now that videos are cheap, the channel needs to remember. I'll build the per-channel journal — a long-term strategy plus an episodic ledger of every bet, with virality scoring and an LLM reflection step that turns measured outcomes into an updated game plan.

Zero to Autopilot, Part 3: Giving a Still Image Real Motion for $0.00

Maksims Gavrilovs — Mon, 08 Jun 2026 01:13:35 +0000

Series: Zero to Autopilot — Building a Self-Improving AI Media Channel. Part 3 of 7. Part 1 was the landscape and my $10 wake-up call; Part 2 was the 7-stage pipeline. This one is the engineering centerpiece: replacing paid AI video with free motion.

Data status: real-now — real ffmpeg filtergraphs from the repo. Every effect here is playing in the live gallery (dasein108.github.io/slope-studio); code is open source.

Viewers don't need generated video. They need motion.

The recap from Part 1 is one line of arithmetic: hosted AI image-to-video bills per second — kling at $0.07/s makes a 150-second Short cost about $10.50. Fine for a single hero shot; absurd as the default for every scene when the whole strategy depends on making hundreds of cheap experiments.

But viewers were never asking for generated video. They want the feeling of motion: a still that drifts, breathes, and cuts on the beat holds attention perfectly well. I'd internalized this years ago shipping indie games, where the entire craft is faking expensive things with cheap math — no budget for a particle artist, so you write a particle system; no budget for animation, so you parallax-scroll a few layers and call it atmosphere. The same instinct ports straight to AI media. Everything below is one still image, ffmpeg, and zero dollars.

ffmpeg is the whole trick: an effect is a string

The quiet hero here is ffmpeg. It ships with roughly 400 built-in filters, and an "effect" is just a few of them chained with commas — no render engine, no GPU shaders, no SDK, no per-call cost. One binary you already have. Every motion in this series is an ffmpeg filtergraph, which means adding an effect is adding a string.

Here is the entire implementation of oldfilm, the vintage look:

"[0:v]colorchannelmixer=.393:.769:.189:0:.349:.686:.168:0:.272:.534:.131,"  # → sepia
"eq=contrast=1.12:saturation=0.82:brightness='0.035*sin(27*t)+0.025*sin(11*t)',"  # flicker
"noise=alls=22:allf=t,"     # film grain, re-rolled every frame
"vignette=PI/4[v]"          # darkened corners

Read it like a Unix pipe; each comma is "then":

colorchannelmixer — a 3×3 RGB matrix that maps the image to a sepia tone.
eq=…brightness='…sin(t)…' — t is the frame's timestamp, so brightness wobbles over time: the projector-gate flicker. Time expressions are what make an effect animate — sin(t) here, a creeping zoom in Ken-Burns next.
noise=allf=t — f=t re-randomizes the grain every frame, so it shimmers instead of sitting frozen.
vignette=PI/4 — darken the corners.

Four stock filters, one string, and it moves. A glitch is rgbashift + noise; chromatic aberration is just rgbashift; rain is a particle layer composited with overlay. The reason this channel can afford hundreds of videos isn't a cheaper model — it's that the effect budget is a text editor and ffmpeg -filter_complex.

The effect families

That one binary buys a whole vocabulary. The catalog sorts into a handful of families, each answering a different question — what does this scene need?

Camera motion — kenburns, motion-drift{left,right,up,down}, motion-zoom{in,out}, pulse. The cheapest possible life: a still pans, drifts, breathes. The default for most scenes.
Depth — parallax, blurred-parallax. Real 2.5D: the foreground subject holds still while the background drifts behind it. For scenery with a clear subject.
Kinetic type — kinetic. Emphasis: a headline slides in over the shot. For the hook or a key stat, not every scene.
Atmosphere — rain, snow, fog, embers, blood, petals, leaves, wind. Mood and a sense of place — the emotional weather, composited for free.
Colour & look grades — grain, vignette, oldfilm, sunrise, sunset, godrays, chroma. Tone and era. This family does the most to separate intentional from slop: grain and a vignette alone (the cover image is one still run through six of these) read as "graded by someone who cares."
Impact — flash[-white/-yellow/-red/-black], blood. A 2–3 frame punch for an action beat. Rare by design.
Characters — puppet (a cutout figure that hops or nods), talkinghead (Rhubarb lip-sync). A figure that acts or speaks, with no avatar model.
Vector — manim. Literal concept and maths visualization, 3Blue1Brown-style. The education power tool (and the one I haven't tamed — more below).
Transitions — cut, fade, dissolve, wipeleft, slideup, slice. Rhythm: how one scene becomes the next.

They're all the same idea underneath — a filtergraph string — so the rest of this piece takes apart the three most interesting ones.

How the motion is wired

Each scene names an animator, and one dispatch function routes to the implementation. The important property is the last line: anything that fails falls back to Ken-Burns and records why in the manifest, so a missing optional dependency degrades the look instead of breaking the render.

# studio/animate.py
a = (animator or "kenburns").strip()
if a == "kenburns" or a == "":   ffmpeg.ken_burns(image, dst, seconds)
elif a.startswith("motion-"):    ffmpeg.motion(image, dst, seconds, preset=a.split("-", 1)[1])
elif a == "kinetic":             return _kinetic(scene, image, dst, seconds)
elif a == "parallax":            return _parallax(scene, image, dst, seconds)
elif a == "slice":               return _slice(scene, image, dst, seconds)
elif a == "puppet":              return _puppet(scene, image, dst, seconds)
elif a == "talkinghead":         return _talkinghead(scene, image, dst, seconds, audio)
elif a == "manim":               return _manim(scene, dst, seconds)

The workhorse, Ken-Burns, is a single zoompan expression — over-scale the source 2× first so the crop never reaches an edge:

# studio/ffmpeg.py — ken_burns()
vf = (f"crop={w*2}:{h*2},"
      f"zoompan=z='min(zoom+0.0012,1.12)':d={frames}:s={w}x{h}:fps={fps}:"
      f"x='iw/2-(iw/zoom/2)':y='ih/2-(ih/zoom/2)'")

z='min(zoom+0.0012,1.12)' creeps the zoom in a hair per frame, capped at 1.12×. The motion-* presets are the same machine with different z/x/y expressions — a whole family of movement from one filtergraph.

Parallax, the one effect ffmpeg can't do alone

Parallax — hold the subject still, drift the background behind it for depth — is the exception to "an effect is a string." ffmpeg can composite layers but it can't find a subject, so this one needs a small, very indie-dev hack first: rembg cuts the subject (the static foreground), Python builds a clean background plane, and only then does ffmpeg drift the back and overlay the front.

The "clean background" is the whole problem. The naive version drifts the original still behind the cutout — but that still already contains the subject, so you get a creepy ghost twin smearing across the back. The fix is to give ffmpeg a background that's complete behind the subject, two ways:

Inpaint it out of the same image (default) — a free blur-diffusion fill: repeatedly blur, then re-stamp the known pixels so the subject's hole heals with its surroundings.
Generate a separate plate — re-prompt the scene without the subject (--parallax-plates, +1 still). Cleaner, no inpaint guesswork.

# studio/animate.py — _inpaint_subject() (heal the subject's hole)
for _ in range(iters):
    blurred = bg.filter(ImageFilter.GaussianBlur(radius))
    bg = Image.composite(bg, blurred, subject_mask)  # keep outside, heal inside

There's also a cheaper third option that embraces the twin: blur the drifting plane hard so the duplicate melts into soft bokeh (blurred-parallax) — on busy backgrounds it reads as dreamy depth-of-field rather than a brittle cutout. A bug turned into a second legitimate look.

Text, and the font library that wasn't there

Kinetic type slides a headline in over a gently pulsing still. The text is rendered by Pillow into a transparent PNG and overlay-ed with an animated y so it rises into place:

# headline rises and settles over the first 0.6s
over = "[bg][t]overlay=x=(W-w)/2:y='H*0.18 - 50*min(t/0.6,1)':format=auto[v]"

Why Pillow and not ffmpeg's drawtext? Because the box this renders on has an ffmpeg built without libfreetype and without libass — so drawtext and subtitles= both simply fail. Rather than fight the build, I render all text — headlines and burned caption strips alike — as Pillow PNGs and overlay them. The constraint forced a more portable design that happens to give pixel-perfect typographic control.

Choosing the effect: the model proposes, code constrains

A library this size is worthless if every scene defaults to Ken-Burns — which is exactly where this started. So a small art-direction layer (studio/artdirect.py) decides, with a deliberately hybrid policy:

The script model proposes a per-scene animator / atmosphere / fx / transition, choosing from a documented menu in its prompt, so the picks match the scene's mood — a duel gets embers and a red flash; a memory gets oldfilm; a landscape gets parallax.
A deterministic pass then constrains it: it validates the names, fills anything the model skipped with position and keyword heuristics (hook → kinetic, scenery → parallax), and applies taste caps — a flash is an impact, so it survives on at most one scene; a single atmosphere can't blanket the whole video.

"Model proposes, code constrains" recurs throughout this project; it's a good default whenever you want a model's judgement without its inconsistency. And because the same pass runs on the keyless stub path, every video gets real art direction instead of a wall of identical pans.

One concrete payoff: cheap punctuation for violence without gore (which also keeps the image model's content filter happy). A red flash on the cut plus a blood overlay, a few frames total — the viewer's mind fills in the rest, the narration carries the meaning, and it costs nothing.

The one I haven't cracked: manim

Manim, the engine behind 3Blue1Brown, is the most promising tool here and the least solved. True vector animation — a circle morphing into a square, a graph plotting itself, an equation transforming term by term — is close to a cheat code for an educational channel, rendered crisp for $0. A scene can carry a manim_code field the model writes, and the pipeline renders it.

The catch is getting a model to author good, literal, compiling manim on demand. It reaches for abstract moving lines when what sells is the literal shape; the code is indentation-sensitive; and a meaningful fraction of generated scenes fail and fall back to Ken-Burns. For now it's hand-authored for hero beats, not trusted to the loop — the single biggest unlock left for the educational side, and squarely on the roadmap. If you've cracked LLM→manim, I genuinely want to hear it.

And the ears

Visual motion is only half of "not slop"; a silent Short feels dead. So there's a matching audio layer — AI-generated sound effects plus a music bed ducked under the narration via sidechain compression (the voice always wins; the bed sits at −24 dB). On one Short that entire layer cost $0.0076. The "make it feel produced" budget, picture and sound together, rounds to zero.

The road not taken: self-hosting the video model

There's a tempting middle path I should address, because every engineer asks it: the video models are open-weight now — why not run one locally and get real AI video for free too? I have a MacBook M4 with 36 GB of unified memory, so I wired a local ComfyUI + Wan 2.2 5B backend into the pipeline as a local-i2v provider and found out. Short version: it works, it's free, and it's a draft-tier toy you should keep out of your render path.

The log, honestly:

fp8 weights are broken on Apple's MPS backend — they load and produce NaN. So everything is GGUF-quantized (Wan 5B at Q4 ≈ 3.4 GB, plus a ~3.6 GB text encoder).
The full-precision version (~22 GB resident) plus the video VAE-decode spike blew past physical RAM, and because MPS has no real offload, macOS swapped and hung the whole machine — not the process, the OS. The fix is a PyTorch MPS watermark cap so a runaway allocation kills the process cleanly instead.
Even stable, it's slow: a 2-second clip took about 15 minutes, and per-step time accelerates off a cliff once memory pressure starts evicting.
And it improvises. On the Persian-miniature still below, Wan added genuine motion — then warped the ornate border and invented a hooded figure that wasn't in the source.

Set that against the hosted option — kling renders a 6-second hero clip in under a minute for about 42 cents — and "free" local generation costs you 15+ minutes, a fragile machine, and a worse result. Free isn't free when it's measured in wall-clock. So the verdict loops right back to this article's thesis: free ffmpeg motion for the overwhelming majority of scenes, a few cents of hosted video for the rare hero shot, and if you must run local, cap it to 1–2 seconds of motion on one or two scenes and Ken-Burns the rest. It stays in the repo as a draft-tier provider — glad I tried it, glad I didn't ship it.

That last pattern is exactly how I built this 55-second Rubaiyat reel: two of its four scenes got ~2 seconds of local Wan motion (then hold the last frame for the rest of the line), the other two are pure Ken Burns — total video-generation cost, $0. It's the honest sweet spot for local i2v on a Mac: a brief breath of real generated motion where it counts, free camera motion everywhere else.

What I'd tell another AI engineer

Before paying a generative model, ask what the viewer actually needs — usually the perception of motion and intention, not literally generated video. A zoompan expression, a parallax composite, a grain overlay, and a ducked music bed deliver that for nothing, and the indie-game-dev instinct (fake the expensive thing with cheap math) ports directly to AI media. Route every effect through one module, give each a graceful fallback, and the pipeline gets cheaper and sturdier at once.

Next — Part 4: The Cost Collapse, $10 → $0.06. With motion free, the full cost model: per-second video math, right-sizing the image model (Nano Banana vs Flux Schnell), the tier system, the auto strategy that spends only on hero scenes, and the --max-cost pre-flight that refuses to overspend.

▶ Live effects gallery: dasein108.github.io/slope-studio
⭐ Star the repo: github.com/dasein108/slope-studio
🔔 Subscribe to watch the experiment grow from zero: the channel

Zero to Autopilot, Part 2: One Line of Text a Published Short, in 7 Stages

Maksims Gavrilovs — Sat, 06 Jun 2026 06:17:19 +0000

Series: Zero to Autopilot — Building a Self-Improving AI Media Channel. Part 2 of 7. Part 1 covered the landscape and my $10 wake-up call. This one is the architecture: how a single line of text becomes an uploaded Short without me ever opening a video editor.

Data status (Part 2): real-now. Code, file layout, and measured costs straight from the repo. No audience metrics — those are sandbagged to Part 7.

⭐ The whole thing is open source: github.com/dasein108/slope-studio. Clone along — there's a zero-API-key smoke test at the bottom.

The mental model: a video is a Makefile

Most "AI video generator" tools are a single monolith — one giant button, one black box, and when scene 14 comes out cursed you get to regenerate all 14. I've shipped enough software to know that's the wrong shape.

So I stole the model from build systems: a video is a directed pipeline of stages, each stage is a pure function from files to files, and the whole thing is idempotent. Re-run a stage, it skips work that's already done. Blow away one artifact, only that stage (and its dependents) rebuild. It's make with a YouTube upload at the end.

Here's the pipeline, top to bottom:

 idea ──► [1 script] ──► 01_script.json        (timed scenes + narration)
            │
            ├──► [2 visuals] ──► 02_visuals/scene_NN.png
            │
            ├──► [2.5 narrate] ─► 05_voice/scenes/*.mp3 + timing.json + captions.srt
            │
            ├──► [3 clips] ────► 03_clips/scene_NN.mp4   (animate the stills)
            │
            ├──► [4 stitch] ───► 04_stitched.mp4         (transitions, no audio)
            │
            ├──► [5 voice] ────► 05_voice/final.mp4      (TTS + music muxed)
            │
            ├──► [6 save] ─────► 06_final.mp4            (platform master)
            │
            └──► [7 publish] ──► YouTube

Every arrow writes a file. Every file lives under one run directory. Which brings us to the most important design decision in the whole project.

Everything is a file under `runs/<id>/`

No database. No hidden state. One run = one directory, and the directory is the state:

runs/lobachevsky/
├── project.json          # the manifest: provider + cost + done-flag per stage
├── 01_script.json        # scenes, narration, title, hashtags
├── 02_visuals/scene_01..15.png
├── 03_clips/scene_NN.mp4
├── 04_stitched.mp4
├── 05_voice/
│   ├── scenes/*.mp3       # per-scene TTS
│   ├── timing.json        # per-scene durations (drives clip lengths)
│   ├── captions.srt
│   └── final.mp4
├── 06_final.mp4          # the master you upload
├── 06_final.json         # SEO title/description/tags
└── 07_publish.json       # the YouTube video id, once live

This sounds almost too simple, but it buys you everything:

Debuggability — something looks off? Open the PNG. Read the JSON. No "inspect the pipeline state" tooling needed; ls and an image viewer are the debugger.
Resumability — kill the process at scene 9, restart, it picks up at scene 9.
Idempotency — stages check for their own output and skip it. Re-running visuals won't re-bill you for 15 images you already have (--force when you actually want to regenerate).
Version control of *artifacts* — every authored video in the repo is a folder you can diff, copy, or hand-edit.

Canonical paths live in exactly one place (studio/paths.py), so no stage ever hardcodes a filename:

def scene_image(d: Path, sid: int) -> Path:
    return visuals_dir(d) / f"scene_{sid:02d}.png"

def master(d: Path) -> Path:
    return d / "06_final.mp4"

Each stage is a CLI subcommand (and they chain)

The pipeline is a Typer app. Every stage is its own subcommand, so you can run the whole thing or surgically poke one stage:

# the whole pipeline, one idea in, one Short out:
studio run "lobachevsky geometry explained in a fun way" --duration 150

# or drive it stage by stage and inspect between steps:
RID=$(studio init "lobachevsky..." --duration 150)
studio script  $RID     # → 01_script.json   (read it! confirm the narration is real)
studio visuals $RID     # → 02_visuals/*.png
studio status  $RID     # render the manifest: what's done, what it cost

The stage order is one list, and run just walks it:

STAGE_ORDER = ["script", "visuals", "narrate", "clips", "stitch", "audio", "voice", "save"]

Adding a stage = write a pure function in stages/, add a subcommand, drop its name in that list. Adding a provider (a new image model, a new TTS) doesn't touch the pipeline at all — more on that next.

The provider contract: every model reports its own cost

Here's the design choice I'm proudest of, because it's what makes the whole rest of the series possible. Every media-producing provider — every LLM, image model, video model, TTS — returns the same dataclass:

@dataclass
class GenResult:
    path: Path | None = None
    cost_usd: float = 0.0     # the REAL cost, computed by the provider
    latency_s: float = 0.0
    provider: str = ""
    note: str = ""

That cost_usd is not an estimate I jotted in a spreadsheet. The Nano Banana provider returns $0.039. The kling provider computes seconds × $0.07. The Ken-Burns animator returns $0.00. So when a stage runs, the manifest records measured cost, not guessed:

class StageRecord(BaseModel):
    done: bool = False
    provider: str = ""
    cost_usd: float = 0.0

class Manifest(BaseModel):
    # ...
    def total_cost_usd(self) -> float:
        return round(sum(s.cost_usd for s in self.stages.values()), 4)

This is the foundation. You can't optimize what you don't measure, and you definitely can't put a budget-aware bandit (Part 6) on top of costs you're guessing at. Every dollar in this series is a real dollar the system reported on itself.

Six small LLMs, not one big one

A thing worth flagging early, because it shapes the whole design: there is no single "AI" in this system. There are six narrow LLM jobs, each doing one small thing, each with a deterministic fallback so the pipeline runs with zero API keys. Where each call sits:

idea
 └─► [scriptwriter LLM] ──► timed scenes + narration
        └─► [art-director LLM] picks each scene's motion + look (animator, fx, atmosphere)
              └─► [vision LLM] locates a face's mouth for lip-sync (only on talkinghead)
 visuals → clips → stitch → voice → save
        └─► [SEO LLM] polishes title / description / tags before publish
 (growth loop)
   [ideator LLM] next falsifiable bet (+ web-search trends) → produce → measure →
   [reflector LLM] turns measured results into an updated strategy ─┘

Role	Where	Job	Fallback (keyless)
Scriptwriter	`stages/script.py`	idea → timed scenes + narration	offline `stub` split
Art director	`artdirect.py`	pick per-scene animator / fx / atmosphere / transition	heuristic rules
Vision / mouth locator	`animate._detect_mouth`	find a face's mouth (pos + size) for lip-sync	explicit coords / default
SEO metadata	`stages/metadata.py`	polish title / description / tags	script-derived
Ideator	`marketing/ideate.py`	next viral bet + trend signals	strategy seeds
Reflector	`marketing/learn.py`	measured bets → updated strategy	top/bottom heuristic

And, deliberately, the parts that must be reproducible and auditable are not LLMs: the explore/exploit bandit (Part 6) is plain Thompson sampling, and virality scoring (Part 5) is a fixed formula. LLMs write and judge taste; statistics make the decisions. Keeping that line clean is most of what makes the system debuggable.

Watching it actually run

Here's the real log from the Lobachevsky run — note each stage announcing its provider and cost as it goes:

» visuals
visuals 15 images via fal-nanobanana  $0.585
» clips
clips 15 clips via fal-i2v  $0.75
» stitch
stitch 15 clips
» voice
voice captions=burn via edge  $0.0
» save
save runs/lobachevsky/06_final.mp4
done lobachevsky  total $1.335

Fifteen stills, fifteen animated clips, narration, captions, muxed and mastered — $1.34, fully automated, from one line of text. (That run used a bit of paid AI video; the all-Ken-Burns version of the same Short is $0.585, and the cheap-tier playbook from Part 1 gets a similar video to six cents. The cost knobs are Part 4.) Here's a frame from the finished thing:

And the data shape underneath each scene — the script stage emits timed scenes the rest of the pipeline consumes:

// 01_script.json (one scene)
{
  "id": 1,
  "start_s": 0, "end_s": 8,
  "narration": "What if everything you were taught about parallel lines was secretly a lie?",
  "visual_prompt": "railroad tracks vanishing toward a glowing question mark, retro poster",
  "on_screen_text": "...a lie?",
  "motion_hint": "slow push-in toward the vanishing point"
}

narration drives the TTS (and therefore the clip length — audio leads, video follows, so nothing ever desyncs). visual_prompt drives the image model. motion_hint drives the free animator. One JSON object, three downstream stages.

Try it yourself (zero API keys, zero dollars)

The repo ships an offline mode so you can watch the whole pipeline run without a single key or cent. Stub providers stand in for the paid ones; everything else is real ffmpeg:

git clone https://github.com/dasein108/slope-studio
cd slope-studio
uv venv && source .venv/bin/activate
uv pip install -e ".[fal]"

# free, offline, end-to-end smoke test:
studio run "how black holes bend time" --duration 12 \
  --script-provider stub --image-provider stub \
  --video-provider kenburns --voice-provider edge

You'll get a real runs/<id>/ folder with a stitched, narrated 06_final.mp4 — built entirely from free local tooling. (Heads up: stub is a wiring generator — it emits placeholder text so you can test the plumbing. Swap in a real LLM key before you spend money on visuals, or you'll lovingly render meaningless filler. Ask me how I know.)

What I'd tell another AI engineer

Takeaway: Resist the monolith. Model your AI pipeline as stages of pure file-to-file functions over a single run directory, make each one an independently runnable command, and give every provider a uniform result type that reports its own cost. You get free debuggability (ls is your inspector), free resumability, free idempotency, and — crucially — a measured cost ledger that everything smarter you build later (budgets, auto-strategies, bandits) gets to stand on. Boring architecture is a feature.

Next — Part 3: Free Motion. The fun part. AI video is $0.07/second; I'm going to take a single still image and give it real motion — drift, parallax with subject inpainting, kinetic type, atmospheric rain and embers — for $0.00, with a deep dive into the ffmpeg filtergraphs and the indie-game-dev tricks behind them. (Spoiler: it's all already running in the live effects gallery.)

▶ Live effects gallery: dasein108.github.io/slope-studio
⭐ Star the repo to follow along: github.com/dasein108/slope-studio
🔔 Subscribe to the channel to watch the experiment grow from zero: the Lobachevsky Short

Zero to Autopilot, Part 1: I Built an AI That Runs a YouTube Channel (the landscape, and my $10 wake-up call)

Maksims Gavrilovs — Fri, 05 Jun 2026 14:59:45 +0000

Series: Zero to Autopilot — Building a Self-Improving AI Media Channel. Part 1 of 7. I'm an AI engineer and this is the full build log of an autonomous AI short-video channel — one that writes, renders, publishes, and decides what to make next, then grades its own homework. No face, no film crew, no me clicking "upload" at midnight.

Data status (Part 1): real-now. Everything below is code, costs, and public facts I can verify today. The juicy audience metrics from my own channel are sandbagged until Part 7, so they have time to become real instead of noise.

The two-billion-view problem

Late 2025, a channel called Bandar Apna Dost crossed ~2 billion views and an estimated $4.25M/year (~₹38 crore). Its content? Short AI clips of a monkey and a Hulk-ish dude. No dialogue. No plot. No discernible reason to exist. (techlusive, Business Standard)

Cue every dev's reaction: "...I have a GPU and zero shame, how hard can this be?"

Pretty hard, actually — because here's the part the get-rich-quick threads leave out. A few months later YouTube's "AI slop" crackdown nuked an estimated 4.7 billion views across 16 channels, ~35M subs, and nearly $10M in revenue. Among the bodies: Three Minute Wisdom, a ~1.7M-sub / ~2B-view faceless AI channel, most of its catalog vaporized. (OutlierKit, Miraflow)

So the lay of the land in mid-2026:

Faceless AI video is a real, monetizable category. Billions of views, real revenue, nobody's face required.
It's also a ban speedrun if you ship slop. The platforms are now actively rm -rf-ing low-effort content at scale.

I looked at that and saw a clean engineering problem with two non-negotiable constraints: don't make slop, and don't go broke making it. This series is me brute-forcing both.

Why "faceless" is catnip for an engineer

Faceless means narration + visuals do all the work. No on-camera talent, no lighting rig, no "can you do Tuesday?" Every input is a file that an LLM or a model can spit out. Which means the whole thing is programmable — and anything programmable can be measured, costed, and (eventually) left to run while you sleep.

The winning recipe is boringly well-documented: pick a niche, nail a 2-second hook, stay on-brand, keep people watching to the end, and build a deep library so the algorithm has something to binge-feed. Notice what's not on that list: a human, per video. That's a system, not a craft.

The channels getting deleted skipped the system and cranked the volume knob to 11. The survivors — and the non-AI GOATs like Kurzgesagt and CrashCourse — win on structure, pacing, and actually having a point. My bet: an engineer can clear that quality bar and the volume bar if each video is cheap enough to run hundreds of experiments, with a learning loop deciding which ones to rerun.

Exhibit A: my first video quietly ate $10

Here's video #1, live on the channel — Lobachevsky, the guy who broke geometry:

🎬 The heretic who broke geometry → youtube.com/shorts/gaR76MiAK0U

I did the rookie thing: reached for AI image-to-video on every single scene, because that's what the shiny demos show. It looked great. Then I checked the bill.

Ten dollars. One Short.

The villain is one line of arithmetic — hosted AI video is priced per second, not per clip:

# studio/providers/video.py — real per-second prices (verified on fal.ai, June 2026)
FAL_MODELS = {
    "kling":    {"per_s": 0.07},   # 150s Short ≈ $10.50   <-- oof
    "ltx":      {"per_s": 0.04},   # cheapest hosted i2v
    "seedance": {"per_s": 0.30},   # 150s ≈ $45 (lol no)
    "hailuo":   {"per_s": 0.045},
    "wan":      {"per_s": 0.16},
}

150 seconds × $0.07 = $10.50, no matter how you slice the clips. Now do the napkin math on a content strategy: at ~$10/video, a hundred experiments is a thousand bucks, and you cannot run a "post a lot and learn" loop you can't afford to repeat. The economics were quietly DOA.

Plot twist: I'd solved this before, in a past life

Before AI ate my career, I shipped indie games. And indie game dev is a master class in faking expensive things for free, because you've got a $0 art budget and a build due Saturday. You don't buy motion — you engineer the feeling of motion: parallax scrolling layers, drifting backgrounds, snappy cuts, a little camera push. Cheap tricks, real game-feel.

Same energy, new domain. Why pay $10.50 for AI video when I can take one still image and add:

drift / Ken-Burns — slow pan + zoom, the still breathes;
parallax — split the frame into depth planes and slide them at different speeds (the background literally drifts behind a static subject);
cuts & transitions — rhythm beats AI motion for retention anyway.

All in ffmpeg. All free. That's the entire Part 3 of this series, and it's where most of the $10 goes to die. Spoiler: it does not look like slop —

(These stills don't move on the page — but every free effect is playing live in the effects gallery. Drift, parallax, rain, embers, glitch, all $0. Part 3 dissects how.)

Exhibit B: the six-cent video

Killing AI video was step one. Step two was realizing Nano Banana isn't always the move. For a goofy "why do cats have fur" Short, I didn't need photoreal noir — I needed clean flat cartoon. Enter Flux Schnell at $0.003 per megapixel, roughly half a cent an image:

Here's that one, live:

🎬 Why do cats have fur? → youtube.com/shorts/FWtEJjeK_vI

And the receipts, straight from its manifest:

Stage	Provider	Cost
Script	local LLM	$0.00
Visuals (10 images)	`fal-flux-schnell`	$0.054
Motion (all scenes)	Ken-Burns (ffmpeg)	$0.00
Voice	`edge-tts` (neural)	$0.00
Sound FX + music	`fal-elevenlabs-sfx` + local bed	$0.0076
Save + Publish	ffmpeg / YouTube API	$0.00
TOTAL		≈ $0.06

From $10.50 → six cents. Same pipeline, different knobs. That's a ~175× cost cut, and it's the difference between "fun demo" and "I can run hundreds of these and let a bandit pick the winners." (Full cost teardown: Part 4.)

That $0.0076 line is quietly important, too: it's an AI sound layer — generated SFX plus a music bed ducked under the narration — and atmosphere is a big reason cheap doesn't read as slop. The how is in Part 3.

The gap I'm actually building into

After mapping the field, two things were suspiciously absent from every faceless-AI playbook:

Cost honesty. Everyone screenshots the $4M. Nobody publishes a per-second price table or admits their first video cost $10. So they never explain how to afford video #100.
Autonomy. "Just post consistently for 6 months" — cool, that's a full-time job done by hand. Nobody treats what to make next as a decision a system can learn: explore vs. exploit, a memory of what won, a verdict on every bet.

That's the thesis. Over the next six parts I'll build a channel that:

turns a one-line idea into a finished, well-directed vertical Short (Part 2),
moves nearly all motion off paid AI video onto free custom effects (Part 3),
drives cost per video from ~$10 toward pennies (Part 4),
remembers what worked via a per-channel journal + self-reflection (Part 5),
decides what to make next with a Thompson-sampling bandit over a falsifiable hypothesis (Part 6),
and runs itself on a schedule, grading each post 48–72h later (Part 7).

The learning loop is already showing its teeth. A batch of near-identical clips dumped in the same minute cannibalized itself (3–6 views each — brutal). Meanwhile one video — a real mathematician framed as a heretic, with a "this breaks reality" hook in the first two seconds — hit roughly 50× the channel's other Shorts. The rest of this series is the machine I'm building so that's a repeatable pattern, not a lucky roll.

It's all open source — and it's a live experiment

The whole studio is on GitHub — slope-studio (one letter from "slop", which, given the genre, is either a typo or a mission statement). Every line of code in this series lives there: the 7-stage pipeline, the free ffmpeg effects, the cost model, the bandit. Part 2 is the guided tour, with a one-command smoke test you can run with zero API keys.

And this isn't a retrospective with the numbers airbrushed in — it's a live experiment you can watch compound or faceplant in public. Every Short the system ships asks viewers to subscribe, because the whole point is watching an autonomous channel grow from zero. Consider it subscribing to the test harness.

What I'd tell another AI engineer

Takeaway: Treat content as a pipeline, not a craft. The instant every input — script, image, motion, voice, sound — is a function call with a measured cost, three superpowers unlock: you can drive unit cost toward zero, run hundreds of cheap experiments, and bolt a learning loop on top that decides which experiments to repeat. The folks making millions optimized the system and the volume. The folks getting deleted only had volume. The alpha is the system.

Next — Part 2: Idea → Published in 7 Stages. The actual architecture: every stage as an independent CLI subcommand, the runs/<id>/ artifact flow, a manifest that records measured cost per stage, and how a single line of text becomes an uploaded Short without me touching a video editor.

▶ Live effects gallery: dasein108.github.io/slope-studio
⭐ Star the repo: github.com/dasein108/slope-studio
🔔 Subscribe (watch the experiment from zero): the Lobachevsky Short

Sources: techlusive · Business Standard · OutlierKit (AI-slop crackdown) · Miraflow (faceless explosion 2026). View/revenue figures are third-party estimates.

DEV Community: Maksims Gavrilovs

Zero to Autopilot, Part 9: Anatomy of a $25 AI Company

1. The goal, and how it becomes tasks

2. Two kinds of memory

3. How an agent decides what to do

4. Coordination: tickets, not chatter

5. Why the guardrails live in code, not the prompt

What the design gets right, and where it's thin

Zero to Autopilot, Part 8: The $25 Company — an Org of AI Agents That Runs My Channel

What this is

The one goal

The company

How a day runs

How it decides what to make

What I control vs what the agents do

Why I barely touch it

Receipts

Zero to Autopilot, Part 7: Closing the Loop — the Channel That Runs Itself

Everything's built. Now delete the operator.

The crux: measurement is deferred

plan() — one tick, one decision

The driver: tick and autopilot

Even the channel setup is automated

Internal agents vs skill-based orchestration

The first thing autonomy taught me: cheap + automated = automated garbage

The rewrite that made the failure legible

The results

The whole arc, in one breath

What I'd tell another AI engineer

Zero to Autopilot, Part 6: A Thompson-Sampling Bandit That Picks the Next Video

The dilemma, made concrete

Phase 1: cold-start (you have no baseline yet)

Phase 2: a warm-started contextual Thompson bandit

Where do new candidates come from? ideate

The loop, end to end

What I'd tell another AI engineer

Zero to Autopilot, Part 5: Teaching a YouTube Channel to Remember

Cheap content is a search problem

Every video is a falsifiable bet

The entry also remembers how it was made

Scoring virality — against yourself

Recall: pulling up the relevant past

Reflection: turning outcomes into strategy

What I'd tell another AI engineer

Zero to Autopilot, Part 4: The Cost Collapse — $10.50 $0.06 per Video

Where the money actually goes

Knob 1: the per-second video bomb

Knob 2: right-size the image model

The tiers: one knob to set them all

auto: spend the budget where it matters

The pre-flight that refuses to overspend

The receipts

A field update: what the catalog actually cost

The one line item I never cut: sound

And cheaper actually wins

Why this is the whole ballgame

What I'd tell another AI engineer

Zero to Autopilot, Part 3: Giving a Still Image Real Motion for $0.00

Viewers don't need generated video. They need motion.

ffmpeg is the whole trick: an effect is a string

The effect families

How the motion is wired

Parallax, the one effect ffmpeg can't do alone

Text, and the font library that wasn't there

Choosing the effect: the model proposes, code constrains

The one I haven't cracked: manim

And the ears

The road not taken: self-hosting the video model

What I'd tell another AI engineer

Zero to Autopilot, Part 2: One Line of Text a Published Short, in 7 Stages

The mental model: a video is a Makefile

Everything is a file under runs/<id>/

Each stage is a CLI subcommand (and they chain)

The provider contract: every model reports its own cost

Six small LLMs, not one big one

Watching it actually run

Try it yourself (zero API keys, zero dollars)

What I'd tell another AI engineer

Zero to Autopilot, Part 1: I Built an AI That Runs a YouTube Channel (the landscape, and my $10 wake-up call)

The two-billion-view problem

`plan()` — one tick, one decision

The driver: `tick` and `autopilot`

Where do new candidates come from? `ideate`

`auto`: spend the budget where it matters

Everything is a file under `runs/<id>/`