Maksims Gavrilovs

Posted on Jun 12 • Originally published at dev.to

Zero to Autopilot, Part 7: Closing the Loop — the Channel That Runs Itself

#python #ai #automation #machinelearning

Series: Zero to Autopilot — Building a Self-Improving AI Media Channel. Part 7 of 7 — the finale. Part 1 landscape · 2 pipeline · 3 free motion · 4 cost · 5 memory · 6 bandit. Now I remove myself from the loop.

Data status (Part 7): real-now. Metrics refreshed from YouTube on 2026-06-12: 24 measured videos, 1,742 total views, 48 likes, +10 subscribers, $5.04 measured production spend, 7 wins, 6 losses, and 11 neutral results. Small-channel data is noisy, but it is real enough to grade the loop honestly.

Everything's built. Now delete the operator.

Here's where six parts leaves us. I can: turn an idea into a finished Short (Parts 2–3), for six cents (Part 4); remember every bet and score it against my own portfolio (Part 5); and decide what to make next with a bandit (Part 6). Each of those is a command I run. The last step is making me unnecessary — a scheduler that runs the whole cycle while I sleep.

There's exactly one thing that makes this hard, and it's not the AI. It's time.

The crux: measurement is deferred

A freshly published Short's metrics are meaningless. Views/day, retention, engagement — they don't stabilize for 48–72 hours. So you cannot write the loop as a straight-line script (ideate → produce → measure → learn), because between produce and measure there's a two-to-three-day wait, and during that wait the loop should be doing other useful things (producing the next bet, reflecting on older ones).

So the loop isn't a script. It's a state machine over time. Each tick, it asks one question: given the journal and the current time, what is the single most useful thing to do right now?

# studio/marketing/loop.py — the five possible actions
#   measure  →  one or more videos have matured; fetch their stats
#   learn    →  enough new measurements have accrued; reflect into strategy
#   ideate   →  the backlog is running low; generate fresh bets
#   produce  →  cadence allows another video; make the next backlog bet (budget-sized)
#   idle     →  nothing due (waiting on maturation or the produce cadence)

`plan()` — one tick, one decision

The whole engine is a pure function: plan(journal, now) → Plan. It returns the one due action, in priority order. Measuring matured videos comes first (that data unlocks everything else), then reflecting, then refilling the backlog, then producing:

class Plan(BaseModel):
    phase: str                      # cold-start | optimizing
    next: str = "idle"              # measure | learn | ideate | produce | idle
    measure_due: list[str] = []     # entry ids past the maturation window
    learn: bool = False             # enough new measurements to reflect?
    produce_entry: str = ""         # the bet to produce next (chosen by the bandit)
    produce_max_cost: float | None = None   # budget cap for that produce

And the core of the decision — note it's all time-driven off published_at and a few cadence knobs:

def plan(j, now=None):
    cfg = j.loop_config
    phase = "cold-start" if j.in_cold_start else "optimizing"

    # 1) measure: deployed videos past the maturation window, not yet measured
    measure_due = [e.id for e in j.deployed()
                   if _age_hours(e.published_at, now) >= cfg.maturation_hours]

    # 2) learn: count measurements newer than the last reflection
    new_measured = [e for e in j.measured()
                    if not j.last_learn_at or e.metrics.fetched_at > j.last_learn_at]
    learn = len(new_measured) >= cfg.learn_every

    # 3) produce/ideate cadence → pick the next bet with the bandit (Part 6)
    # ...priority: measure > learn > ideate (backlog low) > produce (cadence ok) > idle

The knobs are all in one config — maturation window, produce cadence, how often to reflect, when to refill the backlog:

class LoopConfig(BaseModel):
    maturation_hours: float = 60.0          # wait ~2.5 days before measuring
    min_hours_between_produces: float = 20.0  # ≈ 1 video/day
    daily_produce_cap: int = 2
    learn_every: int = 3                    # reflect after 3 new measurements
    backlog_min: int = 2                    # ideate when planned bets drop below this
    select: str = "bandit"                  # next-bet picker (Part 6)

The driver: `tick` and `autopilot`

plan() decides; two CLI commands act. studio marketing tick runs exactly one due action and exits — perfect for a cron job. studio marketing autopilot loops ticks for a session. Put tick on a schedule (cron, a systemd timer, /loop) and the channel runs itself:

# one cron line ≈ a self-running channel
0 */6 * * *  cd /path/to/slope-studio && studio marketing tick --channel pilot

Every 6 hours it wakes, asks plan() what's due, does that one thing — measure a matured video, reflect, ideate, or produce the next bandit-picked bet — and goes back to sleep. The deferred-measurement problem disappears because the state machine simply doesn't measure until published_at + maturation_hours, and spends the wait producing and reflecting instead.

A reproducibility detail that matters here: the bandit's RNG is seeded from journal state, so tick called twice in the same state makes the same decision. No double-producing, no races.

Even the channel setup is automated

One loose end: a new channel needs a brand. So that's a lego-block too — studio brand <spec.json> generates a full kit (banner, profile avatar, a transparent watermark logo, plus keywords and an SEO description) into runs/_brand/<slug>/. Text-free generated art, with the wordmark composited in Pillow's safe area. Zero-to-channel, including the identity, is scriptable.

Internal agents vs skill-based orchestration

There are two ways to run this kind of loop.

The first is internal agent orchestration: the system owns the whole state machine, calls its own tools, and treats every step as part of one product. That is what studio marketing tick does. It knows the journal schema, the maturation window, the budget config, and the next due action. It is tight, reproducible, and cron-friendly.

The second is skill-based orchestration: the same work is decomposed into portable operating instructions that any capable external LLM can follow — Claude, Codex, Gemini, or whatever agent shell you prefer. In that mode, the skill is the durable interface: measure this channel, learn from the journal, pick a bet, deploy it, report the result. The external model brings reasoning, writing, critique, and research; the CLI remains the deterministic I/O layer. That is less sealed than a pure internal agent, but more flexible: you can swap models, run the same marketing workflow from different agent environments, and keep the operational knowledge outside any one vendor's hidden prompt.

In practice I want both. The internal autopilot handles boring scheduled execution. The skills let a stronger external agent step in for strategy, critique, and one-off investigation without rewriting the studio.

The first thing autonomy taught me: cheap + automated = automated garbage

The day the loop ran end-to-end with no one watching, it published a video that was technically fine and still felt wrong. The frames moved. The narration lined up. The audio ducked under the voice. It had all the machinery from the first six parts.

But the story was weak.

Some early Shorts were raw in a way that only became obvious after watching a batch together: not enough concrete explanation, inconsistent emotional arc, pretty frames carrying a script that didn't quite earn the viewer's minute. I had spent six articles making production cheap, and the first lesson of autonomy was blunt: a cheap content machine can manufacture weak stories faster.

Effects are polish; content is the product. A bandit picks a good topic, but "topic" is not a script. The writing still has to deliver a real fact and a real feeling, and nothing in the pipeline was checking for that.

So I added a new stage between script and spend: a content critic (stages/critic.py). It's an LLM-as-judge that reads the scenario and scores it on four things before a cent goes to image or video generation:

# studio/models.py — the bar a scenario has to clear
CRITIC_CRITERIA = {
    "topic_revealed": "viewer comes away KNOWING the thing",
    "fact_explained": "a concrete fact/idea/event is STATED and EXPLAINED",
    "informative_interesting": "teaches something non-obvious with a curiosity gap",
    "emotional_payoff": "lands a clear emotion",
}

Each criterion returns pass/fail, a 1–5 score, one specific note, and revision_notes the writer can act on. The important part is not the prompt. It's where the prompt sits in control flow:

for attempt in range(retries + 1):
    verdict = critic(script)
    if verdict.passed:
        return script
    script = write_again(revision_notes=verdict.revision_notes)

The real code keeps the best-scoring attempt, caps retries with --critic-retries, and can either proceed-best (--critic on) or abort (--critic strict). No framework, no infinite loop, just a bounded script -> critic -> rewrite gate inside studio run. The headless cron inherits it by default, which is the entire point: the gate has to live where there is no human in the seat.

The rewrite that made the failure legible

The cleanest example was Fermat.

I had an older Short about Fermat's Last Theorem: the note in the margin that took 358 years to solve. It had the right ingredients — Fermat's taunt, Andrew Wiles, a famous unsolved problem — but the story was soft. It gestured at the myth more than it explained the hook.

The critic made the problem concrete:

{
  "fact_explained": "2/5 — Wiles' proof is mentioned, but the fix and concepts are not explained",
  "emotional_payoff": "2/5 — highlight Wiles' despair after the fatal flaw, then the triumph"
}

That is a useful failure. "Make it better" is vague. "State the equation, explain the 358-year gap, show the fatal hole, then land Wiles alone finding the fix" is executable.

So I re-made the Short with the same basic media path but a stronger scenario: Fermat's Last Theorem. The new narration opens with the actual equation shape, names the margin note, gives Wiles the seven-year attic beat, and spends the payoff on the near-collapse of the proof:

"In 1994 he unveiled the proof. Then a referee found a fatal hole in it. For a year it looked dead, until, alone, Wiles suddenly saw how to fix it."

The rewrite is still an early read, so I do not mix it into the mature cohort dashboard below. But the direction was not subtle:

Version	URL	Views	Likes	Cost	Result
softer story	`F3STKw8Nlr8`	12	0	$0.776	P29, neutral
critic-guided rewrite	`rozAXRztijQ`	119	3	$0.208	P92, win

Roughly 10x the views, some actual likes, and less money spent because the rewrite reused the cheap path instead of treating the whole thing as a fresh premium render. That's the kind of result I want from an eval: not an abstract "quality score," but a concrete edit that changes the video and the market response.

There is a second lesson hiding inside this one. An eval can expose weak output, but it cannot author the fix by itself. On another video, the critic made the writer-model choice obvious: The Universe Has No Edge failed with a cheap writer, then passed after switching to a stronger writer model. The cost floor and the quality floor live in different places. Keep visuals and motion cheap, but do not cheap out on the script when the whole video depends on it.

The results

I refreshed the YouTube measurements on 2026-06-12. The journal had 24 measured videos, 4 planned bets, 1,742 total views, 48 likes, 10 new subscribers, 0 comments, and $5.04 of measured production spend. The average measured cost was about $0.21 per video, with 7 wins, 11 neutral results, and 6 losses by the channel's own portfolio-relative scoring.

The top of the portfolio is not one format. That's the useful fact. The loop found wins in philosophy, physics, math, and even poetry, but the winners all had a sharper emotional or conceptual promise than the flops.

Rank	Video	Views	Likes	Retention	Cost	Percentile
1	Diogenes and the rich man's spotless palace	268	9	74.29%	$0.207	P100
2	Black hole information paradox	170	4	n/a	$0.234	P96
3	Fermat's Last Theorem	119	3	77.68%	$0.208	P92
4	Rubaiyat — Awake!	132	15	61.40%	$0.084	P88
5	The Universe Has No Edge	182	1	50.07%	$0.581	P83

The best result was not the most expensive one. Diogenes cost $0.207 and landed P100. Fermat's rewrite cost $0.208 and landed P92. Rubaiyat Awake cost $0.084 and landed P88. The signal is not "spend more." The signal is that the idea, story shape, and hook have to earn their minute before the pipeline spends anything.

There is also a weird failure mode I do not want to over-explain: some videos appear to get no initial push at all. A few more are near-zero after several days: Rabies has 2 views, Population of Italy has 4, and the first Galois version has 5. I cannot tell from this data whether that is a metadata problem, a topic problem, a batch-upload penalty, a Shorts distribution quirk, or simply YouTube deciding not to kick-start those uploads. The honest takeaway is that a small-channel autopilot is not only learning audience taste; it is also learning around platform distribution randomness.

That changes how I read losses. A video with 150 views and weak engagement is a content lesson. A video with 0 views is partly a distribution lesson. The loop can still score both, but the strategy should treat them differently: content critique for videos that got a chance, packaging and cadence experiments for videos that never entered the room.

The whole arc, in one breath

A faceless AI channel is a search problem. Make the unit cost trivial (free motion + right-sized models → six cents), record every video as a falsifiable bet with measured cost and portfolio-relative score, let a bandit exploit what wins while exploring the rest, and wrap it in a time-aware state machine that runs the cycle unattended. None of it needed a vector DB, a fine-tune, or a render farm — just boring architecture, honest cost accounting, and a willingness to let the data, not the ego, pick the next video.

What I'd tell another AI engineer

Takeaway: When an agent's feedback is delayed, don't model the workflow as a pipeline — model it as a state machine over time whose tick asks "what's the single most useful thing to do now?" Deferred reward (here, 48–72h of metric maturation) is the norm in real systems, not the exception; a plan(state, now) → one action function handles it cleanly, stays cron-friendly, and (seeded from state) stays reproducible. Automate the boring 90%, be loud about the 10% you can't, and let the loop compound.

That's the series. Zero to autopilot: a channel that writes, renders, publishes, scores, and decides — for cents, on a schedule. It's all open source; go break it, fork it, or beat it.

▶ Live effects gallery: dasein108.github.io/slope-studio
⭐ Star the repo: github.com/dasein108/slope-studio
🔔 Subscribe to watch the experiment continue: the channel

DEV Community

Zero to Autopilot, Part 7: Closing the Loop — the Channel That Runs Itself

Everything's built. Now delete the operator.

The crux: measurement is deferred

`plan()` — one tick, one decision

The driver: `tick` and `autopilot`

Even the channel setup is automated

Internal agents vs skill-based orchestration

The first thing autonomy taught me: cheap + automated = automated garbage

The rewrite that made the failure legible

The results

The whole arc, in one breath

What I'd tell another AI engineer

Top comments (0)

Everything's built. Now delete the operator.

The crux: measurement is deferred

plan() — one tick, one decision

The driver: tick and autopilot

Even the channel setup is automated

Internal agents vs skill-based orchestration

The first thing autonomy taught me: cheap + automated = automated garbage

The rewrite that made the failure legible

The results

The whole arc, in one breath

What I'd tell another AI engineer

`plan()` — one tick, one decision

The driver: `tick` and `autopilot`