DEV Community

Cover image for Zero to Autopilot, Part 5: Teaching a YouTube Channel to Remember
Maksims Gavrilovs
Maksims Gavrilovs

Posted on

Zero to Autopilot, Part 5: Teaching a YouTube Channel to Remember

Series: Zero to Autopilot — Building a Self-Improving AI Media Channel. Part 5 of 7. Part 1 landscape · Part 2 pipeline · Part 3 free motion · Part 4 cost collapse, which together turn an idea into a published Short for six cents. Now the back half: giving the channel a brain. This part is memory; Part 6 is deciding.

Data status (Part 5): real-now (qualitative). The memory architecture and the patterns it has already learned are real and shown below. The quantitative virality scores are defined here but reported with real numbers in Part 7, after the data matures (≥1 week).

A frame from the channel's breakout winner — the video the memory system is built to learn from and repeat.

Cheap content is a search problem

Here's where Part 4 leaves us: I can make a hundred videos for six bucks. That sounds great until you realize it just moves the hard problem. Making videos was never the bottleneck — knowing which videos to make is. A hundred random Shorts is a hundred coin flips. To make it a search, the channel needs to remember what it tried and what happened.

So I gave it a memory — and I modeled it on how human memory actually splits:

  • Semantic memory — the durable, generalized lessons ("tragic-genius math stories work").
  • Episodic memory — the specific events ("on June 3 I posted the Lobachevsky one and it hit 50×").
  • Retrieval — pulling the relevant episodes back up when facing a new decision.

In the code that's three pieces: a long-term Strategy, an episodic Entry[] ledger, and a recall() function. All of it lives in a per-channel journal (runs/_marketing/<channel>/journal.json + a human-readable .md).

Every video is a falsifiable bet

The unit of episodic memory is the Entry, and its most important design choice is that a video isn't just content — it's a hypothesis. Before anything renders, an entry states what it believes and how it'll be judged:

class Entry(BaseModel):
    idea: str
    hook: str = ""
    assumption: str = ""   # WHY we think this goes viral  ← the falsifiable claim
    goal: str = ""         # the target, e.g. ">=P75 virality vs the channel's portfolio"
    theme: str = ""
    tags: list[str] = []
    explore: bool = True   # an exploration bet, or exploiting a known winner?
Enter fullscreen mode Exit fullscreen mode

These aren't hypothetical — here are three real entries from my channel's journal, each a stated bet:

{ "idea": "The Madman Who Counted Infinity: Cantor",
  "hook": "He proved some infinities are BIGGER than others — and it drove him to the asylum.",
  "assumption": "Counterintuitive 'sizes of infinity' + tragic-genius arc = the exact Lobachevsky formula that hit 50x.",
  "theme": "infinity / set theory", "tags": ["math-mystery","heretic-format","explore"] }

{ "idea": "The Equation Written the Night Before a Duel: Galois",
  "hook": "A 20-year-old invented modern algebra in one night — then died in a duel at dawn.",
  "assumption": "Ticking-clock tragedy + 'one night of genius' is an irresistible curiosity gap." }

{ "idea": "One Sentence That Destroyed All of Mathematics: Russell",
  "assumption": "'One sentence breaks everything' is a pure curiosity gap; paradoxes are trending." }
Enter fullscreen mode Exit fullscreen mode

Writing the assumption down, before publishing is the whole trick. When the numbers land, I'm not asking "did it do well?" — I'm asking "was my stated assumption right?" That's the difference between a content diary and a science.

And it pays off most when a bet is wrong. I ran two videos in the same "deadpan academic humor" lane: one on the absurd, straight-faced "how mathematicians catch a lion," and one on relatable "which scientist are you?" lab-personality bait. The first landed; the second didn't. Because both assumptions were on the record, the lesson came out precise instead of vague: it isn't that "humor works," it's that the absurd, specific method is the hook and broad relatability is not. Two falsifiable bets turned a hunch into a rule the next idea inherits — which is exactly what the reflection step (below) writes down.

The entry also remembers how it was made

Each entry doesn't just record the bet and the outcome — it captures its own production telemetry, pulled from the run manifest (the measured-cost ledger from Part 2 finally pays a second dividend):

    cost_usd: float = 0.0          # measured $ to produce
    tier: str = ""                 # free | cheap | balanced | premium
    video_model: str = ""          # kling | ltx | … | kenburns
    animators: list[str] = []      # distinct animators across scenes
    effects: list[str] = []        # fx + atmosphere used
    n_scenes: int = 0
Enter fullscreen mode Exit fullscreen mode

So later I can ask not just "do heretic-mathematician stories win?" but "do the Flux-Schnell, kinetic-heavy, 60-second ones win?" The memory spans content and craft.

That join turns out to matter more than I expected. Once cost, model, effects, music provider, SFX provider, and market outcome sit on the same row, the channel can ask craft questions too: did the $0.20 music bed actually earn its keep, or did a free synth drone do the job? Did the video win because of the topic, the sound, the animation style, or because the script finally had a real story? The first version of this was just "latest metrics." I later added age-bucket
snapshots — 1d, 3d, 7d, 14d, 30d — because comparing a one-day upload to a thirty-day upload is lying with extra steps. The real slice-and-compare receipts stay in Part 7; the important design point here is that the memory row is no longer just an idea log. It's the place where production choices meet market feedback.

Scoring virality — against yourself

When results come in, each entry gets a virality score. The composite is deliberately simple and weighted toward what "viral" actually feels like — velocity — while guarding against cheap reach that doesn't convert:

W_VELOCITY, W_RETENTION, W_ENGAGEMENT, W_SUBS = 0.5, 0.2, 0.2, 0.1

def virality(m):
    return (
        W_VELOCITY  * math.log10(m.velocity + 1)          # views/day, log-damped
        + W_RETENTION  * (m.retention or 0)/100
        + W_ENGAGEMENT * min(m.engagement * 20, 1.0)      # ~5% engagement saturates
        + W_SUBS       * min(subs_conv * 50, 1.0)         # ~2% sub-rate saturates
    )
Enter fullscreen mode Exit fullscreen mode

But an absolute score is meaningless for a small channel — 800 views might be a smash or a dud depending on your baseline. So the score that decides anything is relative to the channel's own portfolio:

def relativize(scores):   # percentile rank within THIS channel's history
    return [round(100.0 * sum(s <= x for s in scores)/n, 1) for x in scores]

def outcome(percentile, cold_start):
    if cold_start:            return "cold-start"
    if percentile >= 75:      return "win"
    if percentile <= 25:      return "loss"
    return "neutral"
Enter fullscreen mode Exit fullscreen mode

A video is a win if it lands in the top quartile of my own videos, a loss in the bottom quartile. Self-relative grading means the loop keeps working whether the channel does 50 views or 50,000 — it's always chasing better than my median, which is exactly what compounding growth needs. (The real percentile numbers go public in Part 7.)

Virality is the post-publish eval — a verdict from the market, after the fact. It turns out to have a mirror image: a pre-spend eval that judges a scenario before a cent is spent — a content critic that asks "does this script actually reveal a fact and land a feeling?" and reworks it if not. Two judges, two timings: one on the idea after the audience sees it, one on the script before the camera rolls. The pre-spend critic earns its own story in Part 7 — it exists because the autopilot, left alone, cheerfully published something hollow.

Recall: pulling up the relevant past

When the channel is about to decide what to make next (Part 6), it shouldn't reason from its entire history — it should pull the episodes relevant to the current direction. That's recall(), and I kept it deliberately dependency-free: relevance is lexical token-overlap, ties broken by virality, so a relevant winner outranks a relevant flop:

def recall(j, query, k=6):
    """Top-k measured episodes most relevant to `query`, best first.
    Ties broken by virality, so a relevant winner outranks a relevant flop."""
    q = _tokens(query)
    scored = [(_relevance(q, _episode_tokens(e)), e.virality or 0.0, e)
              for e in j.measured()]                 # only measured bets have a lesson
    scored.sort(key=lambda t: (t[0], t[1]), reverse=True)   # by relevance, then virality
    return [e for rel, _, e in scored[:k] if rel > 0.0]
Enter fullscreen mode Exit fullscreen mode

The seam is intentional — you could swap in embeddings here — but lexical works, costs nothing, and runs offline. The default in this whole project is "free and local unless paying clearly wins."

Reflection: turning outcomes into strategy

The last piece closes the loop. After a few new videos are measured, a reflect() step feeds the scored bets to an LLM and asks it to update the long-term strategy — what's winning, what's losing, what to try next:

class Strategy(BaseModel):
    niche: str = ""
    current_direction: str = ""
    winning_patterns: list[str] = []
    losing_patterns: list[str] = []
    next_seeds: list[str] = []        # concrete idea seeds for the next ideation
Enter fullscreen mode Exit fullscreen mode

This isn't aspirational — it's the actual current strategy in my channel's journal right now, rewritten by the LLM reflecting on real outcomes:

"niche": "math & physics mystery — rebels, paradoxes, forbidden knowledge (anime-noir visuals)",
"winning_patterns": [
  "Outsider-genius figures, mysticism, and high personal stakes (early death, divine inspiration) in math/physics",
  "Intellectual shock + curiosity gaps framed around 'everything breaking' or a foundational paradox",
  "Absurdist, deadpan academic humor rooted in one specific bizarre concept (mathematicians hunting a lion)",
  "Highly active, vivid, grand imagery in short poetic forms — not contemplative or melancholic ones"
],
"losing_patterns": [
  "Contemplative, melancholic, abstract poetry that lacks active imagery and a dramatic hook",
  "Pure science-horror missing the 'mystery / rebel / paradox' element central to the niche",
  "Generic 'relatable academic humor' that isn't rooted in a truly absurd, deadpan concept",
  "Historical mysteries lacking an immediate, shocking, or deeply personal angle"
]
Enter fullscreen mode Exit fullscreen mode

The important thing isn't the list, it's that the list moved. The very first lesson this loop ever recorded was the cat-anatomy flop from Part 1: don't batch-dump near-identical clips (that series cannibalized itself at three-to-six views each). Everything above is what it has reflected its way toward since — through the math-hero winners, then a deliberate push outside the core into deadpan humor and a run of poetry reels. Look at that first losing pattern: "melancholic poetry that lacks active imagery." The loop learned that from my own poetry experiments underperforming, and wrote itself a rule about it. That's the system caught in the act of learning, not a strategy I typed in.

There's a heuristic fallback too (top and bottom performers by score) so reflection still works with no LLM key, but with one the lessons get sharper and feed straight back into the next idea. reflect() writes Strategy; ideation (Part 6) reads it. The snake eats its tail, and gets smarter each lap.

What I'd tell another AI engineer

Takeaway: If you want a system that improves, make every action a falsifiable bet recorded before the outcome — idea, the why, and the bar to clear. Split memory into durable strategy + an episodic ledger + cheap retrieval, mirror human memory, and score outcomes relative to the agent's own history so the loop is scale-invariant. Capture production telemetry alongside results so the agent can learn craft, not just content. None of this needs a vector DB or a fine-tune — a JSON ledger, a weighted score, token-overlap recall, and one reflection prompt already close the loop.


Next — Part 6: The Bandit. Memory tells the channel what worked; now it has to decide what to try next, balancing exploiting known winners against exploring new bets. I'll wire up a warm-started Thompson-sampling bandit over theme+tags — the actual explore/exploit engine that picks the next video.

Live effects gallery: dasein108.github.io/slope-studio
Star the repo: github.com/dasein108/slope-studio
🔔 Subscribe to watch the experiment grow from zero: the Lobachevsky Short

Top comments (0)