Mike Krop

Posted on Mar 10

I Built a Tiny YouTube Audience Simulator

#discuss #programming #ai #learning

I recently started my Youtube channel.

And I got stuck choosing what video to make next, so instead of trusting pure intuition, I built a tiny system that generates video concepts, generates synthetic audience personas, and asks those personas what they would actually click and keep watching.

This is not a serious prediction engine.

It is a fun, weird, very unfinished project that became possible because LLMs are now good enough to turn messy notes into structured artifacts fast.

And that is exactly why I like it.

Github link to explore to checkout code immediately

Why I built it

I had a painfully familiar creator problem: too many possible ideas, not enough production time, and no reliable way to decide which one deserved attention first.

That kind of decision is usually made with a mix of taste, mood, vague audience intuition, and procrastination disguised as "research."

I wanted something slightly more explicit.

At some point I realized that the core YouTube loop is conceptually simple:

A person sees feed of options.
They decide whether to click.
If they click, they decide whether to keep watching.

That is obviously not the full platform, but it is enough to ask an interesting question:

What happens if I build a tiny synthetic version of that loop and use it before production?

LLMs make this much more practical than it would have been even a year or two ago. They are still unreliable in many ways, but they are very good at:

turning rough notes into candidate ideas
turning audience descriptions into plausible personas
!!!(most important) roleplaying reactions with enough specificity to make tradeoffs visible

That combination is powerful for prototyping systems quickly.

Also: this project was fun. That matters. I think we should build more things just because new tools unlock weird experiments that were previously too annoying to attempt.

What the project actually does

The repo is a small Python pipeline with a few CLI commands.

At a high level it does three things.

1. Generate personas

The first step reads source documents about the channel and the audience and produces structured personas.

These are not just demographic buckets. The goal is to create people with context:

job
location
current situation
media habits
motivations
dislikes

In the committed debug snapshot, the project generates seven personas, including:

a product manager in London
a systems engineering PhD student in Bengaluru
an operations supervisor in Ohio
a supply chain consultant in Tokyo
a retired audio engineer in Vancouver

That matters because "the average viewer" is fake. Real people click for specific reasons. They have limited time, specific taste, and different thresholds for fluff.

2. Generate video candidates

The second step generates candidate videos from either:

an idea bank
or an existing script that needs different packaging

Each candidate includes:

a title
a thumbnail concept
a 10-second hook
a 30-second hook
a full arc

That means the system is not only comparing topics. It can also compare framing.

3. Run a click + retention simulation (NOT YET IMPLEMENTED)

The third step samples a persona, shows them a subset of videos, and asks which one they would click.

If retention is enabled, it then asks how far they would keep watching:

10 seconds
30 seconds
or to the end

The current simulation logic is intentionally simple. In simulate_choices.py, the model gets a persona summary and a small candidate set, returns a single choice, and optionally returns a coarse retention outcome.

That simplicity is a feature. I wanted something inspectable, not a fake-complex black box.

A concrete example from the repo

In the committed debug snapshot, one of the top-ranked ideas is:

Metrics Aren't the Destination: Tracing True Goals Through System Signals

Another strong one is:

How Systems Drift When Nobody Owns the Feedback Loop

That result is interesting to me for two reasons.

First, it suggests the system is not merely selecting the broadest or most sensational title.

Second, it reflects the actual kind of content I want to make: systems, incentives, feedback loops, measurement, and decision-making.

So even though the simulation is tiny, it already does something useful: it externalizes taste into artifacts I can inspect instead of leaving it as a vague feeling in my head.

What I like technically

My favorite part is not the ranking itself. It is the decomposition.

A lot of AI tooling jumps straight to "generate 10 titles" and stops there.

This project splits the problem into separate stages:

audience model
content candidates
packaging
simulated decisions

That is useful because when something looks wrong, I can ask where it went wrong.

Was the audience description weak?
Were the personas too similar?
Were the titles all basically the same?
Did the prompt bias the choice?

Because each stage writes its own artifact, the system is debuggable. That is a much better place to be than blindly prompting ChatGPT and pretending the answer means anything.

What is still bad

A realistic description of this project is important, because otherwise AI writing turns into theater very quickly.

This project is still very far from "ready."

I spent around a day on it. The current version is a prototype with enough structure to be interesting, not enough rigor to be trusted.

Some obvious limitations:

The debug run is tiny. In the committed snapshot it uses only 5 rounds, which is nowhere near enough to claim robust signal.
The personas are synthetic. They are only as good as the notes and prompts used to generate them.
The model is both generating and judging, which can create prompt-shaped bias.
Retention is extremely coarse: 10s, 30s, or end.
There is no grounding in actual channel analytics yet.
There is no feedback loop from real published performance back into the simulator.

So no, this does not "predict YouTube."

What it does is give me a structured pre-production thinking tool.

That is still valuable.

Why I think this matters anyway

The value here is not certainty. The value is decision hygiene.

Creative work often breaks because the reasoning stays invisible. You pick an idea because it "feels right," then post-rationalize later.

This project forces me to write down assumptions:

who the content is for
what they care about
what makes them click
what makes them leave

Once those assumptions exist as files, I can review them, challenge them, version them, and improve them.

That is the part I find genuinely exciting.

LLMs are at their best when they help convert fuzzy intuition into inspectable intermediate artifacts. This project is a good example of that.

The weird question: should we simulate people?

This is the part I do not want to hide behind "it's just a tool."

Simulating people is weird.

On one hand, we already simulate users all the time, just with lower-resolution language:

ideal customer profiles
conversion funnels
audience segments

This project is really just a more dynamic and more explicit version of that habit.

The goal is not "how do I manipulate an audience better?"(But one day it will)

The goal is "how do I pressure-test ideas before spending days making the wrong thing?"

Why I am sharing it now

Because this is exactly the kind of project I like seeing in public:

small, easy to get
opinionated
technically legible
obviously incomplete
interesting enough to start a conversation
crazy to some degree

That is what this repo is.

Discussion?

The technical side is fun, but the bigger question is where this kind of thing goes next. I really want to dive deep here:

If companies like Meta can collect enough data to build increasingly accurate simulations of people, does advertising become fundamentally more manipulative than it already is?
Even if audience simulation works, where is the line between "making better decisions" and optimizing systems against people too effectively?
And on the positive side: if we can emulate human reactions well enough, could this become a genuinely useful tool for business, education, product design, and communication, not just marketing?

Thank you all,

Subscribe:
Youtube

DEV Community