I wired 908 creator dossiers into my Substack commenter. Here is what changed.

#ai #automation #llm #showdev

What is the point of a commenter that knows nothing about who it is talking to?

I had around 410 Substack creators in my comment pool. The engine would pick one, draft something, post it. Comments were technically correct. They were also obviously generic. A comment that could have been written for anyone is a comment that reads like it was written for no one.

The fix was already sitting on disk. A prior workflow had generated 908 creator dossiers as JSONL: each record has known_for, recent_themes, a voice descriptor, suggested hook and angle fields, and an avoid list. None of that was wired into the commenter. So I wired it in.

Three functions, no regressions

creator_index() is a lazy loader. It reads the JSONL on first call and builds a dict keyed on normalized host. Normalization means lowercase, strip scheme and trailing path and www. The normalization does real work because Substack URLs in the wild are inconsistent. After loading, QA filters drop 33 records tagged as duplicates or wrong niche. 908 in, 875 out. If the file is missing the function returns an empty dict and everything falls back to prior behavior. No file means no enrichment, not a crash.

load_authors() expands the pool. It unions in the enriched creators with a QA status of ok or unchecked and a certain host match. Pool grows from around 410 to around 1094. This is the biggest single effect of the whole change. The commenter now has range instead of repeatedly hitting the same small slice.

_enrichment_block() is where the context actually lands. If a dossier exists for the target, it builds a prompt block with known_for, recent_themes, and the voice and hook and angle fields gated on a confidence floor. The avoid list goes in too. This block is injected after the no first person claims rule, so the ordering of constraints is preserved.

_live_score() nudges selection. Each candidate gets a base score multiplied by a factor in the range [1 w, 1 + w] with w = 0.15. Unknown handles score exactly 1.0. The nudge is deliberately small: it biases toward higher tier and higher confidence creators without making selection deterministic. If you want no nudge at all, set SUBSTACK_CREATOR_QUALITY_WEIGHT=0.

Full kill switch: SUBSTACK_CREATOR_ENRICH_ENABLED=0 restores the old pool and the old prompts. Both flags are in the env, not the plist, so you can flip them without a launchctl reload.

The real tradeoff

The enrichment block adds tokens to every generation call when a dossier exists. That is a genuine cost. I chose to accept it because a generic comment to a creator you theoretically have a dossier on is worse than no comment at all. It signals you did not read their work, which is the one thing comments are supposed to signal.

The QA drop of 33 records is also a real cost. Some of those are probably legitimate creators with messy deduplication. A better QA pass would recover some of them. I left it because the 875 that survived are clean and going from 410 to 1094 was already the meaningful win.

What I would do differently

The confidence gating applies a single floor to voice, hook, and angle as a group. In practice voice is more stable than angle. An angle from six months ago might be stale in ways that voice is not. I would split those into separate thresholds if I were starting over.

I would also make the enrichment opt in per dossier rather than opt out globally. Right now the kill switch is all or nothing. A per record flag would let you audit individual dossiers without disabling the whole feature.

Neither of those is a blocker. The core thing works: 11 tests added, full suite 50 passed, and comments now go out knowing something real about who they are going to.

DEV Community

I wired 908 creator dossiers into my Substack commenter. Here is what changed.

Top comments (0)