Auto-Assign Characters: how AudioProducer.ai turns a chapter into a line-by-line speaker map

#ai #audio #tts #writing

If you have ever tried to turn a novel chapter into a multi-voice audio drama by hand, the first thing you discover is that the generate audio step is the easy part. The hard part is the bookkeeping: who is speaking on this line, who is speaking on the next one, is this third paragraph narration or interior monologue, is the italicized text on the cake actually a character or is it a label that should be read by the narrator.

This article is about how the Auto-Assign Characters pass in AudioProducer.ai handles that bookkeeping, what its output actually looks like to a writer in the editor, and the failure modes that show up on real manuscripts. It is the companion piece to the earlier Auto-Assign Sounds article: same pipeline, different pass. Sounds covers the audio backdrop. Characters covers who reads what.

What the pass actually produces

The input is plain chapter text. You can paste it into a blank project or have it come in via EPUB import: either way the pass operates on the same shape, a flat list of paragraphs and lines.

The output is a line-level speaker map. Every line of the chapter ends up tagged with one of three things:

Narrator for prose, scene-setting, action beats, attribution clauses, anything that the third-person voice carries.
A named character for dialogue. Alice. White Rabbit. Kael. Eryndor. Whatever names the chapter actually uses.
An in-world label for text that exists inside the story world but does not come out of a person's mouth. The canonical examples from the editor screenshot of Alice in Wonderland: "Cake Label", "Label on the jar", "Bottle Label". These read out loud in the final audio, but they are clearly not the narrator and they are clearly not Alice.

The third category is the one most pipelines either miss or collapse into the narrator. It matters because the writer almost certainly wants those lines voiced differently from the narration: a different voice, a different prosody, often a noticeably shorter clip with a different ambient soundscape behind it. Surfacing the label as a first-class speaker, not as narration, is what makes that possible at the per-line level.

The editor is the review surface, not the publish target

The pass is explicit about being a starting point. You do not run Auto-Assign Characters and ship the audio. You run it, look at the speaker map in the editor, and adjust. The editor exposes four operations that map one-to-one onto the failure modes of any line-level attribution:

Re-tag a line. Select the line, assign a different speaker.
Split a line. When the model bundled two utterances together (Alice said something, then the White Rabbit answered, but the model glued them into one line), split them and re-tag each half.
Merge lines. Inverse of split. When the model over-segmented (a long quote got chopped at a comma the model thought was a clause boundary), merge them back into one speaker turn.
Add a missing character. If the model invented a new speaker name for someone who was already in your character list (a diminutive, a title, a nickname that did not match any existing tag), you add the canonical character explicitly and re-tag the affected lines.

The thing to notice is what the editor does not have: no "regenerate this paragraph with a slightly different prompt." The review surface is structured edits to the speaker map, not free-text prompt churn. That is deliberate. It means the writer never has to read model output to decide if the model "got it right" in some squishy sense. The question is just: does this line have the correct speaker tag, yes or no.

Failure modes (and how to make them go away)

The Auto-Assign Characters pass is reliable on text that uses conventional dialogue mechanics. Where it gets noisy is on stylistic choices that defeat the cues a reader uses to attribute speech. From the customer-support FAQ:

If many lines are wrong, often it's because the source text uses unusual dialogue conventions (e.g., no quotation marks, unusual attribution patterns). Standardize punctuation in the source and re-run Auto-Assign.

In practice the three patterns that produce the worst attribution noise are:

No quotation marks at all. Some literary fiction renders dialogue as italic-only or em-dash-prefixed. The model has nothing to anchor on, and dialogue ends up tagged as narration. If you want a clean speaker map on text like this, the lift-and-shift fix is to add quotation marks in your source before running the pass. The audio output is the same: the marks are not spoken, they are just attribution cues for the model.
Attribution at the end of long compound sentences. A line that runs "I would rather not, said the Caterpillar, settling back on its mushroom and exhaling another cloud of smoke that drifted over the hatter's ear." will sometimes get the attribution recovered correctly and sometimes get split across speakers. The fix is editorial: shorter sentences, or attribution-before-quote, produce cleaner output.
Unnamed background speakers. A crowd scene with "someone shouted from the back" or "a voice from the doorway" tends to get tagged as Narrator (because the speaker has no name to match against the character list). If you want it voiced distinctly, add an explicit character (Background Voice 1, Voice from Doorway) and re-tag.

None of these are model bugs in the usual sense. They are the same edge cases a copyeditor would flag for any narrator-and-cast read. The editor is structured around fixing them line by line rather than fighting the model.

Carrying characters across a series

The pass operates per chapter, but writers operate per book or per series. The bookkeeping that survives across runs is the character list with assigned voices, not the per-line speaker map. The mechanics:

In a new project, the three-dot menu next to "Add Character" lets you import the character list (with assigned voices and per-character settings) from another project. The new project starts with Alice already pointing at the female_30s_dry voice you picked in book 1.
Inside a single project, the character editing menu supports grouping characters into folders. For ensemble casts the flat list gets unwieldy fast: folders by location, by POV, by plotline, or by chapter range keep the panel scannable.

The Auto-Assign Characters pass on chapter 7 of book 3 then starts from a populated character list and tags new lines against the canonical names. You do not need to re-tell the system that "Kael" is the same character it was twelve chapters and three months ago.

What this means for the AI Words quota

Auto-Assign Characters counts against the AI Words meter, not the Audio Generation Words meter. They are separate quotas. From the product copy:

Both meters get the full plan allowance independently. They don't share a single budget. So a Professional Writer subscriber has 100K AI Words and a separate 100K Audio Generation Words per month.

In practical terms: running Auto-Assign Characters on a 5,000-word chapter eats 5,000 AI Words and zero Audio Generation Words. You can run it, look at the speaker map, adjust voices in the Characters panel, run Auto-Assign Sounds (also AI Words, also free of the audio budget), and only then trigger Generate Audio. The first three steps are reviewable without spending any of your audio rendering allowance. That matters when you are iterating on a draft: you can re-run Auto-Assign after a source edit without the meter cost feeling expensive.

What the pass does not try to do

Two things worth being explicit about, since the surrounding industry copy often blurs them.

The pass does not generate dialogue. It tags existing prose. Every word in the output comes from text the writer put in. If the input is "Alice said hello to the rabbit," the pass labels "hello" as Alice and the rest as narrator. It does not invent or rewrite.

The pass does not enforce voice continuity across runs by itself. Voice assignments live on the character record, not on the speaker map. If you reassign Alice from voice A to voice B and re-generate the audio, every line tagged "Alice" picks up voice B. The speaker map stays the same; the audio sounds different. That separation is what makes voice experimentation cheap.

A short note on iteration shape

The natural workflow is paste-or-import, run Auto-Assign Characters, scan the speaker map in the editor for obvious misses (narrator-vs-character flips, missing in-world labels, over-segmented quotes), fix those in place, then run Auto-Assign Sounds and scan again. The two passes are independent: a re-run of Sounds does not touch the character map and vice versa. If a chapter feels off after audio renders, the diagnosis usually points at one of the two artifacts (wrong speaker on a key line, or wrong soundscape under a scene), not at "the model"; the structured editing is what makes that diagnosable.

If you want to play with the pass on your own text, the import path is at audioproducer.ai: paste a chapter into a blank project, click Auto-Assign, and the speaker map shows up in the editor.

Disclosure: this article was drafted by an AI agent working on behalf of the AudioProducer.ai team.