AudioProducer.ai

Posted on May 12

How AudioProducer.ai's Auto-Assign pipeline turns a chapter into a multi-voice audio drama

#ai #audio #tts #writing

Making a multi-voice audiobook used to mean booking a studio, casting voice actors, and waiting weeks for production. We built AudioProducer.ai to compress that pipeline into something a writer can run from a browser in an afternoon — plain chapter text in, finished audio drama out, with the AI doing the bulk of the markup work.

This post is a walkthrough of the two passes that do the heavy lifting: Auto-Assign Characters and Auto-Assign Sounds. We'll cover what each pass takes as input, what it produces, and how the editor surfaces the result for you to tune. The goal is to make the pipeline legible — so when you sit down with a chapter and click the buttons, you know what the system is actually doing on your behalf.

The shape of the pipeline

The product treats every project as a sequence of chapters. For each chapter, the pipeline runs in four phases:

Source text in. Paste a chapter into the editor, or upload an .epub and the project gets pre-populated with chapter structure, titles, and body text.
Auto-Assign Characters. One-click AI pass that reads the chapter and tags every line by speaker — narrator, named characters, even in-world labels.
Auto-Assign Sounds. Second one-click AI pass that analyzes the scene and places music beds, ambient soundscapes, and one-shot sound effects from the built-in library.
Generate Audio. A single button renders the chapter into a finished audio file using the assigned voices and placed sounds.

Auto-Assign is a starting point, not a final answer. The editor is built around the idea that you'll keep what the AI got right and correct what it got wrong, in seconds per line — not by hand-tagging the whole chapter.

Auto-Assign Characters — what it actually does

Input: raw chapter text. Output: every line attributed to a speaker, with the speakers populated as a per-character voice slot on the project.

Three kinds of speakers come out of this pass:

Narrator. Anything that isn't a character speaking goes to the narrator track. Description, scene-setting, action beats.
Named characters. "Alice", "the White Rabbit", "Eryndor" — every distinct named voice in the chapter gets its own slot. The AI handles attribution heuristics (who's speaking based on dialogue tags, conversational context, scene cues) so you don't have to walk every line manually.
In-world labels. This is the part that surprises new users. The AI catches text that isn't spoken by a character but should still be voiced distinctly — labels on jars, signs in a scene, captions a narrator reads aloud. In our editor's Alice in Wonderland example, you'll see entries like "Cake Label", "Label on the jar", and "Bottle Label" as distinct voice slots alongside Alice and the White Rabbit. Give those labels their own narrator voice (or even their own character voice if you want), and they read differently than the surrounding prose.

After the pass, the Characters panel shows you every speaker the AI extracted with a voice already provisionally assigned. You can swap any of those voices from the 132-voice library on your Voices page — or replace them with a voice you've cloned yourself.

Correcting what the AI gets wrong

The Auto-Assign is good but not perfect. Common cases where it gets a line wrong:

The source uses unusual dialogue conventions (no quotation marks, character-speech embedded in narration paragraphs, attribution patterns the AI hasn't seen often).
A scene has two characters with similar names and the attribution-by-context heuristic picks the wrong one.
A line of free indirect speech ("Alice thought it odd that the rabbit was wearing a waistcoat...") could be the narrator or could be inside Alice's head — judgment call.

The editor handles the fix with a two-click pattern: select the line, pick the right character from the dropdown, done. You don't re-run the whole pass; the rest of the chapter's tags are preserved. For source texts with widespread attribution problems, the more efficient move is usually to standardize the punctuation in the source first and re-run the pass — the AI's accuracy is much higher on well-marked-up source.

Auto-Assign Sounds — what it actually does

Input: the same chapter text, now with characters assigned. Output: music beds, ambient soundscapes, and one-shot sound effects placed at the right moments.

The pass distinguishes three audio types:

Music beds — long-form atmospheric tracks that play under sections of text. Use to set tone for a scene or sequence.
Ambient soundscapes — environmental layers (wind, rain, crowd noise, ocean). These set place rather than mood.
One-shot SFX — discrete events tied to a specific moment in the text. The chips show up inline in the editor at the moment they play, with the sound name and duration: "Distant Thunder (4s)", "Wind Howl (6s)", "Stones Launching (3s)".

In practice, what this looks like for an action scene: the chapter opens, the AI places a tense music bed under the first few paragraphs, layers a wind-howl soundscape over the storm description, and drops a "Stones Launching" SFX exactly on the line where the slingshot fires. All in one click; the chips are visible in the editor view, so you can see exactly what was placed where.

The tune-it pattern is the same as for characters: keep what fits, replace what doesn't. The Sounds panel of the editor lets you swap any placed track for a different one from the library, drag SFX to different moments, or remove placements that read as noise.

The voice library angle

The Auto-Assign Characters pass gets you to "every speaker has a voice." The Voices page is where you decide which voice each speaker gets.

A few notes that matter for picking voices:

132 voices in the library as of this writing, across a mix of male / female / unlabeled, middle-aged / young / older, plus dedicated child-male and child-female voices for kids' content. Accent coverage is mostly American with British, US-Southern, Irish, Australian, Indian, and Spanish-accented English in the mix.
The library is actively growing. New voices land regularly; the canonical source is your in-app Voices page.
Per-line emotion control. Same voice, different inflection per line. You attach an emotion tag (anger, fear, calm, etc.) to specific dialogue lines in the editor.
Voice cloning. You can clone a voice (your own, or any voice you're authorized to use) and use it like any library voice. Useful for narrating in your own voice without a recording rig, for distinct character voices that aren't in the library, or for brand consistency on a podcast.

Voice changes don't require re-running Auto-Assign — they take effect on the next Generate Audio.

Putting it together

The flow, end to end, from a fresh project:

Create a project; either paste a chapter into the editor or import an .epub.
Click Auto-Assign Characters. Review the Characters panel; correct any obvious miscasts.
Click Auto-Assign Sounds. Review the placed music / SFX in the editor; swap or remove what doesn't fit.
Open the Voices page; swap library voices into the character slots that need them, or assign a cloned voice if you've made one.
Click Generate Audio. The chapter renders into a downloadable audio file.

No external audio software in the loop. No separate DAW for mixing. The editor is the place where the audio production happens and the audio file is the output.

Pauses and pacing (briefly)

One detail that comes up frequently for writers used to manually mixing audiobooks: pause control. Pauses are configurable at four levels:

Inline pauses for dramatic effect inside a paragraph.
Project-wide default pause between paragraph breaks (set once per project).
Per-paragraph override when a specific transition needs more breath.
Intro pauses for the project intro (title / author / narrator) and chapter intros.

Multiple consecutive blank lines collapse to a single pause, which is usually what you want.

A note on what the pipeline doesn't do

Auto-Assign covers character attribution and sound placement. A few things sit outside the pipeline:

Publishing to Audible, Spotify, or Apple Podcasts — the output is export-ready, but you upload to those platforms yourself.
Royalty or sales tracking for audiobooks sold elsewhere.
Non-EPUB import — .docx, .pdf, .mobi, and .txt aren't supported import formats today; for those, paste chapter-by-chapter into a blank project or convert your source to .epub first.

We mention these so the pipeline picture is accurate — the audio production is end-to-end inside AudioProducer.ai; distribution is your last step outside it.

Try it

There's a free tier (1,200 words per month, no credit card) on audioproducer.ai. Pick a chapter, run both Auto-Assigns, swap a couple of voices, click Generate. The fastest way to develop intuition for what the pipeline does well — and where you'll spend the most editing time — is to feed it ten pages of your own writing and see what comes back.

Disclosure: this article was drafted by an AI agent working on behalf of the AudioProducer.ai team.

DEV Community