Chapter-marker survival across the EPUB to multi-voice audio pipeline

#ai #audio #tts #epub

A chapter is the smallest unit a listener actually navigates. They open the audiobook in the middle of Chapter 7, leave it open on the dishes, come back later. The publisher cares about that unit too: when you upload to a major audiobook platform, each chapter typically ships as a separate audio file with its own title metadata, and the index your distributor builds depends on those splits.

Somewhere between "writer ships an EPUB" and "listener taps Chapter 7," the chapter boundaries have to survive intact through every pipeline stage. Inside our pipeline, that survival is the boring part. It is also where misalignments compound, because every later stage assumes the prior stage got the chapter list right.

This post walks through what AudioProducer.ai does to keep the chapter unit intact end-to-end, and where it gets stretched.

The chapter is a unit, all the way through

Before we walk the pipeline stage by stage, here is what stays constant. From import to download, a chapter is:

A title (the heading the listener will hear in the chapter intro and see in their player's index).
A body (the text that becomes spoken audio).
A set of annotations the editor attaches as production progresses: per-line speaker, per-character voice, per-paragraph emotion tag, per-paragraph sound annotation.
One rendered audio file when the writer hits Generate.

Every stage in the pipeline operates against one of these four properties of a chapter, and only one chapter at a time. There is no "whole book" rendering pass. The book is a list of chapters, each of which is its own audio render with its own production state. That isolation is what lets per-chapter operations stay tractable: re-render one chapter without touching the others, swap a character voice in chapter 3 without re-rendering chapter 4. It is what makes chapter integrity the load-bearing thing to get right.

Stage 1: Import, getting chapters out of the EPUB

EPUB is a zipped collection of XHTML files plus a navigation document (nav.xhtml for EPUB 3, toc.ncx for EPUB 2). The chapter boundary is conceptually wherever the navigation document points; in practice the source text inside each chapter file ranges from "one heading and the prose" to "multiple sub-headings, embedded images, and footnotes."

When a writer imports an EPUB into a new AudioProducer.ai project, the project comes pre-populated with the chapter structure, titles, and body text. They do not paste anything by hand. From the editor's point of view, the chapter list is the navigation document, projected as one editable chapter per nav entry.

What that does not tell you, and what most writers realize on first import, is how much non-chapter content was hiding in the EPUB. Common cases:

Front matter as its own chapter. Title page, copyright, dedication, table of contents, and acknowledgements often each ship as a separate nav entry in the source EPUB. They show up as chapters in the project after import. The writer's job at that point is to decide which of these she wants in the audio.
Back matter. "About the author," "Also by this writer," footnote bundles. Same shape, opposite end.
Part dividers. Books structured into Parts (Part I, Part II) often surface a Part page as its own nav entry, separate from the first chapter inside that part.
E-reader-optimized titles. EPUB chapter titles sometimes look like 01_chapter01.xhtml rather than "Chapter 1: A Beginning." The display title in the nav document is the one we use, but the writer may want to rename for the audio version where the chapter intro reads the title aloud.

The pipeline does not try to be clever about any of this. The chapter list after import is exactly what the EPUB declared, with all its naming and segmentation quirks. The editor exposes rename and remove operations so the writer can shape the imported chapter list into the chapter list she actually wants spoken.

Stage 2: Editor, annotations attach to chapters not to the book

Once the chapter list is finalized, the writer runs the two Auto-Assign passes: Characters and Sounds. Both run per chapter, and the output of each pass is annotation state owned by the chapter:

The speaker map Auto-Assign Characters produces is a per-line tag living on that chapter's text.
The sound annotations Auto-Assign Sounds places (music beds, ambient soundscapes, one-shot sound effects) are positioned at offsets within that chapter's body.
The per-character voice assignments are project-level, but they only matter at render time, when a chapter's speaker map references a character.

This per-chapter ownership is the practical reason chapter boundaries get to be load-bearing. Re-running Auto-Assign Characters on a single chapter does not touch the speaker maps of the others. Editing a character's voice in the Characters panel reflows the audio of every chapter where that character speaks, but does not re-trigger the Auto-Assign pass on chapters where she does not. There is no global state to corrupt by working chapter-locally.

When the same characters span a series, the import-characters-from-another-project action (three-dot menu next to Add Character) carries the full character list, with voice assignments, across into the new book. The chapter-level annotation state stays per-book; only the voice library is reused.

Stage 3: Render, one chapter one audio file

Generating audio runs against one chapter at a time. The chapter's body, the speaker map, the per-line emotion tags, and the per-paragraph sound annotations all feed into a single render that produces one finished audio file. The chapter intro (the title read aloud, an optional intro sound, a configurable pause) sits at the head of that file.

A few things follow from that:

The chapter file always begins with the chapter intro, when the writer enabled it. If the listener jumps to chapter 7 in their player, the first thing they hear is "Chapter 7" (or the custom ${name} template the writer set up in Edit Project, Chapter Intro), then the configured pause, then the chapter body.
Pauses inside the chapter are governed by the project-wide default plus any per-paragraph overrides the writer set in the editor. Multiple consecutive paragraph breaks collapse to a single pause, so an EPUB that uses double-blank-lines for scene transitions does not accidentally produce a multi-second silence.
The render is the unit of repetition. If a writer wants to swap a character voice mid-production, she changes the voice in the Characters panel and re-generates the chapters that character speaks in. The rest does not need to re-render.

Stage 4: Export, chapter as the downloadable unit

Each chapter is downloadable as its own audio file from the project. This is the part that closes the loop with how audiobook distribution actually wants the audio: most upload flows ask for one file per chapter, with the chapter title attached as metadata. The per-chapter file matches that one-to-one.

AudioProducer.ai does not handle the upload to any specific platform; the writer takes the per-chapter audio files from there and uploads them wherever she is publishing. But the export shape is already lined up with what major audiobook platforms expect: one finished audio file per chapter, each named after the chapter, each ready to drop into a publisher's upload form.

Takeaway

Chapter-marker survival end-to-end is unglamorous and load-bearing at the same time. Declared in the EPUB, projected into the editor as the unit Auto-Assign runs against, anchored as one render per chapter, exported as one file per chapter. Each stage assumes the chapter list it inherits is the truth. Most of what writers spend time on after import (renaming, removing front matter, deciding which Part dividers stay) is shaping that list to be exactly the chapter list the listener will eventually navigate.

If you want to hear what the per-chapter shape sounds like in practice, the AudioProducer.ai audiobook samples walk through the finished side of the pipeline.

Disclosure: this article was drafted by an AI agent working on behalf of the AudioProducer.ai team.