Importing an EPUB into an AI voice pipeline: what the chapter list looks like before audio runs

#ai #audio #tts #writing

If you build any pipeline that takes a book as input, the first concrete problem is not "what does the model do." It is "how do I get the book into a shape the model can consume, chapter by chapter, without typing it twice."

In AudioProducer.ai that input shape is one of two things: a blank project where you paste text directly, or an EPUB upload that populates the project's chapter list automatically. We support EPUB and paste. We do not support .docx, .txt, .pdf, or .mobi today; for those, the route is to convert the source to EPUB first (Calibre is the usual answer) and then import.

This article is about the EPUB side: what comes across when you upload one, the predictable places where the chapter structure does not match what the writer expected, and how the review surface in front of audio generation handles those cases before any compute is spent.

What an EPUB actually is, briefly

An .epub file is a ZIP archive with a known directory layout. Inside it there is a manifest file (content.opf) that lists every document in the package; a navigation document (nav.xhtml for EPUB 3, or toc.ncx for the older EPUB 2 path) that defines the table of contents; and one or more XHTML files that hold the actual prose.

The interesting fact for any chapter-splitting code: there is no single canonical way an EPUB encodes "where chapter 4 begins." Some books have one XHTML file per chapter, with the nav document pointing at each file's root. Some have a smaller number of XHTML files, each containing several chapters, with the nav document pointing at specific anchors inside them. Some have one big file with chapter titles as <h1> or <h2> headings and no nav-doc detail beyond a single top-level entry. All three are valid EPUB, and all three are real in the wild.

If you have ever tried writing your own EPUB-to-chapters parser, you already know the implication: there is no single rule that covers every file. Whatever you do is going to be wrong on some books, which means you need a surface in front of the writer that lets them see what you got and adjust before the rest of the pipeline runs.

What we populate when you upload one

When an EPUB lands in AudioProducer.ai as a new project, three things show up in the project: the chapter list (each item gets a row), the chapter titles as they were in the source, and the body text of each chapter ready to be marked up. From the writer's point of view, the slow part of bootstrapping a project is now done. They can open chapter 1, run Auto-Assign Characters and Auto-Assign Sounds, pick voices, and generate audio.

The chapter list is the review surface. It is intentionally separate from the audio pipeline. Auto-Assign runs and audio generation both work per-chapter, and they both cost compute. The chapter list is where you fix structure before that cost is paid. If the import produced ten chapters but your book has eight, this is the place to see and reconcile that.

Where the chapter list does not match expectations

The most common reasons the auto-populated chapter list does not match what the writer pictured, in roughly the order we see them:

Front matter shows up as chapters. Many EPUBs include a copyright page, dedication, epigraph, acknowledgments, or a publisher logo page as separate XHTML files with their own nav entries. From the EPUB's point of view they are chapters. From the writer's point of view they are not. They appear in the imported list, often before chapter 1.

Back matter shows up as chapters. Same pattern in reverse. About-the-author, "also by this author" lists, sample chapters of a different book, ad pages. They were structurally chapters in the source; they probably should not be narrated.

Section grouping reads as a chapter. Books that have Parts (Part I, Part II) sometimes encode each Part's title page as its own XHTML file with a nav entry. That title page is then one row in the chapter list, with the actual numbered chapters following.

Chapter titles are not the titles the writer would have picked. EPUB metadata is sometimes optimized for an e-reader's navigation pane, not for being read aloud. Titles like Chapter_01_v3_final are real. Titles that are just numbers (1, 2, 3) are common. Titles in mixed case where the writer's manuscript was all-caps, or vice versa, happen routinely.

Chapter boundaries do not match the writer's mental boundaries. Some books bundle several short numbered chapters into one XHTML file, and the EPUB's nav document does not have anchor-level granularity into that file. The imported chapter list ends up with one row that contains "Chapter 17, Chapter 18, Chapter 19" inside it as the body text.

None of these are bugs in the EPUB or in any specific parser; they are just consequences of the format being structurally permissive. The takeaway in the pipeline is that the chapter list is where you discover them.

The review surface, what it does

The chapter list shows, for each row, the chapter title and the body that landed in that row. Two things you can do before running Auto-Assign on anything:

Remove a chapter. Front matter, back matter, and section title pages you do not want narrated come out here. If you decide later that you do want the copyright page read out (some audiobooks do open with one), you can rebuild the project; in practice the more common move is to keep them out.

Rename a chapter. The chapter intro feature reads the chapter name out loud at the start of every chapter, optionally with a custom template like Now beginning ${name}. That means the chapter title is not just an organizational label; it is content that listeners hear. Chapter_01_v3_final becomes a real audio artifact unless you rename it. The chapter list is where that rename happens, before any audio generation runs.

Once the chapter list reads the way the writer expects, the rest of the workflow is per-chapter and reversible: open a chapter, click Auto-Assign Characters, click Auto-Assign Sounds, pick voices, listen to the generated audio, edit lines or markup that did not land right, re-generate.

What this does not handle

A few honest limits worth naming.

Other manuscript formats. As above, we support EPUB and blank/paste. .docx, .txt, .pdf, and .mobi are not currently importable. Pasting chapter-by-chapter into blank projects is the workaround. Converting to EPUB first is the better workaround for anything over a few chapters.

EPUBs whose internal structure encodes "chapter" in a way the writer disagrees with. The chapter list reflects what the EPUB says, not what the writer meant. The fix is in the review step, not in the import step. If you have a regular pattern that always wrong-splits the same way (e.g., every Part page comes through), it is often easier to re-export the EPUB from your source with cleaner structure than to fix it in the project every time.

Re-import. If you change the EPUB and want to re-import it into the same project, the pattern is to start a new project rather than overwrite. Project-level customizations (voices, sound design, edits in the markup) live with the project, so a re-import on top would not be a clean merge.

What this changes about the workflow

The shortest read on what EPUB import gives you, before any model touches the manuscript: a chapter list to look at, on a page that does not cost compute, with the option to remove and rename rows until the structure matches the book the writer thinks they are working on. The audio pipeline runs after that, per-chapter and on your signal. The cost-bearing steps (Auto-Assign passes, audio generation) sit on the other side of a review the writer controls.

If you want to see what your own manuscript looks like through this path, the free tier is the easiest way in: 1,200 words per month, no credit card. Upload an EPUB, look at the chapter list, fix anything that landed wrong, run Auto-Assign on chapter 1, generate the audio, and listen. Most of the questions a writer has about whether the rest of the workflow fits how they work are answerable inside that one chapter.

Start a project at audioproducer.ai.

Disclosure: this article was drafted by an AI agent working on behalf of the AudioProducer.ai team.