Mykyta Chernenko

Posted on Feb 22

I Built an AI Pipeline for Books, Here's the Architecture

#ai #architecture #showdev #writing

We Treated Book Generation as a Compiler Pipeline. Here's What We Learned From 50K Books.

Most AI writing tools are chat wrappers. Paste a prompt, get text, copy into Google Docs, repeat. For a full book that's hundreds of round trips, and you lose all context between them.

I've spent 3 years in the AI + publishing space. Published books myself, built a reading platform (NanoReads, 130+ books, 341K readers), talked to hundreds of authors. The same complaints kept coming up: AI loses track of what happened 10 chapters ago, every chapter sounds different, dialogue is flat, and the output is full of "Moreover," and "Furthermore," and "It's worth noting that."

These aren't model quality problems. After generating 50K+ books on our platform (AIWriteBook), we're pretty confident the bottleneck is the specification pipeline, not the language model.

The architecture

We treat book creation as a multi-stage compilation pipeline:

Book Metadata -> Character Graph -> Chapter Outlines -> Chapter Content
     |               |                  |                  |
  (schema)       (schema)           (schema)          (streaming)

Each stage produces schema-constrained structured output that feeds the next stage. Nothing is freeform until the final prose generation.

Stage 1: Book metadata

User provides title + description. AI generates structured details:

{
  "title": "The Dragon's Reluctant Mate",
  "genres": ["Fantasy", "Romance"],
  "tone": ["dark", "romantic", "suspenseful"],
  "style": ["dialogue-heavy", "fast-paced"],
  "target_audience": "Adult fantasy romance readers",
  "plot_techniques": ["enemies-to-lovers", "slow-burn", "foreshadowing"],
  "writing_style": "..."
}

Everything downstream uses this as context. Tone, style, audience are constraints, not suggestions.

Stage 2: Character graph

Each character is a structured node with voice, motivation, arc, internal conflict. The important bit: when generating a chapter, we only pass the characters present in that chapter. The model gets their specific voice patterns, current arc position, relationship dynamics with the other characters in the scene.

{
  "name": "Kira Ashvane",
  "role": "protagonist",
  "voice": "Sharp, clipped sentences. Uses sarcasm as defense.",
  "motivation": "Prove she doesn't need the dragon clan's protection",
  "internal_conflict": "Craves belonging but fears vulnerability",
  "arc": "Isolation -> reluctant alliance -> trust -> sacrifice"
}

This is why dialogue doesn't all sound the same. The model has explicit voice specs per character instead of trying to infer it from nothing.

Stage 3: Chapter outlines

This turned out to be the most important stage. Each chapter gets a spec:

{
  "chapter_number": 3,
  "title": "The Binding Ceremony",
  "events": ["Kira is forced to attend the bonding ritual", "..."],
  "locations": ["Dragon temple, obsidian halls lit by bioluminescent moss"],
  "twists": ["The ritual reveals Kira has dormant dragon magic"],
  "character_interactions": [
    {
      "characters": ["Kira", "Draethor"],
      "dynamic": "hostile tension with undercurrent of curiosity"
    }
  ],
  "word_count": 2800
}

We ran an internal comparison: same book concept, same voice training, one group used the default generated outline, the other spent time customizing it.

Metric	Default outline	Customized outline
Export rate	16%	34%
Satisfaction	3.4/5	4.3/5
Regenerations/chapter	1.8	0.7
Completion rate	41%	72%

A mediocre model with a detailed outline beats a good model with a vague outline. This is the same lesson as software: garbage requirements produce garbage output regardless of how good the team is.

Stage 4: Chapter generation

The only streaming stage. The model receives book metadata, relevant characters with voice specs, this chapter's outline, previous chapter summaries for continuity, and the author's writing style samples.

Two-model strategy: Gemini Flash for all structural work (fast, cheap, good at structured output), frontier model for actual prose.

Voice training

Authors can upload 3-5 writing samples. We extract style features and use them as few-shot examples during generation.

The numbers from our data:

2.4x higher export rate with voice training
41% fewer regeneration requests
67% less manual editing

Fewer than 3 samples: marginal improvement. More than 5: diminishing returns. We were surprised how narrow the sweet spot is.

Without voice training, the output sounds like default GPT. Authors recognize it instantly and either abandon the book or spend hours rewriting. With voice training, most of the "AI slop" problem disappears. The model is capable of varied prose, it just needs examples to anchor on.

Fiction vs. nonfiction are different pipelines

Fiction uses the character graph + plot continuity pipeline above.

Nonfiction is a separate architecture. Authors upload reference materials (research papers, coaching notes, blog posts, whatever their source material is). We extract content pieces and assign relevant ones to each chapter.

Reference Files -> Content Extraction -> Book Structure Selection
                                               |
                               Chapter Outlines (with assigned references)
                                               |
                               Chapter Content (with citations)

Nonfiction with reference materials vs. without:

38% higher export rate
Satisfaction: 4.4/5 vs. 3.5/5

When the model has specific data, named studies, real quotes to ground its writing, the output gets noticeably better. Without references it falls back on training data generalizations, and readers can feel the difference.

Things we learned from 50K books

Chapter length sweet spot is 2,000-3,500 words. Below that, chapters feel underdeveloped. Above 3,500, the model starts repeating itself with different phrasing, introducing tangents, padding with unnecessary description. Above 5,000, quality drops hard. If a chapter needs to be long, splitting it works better than generating one long one.

Genre matters a lot. Romance has a 31% export rate. Literary fiction has 11%. Humor is 13%. Poetry is 9%. The pattern: AI does well with genres that have established conventions and lots of training data. It struggles with voice-dependent and creativity-dependent writing. Makes sense intuitively, but it was useful to see the numbers.

Only 23% of generated books get exported for publishing. The ones that do share traits: 3.2x more time on outline editing, voice training enabled in 74% of cases, at least one manual edit in 89% of chapters. The books that make it to publish are iterated on, not one-click generated.

Multilingual quality varies a lot. Spanish, French, German are close to English quality. Polish, Russian, Japanese, Korean are good but noticeably lower. Smaller languages are usable for drafts. This maps directly to training data volume. For authors writing in smaller languages, generating in English and translating works better than generating natively.

Stack

Frontend: Next.js, Tailwind, Supabase client
Backend: Supabase Edge Functions (Deno)
AI: Gemini Flash (structural), frontier models (prose)
30+ languages supported

Wrapping up

The main thing we took away from building this: the quality problem in AI-generated books is a specification problem, not a model problem. If you give the model a vague prompt and hit generate, you get slop. If you give it a detailed character graph, a structured outline, voice samples, and proper constraints, the output is genuinely good.

If you want to poke at it, there's a free tier that gives you a full 7-chapter book: aiwritebook.com

Happy to answer questions about the architecture, the data, or anything about AI + publishing.

Tags: #ai #writing #books #showdev #webdev #productivity