German Yamil

Posted on May 3 • Edited on May 22

Python Content Pipeline Architecture: The JSON Queue That Drives Everything

#python #automation #tutorial #beginners

The outline.json Format That Drives My Automated Python Ebook Pipeline

Everything in the pipeline starts with one file: outline.json.

It's the manifest. It defines what the pipeline generates, validates, translates, and publishes. Change the file, run the pipeline, get a different book.

Here's the full format, every field, and a real working example.

🎁 Free: AI Publishing Checklist — 7 steps in Python · Full pipeline: germy5.gumroad.com/l/xhxkzz (pay what you want, min $9.99)

The Schema

{
  "title": "string — full book title",
  "subtitle": "string — used in Gumroad listing and KDP metadata",
  "author": "string",
  "language_primary": "en",
  "language_secondary": "es",
  "target_word_count": 22000,
  "chapters": [
    {
      "number": 1,
      "title": "string — chapter title (used as H1)",
      "slug": "string — used for filename: chapter-01-intro",
      "word_target": 2200,
      "code_file": "string — script_01_intro.py",
      "learning_objective": "string — what the reader can do after this chapter",
      "prerequisites": ["chapter-slug-1"],
      "tags": ["python", "automation"],
      "notes": "string — optional hints for the generation prompt"
    }
  ],
  "style_guide": {
    "voice": "string",
    "audience": "string",
    "code_conventions": ["Use only Python stdlib", "All functions must have docstrings"],
    "avoid": ["passive voice", "marketing language"]
  },
  "gumroad": {
    "price_cents": 999,
    "customizable_price": true,
    "tags": ["python", "ebook", "automation"]
  }
}

Field Reference

Top-level fields

Field	Type	Required	Description
`title`	string	✅	Full book title. Used in EPUB metadata and Gumroad listing
`subtitle`	string	✅	Subtitle for KDP and Gumroad. Aim for keyword richness
`author`	string	✅	Author name as it appears on the cover and EPUB metadata
`language_primary`	string	✅	Source language code (`en`)
`language_secondary`	string	❌	Target language for translation (`es`). Omit to skip translation
`target_word_count`	integer	❌	Total book target. Used for validation: sum of chapter `word_target` must be within 15%
`chapters`	array	✅	Array of chapter objects (see below)
`style_guide`	object	❌	Injected into every chapter prompt to enforce consistent voice
`gumroad`	object	❌	Used by `gumroad_create.py` to build the listing automatically

Chapter fields

Field	Type	Required	Description
`number`	integer	✅	Chapter number. Determines processing order and filename prefix
`title`	string	✅	Chapter title. Used as H1 in the generated markdown
`slug`	string	✅	Filename-safe identifier. Output files: `{slug}-en.md`, `{slug}-es.md`, `{slug}.py`
`word_target`	integer	✅	Target word count for this chapter. Enforced: ±15% tolerance
`code_file`	string	✅	Output Python script name. This file goes through both validation gates
`learning_objective`	string	✅	Injected into prompt: "After this chapter, the reader will be able to..."
`prerequisites`	array	❌	List of chapter slugs that must be in `DONE` state before this chapter can start
`tags`	array	❌	Topic tags for this chapter. Used to tune prompt focus
`notes`	string	❌	Free-form hints injected into the generation prompt for this chapter only

style_guide fields

Field	Type	Description
`voice`	string	Tone descriptor injected into every prompt: `"direct, technical, no marketing language"`
`audience`	string	Audience definition: `"Python developers with 2+ years experience"`
`code_conventions`	array	Rules applied to every code block: `["Use only Python stdlib", "All variable names in English"]`
`avoid`	array	Patterns to suppress: `["passive voice", "hedging language", "numbered lists for 2-item sets"]`

Real Working Example

This is the actual outline.json used to produce The AI Publishing Pipeline:

{
  "title": "The AI Publishing Pipeline",
  "subtitle": "Automated Ebook System for Python Developers",
  "author": "German Yamil",
  "language_primary": "en",
  "language_secondary": "es",
  "target_word_count": 22000,
  "chapters": [
    {
      "number": 1,
      "title": "Architecture Overview: The Four-State Pipeline",
      "slug": "chapter-01-architecture",
      "word_target": 2200,
      "code_file": "script_01_state_machine.py",
      "learning_objective": "set up the chapter state machine and understand PENDING, RUNNING, DONE, NEEDS_REVIEW transitions",
      "tags": ["python", "architecture", "state-machine"]
    },
    {
      "number": 2,
      "title": "Code Validation: AST Parsing and Subprocess Isolation",
      "slug": "chapter-02-validation",
      "word_target": 2200,
      "code_file": "script_02_validation.py",
      "learning_objective": "implement two-gate code validation that prevents broken scripts from shipping",
      "prerequisites": ["chapter-01-architecture"],
      "notes": "Show both gates as composable functions. Include a deliberate failure example."
    },
    {
      "number": 3,
      "title": "Crash Recovery: Making Long Runs Resumable",
      "slug": "chapter-03-crash-recovery",
      "word_target": 2200,
      "code_file": "script_03_recovery.py",
      "learning_objective": "implement startup state normalization so any crash is safely recoverable"
    },
    {
      "number": 4,
      "title": "Translation QA: Bilingual Output with Semantic Validation",
      "slug": "chapter-04-translation",
      "word_target": 2200,
      "code_file": "script_04_translation_qa.py",
      "learning_objective": "generate Spanish translations and validate them with code fence diffing and word ratio checks"
    },
    {
      "number": 5,
      "title": "EPUB Assembly: Pandoc, Metadata, and epubcheck",
      "slug": "chapter-05-epub",
      "word_target": 2200,
      "code_file": "script_05_epub_assembly.py",
      "learning_objective": "assemble chapters into a validated EPUB3 file using Pandoc with proper metadata"
    }
  ],
  "style_guide": {
    "voice": "direct, technical, first-person singular, no marketing language",
    "audience": "Python developers with 2+ years experience who want to automate content production",
    "code_conventions": [
      "Use only Python stdlib unless the chapter is specifically about a third-party library",
      "All variable names, function names, and comments must be in English even in translated chapters",
      "Every function must have a docstring",
      "Include inline comments for non-obvious logic"
    ],
    "avoid": [
      "passive voice",
      "phrases like 'it is important to note'",
      "numbered lists for sets of 2 items (use prose instead)",
      "ending sections with 'In summary,...'"
    ]
  },
  "gumroad": {
    "price_cents": 999,
    "customizable_price": true,
    "tags": ["python", "ebook", "automation", "publishing"]
  }
}

How the Pipeline Uses outline.json

import json

def load_outline(path: str) -> dict:
    with open(path) as f:
        outline = json.load(f)
    # Validate required fields
    assert "title" in outline
    assert "chapters" in outline and len(outline["chapters"]) > 0
    for ch in outline["chapters"]:
        assert "slug" in ch and "word_target" in ch and "code_file" in ch
    return outline

outline = load_outline("outline.json")

for chapter_def in outline["chapters"]:
    chapter = Chapter.from_dict(chapter_def)
    if chapter.state == ChapterState.DONE:
        continue  # skip already-done chapters
    process_chapter(chapter, outline["style_guide"])

The style guide is injected into the generation prompt for every chapter. This is what makes the voice consistent across 10 chapters even though each is generated independently.

Forking for a New Book

To produce a new book:

Copy the schema above
Change title, subtitle, author
Write your 10 chapters — titles, slugs, learning_objective for each
Update style_guide.audience and code_conventions for your domain
Run python3 generate_chapters.py --outline outline.json

The pipeline handles everything else.

Full pipeline code: germy5.gumroad.com/l/xhxkzz — pay what you want, min $9.99.

If this saved you time, the ❤️ button helps other developers find it.

DEV Community