AI Ebook Generator for Developers: Build Your Own Publishing Pipeline in Python

#python #llm #codenewbie #productivity

If you search "AI ebook generator" you will find SaaS tools that promise a book in minutes. They work fine for general non-fiction. They fail for technical content — broken code examples, shallow explanations, no AST validation, no control over the prompt structure. Building your own takes a day. It pays back on every book after the first.

🎁 Free resource: AI Publishing Checklist — 7 steps to ship a technical ebook with Python (free, no email required) · Full pipeline + 10 scripts: germy5.gumroad.com/l/xhxkzz (pay what you want, min $9.99)

Why Off-the-Shelf Tools Fall Short

The core problem is that technical ebooks require:

Code that actually runs — SaaS generators treat code as text. There is no execution or AST check.
Consistent terminology — "container" means something specific in Docker chapters. Generic tools drift.
Controllable depth — a chapter on async Python needs different treatment than a chapter on CI/CD.
Resumability — a 10-chapter generation run takes 15–20 minutes and costs API credits. If it crashes at chapter 8, you need checkpointing.

None of the major SaaS AI ebook generators handle any of these. They are content mills with an LLM front end.

The Core Generation Loop

The minimum viable AI ebook generator is a prompt template, a state machine, and an output validator.

#!/usr/bin/env python3
"""
generator.py — LLM generation loop with checkpoint.json state machine
"""
import ast
import json
import re
import sys
from pathlib import Path
import anthropic

CHECKPOINT_FILE = Path("checkpoint.json")
CHAPTERS_DIR = Path("chapters")

# ── State Machine ─────────────────────────────────────────────────────────────

def load_state() -> dict:
    """Load generation state; resume from last successful stage."""
    if CHECKPOINT_FILE.exists():
        state = json.loads(CHECKPOINT_FILE.read_text())
        completed = sum(1 for v in state.get("chapters", {}).values() if v["status"] == "done")
        print(f"Resuming: {completed} chapters already done")
        return state
    return {"stage": "init", "chapters": {}, "errors": []}

def save_state(state: dict):
    CHECKPOINT_FILE.write_text(json.dumps(state, indent=2))

def mark_chapter(state: dict, slug: str, status: str, path: str = ""):
    state["chapters"].setdefault(slug, {})
    state["chapters"][slug]["status"] = status
    if path:
        state["chapters"][slug]["path"] = path
    save_state(state)

# ── Prompt Engineering ────────────────────────────────────────────────────────

SYSTEM_PROMPT = """You are writing a chapter for a technical ebook targeting senior developers.
Rules:
- Every code example must be complete and syntactically valid Python 3.11+
- No filler phrases like "In this chapter we will explore..."
- Use concrete examples, not abstract descriptions
- Target 1100-1300 words
- Markdown output only"""

def build_chapter_prompt(chapter: dict, book_context: str) -> str:
    return (
        f"Book context: {book_context}\n\n"
        f"Chapter title: {chapter['title']}\n"
        f"Topics to cover:\n" +
        "\n".join(f"- {t}" for t in chapter["topics"]) +
        "\n\nWrite the full chapter now."
    )

# ── Generation ────────────────────────────────────────────────────────────────

def generate_chapter(
    client: anthropic.Anthropic,
    chapter: dict,
    book_context: str,
    model: str = "claude-sonnet-4-5"
) -> str:
    response = client.messages.create(
        model=model,
        max_tokens=4096,
        system=SYSTEM_PROMPT,
        messages=[{
            "role": "user",
            "content": build_chapter_prompt(chapter, book_context)
        }]
    )
    return response.content[0].text

# ── Validation ────────────────────────────────────────────────────────────────

def extract_python_blocks(content: str) -> list[str]:
    return re.findall(r"```

python\n(.*?)

```", content, re.DOTALL)

def validate_chapter(content: str) -> tuple[bool, list[str]]:
    """AST-check all Python fences. Returns (ok, errors)."""
    errors = []
    blocks = extract_python_blocks(content)
    if not blocks:
        # Warn but don't fail — some chapters may have no code
        print("  Warning: no Python blocks found")
    for i, block in enumerate(blocks):
        try:
            ast.parse(block)
        except SyntaxError as e:
            errors.append(f"Block {i}: {e.msg} at line {e.lineno}")
    return len(errors) == 0, errors

# ── Main Loop ─────────────────────────────────────────────────────────────────

def run(book_config: dict):
    client = anthropic.Anthropic()
    state = load_state()
    CHAPTERS_DIR.mkdir(exist_ok=True)

    chapters = book_config["chapters"]
    book_context = book_config["context"]

    for chapter in chapters:
        slug = chapter["slug"]
        ch_state = state["chapters"].get(slug, {})

        # Skip completed chapters
        if ch_state.get("status") == "done":
            print(f"  Cached: {slug}")
            continue

        print(f"  Generating: {chapter['title']}")
        mark_chapter(state, slug, "generating")

        try:
            content = generate_chapter(client, chapter, book_context)
        except Exception as e:
            print(f"  API error: {e}")
            mark_chapter(state, slug, "api_error")
            sys.exit(1)

        # Validate
        ok, errors = validate_chapter(content)
        if not ok:
            print(f"  Validation failed:\n" + "\n".join(errors))
            # Save draft anyway for inspection
            draft_path = CHAPTERS_DIR / f"{slug}_DRAFT.md"
            draft_path.write_text(content)
            mark_chapter(state, slug, "validation_failed", str(draft_path))
            # Continue to next chapter; fix manually and re-run
            continue

        # Persist
        ch_path = CHAPTERS_DIR / f"{slug}.md"
        ch_path.write_text(content)
        mark_chapter(state, slug, "done", str(ch_path))
        print(f"  Done: {ch_path}")

    done = [s for s in state["chapters"].values() if s.get("status") == "done"]
    failed = [s for s in state["chapters"].values() if s.get("status") == "validation_failed"]
    print(f"\nSummary: {len(done)} done, {len(failed)} need review")

# ── Config ────────────────────────────────────────────────────────────────────

if __name__ == "__main__":
    config = {
        "context": "A practical guide to FastAPI production deployment on AWS for senior Python developers.",
        "chapters": [
            {
                "slug": "ch01-project-structure",
                "title": "Production FastAPI Project Structure",
                "topics": [
                    "Layered architecture (routers, services, repositories)",
                    "Dependency injection patterns",
                    "Settings management with pydantic-settings",
                ]
            },
            {
                "slug": "ch02-async-patterns",
                "title": "Async Patterns That Actually Matter",
                "topics": [
                    "When async helps and when it hurts",
                    "Background tasks vs Celery",
                    "Avoiding common async pitfalls",
                ]
            },
        ]
    }
    run(config)

The State Machine in Detail

The checkpoint.json state machine is the feature that makes this production-grade. States per chapter:

generating — API call in flight
done — content written to disk, AST-clean
validation_failed — draft saved, needs human review
api_error — unrecoverable, pipeline halted

When you re-run after a partial failure, the loop reads checkpoint state and skips done chapters. You only pay for regeneration when validation fails or you explicitly want a fresh version.

Custom vs. Off-the-Shelf: The Real Tradeoff

Factor	SaaS Tool	Custom Pipeline
Setup time	5 minutes	4–8 hours
Code validation	None	AST + optional exec
Prompt control	Limited templates	Full control
Cost per book	$15–50/month subscription	~$1.50 API costs
Resumability	Usually none	Full checkpoint
Multi-language	Rare	Straightforward

For non-technical content — marketing books, guides, how-tos — SaaS tools are fine. For code-heavy technical books, the custom pipeline pays for itself on the first book where you catch a broken example before it ships to readers.

Extending the Loop

The generation loop above is intentionally minimal. Common extensions:

Temperature control: lower temperature (0.3–0.5) for more consistent code style
Retry logic: exponential backoff on API rate limits
Word count enforcement: post-process and flag chapters outside 1000–1400 words
Cross-chapter consistency: pass previous chapter summaries as context

Each extension is a function you own. No SaaS tool gives you that control.

If you want the complete system — prompts, validation, state machine, Pandoc compilation, Gumroad integration — rather than building from scratch: Python Ebook Automation Pipeline ($9.99, 30-day money-back guarantee

If this saved you time, the ❤️ button helps other developers find it.

📋 Free: AI Publishing Checklist — 7 steps to ship a technical ebook with Python (PDF, free)