DEV Community

German Yamil
German Yamil

Posted on

Show Dev: I Built a Python Pipeline That Writes, Validates, and Publishes Bilingual Ebooks — Here's Everything

Show Dev: I Built a Python Pipeline That Writes, Validates, and Publishes Bilingual Ebooks

Six weeks ago I had an idea that felt slightly ridiculous.

What if I built an automated pipeline that generates a technical ebook — and then used that pipeline to produce the ebook that documents itself?

The ebook about the pipeline would be the proof that the pipeline works.


🎁 Free resource: AI Publishing Checklist — 7 steps to ship a technical ebook with Python (free, no email required) · Full pipeline: germy5.gumroad.com/l/xhxkzz (pay what you want, min $9.99)


What I built

A Python pipeline that:

  1. Takes an outline.json (10 chapters, each with title, word target, code deliverable)
  2. Generates each chapter using the Claude API
  3. Validates every code snippet through two gates before advancing
  4. Translates each chapter to Spanish with QA checks
  5. Assembles two EPUBs (EN + ES) with Pandoc
  6. Creates the Gumroad product listing via API

Total active time per book: 4–6 hours. The pipeline runs the rest unattended.

The core: two-gate code validation

Most technical ebooks have code that was never tested. I made that impossible.

Gate 1: AST parsing

import ast

def validate_syntax(code: str) -> bool:
    try:
        ast.parse(code)
        return True
    except SyntaxError as e:
        print(f"Syntax error at line {e.lineno}: {e.msg}")
        return False
Enter fullscreen mode Exit fullscreen mode

Gate 2: Subprocess isolation

import subprocess, tempfile, os

def validate_execution(code: str, timeout: int = 30) -> bool:
    with tempfile.TemporaryDirectory() as tmpdir:
        path = os.path.join(tmpdir, "test.py")
        with open(path, "w") as f:
            f.write(code)
        result = subprocess.run(
            ["python3", path],
            capture_output=True, timeout=timeout, cwd=tmpdir
        )
        if result.returncode != 0:
            print(result.stderr.decode())
            return False
        return True
Enter fullscreen mode Exit fullscreen mode

A chapter only reaches DONE state when both return True. There is no override.

The state machine

PENDING → RUNNING → DONE
                  ↘ NEEDS_REVIEW → (fix) → PENDING
Enter fullscreen mode Exit fullscreen mode

Every state change writes to disk immediately. If the process crashes mid-generation, the next run resets RUNNING chapters to PENDING and skips DONE ones. I've had 3 crashes during production — no data loss, no re-doing finished chapters.

Translation QA

After English generation, the pipeline generates Spanish and checks:

  • Code fence count — EN and ES must have identical \\` fence pairs. Mismatch = dropped code block = hard failure
  • Word ratio — Spanish typically runs 10–15% longer than English. Deviation > 20% flags for review

python
def validate_translation(en_content: str, es_content: str) -> bool:
import re
en_fences = len(re.findall(r'
', en_content))
es_fences = len(re.findall(r'', es_content))
if en_fences != es_fences:
raise ValueError(f"Fence mismatch: EN={en_fences}, ES={es_fences}")
en_words = len(en_content.split())
es_words = len(es_content.split())
ratio = abs(en_words - es_words) / en_words
if ratio > 0.20:
print(f"Word ratio warning: {ratio:.2%} deviation")
return True

The economics (stated plainly)

Item Value
Infrastructure cost $20/month (Claude Code Pro only)
Price $9.99+ (pay what you want)
Break-even 2 sales
Time per book 4–6 hours active
Marginal cost, book #10 Same as book #1

What failed and what I learned

Failure 1: Published 10 articles on one day in April. Dev.to and Google suppressed the batch. Average 11 views/article for those vs. 54 views for the ones I published individually.

Lesson: Space content by at least 24 hours. One article per day maximum.

Failure 2: Translation sometimes produced code with Spanish variable names. Added explicit instruction to the prompt: "All variable names, function names, and comments must remain in English."

Failure 3: Some generated scripts used pandas or numpy which aren't in the clean subprocess environment. Fixed by adding to the prompt: "Use only Python stdlib. No third-party imports."

Failure 4 (ongoing): 0 sales so far after 16 days. 268 Dev.to views total. The math says I need ~3,000–5,000 views before expecting consistent sales. Working on volume.

The meta-proof

The ebook that documents this pipeline was produced by this pipeline.

Every one of its 10 chapters passed both validation gates before shipping. The Spanish edition was checked by the translation QA script. The EPUB was assembled and validated by epubcheck with zero errors.

I could claim this. Or I could build a system where it's the only possible outcome. I chose the second.


Free 7-step checklist: germy5.gumroad.com/l/vlvhld — free, no email

Full pipeline (10 scripts + complete ebook): germy5.gumroad.com/l/xhxkzz — pay what you want, min $9.99, 30-day refund


Questions? What part of the architecture would you build differently? Drop a comment — genuinely curious.

Top comments (0)