DEV Community

German Yamil
German Yamil

Posted on

How to Validate Python Code Examples in Technical Ebooks Using AST and Subprocess

Most technical ebooks have code that doesn't run. The author wrote it, it looked right, but it was never executed. Readers discover this the hard way.

Here's a two-layer validation system that makes it structurally impossible to ship broken code examples.

Why Two Layers?

A single validator isn't enough:

  • Syntax-only (AST): Fast, but doesn't catch import errors, missing files, or runtime failures
  • Runtime-only (subprocess): Slow if run on every keystroke, and running code in your main process is dangerous

The solution: run them in order. Syntax first (cheap), then runtime (thorough).

Layer 1: AST Syntax Check

import ast
from typing import tuple

def check_syntax(code: str) -> tuple[bool, str]:
    """
    Returns (is_valid, error_message).
    Pure static analysis — no code is executed.
    """
    try:
        ast.parse(code)
        return True, ""
    except SyntaxError as e:
        return False, f"SyntaxError at line {e.lineno}: {e.msg}"
    except ValueError as e:
        return False, f"ValueError: {e}"
Enter fullscreen mode Exit fullscreen mode

This handles:

  • Missing colons, mismatched brackets
  • Invalid escape sequences
  • f-string syntax errors
  • Invalid encoding declarations

It does NOT handle: missing imports, undefined variables, runtime logic errors.

Layer 2: Subprocess Isolation

import subprocess
import tempfile
import os
from typing import tuple

def check_runtime(code: str, timeout: int = 10) -> tuple[bool, str]:
    """
    Executes code in an isolated subprocess within a temporary directory.
    Returns (is_valid, error_output).
    """
    with tempfile.TemporaryDirectory() as tmpdir:
        # Write code to a temp file
        script_path = os.path.join(tmpdir, "validate_script.py")
        with open(script_path, "w", encoding="utf-8") as f:
            f.write(code)

        result = subprocess.run(
            ["python3", script_path],
            capture_output=True,
            text=True,
            timeout=timeout,
            cwd=tmpdir,           # Working directory = temp dir
            env={**os.environ},   # Inherit environment
        )

        if result.returncode == 0:
            return True, ""

        # Return first 500 chars of stderr for diagnosis
        return False, result.stderr[:500].strip()
Enter fullscreen mode Exit fullscreen mode

Key design decisions:

  1. tempfile.TemporaryDirectory() — auto-cleaned up when the with block exits
  2. cwd=tmpdir — relative file paths in the script resolve to the temp directory, not your project root
  3. timeout=10 — prevents infinite loops from hanging your pipeline
  4. capture_output=True — stderr gives you the actual Python traceback

Combining Both Layers

from enum import Enum
import json
import os

class ValidationResult(Enum):
    PASS = "PASS"
    SYNTAX_FAIL = "SYNTAX_FAIL"
    RUNTIME_FAIL = "RUNTIME_FAIL"

def validate_code_snippet(
    code: str,
    timeout: int = 10
) -> dict:
    """
    Full validation pipeline for a code snippet.
    Returns structured result dict.
    """
    # Layer 1: syntax
    syntax_ok, syntax_err = check_syntax(code)
    if not syntax_ok:
        return {
            "result": ValidationResult.SYNTAX_FAIL.value,
            "error": syntax_err,
            "layer": 1,
        }

    # Layer 2: runtime
    runtime_ok, runtime_err = check_runtime(code, timeout)
    if not runtime_ok:
        return {
            "result": ValidationResult.RUNTIME_FAIL.value,
            "error": runtime_err,
            "layer": 2,
        }

    return {
        "result": ValidationResult.PASS.value,
        "error": None,
        "layer": None,
    }
Enter fullscreen mode Exit fullscreen mode

Integrating With a Chapter Pipeline

If you're building a content pipeline, hook this into your state machine:

import json

CHECKPOINT_FILE = "checkpoint.json"

def process_chapter(chapter_id: str, code_snippets: list[str]):
    """
    Validates all code snippets in a chapter.
    Sets chapter status to NEEDS_REVIEW if any fail.
    """
    with open(CHECKPOINT_FILE, "r+") as f:
        data = json.load(f)
        chapter = next(c for c in data["chapters"] if c["id"] == chapter_id)

        chapter["status"] = "RUNNING"
        f.seek(0)
        json.dump(data, f, indent=2)
        f.truncate()

    # Validate each snippet
    failed_snippets = []
    for i, snippet in enumerate(code_snippets):
        result = validate_code_snippet(snippet)
        if result["result"] != "PASS":
            failed_snippets.append({
                "snippet_index": i,
                "result": result,
            })

    # Update checkpoint
    with open(CHECKPOINT_FILE, "r+") as f:
        data = json.load(f)
        chapter = next(c for c in data["chapters"] if c["id"] == chapter_id)

        if failed_snippets:
            chapter["status"] = "NEEDS_REVIEW"
            chapter["failed_snippets"] = failed_snippets
        else:
            chapter["status"] = "DONE"

        f.seek(0)
        json.dump(data, f, indent=2)
        f.truncate()

    return len(failed_snippets) == 0
Enter fullscreen mode Exit fullscreen mode

What This Catches

Real failures this system blocked during testing:

Error Type Layer Caught Example
Missing colon 1 (AST) def foo()
Undefined variable 2 (subprocess) print(undefined_var)
Import not found 2 (subprocess) import nonexistent_pkg
Infinite loop 2 (timeout) while True: pass
File not found 2 (subprocess) open("missing.csv")
Wrong indentation 1 (AST) Misaligned blocks

What This Doesn't Catch

Be honest about limitations:

  • Logic errors — code runs but produces wrong output
  • Side effects — scripts that write to disk or network
  • Long-running scripts — adjust timeout accordingly
  • Scripts with required user inputinput() calls will hang

For a technical ebook pipeline, these are acceptable tradeoffs. You want to verify the code examples compile and execute cleanly, not full unit test coverage.

Complete Example

# Test the validator
snippets = [
    # Valid
    """
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

print(fibonacci(10))
""",
    # Syntax error
    """
def broken(
    print("missing closing paren")
""",
    # Runtime error
    """
import this_package_does_not_exist
""",
]

for i, snippet in enumerate(snippets):
    result = validate_code_snippet(snippet.strip())
    status = "" if result["result"] == "PASS" else ""
    print(f"Snippet {i+1}: {status} {result['result']}")
    if result["error"]:
        print(f"  Error: {result['error'][:100]}")
Enter fullscreen mode Exit fullscreen mode

Output:

Snippet 1: ✅ PASS
Snippet 2: ❌ SYNTAX_FAIL
  Error: SyntaxError at line 2: '(' was never closed
Snippet 3: ❌ RUNTIME_FAIL
  Error: ModuleNotFoundError: No module named 'this_package_does_not_exist'
Enter fullscreen mode Exit fullscreen mode

This validation system is part of a larger pipeline I built for producing technical ebooks end-to-end: writing, validation, translation QA, EPUB assembly, and marketing asset generation — all for $20/month.

If you want the complete pipeline with all scripts: germy5.gumroad.com/l/xhxkzz ($19.99, 30-day refund).

Top comments (0)