DEV Community

How We Built an AI Pipeline That Turns Prompts into Print-Ready Booklets

How We Built an AI Pipeline That Turns Prompts into Print-Ready Booklets

When we set out to build Booklet AI, the brief was deceptively simple: "Type a topic, get back a print-ready booklet — magazine layout, professional typography, downloadable as PDF."

In practice, that one sentence hides a small army of problems: prompt-to-content generation, layout decisions across variable content lengths, image sourcing, font fallbacks, headless rendering, PDF export, and an interactive editor sitting on top of it all. This post walks through the architecture we landed on, the parts that surprised us, and the pieces we'd build differently next time.

The shape of the problem

A "booklet" is not a slide deck and not a long-form article — it's somewhere in between. Think 8–24 A4 pages, magazine-style spreads, with a cover, table of contents, alternating image/text layouts, and pull quotes. Users feed in one of three input types:

  1. A free-text prompt ("Make me a 12-page booklet about regenerative agriculture")
  2. A PDF or DOCX upload (turn this report into a magazine)
  3. A URL (turn this article into a designed spread)

The output is an HTML document that renders cleanly to PDF and is also editable inside a web editor.

High-level pipeline

User input
   │
   ▼
[Content extraction]   ← PDFBox / readability / GPT-4 cleanup
   │
   ▼
[Outline generation]   ← LLM, structured JSON output
   │
   ▼
[Per-page content gen] ← LLM with section-aware prompts
   │
   ▼
[Layout assignment]    ← rule-based template picker
   │
   ▼
[Image generation]     ← Replicate / DALL·E, async
   │
   ▼
[HTML assembly]        ← server-side templating (Thymeleaf)
   │
   ▼
[Headless render]      ← Puppeteer
   │
   ▼
[PDF export]           ← Puppeteer pdf() + post-processing
Enter fullscreen mode Exit fullscreen mode

Each stage is its own service call, queued and retryable. The whole pipeline takes 30–90 seconds for a typical 12-page booklet.

Stage 1: Content extraction

For PDF and DOCX inputs, we tried three approaches before landing on the one we ship:

  • Pure PDFBox text extraction — fast but loses heading structure and reading order on multi-column layouts.
  • OCR-first (Tesseract) — slow and overkill for digital PDFs.
  • PDFBox + GPT cleanup — extract raw text, then run a small LLM call to restore structure ("split this into sections with heading levels"). This is what we ended up shipping. ~5–10s for a 30-page report.

For URL inputs, Mozilla's Readability.js wrapped in a tiny Node service does almost all the work.

The lesson: don't over-engineer extraction. The downstream LLM is going to rewrite this content anyway. You just need enough signal — headings, paragraphs, reading order — for the outline step to make good decisions.

Stage 2: Outline generation

This is where the pipeline either goes well or goes off the rails. The outline determines how many pages, what each page is about, and what kind of layout each page wants. We iterated a lot here.

The prompt enforces a structured JSON schema:

{
  "title": "...",
  "subtitle": "...",
  "pages": [
    {
      "page_number": 1,
      "type": "cover" | "toc" | "section_intro" | "content" | "quote" | "back",
      "heading": "...",
      "key_points": ["...", "..."],
      "suggested_layout": "image_left" | "image_right" | "full_bleed" | "two_column",
      "image_prompt": "..." 
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Three things made this reliable:

  1. JSON schema validation in the prompt — we paste a small JSON Schema and ask the model to conform. Reject + retry on parse failure.
  2. Length budgeting upfront — we tell the model the target page count and word budget per page. Without this, models love to write 800 words for a page that only fits 180.
  3. Layout suggestions generated alongside content — letting the model pick suggested_layout based on the content type produces better-looking books than picking layouts post-hoc.

Stage 3: Per-page content generation

The outline is cheap to generate (one call). Per-page content is where the cost lives. We learned to:

  • Generate pages in parallel — pages are independent given the outline, so we fire 6–8 LLM calls concurrently with a small connection pool.
  • Pin the model per booklet — switching models mid-booklet creates jarring tone shifts. We pick one model per generation job and stick with it.
  • Keep the outline in every prompt's system message — gives each page-level call awareness of what comes before and after, which prevents repetition.

Streaming responses don't help here — the user is waiting on the entire booklet, not page-by-page text appearing.

Stage 4: Image generation (the surprising bottleneck)

Images turned out to be the slowest, flakiest stage. Why:

  • Image gen APIs have wildly different latencies (Replicate model cold starts can be 30s+).
  • Failure rates are non-trivial (~3–5% per image), and a 12-page booklet needs 6–10 images.
  • Quality is inconsistent on style adherence — "magazine editorial photography" means different things to different models.

What worked:

  • Async-from-the-start — image gen runs in parallel with HTML assembly. We render the booklet with placeholder gradients first; images stream in as they're ready.
  • Per-image timeout + fallback — if image gen for page 7 fails or takes >40s, we substitute a curated stock image keyed off the page's image_prompt. Better a relevant photograph than a broken booklet.
  • Cropping happens server-side — we ask for square (1024x1024) images and crop to the layout's aspect ratio in the same service that handles the upload to OSS. This avoids round-trips and lets us cache aggressively.

Stage 5: HTML assembly

The booklet template is server-rendered HTML with a CSS-paged-media stylesheet. Each layout type (image_left, image_right, etc.) is a small Thymeleaf fragment composed into a full page.

Key decisions:

  • A4 dimensions hardcoded into CSS (@page { size: A4; } plus width: 210mm; height: 297mm on each page wrapper). Makes the PDF predictable.
  • Print-aware CSS — fonts loaded via @font-face, page breaks via break-after: page, no position: fixed (Puppeteer pdf() handles it inconsistently across pages).
  • One HTML document per booklet, not one per page — easier to navigate in the editor, easier to render in one Puppeteer pass.

Stage 6: Headless render + PDF export

Puppeteer is the unsung hero here. The export endpoint:

const browser = await puppeteer.launch({
  args: ['--no-sandbox', '--disable-setuid-sandbox', '--font-render-hinting=none'],
});
const page = await browser.newPage();
await page.goto(bookletUrl, { waitUntil: 'networkidle0', timeout: 60_000 });
await page.evaluateHandle('document.fonts.ready');

const pdf = await page.pdf({
  format: 'A4',
  printBackground: true,
  preferCSSPageSize: true,
  margin: { top: 0, right: 0, bottom: 0, left: 0 },
});

await browser.close();
return pdf;
Enter fullscreen mode Exit fullscreen mode

Things we learned the hard way:

  • document.fonts.ready matters. Without it, you ship PDFs with fallback fonts on the first page.
  • networkidle0 is not enough for image-heavy pages — we explicitly await Promise.all(images.map(img => img.complete || new Promise(r => img.onload = r))) before calling pdf().
  • preferCSSPageSize: true is the difference between "the PDF respects my A4 layout" and "Chrome guesses the page size".
  • --font-render-hinting=none produces cleaner text in PDFs across font weights.

Post-processing (page numbering, bookmarks, link annotations) we do with pdf-lib after Puppeteer hands back the buffer.

What we'd build differently

Three things, in priority order:

  1. A real document model, not HTML-as-source-of-truth. Storing the booklet as JSON (structured content + layout decisions) and rendering to HTML on demand would make collaboration features and version diffing dramatically simpler. We're paying interest on this every sprint.

  2. A proper image queue with priority. Right now image gen is a fire-and-forget per booklet. Sharing one queue across all jobs with priority tiers (paid users first, retries scheduled) would smooth out tail latency.

  3. Server-side font subsetting. Booklets ship with the full webfont files embedded in the PDF. For a 12-page booklet using one weight of one font, that's hundreds of KB of unused glyphs. fontTools subsetting in the export step would cut PDF size by 60–80%.

Try it

If you want to see the output of this pipeline, the live tool is at bookletai.org — paste a topic or upload a PDF and you get a booklet in about a minute. The first one is free; we'd love feedback on the layout decisions, especially what the AI gets wrong.


This is a write-up of decisions made on the Booklet AI team. Questions, critiques, or "you should have done X instead" comments very welcome.

Top comments (0)