You have a simple task: generate a PDF from your web app. The instinct is obvious: render HTML, print to PDF, done. After all, you already know HTML and CSS, and your content is probably already in some templated HTML format.
This works fine until it doesn't. Page breaks split tables in half. Fonts render differently on your server versus your laptop. Headers and footers need manual positioning. And don't get started on multi-column layouts or page numbers in table of contents.
Here's when HTML-to-PDF makes sense, when it falls apart, and what to use instead.
The HTML-to-PDF Illusion
HTML was designed for screens, not paper. When you convert HTML to PDF, you're forcing a screen layout engine to think in pages. This works reasonably well for:
- Simple reports with linear content flow
- Existing HTML content you can't easily restructure
- Quick prototypes where pixel-perfect output isn't critical
The common tools in this space all share the same underlying approach: spin up a headless browser, render your HTML, capture the output.
# Puppeteer example
npx puppeteer print ./report.html ./report.pdf
# wkhtmltopdf
wkhtmltopdf --page-size A4 report.html report.pdf
# Gotenberg (self-hosted API)
curl -X POST http://localhost:3000/convert/html \
-F "file=@report.html" \
-o report.pdf
The problem isn't that these tools don't work. They work fine for simple cases. The problem is what happens when your requirements grow beyond "simple case."
Where HTML-to-PDF Breaks Down
Page Breaks and Layout Control
CSS has page-break-before, page-break-after, and break-inside: avoid. In theory, these give you control. In practice, browser rendering engines optimize for screens first. Complex layouts with multi-column sections, fixed headers, and footers often produce unpredictable breaks.
Your CSS says "don't break inside this table," but the browser engine has already calculated the page height differently than expected. Now your table header sits alone on page 7 while the data spills to page 8.
Consistency Across Environments
Your local Chrome produces a perfect PDF. Your CI pipeline running Chromium produces something almost identical, but the line spacing is slightly different and one image is 2 pixels lower. Same HTML, same CSS, different output.
This isn't a bug in the tools. It's the nature of browser engines. They're designed for interactive rendering with font substitution, sub-pixel positioning, and GPU acceleration. None of these optimize for deterministic document output.
Complex Document Features
Try implementing these in pure HTML/CSS:
- Automatic table of contents with page numbers
- Cross-references like "see Figure 3 on page 12"
- Academic citations with auto-generated bibliography
- Multi-column layouts that reflow correctly across page boundaries
- Headers and footers with page numbers, section titles, and total page count
Each is possible with enough JavaScript and post-processing. But you're now building a document engine on top of a layout engine that was never meant for documents.
Decision Framework: Which Approach to Use
Use this mental model:
| Your Need | Best Approach | Why |
|---|---|---|
| Simple invoice from existing HTML | HTML-to-PDF | One-time conversion, no complex layout |
| Report with charts and TOC | Native document engine | Automatic indices, deterministic output |
| 500 personalized contracts | Template fill + bulk render | Reuse template, fill variables at scale |
| Real-time document from app data | Markdown/JSON via API | Schema validation, AI-friendly generation |
| Complex academic paper | Native with citations | Bibliography, cross-references, math |
Let's look at each approach with code examples.
Approach 1: Native Document Generation
Instead of HTML, you define documents in a structured format designed for paper. Markdown extended with document primitives, or a JSON schema that describes every element explicitly.
The key difference: the rendering engine thinks in pages from the start, not after the fact.
Here's generating a report with a chart and table of contents using the Autype API:
curl -X POST https://api.autype.com/api/v1/dev/render/markdown \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"content": "# Quarterly Report\n\n:::toc\n:::\n\n## Executive Summary\n\nRevenue increased by 23% compared to the previous quarter.\n\n## Sales Data\n\n:::chart{\n type: \"bar\",\n width: 400,\n height: 200\n}\nlabels: [\"Q1\", \"Q2\", \"Q3\", \"Q4\"]\ndatasets:\n - label: \"Revenue (M$)\"\n data: [12.4, 15.2, 18.1, 22.3]\n:::\n\n### Breakdown by Region\n\n| Region | Sales | Growth |\n|--------|-------|--------|\n| Europe | 8.2M | +18% |\n| NA | 9.1M | +25% |\n| APAC | 5.0M | +31% |\n",
"document": {
"type": "pdf",
"size": "A4"
},
"defaults": {
"fontFamily": "Helvetica",
"fontSize": 11,
"header": {
"left": "Quarterly Report",
"right": "{{pageNumber}}/{{totalPages}}"
}
}
}'
Response:
{
"jobId": "r_8f3a2b1c4d5e",
"status": "PROCESSING",
"format": "PDF",
"creditCost": 5,
"createdAt": "2024-12-15T10:30:00Z"
}
The engine handles:
- Page breaks that respect content boundaries
- Automatic table of contents with page numbers
- Charts rendered inline without external libraries
- Consistent typography across all output
You can also use JSON for more granular control:
const documentJson = {
document: {
type: "pdf",
size: "A4",
orientation: "portrait"
},
defaults: {
fontFamily: "Helvetica",
fontSize: 11,
header: {
left: "Quarterly Report",
right: "{{pageNumber}}/{{totalPages}}"
}
},
sections: [
{
id: "main",
type: "flow",
content: [
{ type: "h1", text: "Quarterly Report" },
{ type: "toc", title: "Contents" },
{ type: "h2", text: "Executive Summary" },
{ type: "text", text: "Revenue increased by 23% compared to the previous quarter." },
{
type: "chart",
config: {
type: "bar",
data: {
labels: ["Q1", "Q2", "Q3", "Q4"],
datasets: [{ label: "Revenue (M$)", data: [12.4, 15.2, 18.1, 22.3] }]
}
},
width: 400,
height: 200
}
]
}
]
};
// Validate before rendering
await fetch("https://api.autype.com/api/v1/dev/render/validate", {
method: "POST",
headers: {
"X-API-Key": process.env.AUTYPE_API_KEY,
"Content-Type": "application/json"
},
body: JSON.stringify({ config: documentJson })
});
This validates the schema before you spend credits on rendering. If the structure is invalid, you get specific error paths.
Approach 2: Template-Based Bulk Generation
When you have a contract template and need 500 personalized versions, you don't generate each from scratch. You define a template once, mark the variable placeholders, and fill them programmatically.
The template can be defined in the Autype editor with visual tools, then rendered via API:
# Bulk render from a saved template
curl -X POST https://api.autype.com/api/v1/dev/bulk-render \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"documentId": "d_contract_template_2024",
"format": "PDF",
"items": [
{
"clientName": "Acme Corporation",
"contractDate": "2024-12-15",
"amount": "€50,000",
"projectDescription": "Annual maintenance agreement"
},
{
"clientName": "Beta Industries",
"contractDate": "2024-12-16",
"amount": "€125,000",
"projectDescription": "Platform development phase 2"
}
]
}'
You can also upload a CSV or Excel file with hundreds of rows:
curl -X POST https://api.autype.com/api/v1/dev/bulk-render/file \
-H "X-API-Key: YOUR_API_KEY" \
-F "file=@contracts.csv" \
-F "documentId=d_contract_template_2024" \
-F "format=PDF"
Each row generates one PDF. All documents render in parallel, and you get a ZIP download when complete.
Integrating with Automation Tools
n8n Workflow
You can connect document generation to any data source using n8n. The Autype n8n node provides 40+ operations:
# n8n workflow concept
Workflow:
- Trigger: Webhook receives form submission
- Node 1: Extract form data (client name, service, amount)
- Node 2: Autype → Render from Markdown template
- Node 3: Google Drive → Upload PDF
- Node 4: Gmail → Send PDF to client
The node handles async job polling automatically. You submit the render job, n8n waits for completion, then passes the download URL to the next node.
Make.com Integration
For no-code automation, the Make.com integration uses a slightly different variable syntax to avoid conflicts with Make's own templating:
# Use ${varName} instead of {{varName}} in your templates
Dear ${clientName},
Your invoice for ${amount} is attached.
This lets Make process its own variables while passing the correct values to Autype.
When to Stick with HTML-to-PDF
None of this means HTML-to-PDF is always wrong. Use it when:
- Your content is already HTML and restructuring would be expensive
- Layout requirements are simple (linear flow, no complex tables or multi-column)
- You're generating a few documents and can manually verify output
- You need self-hosting and already have the infrastructure for headless browsers
The tools are mature and well-documented. Just know their limits.
Performance Comparison
For a 50-page document with charts, tables, and headers:
| Method | Render Time | Output Consistency | Complex Features |
|---|---|---|---|
| Puppeteer (HTML) | 8-15 seconds | Varies by environment | Manual implementation |
| wkhtmltopdf | 5-12 seconds | Varies by environment | Manual implementation |
| Native engine (Autype) | < 8 seconds | Deterministic | Built-in |
Native engines render faster because they don't spin up a browser context. The output is deterministic because the renderer is purpose-built for documents.
The Real Question
When choosing a PDF generation approach, the question isn't "which tool is best." The question is "what kind of documents do you actually need to generate?"
- Simple, one-off documents from HTML: HTML-to-PDF tools work fine
- Professional documents with TOC, charts, citations: Use a native document engine
- High-volume personalized documents: Template fill with bulk rendering
- AI-generated documents: Markdown/JSON input with schema validation
HTML is excellent for web pages. Documents are not web pages. Using the right tool for the medium saves hours of debugging layout issues that shouldn't exist in the first place.
Top comments (0)