Why Most Developers Get PDF Generation Wrong (And How to Fix It)

#python #webdev #tutorial #productivity

PDF generation seems simple until it isn't. After building a PDF API that handles thousands of documents, here are the mistakes I see developers make — and better approaches.

Mistake 1: Using Puppeteer for Everything

Puppeteer spins up a full Chromium browser to render each PDF. It works, but:

~150MB RAM per instance
2-5 second cold starts
Crashes under concurrent load

Better approach: Use a purpose-built renderer like WeasyPrint (Python) or wkhtmltopdf. They handle CSS properly without the browser overhead.

# WeasyPrint: ~30ms per document, ~50MB RAM
from weasyprint import HTML
pdf = HTML(string=html_content).write_pdf()

Mistake 2: Building HTML Templates with String Concatenation

I've seen this in production codebases:

# Please don't do this
html = "<html><body><h1>" + company_name + "</h1>"
html += "<p>Invoice #" + str(invoice_num) + "</p>"
# ... 200 more lines of this

This is unmaintainable, insecure (XSS), and impossible to style.

Better approach: Use Jinja2 templates with proper HTML/CSS:

from jinja2 import Template

template = Template(open("invoice.html").read())
html = template.render(
    company=company_name,
    invoice_number=invoice_num,
    items=line_items
)

Mistake 3: Not Handling Currency Formatting

Floats and money don't mix:

>>> 0.1 + 0.2
0.30000000000000004  # Great, now your invoice is wrong

Better approach: Use integers (cents) or Decimal:

from decimal import Decimal

subtotal = Decimal("99.99")
tax = subtotal * Decimal("0.08")
total = subtotal + tax  # Exact: 107.9892

Mistake 4: Ignoring Page Breaks

Your PDF looks great... until the content wraps to page 2 and a table row gets split in half.

/* Prevent table rows from splitting across pages */
tr {
    page-break-inside: avoid;
}

/* Force page break before totals section */
.totals-section {
    page-break-before: auto;
    page-break-inside: avoid;
}

Mistake 5: Generating PDFs Synchronously in Request Handlers

A 500ms PDF generation blocks your entire request. Under load, your API becomes unresponsive.

Better approach: Queue PDF generation as a background task:

# Or use an API that handles this for you
import requests

response = requests.post(
    "https://documint.anethoth.com/api/v1/invoices",
    json={"vendor": {"name": "Acme"}, "customer": {"name": "Client"}, "items": [{"description": "Service", "quantity": 1, "unit_price": 99.99}]},
    headers={"X-API-Key": "your_key"}
)
pdf_bytes = response.content

The Lazy Way: Use an API

If you don't want to deal with template engines, CSS page breaks, font rendering, and PDF libraries — just send JSON and get back a PDF:

# Free tier: 10 invoices/month, no credit card
curl -X POST https://documint.anethoth.com/api/v1/demo-invoice   -H "Content-Type: application/json"   -d '{"vendor": {"name": "Your Company"}, "customer": {"name": "Client"}, "items": [{"description": "Consulting", "quantity": 10, "unit_price": 150}]}'

You can also convert raw HTML to PDF for free:

curl -X POST https://documint.anethoth.com/api/v1/html-to-pdf   -H "Content-Type: application/json"   -d '{"html": "<h1>Hello World</h1><p>This becomes a PDF.</p>"}'