PDF generation seems simple until it isn't. After building a PDF API that handles thousands of documents, here are the mistakes I see developers make — and better approaches.
Mistake 1: Using Puppeteer for Everything
Puppeteer spins up a full Chromium browser to render each PDF. It works, but:
- ~150MB RAM per instance
- 2-5 second cold starts
- Crashes under concurrent load
Better approach: Use a purpose-built renderer like WeasyPrint (Python) or wkhtmltopdf. They handle CSS properly without the browser overhead.
# WeasyPrint: ~30ms per document, ~50MB RAM
from weasyprint import HTML
pdf = HTML(string=html_content).write_pdf()
Mistake 2: Building HTML Templates with String Concatenation
I've seen this in production codebases:
# Please don't do this
html = "<html><body><h1>" + company_name + "</h1>"
html += "<p>Invoice #" + str(invoice_num) + "</p>"
# ... 200 more lines of this
This is unmaintainable, insecure (XSS), and impossible to style.
Better approach: Use Jinja2 templates with proper HTML/CSS:
from jinja2 import Template
template = Template(open("invoice.html").read())
html = template.render(
company=company_name,
invoice_number=invoice_num,
items=line_items
)
Mistake 3: Not Handling Currency Formatting
Floats and money don't mix:
>>> 0.1 + 0.2
0.30000000000000004 # Great, now your invoice is wrong
Better approach: Use integers (cents) or Decimal:
from decimal import Decimal
subtotal = Decimal("99.99")
tax = subtotal * Decimal("0.08")
total = subtotal + tax # Exact: 107.9892
Mistake 4: Ignoring Page Breaks
Your PDF looks great... until the content wraps to page 2 and a table row gets split in half.
/* Prevent table rows from splitting across pages */
tr {
page-break-inside: avoid;
}
/* Force page break before totals section */
.totals-section {
page-break-before: auto;
page-break-inside: avoid;
}
Mistake 5: Generating PDFs Synchronously in Request Handlers
A 500ms PDF generation blocks your entire request. Under load, your API becomes unresponsive.
Better approach: Queue PDF generation as a background task:
# Or use an API that handles this for you
import requests
response = requests.post(
"https://documint.anethoth.com/api/v1/invoices",
json={"vendor": {"name": "Acme"}, "customer": {"name": "Client"}, "items": [{"description": "Service", "quantity": 1, "unit_price": 99.99}]},
headers={"X-API-Key": "your_key"}
)
pdf_bytes = response.content
The Lazy Way: Use an API
If you don't want to deal with template engines, CSS page breaks, font rendering, and PDF libraries — just send JSON and get back a PDF:
# Free tier: 10 invoices/month, no credit card
curl -X POST https://documint.anethoth.com/api/v1/demo-invoice -H "Content-Type: application/json" -d '{"vendor": {"name": "Your Company"}, "customer": {"name": "Client"}, "items": [{"description": "Consulting", "quantity": 10, "unit_price": 150}]}'
You can also convert raw HTML to PDF for free:
curl -X POST https://documint.anethoth.com/api/v1/html-to-pdf -H "Content-Type: application/json" -d '{"html": "<h1>Hello World</h1><p>This becomes a PDF.</p>"}'
What PDF generation mistakes have you run into? Drop them in the comments — I bet there are patterns I haven't seen yet.
Top comments (0)