Digital Troubadour

Posted on Mar 18

Generating PDFs from HTML in Node.js (and why I stopped using Puppeteer)

#node #javascript #performance #backend

Last year I was adding invoice generation to a side project. Three days later I was still debugging Puppeteer on a DigitalOcean droplet.

Blank PDFs, fonts not loading, the process eating 400+MB of RAM for a single render.

I shipped something that worked 90% of the time and crossed my fingers.

If that sounds familiar, here's what I've learned since.

What's actually wrong with Puppeteer

Nothing, on your laptop. The problems start when you deploy.

Memory. Chromium is a full browser. Each instance takes 300–500MB. If a render crashes mid-way, the browser process doesn't always clean up after itself. Run this under any real traffic and you'll watch your server slowly run out of memory.

Cold starts. Spinning up a Chromium instance takes 1–3 seconds. Every time. If you're on a serverless function that scales to zero, that latency hits every first request.

Fonts and assets. Puppeteer runs sandboxed. Anything loaded from a relative path or file:// URL either fails silently or renders wrong. The PDF looks fine locally, looks broken in production. You spend an hour figuring out why.

Server dependencies. Chromium needs libglib, libnss, libatk, and a handful of other system libraries that aren't on a vanilla Ubuntu server. Every new environment is a fresh debugging session. Every Docker image is 400MB heavier.

None of this is Puppeteer's fault — it's a browser automation tool being asked to do something it wasn't really designed for.

The other options people try

wkhtmltopdf

Uses WebKit to render HTML. Fast, lightweight, no browser process to manage. The catch: it hasn't been maintained since 2020 and CSS support is frozen around 2013. No flexbox, no grid, no CSS variables. If your template uses any modern layout it'll look wrong.

Fine if you're generating PDFs from HTML that was already simple. Not fine if you're building something new.

PDFKit / jsPDF

You describe the document in code — place text at this coordinate, draw a line here, set this font. Very precise, and works well for documents with fixed layouts.

The problem is you can't reuse HTML templates. Everything has to be rebuilt in the library's API. A simple invoice with a dynamic line-item table takes a surprising amount of code to get right. Any design change means editing that code.

An API

Send HTML, get a PDF back. The rendering infrastructure is someone else's problem. No Chromium to manage, no system dependencies, nothing to deploy.

This is where most teams land after they've burned enough time on the alternatives.

What using an API actually looks like

Here's a basic Node.js example using LightningPDF:

const response = await fetch("https://lightningpdf.dev/api/v1/pdf/generate", {
  method: "POST",
  headers: {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    html: `
      <html>
        <head><script src="https://cdn.tailwindcss.com"></script></head>
        <body class="p-8 font-sans">
          <h1 class="text-2xl font-bold">Invoice #1042</h1>
          <p class="text-gray-500 mt-1">Due: March 31, 2026</p>
        </body>
      </html>
    `,
    options: { format: "A4" }
  })
});

const { data } = await response.json();
const pdfBuffer = Buffer.from(data.pdf, "base64");

That's it. Your existing HTML templates work as-is. Tailwind classes work without a build step. The migration from Puppeteer is mostly just deleting the browser setup and teardown code.

Templates for documents you generate repeatedly

If you're generating invoices or reports where the structure is always the same but the data changes, you can build the template once in a visual designer and just pass data at render time:

body: JSON.stringify({
  template_id: "invoice-001",
  data: {
    company: "Acme Corp",
    invoice_number: "1042",
    items: [
      { name: "Web development", quantity: 10, price: 150 },
      { name: "Design review", quantity: 2, price: 200 }
    ]
  }
})

No HTML string concatenation in your app code. The template lives separately and gets filled in at render time.

Batch generation

For bulk jobs — end-of-month invoices, report runs — use the async endpoint and get a webhook when it's done:

const response = await fetch("https://lightningpdf.dev/api/v1/pdf/async", {
  method: "POST",
  headers: { "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json" },
  body: JSON.stringify({
    template_id: "monthly-statement",
    data: { user_id: "usr_123", month: "February" },
    webhook_url: "https://yourapp.com/webhooks/pdf-ready"
  })
});

Rough performance numbers

Approach	Typical render time	Memory overhead
Puppeteer (self-hosted)	2–4s	300–500MB per instance
wkhtmltopdf	0.5–1s	Low
API (simple docs)	<100ms	Not your problem
API (complex CSS)	1–3s	Not your problem

The speed difference for simple documents is significant. A Go-native renderer can produce a basic invoice in under 100ms. Chromium only kicks in when the HTML is complex enough to need it.

Is this worth it for a small project?

Probably yes, just because of deployment complexity. Even if you only generate 20 PDFs a month, not having to install Chromium on every server is worth something. Most PDF APIs have a free tier that covers low volume.

For anything with real traffic or batch generation, the difference is more obvious — you're not thinking about memory limits or process management at all.

What are you currently using for PDF generation? Curious if anyone has found a way to make self-hosted Puppeteer not terrible in production.

DEV Community