We thought Puppeteer was âgood enough.â Turns out, âgood enoughâ breaks at scale. Hereâs what nobody tells you about building PDF pipelines with headless browsers.
đȘ The Cold, Hard Truth About Puppeteer PDF Generation
If youâve ever Googled âgenerate PDF from HTML in Node.js,â youâve probably landed on Puppeteer. Itâs fast, works for MVPs, and itâs open source. But when our app started generating thousands of PDFs a day, we learned some hard lessons. The kind of lessons that donât show up in the docs.
Hereâs what went wrong (and what I wish I knew sooner):
1. Zombie Chrome Processes Will Haunt You
Remember that âjust use headless Chrome in Dockerâ tip? Works fine - until it doesnât.
Puppeteer spawns Chrome processes for each PDF job. But when things go wrong - timeouts, crashes, or OS bugs - it leaks those processes. Suddenly, your servers are chewing through memory and âOOM killedâ becomes your favorite Slack notification.
Lesson:
Managing Chrome at scale is a game of whack-a-mole. Youâll burn time on process cleanup instead of shipping features.
2. Performance Bottlenecks Sneak Up Fast
Need to generate PDFs in parallel? Say hello to the bottleneck monster.
Each Puppeteer instance uses 100MB+ RAM and spawns its own Chromium process. Try spinning up 10, 20, or 50 at once. Youâll hit resource limits fast, especially on smaller VMs or containers.
Lesson:
Scaling horizontally gets expensive. For bursty workloads, expect weird delays, timeouts, and server crashes unless you invest in orchestration.
3. Docker Nightmares & OS Compatibility
Works on my machine. Fails in prod. Welcome to container hell.
Dockerizing Puppeteer means juggling fonts, dependencies, glibc versions, missing libraries, and obscure startup flags. âError: failed to launch Chromiumâ is the bane of your CI/CD pipeline.
Lesson:
Upgrading Chrome? Be ready for surprise production outages and dependency chaos.
4. Debugging Is a Time Sink
Your beautiful HTML works in the browser⊠but your PDF is a hot mess.
Headless Chrome renders differently from your desktop browser: think missing fonts, broken CSS, and no animations. Debugging means endless screenshotting, CSS tweaking, rerunning jobs, and a healthy dose of frustration.
Lesson:
Expect to lose hours chasing rendering quirks and âwhy is my chart missing?â mysteries.
5. Security & Maintenance Are Never âDoneâ
Running browsers on your backend? Prepare for patch marathons.
Chromium ships security patches almost weekly. Falling behind means risking exploits and stability issues. Youâll end up maintaining a mini browser farm just to keep up.
Lesson:
Security and compliance become ongoing headaches, especially if you handle sensitive data.
đ What Actually Works at Scale?
After all this pain, we created a PDF-as-a-Service model called Reportgen.io:
- No more Chrome orchestration: Just send HTML & data, get a PDF back.
- Async & scalable: Handles spikes and queues without server bottlenecks.
- Simple REST API: Integrates with any stack (Go, Node.js, Python, etc.).
- Secure & auto-updating: Someone else worries about browser patches and exploits.
Bonus:
We got back days of engineering time and no more server babysitting or âChromium upgrade Fridays.â
⥠TL;DR
- Puppeteer is fine for hobby projects.
- At scale? Itâll break your infrastructure and your sanity.
- Consider a dedicated PDF API and let someone else handle the mess.
Curious? Try Reportgen.io for free. Skip the Puppeteer pain and focus on shipping.
Want more technical war stories or a step-by-step migration guide? Drop a comment below! đ
Top comments (1)
all of these hit hard â especially the memory leak one. had puppeteer slowly eat a 2GB container over 12hrs before i noticed
ended up switching PDF/screenshot work to snapapi.pics. REST API, no chromium on your infra, handles scale without the ops pain