DEV Community

Cover image for Generate HTML as PDF using Next.js & Puppeteer running on Serverless (Vercel/AWS Lambda) Martin Danielson
Martin Danielson
Martin Danielson

Posted on

1

Generate HTML as PDF using Next.js & Puppeteer running on Serverless (Vercel/AWS Lambda) Martin Danielson

I wanted to share my experience of generating PDF’s from web pages using Puppeteer, deployed on Vercel (or AWS Lambda). While there are many resources available on similar topics, I found that none of them provided a complete solution. Since this is a common use case — whether for invoicing, scraping, or testing — I hope my insights can help others facing similar challenges.

Background: One of my businesses rents out a space in a venue, and while we only generate about 30 invoices per year, it can become tedious work and prone to errors. So, I decided to build an app to automate invoice generation based on HTML (which is easier for me to work with since I am familiar with front-end development). I also needed the system to send invoices by email, manage different billing periods, and be easy to extend with new customers.

Rather than focusing on all the logic behind calculating when and how to send invoices, I will focus here on how I set up the PDF generation and hosted the entire solution on Vercel. Mind you that all of this is from memory, so I could not provide exact error codes and might have missed some details.

Setting Up PDF Generation with Puppeteer

I created a simple Next.js page to render my invoices. The next step was converting the HTML of a page into a PDF. I have used Puppeteer before, and after some research, I found it was still one of the best options for this use case.

Here’s the basic code to generate the PDF from a URL:

import puppeteer from "puppeteer";
...
const browser = await puppeteer.launch();
const page = await browser.newPage();

await page.goto("http://localhost:3000/invoice");
await page.pdf({ path: "a.pdf" });

await browser.close();
Enter fullscreen mode Exit fullscreen mode

Issue with Puppeteer and Serverless limits

The code above worked perfectly locally, but when I deployed it, the endpoint returned a 500 error.

Puppeteer relies on Chrome to render pages, and the Chrome binary included in the Puppeteer package is too large for serverless functions due to size constraints.

After some digging, I opted to use puppeteer-core — which lets you specify your own browser — together with a version of Chromium called @sparticuz/chromium-min, which is tailored for serverless environments. However, even this was too large to upload directly to my project (over 50MB).

Solution: Using external storage for Chromium

The solution was to host the smaller Chromium binary in an external Blob storage (same I used to save the PDF’s in) and reference it during runtime. Here’s how I configured Puppeteer to use this custom Chromium version:

const browser = await puppeteer.launch({
  args: chromium.args,
  defaultViewport: chromium.defaultViewport,
  executablePath: await chromium.executablePath("URL_to_chromium-min_tar"),
  headless: chromium.headless,
});
Enter fullscreen mode Exit fullscreen mode

Additionally, to ensure that Puppeteer runs in development with the standard Chromium version (useful for debugging), I used this code:

import { executablePath } from "puppeteer";
import puppeteer from "puppeteer-core";

const browser = await puppeteer.launch(
  process.env.NODE_ENV === "production"
    ? {
        args: chromium.args,
        defaultViewport: chromium.defaultViewport,
        executablePath: await chromium.executablePath("URL_to_chromium-min_tar"),
        headless: chromium.headless,
      }
    : { executablePath: executablePath() }
);
Enter fullscreen mode Exit fullscreen mode

Handling timeouts and optimizing the process

When running in a serverless environment, Puppeteer can be slow due to the overhead of downloading, unpacking and loading Chromium.
Just loading the browser took about 15 seconds, which led to timeouts, not counting to load a page and generate the PDF’s.

Vercel’s default serverless timeout is 10 seconds. This can easilly be extended to 60 seconds on their free tier.

export const maxDuration = 60;
Enter fullscreen mode Exit fullscreen mode

However; this was not enough to generate a number of PDF’s so I also made two optimizations:

  • 1. Singleton pattern: Reused the browser instance across multiple PDF generations to reduce overhead.
  • 2. Optimized waitUntil parameter: I switched from networkidle2 (which waits for all network activity to finish) to the default load event, which is more efficient for my use case.

Additionally, instead of writing the PDF to disk and reading it back, you can use the buffer that is returned and pass it to Vercel’s Blob storage:

const buffer = await page.pdf();
Enter fullscreen mode Exit fullscreen mode

Bonus Tips

Set Extra HTTP Headers for Vercel: When running on Vercel, your resources are protected on some domains (i.e. your priview environments). This led to me generating a lot of PDF’s of the Vercel login page.

page.setExtraHTTPHeaders({
  "x-vercel-protection-bypass": process.env.VERCEL_AUTOMATION_BYPASS_SECRET,
});
Enter fullscreen mode Exit fullscreen mode

CSS for PDF Layout: To ensure the PDF renders in a print-friendly format and without margins:

const buffer = await page.pdf({ preferCSSPageSize: true });
Enter fullscreen mode Exit fullscreen mode
@media print {
   @page {
     size: A4 portrait;
     margin: 0;
   }
}
Enter fullscreen mode Exit fullscreen mode

Securing Your Endpoint: I tried securing the API endpoint using the Authorization header, but Vercel has an issue with forwarding headers when using rules. The authorization header is passed under x-vercel-sc-headers, but it’s not guaranteed to work reliably. I decided to leave it for now.

Final thoughts

If you’ve made it this far, thank you! I hope this guide helps anyone trying to generate PDF’s from HTML using Puppeteer in serverless environments like Vercel.

I welcome suggestions and tips. Also if you feel that I did not credit some resource or author please contact me and I will accredit accordingly.

Credits to resources I used to figure all this out:

Do your career a big favor. Join DEV. (The website you're on right now)

It takes one minute, it's free, and is worth it for your career.

Get started

Community matters

Top comments (0)

Heroku

Build apps, not infrastructure.

Dealing with servers, hardware, and infrastructure can take up your valuable time. Discover the benefits of Heroku, the PaaS of choice for developers since 2007.

Visit Site

👋 Kindness is contagious

Immerse yourself in a wealth of knowledge with this piece, supported by the inclusive DEV Community—every developer, no matter where they are in their journey, is invited to contribute to our collective wisdom.

A simple “thank you” goes a long way—express your gratitude below in the comments!

Gathering insights enriches our journey on DEV and fortifies our community ties. Did you find this article valuable? Taking a moment to thank the author can have a significant impact.

Okay