Mate Boer

Posted on Feb 5, 2019 • Edited on Feb 7, 2019 • Originally published at blog.risingstack.com

Generating PDF from HTML with Node.js and Puppeteer

#node #pdf #javascript #html

Originally published at blog.risingstack.com on February 5, 2019.

In this article I’m going to show how you can generate a PDF document from a heavily styled React page using Node.js, Puppeteer, headless Chrome & Docker.

Background: A few months ago one of the clients of RisingStack asked us to develop a feature where the user would be able to request a React page in PDF format. That page is basically a report/result for patients with data visualization, containing a lot of SVGs. Furthermore, there were some special requests to manipulate the layout and make some rearrangements of the HTML elements. So the PDF should have different styling and additions compared to the original React page.

As the assignment was a bit more complex than what could have been solved with simple CSS rules, we first explored possible implementations. Essentially we found 3 main solutions. This blogpost will walk you through on these possibilities and the final implementations.

A personal comment before we get started: it’s quite a hassle, so buckle up!

Client side or Server side?

It is possible to generate a PDF file both on the client-side and on the server-side. However, it probably makes more sense to let the backend handle it, as you don’t want to use up all the resources the user’s browser can offer.
Even so, I’ll still show solutions for both methods.

Option 1: Make a Screenshot from the DOM

At first sight, this solution seemed to be the simplest, and it turned out to be true, but it has its own limitations. If you don’t have special needs, like selectable or searchable text in the PDF, it is a good and simple way to generate one.

This method is plain and simple: create a screenshot from the page, and put it in a PDF file. Pretty straightforward. We used two packages for this approach:

Html2canvas, to make a screenshot from the DOM
jsPdf, a library to generate PDF

Let’s start coding.

npm install html2canvas jspdf

import html2canvas from 'html2canvas'
import jsPdf from 'jspdf'

function printPDF () {
    const domElement = document.getElementById('your-id')
    html2canvas(domElement, { onclone: (document) => {
      document.getElementById('print-button').style.visibility = 'hidden'
}})
    .then((canvas) => {
        const img = canvas.toDataURL('image/png')
        const pdf = new jsPdf()
        pdf.addImage(imgData, 'JPEG', 0, 0, width, height)
        pdf.save('your-filename.pdf')
})

And that’s it!

Make sure you take a look at the html2canvas onclone method. It can prove to be handy when you quickly need to take a snapshot and manipulate the DOM (e.g. hide the print button) before taking the picture. I can see quite a lot of use cases for this package. Unfortunately, ours wasn’t one, as we needed to handle the PDF creation on the backend side.

Option 2: Use only a PDF Library

There are several libraries out there on NPM for this purpose, like jsPDF (mentioned above) or PDFKit. The problem with them that I would have to recreate the page structure again if I wanted to use these libraries. That definitely hurts maintainability, as I would have needed to apply all subsequent changes to both the PDF template and the React page.
Take a look at the code below. You need to create the PDF document yourself by hand. Now you could traverse the DOM and figure out how to translate each element to PDF ones, but that is a tedious job. There must be an easier way.

doc = new PDFDocument
doc.pipe fs.createWriteStream('output.pdf')
doc.font('fonts/PalatinoBold.ttf')
   .fontSize(25)
   .text('Some text with an embedded font!', 100, 100)

doc.image('path/to/image.png', {
   fit: [250, 300],
   align: 'center',
   valign: 'center'
});

doc.addPage()
   .fontSize(25)
   .text('Here is some vector graphics...', 100, 100)

doc.end()

This snippet is from the PDFKit docs. However, it can be useful if your target is a PDF file straight away and not the conversion of an already existing (and ever-changing) HTML page.

Final Option 3: Puppeteer, Headless Chrome with Node.js

What is Puppeteer? The documentation says:

Puppeteer is a Node library which provides a high-level API to control Chrome or Chromium over the DevTools Protocol. Puppeteer runs headless by default, but can be configured to run full (non-headless) Chrome or Chromium.
It’s basically a browser which you can run from Node.js. If you read the docs, the first thing it says about Puppeteer is that you can use it to Generate screenshots and PDFs of pages’. Excellent! That’s what we were looking for.
Let’s install Puppeteer with npmi i puppeteer, and implement our use case.

const puppeteer = require('puppeteer')

async function printPDF() {
  const browser = await puppeteer.launch({ headless: true });
  const page = await browser.newPage();
  await page.goto('https://blog.risingstack.com', {waitUntil: 'networkidle0'});
  const pdf = await page.pdf({ format: 'A4' });

  await browser.close();
  return pdf
})

This is a simple function that navigates to a URL and generates a PDF file of the site.First, we launch the browser (PDF generation only supported in headless mode), then we open a new page, set the viewport, and navigate to the provided URL.

Setting the waitUntil: ‘networkidle0’ option means that Puppeteer considers navigation to be finished when there are no network connections for at least 500 ms. (Check API docs for further information.)

After that, we save the PDF to a variable, we close the browser and return the PDF.

Note: The page.pdfmethod receives an options object, where you can save the file to disk with the ‘path’ option as well. If path is not provided, the PDF won’t be saved to the disk, you’ll get a buffer instead. Later on, I discuss how you can handle it.)

In case you need to log in first to generate a PDF from a protected page, first you need to navigate to the login page, inspect the form elements for ID or name, fill them in, then submit the form:

await page.type('#email', process.env.PDF_USER)
await page.type('#password', process.env.PDF_PASSWORD)
await page.click('#submit')

Always store login credentials in environment variables, do not hardcode them!

Style Manipulation

Puppeteer has a solution for this style manipulation too. You can insert style tags before generating the PDF, and Puppeteer will generate a file with the modified styles.

await page.addStyleTag({ content: '.nav { display: none} .navbar { border: 0px} #print-button {display: none}' })

Send file to the client and save it

Okay, now you have generated a PDF file on the backend. What to do now?
As I mentioned above, if you don’t save the file to disk, you’ll get a buffer. You just need to send that buffer with the proper content type to the front-end.

printPDF.then(pdf => {
    res.set({ 'Content-Type': 'application/pdf', 'Content-Length': pdf.length })
    res.send(pdf)

Now you can simply send a request to the server, to get the generated PDF.

function getPDF() {
 return axios.get(`${API_URL}/your-pdf-endpoint`, {
   responseType: 'arraybuffer',
   headers: {
     'Accept': 'application/pdf'
   }
 })

Once you’ve sent the request, the buffer should start downloading. Now the last step is to convert the buffer into a PDF file.

savePDF = () => {
    this.openModal(‘Loading…’) // open modal
   return getPDF() // API call
     .then((response) => {
       const blob = new Blob([response.data], {type: 'application/pdf'})
       const link = document.createElement('a')
       link.href = window.URL.createObjectURL(blob)
       link.download = `your-file-name.pdf`
       link.click()
       this.closeModal() // close modal
     })
   .catch(err => /** error handling **/)
 }

<button onClick={this.savePDF}>Save as PDF</button>

That was it! If you click on the save button, the PDF will be saved by the browser.

Using Puppeteer with Docker

I think this is the trickiest part of the implementation - so let me save you a couple of hours of Googling.
The official documentation states that “getting headless Chrome up and running in Docker can be tricky”. The official docs have a Troubleshooting section, where at the time of writing you can find all the necessary information on installing puppeteer with Docker.
If you install Puppeteer on the Alpine image, make sure you scroll down a bit to this part of the page. Otherwise, you might gloss over the fact that you cannot run the latest Puppeteer version and you also need to disable shm usage, using a flag:

const browser = await puppeteer.launch({
  headless: true,
  args: ['--disable-dev-shm-usage']
});

Otherwise, the Puppeteer sub process might run out of memory before it even gets started properly. More info about that on the troubleshooting link above.

Option 3 + 1: CSS Print Rules

One might think that simply using CSS print rules is easy from a developers standpoint. No NPM modules, just pure CSS. But how do they fare when it comes to cross-browser compatibility?
When choosing CSS print rules, you have to test the outcome in every browser to make sure it provides the same layout, and it’s not 100% that it does.
For example, inserting a break after a given element cannot be considered an esoteric use case, yet you might be surprised that you need to use workarounds to get that working in Firefox.
Unless you are a battle-hardened CSS magician with a lot of experience in creating printable pages, this can be time-consuming.
Print rules are great if you can keep the print stylesheets simple.
Let’s see an example.

@media print {
    .print-button {
        display: none;
    }

    .content div {
        break-after: always;
    }
}

This CSS above hides the print button, and inserts a page break after every div with the class content. There is a great article that summarizes what you can do with print rules, and what are the difficulties with them including browser compatibility.
Taking everything into account, CSS print rules are great and effective if you want to make a PDF from a not so complex page.

Summary: PDF from HTML with Node.js and Puppeteer

So let’s quickly go through the options we covered here for generating PDF files from HTML pages:

Screenshot from the DOM: This can be useful when you need to create snapshots from a page (for example to create a thumbnail), but falls short when you have a lot of data to handle.
Use only a PDF library: If you need to create PDF files programmatically from scratch, this is a perfect solution. Otherwise, you need to maintain the HTML and PDF templates which is definitely a no-go.
Puppeteer: Despite being relatively difficult to get it working on Docker, it provided the best result for our use case, and it was also the easiest to write the code with.
CSS print rules: If your users are educated enough to know how to print to a file and your pages are relatively simple, it can be the most painless solution. As you saw in our case, it wasn’t. Happy printing!

Oldest comments (11)

hmadrigal • Feb 6 '19 • Edited

I would like to suggest an additional option. The library wkhtmltopdf.org can be used, it is open source and based on WebKit which helps you to reduce issues of rendering. Also, it has options for customizing headers and footers thru custom js/HTML. It is available in all. most major OS, the only drawback it is that it is a native binary, implying there will be needed bindings (e.g DLLImports in C#) or calling the executable starting a new process (and specifying processing in the command line)

Caley Woods • Feb 12 '19

This is what I ended up going with several years ago when there was a request to be able to receive a PDF copy of an invoice that we were already creating with HTML / CSS. Worked well enough.

Richard MTP • Feb 7 '19

Is this support multiple pages? What will happen in case more than 10000 rows in page?

Mate Boer • Feb 11 '19

One way to do it: stackoverflow.com/questions/485102...

vladislav • Feb 7 '19

if your production node has strict firewall rules - you needs aditional interactions with firewall to download puppeter

create1666 • May 9 '19

Hello

Nice job

McShaze • Apr 14 '21

Yep, it's true!

Jpdewa • Oct 11 '21

Make your money more here at site Jpdewa

kritikgarg • Mar 13 '23 • Edited

👏📄 Great article,Mate! Generating PDFs from HTML using Node.js and Puppeteer seems like a very useful tool for businesses and developers. It's amazing to see how technology has made it so easy to convert HTML to PDF and create professional-looking documents. I appreciate the clear explanations and code samples provided in the article. well done!!💡👍
👀📄Recently, I stumbled upon an article about Generate PDF in Salesforce and it's a great guide for those looking to create PDFs in Salesforce for free! 👍