loading...

Generate a PDF from HTML with puppeteer

damcosset profile image Damien Cosset Updated on ・2 min read

Introduction

This is one of those frustrations post where I just spent hours working on something and I finally managed to have a working solution. I learned quite a bit but I feel like it should not have taken me that much time...

Anyway, the goal was to generate a PDF from HTML, then send it back to the browser so the user could download it. I tried a lot of different things, and it's more than likely my solution is not the most elegant, or fast, but fuck it, it works.

I consider this post to be a place where I can store this solution, juste in case I forget it in the future. I'll know where to look. Let's jump into the actual solution.

The solution!

Front-end

Let's start with the front-end.

const downloadPDF = () => {
        fetch('/api/invoices/create-pdf', {
            data: {
                invoiceDetails,
                invoiceSettings,
                itemsDetails,
                organisationInfos,
                otherDetails,
                clientDetails
            },
            method: 'POST'
        }).then(res => {
            return res
                .arrayBuffer()
                .then(res => {
                    const blob = new Blob([res], { type: 'application/pdf' })
                    saveAs(blob, 'invoice.pdf')
                })
                .catch(e => alert(e))
        })
    }

This is the function that does everything. We are generating an invoice in my case.

1) A fetch with the POST method. This is the part where we generate our PDF with the proper data and generate our PDF on the server. (server code will follow)

3) The response we get needs to be converted into an arraybuffer.

4) We create a Blob ( Binary Large Objects ) with the new Blob() constructor. The Blob takes a iterable as the first argument. Notice how our response turned arraybuffer is surrounded by square braquets( [res] ). To create a blob that can be read as a PDF, the data needs to be an iterable into a binary form ( I think...). Also, notice the type application/pdf.

5) Finally, I'm using the saveAs function from the file-saver package to create the file on the front end!

Back-end

Here is the back-end things. There is a whole express application and everything. I juste show you the controller where the two methods reside for this PDF problem.

module.exports = {
    createPDF: async function(req, res, next) {
        const content = fs.readFileSync(
            path.resolve(__dirname, '../invoices/templates/basic-template.html'),
            'utf-8'
        )
        const browser = await puppeteer.launch({ headless: true })
        const page = await browser.newPage()
        await page.setContent(content)
        const buffer = await page.pdf({
            format: 'A4',
            printBackground: true,
            margin: {
                left: '0px',
                top: '0px',
                right: '0px',
                bottom: '0px'
            }
        })
                await browser.close()
        res.end(buffer)
    }
}

1) I am using puppeteer to create a PDF from the HTML content. The HTML content is read from an HTML file I simply fetch with readFileSync

2) We store the buffer data returned by page.pdf() and we return it to the front-end. This is the response converted to an arraybuffer later.

Done

Well, looking at the code, it really looks easier now that it actually did when I tried to solve this problem. It took me close to 10 hours to find a proper answer. 10 FREAKING HOURS!!!!

Note to self: if you get frustrated, walk away from the computer, get some fresh air, and come back later...

Happy Coding <3

Posted on by:

damcosset profile

Damien Cosset

@damcosset

French web developer mostly interested in Javascript and JAVA

Discussion

pic
Editor guide
 

I've gone through something similar a few days ago... and it took me a while to figure it out.

Here's my approach, without using puppeteer.
I'm using Vue in this case, but I'm pretty sure the concept is applicable in other cases as well.

// Client
<template>
    // Use CSS to give it full width & height
    <div class="c-file">
        <iframe
            class="c-file__display"
            v-if="src"
            :src="src"
        />
    </div>
</template>

<script>
/* ... */
 async created () {
  const config = { headers: new Headers({
            'Content-Type': 'application/json',
            }), 
            method: "POST",
            body: JSON.stringify(body) // Dynamic data
        }
        fetch(url, config)
            .then(res => res.blob())
            .then(res => {

                const blob =  new Blob([res], { type: 'application/pdf' });
                this.src = URL.createObjectURL(blob, { type: 'application/pdf' })
            })
 }
</script>

  // Server
  /* ... */
   const pdf = require('html-pdf')
   // Generate HTML string
   const content = getPDFContent(req.body.data);


   res.setHeader('Content-Type', 'application/pdf')
   pdf.create(content).toStream( (err, stream) => {
      stream.pipe(res);
   });

 

html-pdf uses phantomjs under the hood. Where is your app hosted? It didn't work for me on Docker based cloud providers like Now

 

The app isn’t hosted (yet), it is all on localhost. I haven’t tried docker nor used cloud providers too often, but I’m really curious about why this approach wouldn’t work.

Please let me know if you find a solution!

We've had success hosting a similar Puppeteer-based converter using Google Cloud Functions (I don't work for Google): github.com/Courtsite/shuttlepdf. There is a bit of latency, but it is a reasonable trade-off for ease of deployment, scalability, and reliability.

Well, in my case, phantomjs wasn't found by the library on my docker based hosting.

There is a dockerized phantom available but you should have full access the deployment Dockerfile in order to tell it to install on the process.

Overall I would recommend going away from html-pdf because it's not maintained anymore

Generally speaking, you should probably avoid Phantomjs. With headless Chromium, there really isn't any need for it. Indeed, I think it is no longer maintained.

 

Lolly Post.

I understand your frustration.
In my case, I spent "One freaking Week" coding a serverless microservice that takes {html,options} and returns back the buffer, a generic solution.

The challenge was that I needed to embed puppeteer in 50mb which is the maximum size a serverless function can take on Now.

Second challenge was debugging why the f**** HTML didn't render at all.

After ours of trial and error I found that for some reason, if the html string contains the # character, it becomes somehow "invalid" and puppeteer fails silently 0_0

Third, in my particular case, the HTML uses bootstrap and just by using puppeteer.setContent didn't wait for it to load correctly. I needed to use a workaround by
puppeteer.go('text/html://${html}') which do waits for external resources to load.

Four, (and this one is the reason why I almost throws the computer through the window), the HTML markup in my case was rendered at runtime using react Dom and the go to hell react-inline-css library, which ALWAYS WRAPS the generated CSS
with this selector #ReactInlineCss ....

See the # character there? Goto point #2 above (x_x)

Well, but thanks god the pain is gone. If you wanna learn how to make it serverless like I did, hop to now's blog (zeit.co/blog)

 

Sorry, that made me laugh 😁

 

Hey Vladimir, how were you able to fix the second point?

 

Sweet lord thank you for this.
I was using the html-pdf package and was having no issues until I to either deploy to azure app service or run it in a docker container.
Didn't realize that phantomJS is pretty much no longer supported so this was a life saver.

Also, if anyone was curious, I was having issues running puppeteer in a docker container. Found this gem in order to make chromium work in the container:

const browser = await puppeteer.launch({ headless: true,  args: ['--no-sandbox', '--disable-setuid-sandbox'], ignoreHTTPSErrors: true, dumpio: false });
 

Interesting post, thanks for sharing!

I have a question: where is the code in which you inject the data into the template?
I mean, I can see where you load the HTML template and render it into a PDF, and I can see the front-end sending some data to customise the PDF like invoiceDetails, clientDetails etc. but how do you actually put that data inside the template?

 

I have a "react" template on the front end, the one you see.
There is an Html template on the back, with placeholders for the future data. After that, to populate the html template on the back, it's just String.replace() functions

 

And it even works on aws lambda ;-)

 

please provide the complete code

 

Bootstrap is not working in my pdf.html file.
if there is any solution please let me know