Aki Rautio

Posted on Jul 22, 2019 • Edited on Dec 21, 2020 • Originally published at akirautio.com

Generate a PDF in AWS Lambda with NodeJS and Puppeteer

#pdf #puppeteer #serverless #javascript

Recently I have needed to solve a problem that involves generating a PDF file based on database content. Since these PDFs are not generated too often, it doesn't make sense to 24/7 running service. Luckily both Google (Functions) and AWS (Lambda) have an event-driven service which is only running on request.

Originally I was planning to use Python and a Reportlab for this project but a connection to PostgreSQL database ended up being too complex to configure. With NodeJS I had already done a small project with database connection so I knew that it would work.

For NodeJS I still needed a package to generator PDF, and I found following options:

I ended up choosing Puppeteer for this project. It's a bit overkill for the current use case but at the same time, it is more future proof due to html+css base structure.

To make my life easier I'm using a serverless package to handle deployment to AWS Lambda and chrome-aws-lambda to help out the deployment of puppeteer to AWS Lambda. Full list of required dependencies are the following:

"dependencies": {
  "chrome-aws-lambda": "1.18.1",
  "knex": "0.18.3",
  "pg": "7.11.0",
  "pg-hstore": "2.3.2",
  "pug": "2.0.4",
  "puppeteer-core": "1.18.1",
}
"devDependencies": {
    "serverless": "1.40.0",
    "serverless-apigw-binary": "0.4.4",
    "serverless-offline": "4.9.4",
  }

Aside from the main requirements, I'm using knex, pg, and pg-hstore to handle database connection and pug as a template engine. For local testing I'm using serverless-offline and to help the binary addition to lambda, I'm using serverless-apigw-binary.

Creating a lambda function

The process of creating a pdf goes following:

Fetch the data which we will use to create report (in my case from db with knex)
Create a html template which will be comined with the data (I'm using pug in here).
Load puppeteer and open html file with puppeteer.
Generate a pdf page with puppeteer.
Return PDF as a base64 string.

'use strict'
const chromium = require('chrome-aws-lambda')
const pug = require('pug')
const fs = require('fs')
const path = require('path')

const knex = require('./src/db')

module.exports.pdf = async (event, context) => {
  const yearMonth = ((event || {}).pathParameters || {}).yearMonth || ''
  const year = yearMonth.length == 7 && yearMonth.substring(0, 4)
  const month = yearMonth.length == 7 && yearMonth.substring(5, 6)

  // Select a date
  const selDate = new Date(year, month)
  const filter = {
    month: selDate.toLocaleString('en', { month: 'long' }),
    year: selDate.getFullYear()
  }


  // 1. Load database data wiht Knex TODO
  const result = await knex
    .select()
    .from('sales')
    .where({
      year: filter.year,
      month: selDate.getMonth() + 1
    })

  // 2. Create html
  const template = pug.compileFile('./src/template.pug')
  const html = template({ ...filter, result })

  // 3. Open puppeteer
  let browser = null
  try {
    browser = await chromium.puppeteer.launch({
      args: chromium.args,
      defaultViewport: chromium.defaultViewport,
      executablePath: await chromium.executablePath,
      headless: chromium.headless
    })

    const page = await browser.newPage()
    page.setContent(html)

    // 4. Create pdf file with puppeteer
    const pdf = await page.pdf({
      format: 'A4',
      printBackground: true,
      margin: { top: '1cm', right: '1cm', bottom: '1cm', left: '1cm' }
    })

    // 5. Return PDf as base64 string
    const response = {
      headers: {
        'Content-type': 'application/pdf',
        'content-disposition': 'attachment; filename=test.pdf'
      },
      statusCode: 200,
      body: pdf.toString('base64'),
      isBase64Encoded: true
    }
    context.succeed(response)
  } catch (error) {
    return context.fail(error)
  } finally {
    if (browser !== null) {
      await browser.close()
    }
  }
}

Deployment to AWS lambda

As earlier said, we are using Serverless for deployment so that the configuration is not too heavy.

service:
  name: PDF

plugins:
  - serverless-offline
  - serverless-apigw-binary

provider:
  name: aws
  runtime: nodejs8.10
  region: eu-central-1
  stage: ${opt:stage, 'development'}
  environment:
    ENV: ${self:provider.stage}

custom:
  apigwBinary:
    types:
      - '*/*'

functions:
  pdf:
    handler: pdf.pdf
    events:
      - http:
          path: pdf
          method: get
          cors: true

The keys in here are that we enable / for apigwBinary so that PDF goes through in a correct format.

And here we have everything to generate PDF in AWS lambda. To my opinion generating the pdf with 1024 MB took something like 4000ms which would mean that total price would be close to 1 euro per 20000 PDF generations after free tier.

If you want to try it out yourself, I have created a repository to Github.

Top comments (31)

Esteban Gatjens • Mar 20 '20

Hi,
Thanks for the article.
I have an issue where the pdf comes empty randomly, to solve it setContent should wait for everything to be loaded.

await page.setContent(html, { waitUntil: ['load', 'domcontentloaded', 'networkidle0'] });

Aki Rautio • Mar 20 '20

Thanks :) Very good point and interesting find. I haven't seen this when loading html as string but this very well can happen.

WaitUntil is very good to use and even necessary if page itself loads external content.

Maximiliano Schvindt • Apr 19 '21

Thanks for your comment! I had the same issue trying to run puppeteer in a EC2 linux instance but with your adjust it is working now.

Yash Rajesh Dave • Dec 20 '20

i tried following this step but the page of the PDf is still empty

Aki Rautio • Dec 20 '20

Do you have some content that is loaded conditionally and could cause the issue? I have seen that if there is enough long gap between two loading elements, Puppeteer will treat the page final before all the data has been loaded.

john-leal-viva • Jan 26 '22

Hi - great article, but I am getting the following error: ERROR in ../../chrome-aws-lambda/build/puppeteer/lib/Browser.js.map 1:10
Module parse failed: Unexpected token (1:10)
You may need an appropriate loader to handle this file type, currently no loaders are configured to process this file. See webpack.js.org/concepts#loaders

As it is a Webpack compilation error, do you have any suggestions?

john-leal-viva • Jan 26 '22

Do I need a typescript loader with my webpack?

Aki Rautio • Jan 26 '22

Thanks for the feedback :)

This certainly is webpack related issue and it's a bit hard to debug the issue without knowing the configuration. But the error message basically tells that you are trying handle the .map file with webpack. The map files are for debugging reason and they are not needed to be handled by webpack so there definitely is something regarding configuration.

Couple of points to check from webpack:

Puppeteer should be only be imported server side so check that your build is only for server.
You most likely can include the transpiling for node_modules in server side since the puppeteer has already been transpiled to javascript that most reasonably new nodeJs versions should understand.

john-leal-viva • Jan 26 '22

Interesting.
I attached an image of the full error message and an image of my Webpack.config file. It won't even let me deploy to lambda since there is the Webpack compilation error. There are 5 errors and each error is for a file in chrome-aws-lambda/source/puppeteer/lib/ directory.

Additionally, how might I put the transpiling on server side compared to how it would be now? That I am also unfamiliar with.

And thank you for the help. I'm an intern and junior in college and this is my first three weeks working in NodeJs!

Aki Rautio • Jan 26 '22

Hmm, could you try to reshare the images? For some reason they didn't end up to the post.

Also is there any specific reason you are using webpack for aws lambda? It might be that you don't even need it.

john-leal-viva • Jan 26 '22

Honestly you are right - I am trying with a clean slate and new service without Webpack to hopefully avoid this headache. I will let you know!

john-leal-viva • Jan 26 '22

I just want to say thank you because I have been trying to figure out how to make it work for the past day and a half. I just isolated it into its own service without Webpack and it worked in less than 10 minutes.

Aki Rautio • Jan 26 '22

Great to hear :)

Henrique de Castilhos • Jul 29 '19

Hi! Thanks for the article, it helped me a lot, I didn't know chrome-aws-lambda and it was frustrating me horrors, I couldn't use serverless puppeteer at all and I finally got it now.

Aymon Fournier • Feb 7 '20

Error: Chromium revision is not downloaded. Run "npm install" which, doesn't work

Aki Rautio • Feb 7 '20

Is this happening on aws or when running locally?

Aymon Fournier • Feb 8 '20 • Edited

Locally. Works fine on lambda, how do I run it locally?

Also thank you very much for this code, saved me 5 days of work

Aki Rautio • Feb 8 '20

The original package that we are using in here is suggesting following: github.com/alixaxel/chrome-aws-lam...

Charanjit Singh • May 25 '20

I have read about named @page rule in css, but It is not working, any idea? Why? i want to make mixture of landscape and portrait pages.

Aki Rautio • May 25 '20

Any chance you could share your CSS? I haven't tried exactly this kind of a scenario but it could be that puppeteer has some limitation regarding css.

Charanjit Singh • Jun 1 '20

      @page :first {
          display: none;
      }    
      @page { size : portrait }
      @page rotated { size : landscape }
      h3 { page : rotated }

      p, h3 {
        page-break-after: always;
      }
    </style>
    <p>First Page.</p>
    <h3>Hello world</h3>
  <p>Second Page.</p>
  <button>Print!</button>```

Snir Cohen • Dec 27 '20 • Edited

I've tried using getObject from s3, convert the html file into string, then using puppeteer and setContent (+ waitUntil with all flags) and then uploading the PDF to s3 again (used Buffer.from(pdf, 'base64')) but I get a blank page when I open the file. Any ideas?
my html string starts with <!DOCTYPE ...
should I remove that? or setContent wiped all page html data before it sets the new string I provide?
Also, I do have some script that fetch JS code and assets inside the html.

Aki Rautio • Dec 28 '20

The result of the PUG template used in an article looks like this:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <title>PDF Generator</title>
    <style>
      body {
        font-family: Helvetica;
      }
      h1 {
        font-size: 36px;
        border-bottom: 1px solid black;
      }
      h3 {
        font-size: 16px;
      }
    </style>
  </head>
  <body>
    <h1>Monthly report</h1>
    <h3>January - 2020</h3>
    <div id="body">Here comes the values</div>
  </body>
</html>

So I would guess HTML part is okay. I would suggest to debug the setup by trying the code only with puppeteer and simple html and then including javascript elements to pinpoint the part where it starts to fail.

Anwar Gul • Apr 6 '21

Thank You so much it helped a lot

nikhil9-gemini • Apr 30 '21 • Edited

Getting this following error in it:-
Error: Cannot find module '/home/nikhilsrivastva/Desktop/HR Onboarding/onboarding BE/hronboardingcodebase/services/certification/.webpack/service/apis/puppeteer/lib/Browser' at webpackContextResolve

Also, puppeteer/lib has no Browser file in it with the latest versions of puppeteer.
Someone can solve this issue?

Aki Rautio • May 4 '21

It seems like there is something related to webpack. It would help out to understand what kind of setup you have to pinpoint you to right direction. :)

aviban • Feb 11 '20

"errorMessage": "Cannot find module 'iltorb'"

Aki Rautio • Feb 11 '20

This probably could help you:
github.com/alixaxel/chrome-aws-lam...

Saurabh Sharma • Dec 11 '19

Hi,
This article was very helpful to me.
The pdf that you created, I wanted to store that pdf at S3 bucket because it's in base64, Can you guide me how can I do that?

Aki Rautio • Dec 11 '19

I haven't done this with PDF but I have another lambda function save files which I save this way. Though you could also put the PDF variable straight to body before turning it to base64 and that should work.

const s3 = new AWS.S3()

s3.upload(
{
Body: Buffer.from(pdfBase64, 'base64'),
Bucket: BUCKET,
Key: 'path for the file'
},
(err, data) => {
if (err) {
callback(err, null)
} else {
callback(null, {
statusCode: 200,
body: true,
isBase64Encoded: false
})
}
}
)

View full discussion (31 comments)