Javier Toscano

Posted on Aug 15, 2021

Create PDF documents with AWS Lambda + S3 with NodeJS and Puppeteer

#serverless #aws #javascript #node

This post was originally posted on my blog

Intro

Recently I had to create two serverless functions for a client that needed to create a PDF document from an existing HTML format and merge it with another PDF documents provided by users in an upload form.

In this article, we will use examples based on real-world applications.
Going through project configuration, AWS configuration, and project deployment.

Content

Setting Up
Setting up serverless configuration
Setting up a Lambda Layer
Working with Puppeteer
Uploading PDF to S3
Deploying to AWS

TL;DR:

Lambda function Github Repo
Login demo app Github Repo

Setting Up

Serverless Framework

We will be using the Serverless Framework to deploy easily our resources to the cloud.

Open up a terminal and type the following command to install Serverless globally using npm.

npm install -g serverless

Initial Project Setup

Create a new serverless project:

serverless create --template aws-nodejs --path pdf-generator

This is going to create a new folder named pdf-generator with two files on it handler.js and serverless.yml. For now, we will leave the files as-is.

Installing Dependencies.

We will need the following dependencies to work with puppeteer on our project.

chrome-aws-lambda: Chromium Binary for AWS Lambda and Google Cloud Functions.
puppeteer-core: Puppeteer-core is intended to be a lightweight version of Puppeteer for launching an existing browser installation or for connecting to a remote one.
aws-sdk: AWS SDK Library to interact with AWS Services.
serverless-webpack: A Serverless v1.x & v2.x plugin to build your lambda functions with Webpack.
node-loader: Allows to connect native node modules with .node extension.

npm install chrome-aws-lambda puppeteer-core
npm install -D aws-sdk node-loader serverless-webpack

Configuring Webpack

Once we have our project dependencies installed, we are going to configure Webpack, to package our code and reduce the size of our cloud function, this will save us a lot of problems since lambdas can hit around 1GB of space, and sometimes AWS rejects our package because of the size.

Create the file webpack.config.js on our project root, and add the following code:

module.exports = {
  target: "node",
  mode: "development",
  module: {
    rules: [
      {
        test: /\.node$/,
        loader: "node-loader",
      },
    ],
  },
  externals: ["aws-sdk", "chrome-aws-lambda"],
};

In the code above we are setting the following options to Webpack:

We are using development mode, so our code isn't minified and we can trace errors with AWS CloudWatch
We are importing node modules to our bundle using node-loader
We are excluding aws-sdk and chrome-aws-lambda from our bundle since AWS has a built-in aws-sdk library and for chrome-aws-lambda we are going to use a Lambda Layer since Webpack can't bundle the library as-is

Setting up serverless configuration

Next, we are going to configure our serverless.yml file, for now, we will add some environment variables, a lambda layer to use chrome-aws-lambda, and add Webpack to the list of plugins.

First, we define global variables to use along all of our functions.

custom:
  app_url: https://puppeteer-login-demo.vercel.app
  app_user: admin@admin.com
  app_pass: 123456789

Here we are defining custom properties that we can access in our configuration file using the syntax ${self:someProperty} in our case, we can access our properties using the following syntax ${self:custom.someProperty}

Now we define our environment variables inside our function to allow our handler to access these variables.

functions:
  generate-pdf:
    handler: handler.handler
    environment:
      APP_URL: ${self:custom.app_url}
      APP_USER: ${self:custom.app_user}
      APP_PASS: ${self:custom.app_pass}

Now add the plugins section at the end of our file, so we can use Webpack with our lambdas.

plugins:
  - serverless-webpack

package:
  individually: true

So far our serverless.yml should look like the following:

service: pdf-generator
frameworkVersion: '2'

custom:
  app_url: https://puppeteer-login-demo.vercel.app
  app_user: admin@admin.com
  app_pass: 123456789

provider:
  name: aws
  stage: dev
  region: us-east-1
  runtime: nodejs12.x
  lambdaHashingVersion: 20201221

functions:
  generate-pdf:
    handler: handler.handler
    environment:
      APP_URL: ${self:custom.app_url}
      APP_USER: ${self:custom.app_user}
      APP_PASS: ${self:custom.app_pass}

plugins:
  - serverless-webpack

package:
  individually: true

Setting up a Lambda Layer

To use the library chrome-aws-lambda we need to use it as an external library, for this, we can create our own Lambda Layer or use a community hosted one.

Here I'll explain both options and you can decide whenever option you want to use it.

Own Hosted Layer

First, we have to package the library as a zip file, open up the terminal, and type:

git clone --depth=1 https://github.com/alixaxel/chrome-aws-lambda.git && \
cd chrome-aws-lambda && \
make chrome_aws_lambda.zip

The above will create a chrome-aws-lambda.zip file, which can be uploaded to your Layers console.

Community Hosted Layer

This repository hosts a Community Lambda Layer so we can use it directly on our function. At this time the latest version is 24

arn:aws:lambda:us-east-1:764866452798:layer:chrome-aws-lambda:24

Now we have to add this layer to our serverless.yml file and specify that our function is going to use this layer, in this case, we are going to use the community version.

functions:
  generate-pdf:
    handler: handler.handler
    layers:
      - arn:aws:lambda:us-east-1:764866452798:layer:chrome-aws-lambda:24

Working with Puppeteer

Now that our project is configured, we are ready to start developing our lambda function.

First, we start loading the chromium library and creating a new instance in our handler.js file to work with Puppeteer.

"use strict";
const chromium = require("chrome-aws-lambda");

exports.handler = async (event, context) => {
  let browser = null;

  try {
    browser = await chromium.puppeteer.launch({
      args: chromium.args,
      defaultViewport: chromium.defaultViewport,
      executablePath: await chromium.executablePath,
      headless: chromium.headless,
      ignoreHTTPSErrors: true,
    });

    const page = await browser.newPage();
  } catch (e) {
    console.log(e);
  } finally {
    if (browser !== null) {
      await browser.close();
    }
  }
};

In this example, we will use an app that needs login to view the report that we want to convert to PDF, so first, we are going to navigate to the login page and using the environment variables to simulate a login to access the report.

    await page.goto(`${process.env.APP_URL}/login`, {
      waitUntil: "networkidle0",
    });
    await page.type("#email", process.env.APP_USER);
    await page.type("#password", process.env.APP_PASS);
    await page.click("#loginButton");
    await page.waitForNavigation({ waitUntil: "networkidle0" });

In the above code we carry out the following steps:

Navigate to the login page
Search for the input with ID email and password and type the user and password credentials from the env variables.
Click on the button with ID loginButton
Wait for the next page to be fully loaded (in our example we are being redirected to a Dashboard)

Now we are logged in, so we can navigate to the report URL that we want to convert to a PDF file.

    await page.goto(`${process.env.APP_URL}/invoice`, {
      waitUntil: ["domcontentloaded", "networkidle0"],
    });

Here we go to the invoice page and wait until the content is fully loaded.

Now that we are on the page that we want to convert, we create our PDF file and save it on the buffer to save it later to AWS S3.

      const buffer = await page.pdf({
        format: "letter",
        printBackground: true,
        margin: "0.5cm",
      });

in the above code we added a few options to the pdf method:

format: the size of our file
printBackground: print background graphics
margin: add a margin of 0.5cm to the print area

So far our handler.js should look like this:

"use strict";
const chromium = require("chrome-aws-lambda");

exports.handler = async (event, context) => {
  let browser = null;

  try {
    browser = await chromium.puppeteer.launch({
      args: chromium.args,
      defaultViewport: chromium.defaultViewport,
      executablePath: await chromium.executablePath,
      headless: chromium.headless,
      ignoreHTTPSErrors: true,
    });

    const page = await browser.newPage();

    await page.goto(`${process.env.APP_URL}/login`, {
      waitUntil: "networkidle0",
    });
    await page.type("#email", process.env.APP_USER);
    await page.type("#password", process.env.APP_PASS);
    await page.click("#loginButton");
    await page.waitForNavigation({ waitUntil: "networkidle0" });

    await page.goto(`${process.env.APP_URL}/invoice`, {
      waitUntil: ["domcontentloaded", "networkidle0"],
    });

    const buffer = await page.pdf({
      format: "letter",
      printBackground: true,
      margin: "0.5cm",
    });
  } catch (e) {
    console.log(e);
  } finally {
    if (browser !== null) {
      await browser.close();
    }
  }
};

Uploading PDF to S3

Currently, we can generate our PDF file using Puppeteer, now we are going to configure our function to create a new S3 Bucket, and upload our file to S3.

First, we are going to define in our serverless.yml file, the resources for the creation and usage of our S3 bucket.

service: pdf-generator
frameworkVersion: '2'

custom:
  app_url: https://puppeteer-login-demo.vercel.app
  app_user: admin@admin.com
  app_pass: 123456789
  bucket: pdf-files

provider:
  name: aws
  stage: dev
  region: us-east-1
  iam:
    role:
      statements:
        - Effect: Allow
          Action:
            - s3:PutObject
            - s3:PutObjectAcl
          Resource: "arn:aws:s3:::${self:custom.bucket}/*"
  runtime: nodejs12.x
  lambdaHashingVersion: 20201221

functions:
  generate-pdf:
    handler: handler.handler
    timeout: 25
    layers:
      - arn:aws:lambda:us-east-1:764866452798:layer:chrome-aws-lambda:24
    environment:
      APP_URL: ${self:custom.app_url}
      APP_USER: ${self:custom.app_user}
      APP_PASS: ${self:custom.app_pass}
      S3_BUCKET: ${self:custom.bucket}

plugins:
  - serverless-webpack

package:
  individually: true

resources:
  Resources:
    FilesBucket:
      Type: AWS::S3::Bucket
      Properties:
        BucketName: ${self:custom.bucket}

Here we defined our resource FilesBucket that Serverless is going to create, and we also defined the permissions that our Lambda has over the Bucket, for now, we just need permission to put files.

Now in our handler.js we load the AWS library and instance a new S3 object.

const AWS = require("aws-sdk");
const s3 = new AWS.S3({ apiVersion: "2006-03-01" });

Now, we just need to save our buffer variable to our S3 Bucket.

    const s3result = await s3
      .upload({
        Bucket: process.env.S3_BUCKET,
        Key: `${Date.now()}.pdf`,
        Body: buffer,
        ContentType: "application/pdf",
        ACL: "public-read",
      })
      .promise();

    await page.close();
    await browser.close();

    return s3result.Location;

Here we uploaded our file to our Bucket, closed our chromium session, and returned the new file URL.

Deploying to AWS

First, we need to add our AWS Credentials to Serverless in order to deploy our functions, please visit the serverless documentation to select the appropriate auth method for you.

Now, open the package.json file to add our deployment commands.

  "scripts": {
    "deploy": "sls deploy",
    "remove": "sls remove"
  },

Here we added 2 new commands, deploy and remove, open up a terminal and type:

npm run deploy

Now our function is bundled and deployed to AWS Lambda!

DEV Community