Wojciech Matuszewski for AWS Community Builders

Posted on Nov 14, 2021

Getting the most of AWS Lambda free compute - wrapper scripts

#aws #serverless #lambda #cdk

Since the service's inception, the AWS Lambda quickly rose to the status of the go-to tool in many developers' toolbelt. This course of events is entirely understandable. A versatile compute platform with per millisecond billing is a dream come true for developers alike.

This blog post will show you how to squeeze even more value from the AWS Lambda service. By utilizing a particular part of the invocation lifecycle, we will get away with up to ten seconds of free AWS Lambda compute.

Let us dive in.

All the code from this article is available on my GitHub.

AWS Lambda invocation lifecycle

Here is an image representing various stages your function goes through during an invocation. The source of the image, the AWS documentation page is a great resource to learn more about nitty-gritty details about AWS Lambda service itself.

I will not be spending much time here – describing each phase is out of scope for this blog post. The most critical information I would like you to keep in mind is that the Init phase is entirely free. You do not pay for it. The Init phase can last up to ten seconds.

Keep all of this in mind as our entire master plan of using AWS Lambda as free compute is based on this idea.

You are most likely already utilizing the `Init` phase

Since the Init phase is all the code executed outside of AWS Lambda primary handler, by leveraging some of the best practices – like initializing third-party libraries outside the main AWS Lambda handler, you are already taking advantage of the free compute time.

Here is an example of initializing third-party dependencies outside the primary AWS Lambda handler written in Node.js.

// The next two lines are executed in the `Init` phase.
import S3 from "aws-sdk/clients/s3";
const s3Client = new S3();

export const handler = async () => {...}

And here is an example written in Golang.

package main

import (
    "context"
    "github.com/aws/aws-lambda-go/lambda"
)

// The whole `main` function is executed in the `Init` phase.
func main() {
    s3Client := s3.NewFromConfig(getConfig());
    // I could easily perform a network request here if I wanted to.
    lambda.Start(handler)
}

func handler(ctx context.Context) error {...}

To achieve our objective of free compute, all we have to do is pack as much code as possible before our handler executes.

The Node.js problem

The main function in languages like Golang and Rust executes in the Init phase allowing developers to perform all kinds of operations, including waiting for HTTP responses, for free.

Sadly this is not the case with the Node.js language in the context of AWS Lambda handlers. Since, to my best knowledge, AWS Lambda Node.js runtime does not support the top-level await feature (despite supporting Node.js 14.x), developers writing lambda functions are constrained to only synchronous actions in their Init phase code.

import S3 from "aws-sdk/clients/s3";

const s3Client = new S3();

// The following will not work, even when deployed as Node.js 14.x ESM handler.
const s3Result = await s3Client.getObject({Bucket: "bucketName", Key: "file.txt"}).promise()

// The following will work, but the result will not be available to you in the handler.
// This is because Node.js will NOT wait for the promise inside the IFFIE to resolve before running the handler.
let otherS3Result;
(async () => {
    otherS3Result = await s3Client.getObject({Bucket: "bucketName", Key: "file.txt"}).promise()
})()

export const handler = async () => {...}

Inability to wait for asynchronous code in the Init phase of AWS Lambda execution environment greatly hinders our options. From my experience, most lambda functions spend time idling - waiting for HTTP responses for requests fired in the main handler code.

Would not it be nice to move some of that idling to the Init phase, so we do not have to pay for it?

It turns out there is – enter AWS Lambda wrapper scripts.

Wrapper scripts to the rescue

I first learned about AWS Lambda wrapper scripts while reading about AWS Lambda extensions and topics regarding AWS Lambda custom runtimes.

Wrapper scripts are, well, scripts designed to allow you to augment the runtime environment. Think of passing special flags to the Node.js runtime like --enable-source-maps.

In addition to passing command line arguments to your runtime of choice, AWS Lambda wrapper scripts allow you to perform arbitrary shell scripts or invoke binaries – all of them executed in the Init phase of AWS Lambda execution environment.

Let us explore how one might leverage this fact to circumvent the inability to reliably wait for asynchronous operations in the Init phase in the context of Node.js AWS Lambda runtime.

Writing the scripts

Let us say we want to fetch a piece of data from S3 needed for our handler to operate correctly – think configuration file. We do not want to do that inside the handler definition since we would be paying for idle time.

Instead, armed with the knowledge about wrapper scripts, let us write a Node.js script to do that for us. We could then execute that script as a part of the wrapper script logic giving us confidence that the S3 data will be there for the handler to consume it before its invocation.

The first step is to write s3_downloader the script. The following is a sample and self-contained Node.js script whose role is to fetch a configuration file from S3.

// wrapper-script/index.js

const S3 = require("aws-sdk/clients/s3");
const fs = require("fs-extra");

const s3Client = new S3();

async function main() {
  const result = await s3Client
    .getObject({
      Key: process.env.CONFIG_FILE_KEY,
      Bucket: process.env.BUCKET_NAME
    })
    .promise();

  await fs.writeJson(
    process.env.CONFIG_FILE_PATH,
    JSON.parse(result.Body.toString()),
    {
      encoding: "utf-8"
    }
  );
}

process.on("unhandledRejection", () => {
  console.log("unhandled error", e);

  process.exit(1);
});

main();

Since Node.js waits for the event loop to be empty before exiting, we are guaranteed to have our main function fully executed before the program exits.

Take note of the environment variables. Since the environment variables specified for the AWS Lambda function are available in the Init phase, they are also available to us in the wrapper scripts and every binary or script they invoke.

The second step is to write the wrapper script itself. As I eluded earlier, its primary purpose is to provide additional parameters to the runtime of your choice, but in our case, the script will be invoking the Node.js script from above.

Here is an example of the wrapper script that invokes the index.js script depicted earlier.

#wrapper-script/wrapper-script

#!/bin/bash

args=("$@")

script_full_path=$(dirname $(readlink -f "$0"))

node ${script_full_path}/index.js

s3_downloader_script_result=$?

if [ $s3_downloader_script_result -ne 0 ]; then
    echo "Error: 's3_downloader_script' script returned with non-zero status code: $s3_downloader_script_result"
    exit $s3_downloader_script_result
fi

exec "${args[@]}"

Here, it is crucial to handle the errors correctly. Remember – unless we explicitly exit from the script, your handler will be invoked (the exec "${args[@]}" part). There would be no point in invoking our AWS Lambda handler if we did not manage to download the config file from S3.

Of course, I'm not that smart to write such scripts on my own. The wrapper script logic was heavily inspired by this great article.

Writing it up together

With the scripts written, all that is left is to wire the pieces together and deploy the infrastructure. I'm going to use AWS CDK for that.

Firstly let us tackle the wrapper script deployment. Wrapper scripts are deployed as layers. The wrapper script and an AWS Lambda handler connect through a particular environment variable. More on that when during the handler configuration.

The following is a declaration of AWS Lambda layer containing our scripts.

import * as lambda from "@aws-cdk/aws-lambda";

// Other AWS-CDK code...

const wrapperScriptLayer = new lambda.LayerVersion(this, "wrapperScriptLayer", {
  code: lambda.Code.fromAsset(join(__dirname, "../wrapper-script"))
});

Next up on our TODO list is the AWS Lambda handler and the S3 bucket. Luckily for us, AWS CDK makes it easy to define those resources.

import * as lambda from "@aws-cdk/aws-lambda";
import * as s3 from "@aws-cdk/aws-s3";

// Other AWS-CDK code...

const assetsBucket = new s3.Bucket(this, "assets-bucket");

const handler = new lambda.Function(this, "handler", {
  runtime: lambda.Runtime.NODEJS_14_X,
  handler: "index.handler",
  code: lambda.Code.fromInline(`
    const fs = require("fs");
    module.exports.handler = async () => {
      const configFile = fs.readFileSync(process.env.CONFIG_FILE_PATH, "utf8");
      console.log(configFile.toString());
    }
  `),
  layers: [wrapperScriptLayer],
  environment: {
    CONFIG_FILE_PATH: "/tmp/config.json",
    AWS_LAMBDA_EXEC_WRAPPER: "/opt/wrapper-script",
    BUCKET_NAME: assetsBucket.bucketName,
    CONFIG_FILE_KEY: "config.json"
  }
});

assetsBucket.grantRead(handler);

There are three crucial things to keep in mind:

By specifying the AWS_LAMBDA_EXEC_WRAPPER environment variable, you tell the AWS Lambda runtime where the wrapper script location. Please note that contents of AWS Lambda layers are unpacked into the /opt directory.
The wrapper script shares IAM permissions with your AWS Lambda handler.
The wrapper script shares the environment variables with your AWS Lambda handler.

Running the handler

With the infrastructure and the code in place, let us run the AWS Lambda function and see if the configuration file is available to our handler.

Notice the massive difference between the Init duration and the Billed duration. Keep in mind that we are only paying for the Billed duration. The more compute you push into the Init phase, the less you will pay for some (more on that later) of your AWS Lambda invocations.

In my test scenario, the configuration file size is trivial – the cost gain from using the wrapper script to fetch the data is negligible. But the more time you spend in the Init phase, the more significant your cost savings will be.

Too good to be true

If something sounds too good to be true, it probably is. The topic this blog post touches on is no exception to this rule.

As good as wrapper scripts are, there are some significant limitations that you must consider before incorporating this solution into your codebase.

Apart from the limit of having a maximum of ten seconds for the Init phase, the most notable thing to consider is that the wrapper script is only invoked on AWS Lambda cold start. This makes it an ideal solution for fetching configuration data or performing static compute tasks, but do not expect to build a free API with this method (If you manage to do that, please let me know).