AWS Lambda SnapStart - Part 5 Measuring priming, end to end latency and deployment time with Java 11

#aws #java #serverless #coldstart

Introduction

In part 1, part 2, part 3 and part 4 of this series, we talked about the SnapStart in general, and made the first tests to compare the cold start of Lambda written in plain Java with AWS SDK for Java version 2 and using Micronaut, Quarkus, and Spring Boot Frameworks with and without SnapStart enabled. We saw that enabling SnapStart led to a huge decrease in the cold start times in all cases for our example application, but still, these cold starts were quite noticeable. In this part of the series, we are going to discuss further optimization techniques like Priming, measure end-to-end AWS API Gateway request latency with SnapStart enabled for the cold starts, and explore how much additional time the Lambda deployment with SnapStart enabled takes.

Priming

Before we talk about the priming technique, let's summarize measured cold starts with SnapStart enabled and without using priming :

Framework	p50	p90	p99
Pure Lambda	1266.05	1306.85	1326.81
Micronaut	1468.18	1595.61	1641.23
Quarkus	1337.16	1374.76	1473.87
Spring Boot	1222.52	1877.08	1879.78

The Java managed runtime uses the open-source Coordinated Restore at Checkpoint (CRaC) project to provide hook support. The managed Java runtime contains a customized CRaC context implementation that calls your Lambda function’s runtime hooks before completing snapshot creation and after restoring the execution environment from a snapshot.

SnapStart and runtime hooks give you new ways to build your Lambda functions for low startup latency. You can use the pre-snapshot hook to make your Java application as ready as possible for the first invocation. Do as much as possible within your function before the snapshot is taken. This is called priming.

Let's see how we can use priming. First, we need to add the dependency to the CRaC project to the pom.xml:

<dependency>
   <groupId>io.github.crac</groupId>
   <artifactId>org-crac</artifactId>
   <version>0.1.3</version>
</dependency>

We'll make use of org.crac.Resource interface, which provides 2 methods beforeCheckpoint and afterRestore. The first one, before completing snapshot creation, is a good place for implementing priming. In this method, we make a call to retrieve a product item from the DynamoDB with a static id (0 in this case),

productDao.getProduct("0");

which calls itself the DynamoDB Client getItem method, which
forces Jackson Marshallers to initialize, which is quite an expensive one-time operation for the life cycle of the Lambda function. This product shouldn't necessarily be presented in DynamoDB. The main goals of this call are the class loading and initialization steps. With that, we intend to lower the cold start even more.

Let's see how it works for our scenarios individually. I added priming implementation for all described cases for my example application here.

1) Pure Java example

We directly implement org.crac.Resource interface in the Lambda function handler class (you should enable SnapStart on it) itself see and call productDao.getProduct("0") in the beforeCheckpoint method. It works out of the box before completing snapshot creation. You can add additional logging in the beforeCheckpoint method and find it in CloudWatch Logs during the deployment of your Lambda function.

2) Spring Boot example

It works the same way as for the pure Java example, see.

3) Micronaut example

First, we additionally need micronaut-crac dependency in pom.xml

<dependency>
    <groupId>io.micronaut.crac</groupId>
    <artifactId>micronaut-crac</artifactId>
    <version>1.1.1</version>
    <scope>compile</scope>
</dependency>

Then we create the separate priming implementation for our example, as described in the official documentation.

This implementation basically does the same as we do in the pure Java example:

@Singleton
public class ProductAPIResource implements io.micronaut.crac.OrderedResource  {

    private final ProductDao productDao;

    public ProductAPIResource(ProductDao productDao) {
        this.productDao = productDao;
      }

    @Override
    public void beforeCheckpoint(Context<? extends Resource> context) throws Exception {
          productDao.getProduct("0");
    }

    @Override
    public void afterRestore(Context<? extends Resource> context) throws Exception {
    }
}

The main difference is that the class implements io.micronaut.crac.OrderedResource interface, which is a CRaC implementation of Micronaut Framework and has @singleton annotation on it. This together ensures that SnapStart uses the CRaC API to allow the application to execute custom code before the snapshotting or during the restoration.

4) Quarkus example

The CraC implementation is similar to the Micronaut example, as we also create a separate priming implementation, as described in the official documentation.

This implementation looks like this:

@Startup
@ApplicationScoped
public class ProductAPIResource implements org.crac.Resource {

    private static final ProductDao productDao = new DynamoProductDao();

        @PostConstruct
    public void init () {
        Core.getGlobalContext().register(this);
    }

    @Override
    public void beforeCheckpoint(org.crac.Context<? extends Resource> context) throws Exception {
     productDao.getProduct("0");
   }

    @Override
    public void afterRestore(org.crac.Context<? extends Resource> context) throws Exception {
    }
}

With 2 annotations Startup and ApplicationScoped, we ensure that SnapStart uses the CRaC API to allow the application to execute custom code before the snapshotting or during the restoration.

Now, as we explored how to add priming to all our scenarios, let's measure cold start time after 100 invocations each. We got the following results:

Framework	p50	p90	p99
Pure Lambda	352.45	401.43	433.76
Micronaut	597.91	732.01	755.53
Quarkus	459.24	493.33	510.32
Spring Boot	600.66	1065.37	1173.93

We see a huge decrease in the cold starts for all scenarios up to 900 milliseconds. Even if the effect of priming may vary from scenario to scenario (when using DynamoDB, we'll see one of the biggest possible optimizations), it's one of the must-have optimization techniques to be considered. Also, with priming, we achieved the cold starts, which are currently comparable or even lower than with GraalVM Native Image, and generally look very promising, so that they won't impact your public-facing applications that much.

Measuring end-to-end AWS API Gateway latency

Measuring the cold start times for AWS Lambda with SnapStart enabled is one thing, but it's more useful to see the full picture and therefore to measure the full end-to-end AWS API Gateway request latency.

Here are the results that I got for 100 requests that produced the cold start for each scenario, with SnapStart enabled and with priming

Framework	p50	p90	p99
Pure Lambda	877	1090	1098
Micronaut	1083	1221	1570
Quarkus	946	1094	1243
Spring Boot	1068	2021	2222

Measuring additional deployment time for the SnapStart-enabled Lambda function

It's logical that each new deployment of the Lambda function with SnapStart enabled takes longer than without enabling SnapStart because of the snapshotting and possibly pre-snapshot hook execution. We'd like to additionally measure how long it takes. I have only run my experiments with sam deploy (without the hot deployment). I excluded the time to upload the source code, which is size and framework dependent, and only focused on the deployment itself, so the difference between the pure Java and frameworks used (Micronat, Quarkus, or Spring Boot) is negligible.

Without SnapStart and without the use of version and alias in the AWS SAM template, deployment took approximately 31 seconds
Without SnapStart and with the initial creation of version and alias in the AWS SAM template, deployment took approximately 1 minute.
Without SnapStart and with creating a newer version and modifying the existing alias in the AWS SAM template, deployment took approx. 41 seconds.

Then I enabled SnapStart on 1 Lambda function, and that's why I was required to use version and alias (AutoPublishAlias: liveVersion).

With SnapStart and with inital creating of version and alias in the AWS SAM template, deployment took approx. 3 minutes.
With SnapStart and by creating a newer version and modifying the existing alias in the AWS SAM template, deployment took approx. 2 minutes and 40 seconds.

So we observe that enabling SnapStart on the 1 Lambda functions leads to an increase in the deployment time by 2 or even more minutes, depending on the scenario (creating or modifying the alias).

When we re-run the experiment with 2 Lambda functions, we observe that the deployment time increased only by several seconds in all scenarios, as SAM deploys all Lambda functions in parallel.

Conclusions and next steps

In this part of the series, we discussed further optimization techniques like Priming and observed that it significantly reduced the cold start time further. That's why we consider this technique a must-have, even if it means that we need to slightly modify the code for it. We also measured end-to-end AWS API Gateway request latency with SnapStart and priming enabled for the cold starts to see the overall result. In the end, we explored how much additional time the Lambda deployment with SnapStart enabled takes and saw that it adds 2 minutes and even more. Here is a lot of room for improvement to provide a smoother developer experience.

In the next part of the series, we'll be looking at scenarios like using other AWS services, like SQS and SNS, to see how enabling SnapStart and priming affect them. Also, we're going to test our application with steady traffic to see if it has any effect on decreasing the resume times and, therefore, the cold starts.

Please also check out my website for more technical content and upcoming public speaking activities.