Serverless applications on AWS with Lambda using Java 25, API Gateway and DynamoDB - Part 7 Lambda performance optimization approaches

#aws #java #serverless #awslambda

Introduction

In the previous articles of the series about how to develop, run, and optimize Serverless applications on AWS with Lambda using Java 25, API Gateway, and DynamoDB, we used:

Managed Java 25 runtime
GraalVM Native Image deployed as Lambda Custom Runtime

We also did Lambda performance (cold and warm starts) measurements with the following settings:

Lambda functions used 1024 MB of memory
Java compilation option "-XX:+TieredCompilation -XX:TieredStopAtLevel=1"
Lambda x86_64 architecture used
Default Apache HTTP Client (version 4.5) used to connect to the DynamoDB

In this article, we'll introduce some additional Lambda performance (cold and warm starts) optimization approaches to apply to our sample application. You'll need to measure the performance by yourself to figure out whether they will provide the desired Lambda performance improvements.

Please keep in mind that you can also deploy our sample application on AWS Lambda as a (Docker) Container Image. I didn't cover this approach, but you can look into my article series Lambda function using Docker Container Image for a step-by-step introduction on how to do it. The cold start will be quite big. Lambda SnapStart isn't available for the Lambda deployment as a Container Image. Instead, you can use Ahead-of-Time (AOT) and CDS caches for the Container Image and then measure the Lambda performance.

Lambda performance optimization approaches

To find a good balance between the cold and warm start times of the Lambda function, you can try out the optimization techniques introduced below. I have not taken any additional measurements with our sample application with Java and GraalVM 25, but have done so using older Java, GraalVM, and dependency versions. I'll provide references to my relevant articles. Measurements that I did back then might already be outdated, so I strongly recommend you to re-measure.

We can apply the following approaches to the managed Java runtime and GraalVM Native Image. For the managed Java runtime, it includes enabling SnapStart and applying the priming techniques on top:

Try out different Lambda memory settings. We performed all measurements with 1024 MB of memory for the Lambda function. With different memory settings, you might become better at the price-performance trade-off. See my article Measuring cold and warm starts and deployment time with Java 21 using different Lambda memory settings for further examples, performance measurements, and conclusions.
Try out setting Lambda arm64 architecture using the AWS Graviton2 processor, which supports SnapStart since July 2024. This can provide a better cost-performance trade-off compared to x86 architecture. See my article AWS Lambda performance with Java 21: x86 vs arm64 - Initial measurements for some insights.
Try out different synchronous HTTP clients to establish an HTTP connection to DynamoDB. We performed all measurements until now with the default synchronous Apache HTTP Client version 4.5. There are other options like UrlConnection and AWS CRT HTTP clients, which provide different performance trade-offs for the cold and warm start. See my article Measuring cold and warm starts with Java 21 using different synchronous HTTP clients for further examples, performance measurements, and conclusions. GraalVM Native Image also supports the AWS CRT HTTP Client, and I did some measurements using a pure Java Lambda function in my article Measuring cold and warm starts with GraalVM 23 and AWS CRT HTTP Client. Recently, also Apache 5.x based HTTP client has been released, so you can try it out.
Explore whether an asynchronous HTTP client for DynamoDB is an option for your use case. The default asynchronous HTTP Client is NettyNio. There is another option, the AWS CRT async HTTP client, which provides different performance trade-offs for the cold and warm starts. See my article Measuring cold and warm starts with Java 21 using different asynchronous HTTP clients for further examples, performance measurements, and conclusions.

We can apply the following approaches primarily only to the managed Java runtime on Lambda. This includes SnapStart being enabled and applying the priming techniques on top:

Try out different Java compilation options for the Lambda function. We performed all measurements until now with the compilation option "-XX:+TieredCompilation -XX:TieredStopAtLevel=1". We can provide other compilation options to the Lambda function using an environment variable called JAVA_TOOL_OPTIONS. This can have different cold and warm starts trade-offs. See my article Measuring cold and warm starts with Java 21 using different compilation options for further examples, performance measurements, and conclusions. For GraalVM Native Image, the choice of Java compilation method doesn't have much impact on the Lambda performance. This is because our application is already compiled natively.
Further exclude unused dependencies. With that, we can especially reduce the cold start times (also for SnapStart enabled); see my article Measuring cold starts with Java 21 using different deployment artifact sizes. In the case of GraalVM Native Image, only reachable Java classes, functions, and methods will become a part of the Native Image, so including unused dependencies may not help that much.

We can apply the following approach primarily to the managed Java runtime on Lambda with SnapStart enabled:

Search for further Lambda SnapStart priming potential in addition to those we introduced in this series. For this, you can use AWS Lambda Profiler Extension for Java. I described it in my article Improving Lambda performance with Lambda SnapStart and priming.

We can apply the following approach primarily to the GraalVM Native Image :

Try out Profile-Guided Optimizations to see whether you can further improve Lambda performance. The difficulty of trying out this technique is that you'll need to do some additional semi-automated steps to run your application either with the Lambda emulator locally or in an extra environment to obtain the profile of your application, which you'll then need to use to generate the optimized Native Image. You can use Lambda extension for it, but it still requires a lot of additional work. This is the work AWS did for us in case Lambda SnapStart is enabled. I really appreciate that I don't need to care about generating, encrypting, storing, and restoring the snapshots/profiles.

Conclusion

In this article, we introduced additional Lambda performance optimization approaches that we can use in our sample application. Try them out on your own to figure out whether they will provide the desired Lambda performance improvements.

Please also watch out for another series where I use a relational serverless Amazon Aurora DSQL database and additionally the Hibernate ORM framework instead of DynamoDB to do the same Lambda performance measurements.

If you like my content, please follow me on GitHub and give my repositories a star!

Please also check out my website for more technical content and upcoming public speaking activities.