Introduction
In the previous parts of our series, we measured the cold starts of the Lambda function with Java 21 runtime without SnapStart enabled, with SnapStart enabled, and also applied DynamoDB invocation priming optimization with different Lambda memory settings, Lambda deployment artifact sizes, Java compilation options, (a)synchronous HTTP clients, and the usage of different Lambda layers. For all these measurements, we used the default garbage collection algorithm G1.
In this article, we'd like to explore the impact of Java garbage collection algorithms on the performance of the Lambda function with Java 21 runtime. We'll also re-measure everything for the G1 to have comparable results with the same minor Java 21 version in use for all garbage collection algorithms.
Java Garbage collection algorithms
For our measurements, we'll use the following Java collection algorithms with their default setting (please refer to the linked documentation for more detailed information about each algorithm):
- Garbage-First (G1) Garbage Collector. This is the garbage collection algorithm used by default. You can set it explicitly in AWS SAM template by adding -XX:+UseG1GC to the JAVA_TOOL_OPTIONS environment variable.
- The Parallel Collector. You can set it explicitly in AWS SAM template by adding -XX:+UseParallelGC to the JAVA_TOOL_OPTIONS environment variable.
- Shenandoah GC. Oracle JDK doesn't provide it, but Amazon Corretto 21 JDK does. You can set it explicitly in AWS SAM template by adding -XX:+UseShenandoahGC to the JAVA_TOOL_OPTIONS environment variable.
- The Z Garbage Collector. There are 2 different ZGC algorithms: default and the newer one- generational. You can set it explicitly in AWS SAM template by adding -XX:+UseZGC or -XX:+UseZGC -XX:+ZGenerational to the JAVA_TOOL_OPTIONS environment variable.
Measuring cold and warm starts with Java 21 using different garbage collection algorithms
In our experiment, we'll use a slightly modified application introduced in part 9. You can find the application code here. There are basically 2 Lambda functions, which both respond to the API Gateway requests and retrieve a product by id received from the API Gateway from DynamoDB. One Lambda function, GetProductByIdWithPureJava21LambdaWithGCAlg, can be used with and without SnapStart, and the second one, GetProductByIdWithPureJava21LambdaAndPrimingWithGCAlg, uses SnapStart and DynamoDB request invocation priming.
The results of the experiment below were based on reproducing more than 100 cold and approximately 100.000 warm starts with the experiment, which ran for approximately 1 hour. For it (and experiments from my previous article), I used the load test tool hey, but you can use whatever tool you want, like Serverless-artillery or Postman. We run experiments by giving Lambda functions 1024 MB memory and using JAVA_TOOL_OPTIONS: "-XX:+TieredCompilation -XX:TieredStopAtLevel=1" (Java client compilation without profiling), which has a very good trade-off between cold and warm start times.
Unfortunately, I couldn't make the Lambda function start with the Z Garbage Collector (with both the default and generational ones), running into the error :
Failed to commit memory (Operation not permitted)
[error][gc] Forced to lower max Java heap size from 872M(100%) to 0M(0%)
[error][gc] Failed to allocate initial Java heap (512M)
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
It tried out bigger memory setting as 1024 like 2048 MB and even more MBs, but the same error still appeared.
Let's look into the results of our measurements with the other 3 garbage collection algorithms.
Abbreviation c is for the cold start, and w is for the warm start.
Cold (c) and warm (w) start time without SnapStart enabled in ms:
| GC Algorithm | c p50 | c p75 | c p90 | c p99 | c p99.9 | c max | w p50 | w p75 | w p90 | w p99 | w p99.9 | w max |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| G1 | 3655.17 | 3725.25 | 3811.88 | 4019.25 | 4027.30 | 4027.83 | 5.46 | 6.10 | 7.10 | 16.79 | 48.06 | 1929.79 |
| Parallel Collector | 3714.10 | 3789.09 | 3857.87 | 3959.44 | 4075.89 | 4078.25 | 5.55 | 6.20 | 7.10 | 15.38 | 130.13 | 2017.92 |
| Shenandoah | 3963.40 | 4019.25 | 4096.30 | 4221.00 | 4388.78 | 4390.76 | 5.82 | 6.45 | 7.39 | 17.06 | 71.02 | 2159.21 |
Cold (c) and warm (w) start time with SnapStart enabled without Priming in ms:
| GC Algorithm | c p50 | c p75 | c p90 | c p99 | c p99.9 | c max | w p50 | w p75 | w p90 | w p99 | w p99.9 | w max |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| G1 | 1867.27 | 1935.68 | 2152.02 | 2416.57 | 2426.25 | 2427.35 | 5.47 | 6.11 | 7.05 | 17.41 | 51.24 | 1522.04 |
| Parallel Collector | 1990.62 | 2047.12 | 2202.07 | 2402.12 | 2418.99 | 2419.32 | 5.68 | 6.35 | 7.45 | 18.04 | 147.83 | 1577.21 |
| Shenandoah | 2195.47 | 2301.07 | 2563.37 | 3004.89 | 3029.01 | 3030.36 | 5.73 | 6.41 | 7.51 | 17.97 | 75.00 | 1843.34 |
Cold (c) and warm (w) start time with SnapStart enabled and with DynamoDB invocation Priming in ms:
| GC Algorithm | c p50 | c p75 | c p90 | c p99 | c p99.9 | c max | w p50 | w p75 | w p90 | w p99 | w p99.9 | w max |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| G1 | 833.50 | 875.34 | 1089.53 | 1205.26 | 1269.56 | 1269.8 | 5.46 | 6.10 | 7.16 | 16.39 | 46.19 | 499.13 |
| Parallel Collector | 900.18 | 975.12 | 1058.41 | 1141.94 | 1253.17 | 1253.99 | 5.82 | 6.61 | 7.75 | 16.87 | 49.64 | 487.73 |
| Shenandoah | 1065.84 | 1131.71 | 1331.96 | 1473.44 | 1553.59 | 1554.95 | 5.77 | 6.40 | 7.39 | 17.20 | 65.06 | 500.48 |
Conclusion
In this article, we explored the impact of Java garbage collection algorithms (G1, Parallel Collector, and Shenandoah) on the performance of the Lambda function with Java 21 runtime. We saw quite a bit of a difference between the performance of those algorithms. Using the default settings with G1 (the default one), we experience ( sometimes by far) the lowest cold and warm start times. By using SnapStart with priming of the DynamoDB request, the performance results are, as expected, much closer to each other.
Please refer to the documentation of each garbage collection algorithm to tune settings like mix and max memory, which can provide significant improvement in performance, and do your own measurements.
Top comments (0)