Artur Bartosik

Posted on Feb 17, 2023 • Edited on May 11, 2023

Cold starts with SnapStart for Java Frameworks (Spring Boot vs Quarkus vs Micronaut)

#serverless #java #quarkus #springboot

At the last re:Invent 2022 AWS gave a lot of attention to the term Serverless. The main Keynote of AWS CTO Dr. Werner Vogels was very saturated with asynchronous approach and event-driven architecture. Also, new announcements were very closely related to these topics e.g. Step Functions & Event Bridge improvements.

Apart from all these trendy announcements, AWS also shows that it is trying to invest in technologies very cross-sectionally. It doesn’t cut itself off from Java technology, among others. This is evidenced by one of the largest Serverless announcements - SnapStart.

Due to the fact that I am associated with Java technology from the beginning of my professional career, I was very excited about this announcement. I was aware that so far JVM-based languages are not the main players for Lambda functions. Mainly because of its long cold start.
I haven't checked in a long time to see if anything has improved. So I decided that the release of SnapStart is a good time to evaluate cold starts for Java and its frameworks - Spring Boot, Quarkus, and Micronaut.

What is SnapStart?

Typically, AWS Lambda set up a new execution environment each time a function is first invoked or when the function is scaled up to handle increased traffic. As you probably know applications written in Java before accepting traffic need some time to initialize and start-up. This is the nature of JVM. SnapStart has been created to address this issue.

When SnapStart is enabled Lambda ahead of time creates a snapshot of initialized execution environment (memory and disk state) and persists it in the cache for low-latency access. This eliminates the need for the function to spend time on initialization (when the event came), as Lambda can quickly resume from the persisted snapshot instead.

I said "ahead of time" because Snapshot creation happens when you publish a function version and SnapStart works only for the published version of the Lambda function (can’t just use $LATEST).
Normally Init phase is the stage during Lambda performs multiple tasks like preparing the runtime container, downloading function code, initializing it, and so on… and then moving to the next phase. Init phase is Limited to 10 seconds. When SnapStart is activated, the Init phase happens earlier - yes, yes… when you publish a function version. In this case, 10-second timeout doesn't apply. Snapshot initialization can take up to 15 minutes.

Someone more curious might ask how the snapshot of the initiated function is possible to create? The answer is hidden behind a few magic terms.
First of all - CRaC (Coordinated Restore at Checkpoint) - open source project led by OpenJDK. It's focused on creating Java API responsible for saving and restoring the state of a JVM, including the currently running application - so-called checkpointing. CRaC based on next key project - CRIU (Checkpoint/Restore in Userspace) - that allows application running on Linux system to be paused and restarted at some point later in time, potentially on a different machine. The last key piece of this SnapStart puzzle is Firecracker and his microVMs. SnapStart uses micro Virtual Machine (microVM) snapshots to checkpoint and restore full applications. Interestingly, it turns out that Amazon engineers from Firecracker and Corretto (AWS JDK distribution) teams were involved in CRaC project at the early stage.

This means that AWS has long since taken the first steps to address the cold start problem for Java. It confirms my thesis that AWS invests in breakthrough technologies and knows that Java and JVM are still important in the IT market.

But unfortunately, not everything is so beautiful. SnapStart and the methods on which it is based introduced some challenges - Due to the fact that it operates on a memory dump.

Randomness - all results of java.util.Random operations can be the same so use java.security.SecureRandom instead because Amazon handles it in Corretto. But if one of your dependencies uses the first one, you still be in trouble.
Connections keeping - state of connections that your function establishes during the initialization phase isn't guaranteed when Lambda resumes from a snapshot. In most cases, network connections that an AWS SDK establishes automatically resume but for other connections you need to handle it on your own.
Stale credentials - the created snapshot also caches things like injected secrets and passwords (of course the whole snapshot is encrypted). Passwords can be rotated automatically. However, the snapshot can be used for a long period of time and not know anything about the password change. Our snapshot is immutable so will continue to use stale credentials. This applies not only to secrets. You need to protect yourself against such a case for any frequently changed data that you pull from an external sources into function memory. But don’t worry, you have tools to handle it e.g. post-snapshot hook.
Other lacks of support for AWS Lambda - provisioned concurrency, arm64 architecture, EFS, larger ephemeral storage (max 512 MB)

I'm curious if SnapStart will remain a functionality reserved only for Java runtime or maybe it will turn out that AWS prepares a similar trick for other runtimes. Presumably, other runtimes can't use snapshotting in quite the same way as JVM and others wouldn’t make sense to even attempt eg. Golang. But, I would like to see a SnapStart kind of solution for Node, which also can have hiccups in cold start ... especially with a large number of dependencies.

Spring Boot vs Quarkus vs Micronaut - introduction of competitors

Before we get to the merits, a brief introduction of competitors. Overall, all 3 frameworks are similar in terms of functionality and are suitable for building web apps and microservices, but they have different design goals and trade-offs.

BTW all sources and codes, as always, can be found on my GitHub

Spring Boot

Spring Boot is the most widely adopted and well-established of the three frameworks. Pivotal product has also the largest and most active community. It has been around for over a decade. It provides a wide range of features and is highly configurable, making it a good choice for large and complex applications. However, Spring Boot is more resource-intensive than Quarkus and Micronaut. Spring Boot uses a traditional, Just-in-Time (JIT) compilation approach, which can result in longer startup times compared to Quarkus and Micronaut, which use Ahead-of-Time (AOT) compilation. AOT compilation pre-compiles the code at build time, resulting in faster startup times and smaller memory footprint. Also, runtime dependency injection adds some overhead and complexity to spring-based workloads.

Quarkus

Quarkus is a relatively new framework that aims to provide the same functionality as Spring Boot, but with a smaller footprint and faster startup time. A project initiated by RedHat was created to be used for native compilation for GraalVM. It aims to be effective platform for serverless, cloud, and Kubernetes environments. Quarkus uses Ahead-of-Time (AOT) compilation to reduce startup time and memory usage. The community strongly appreciates the speed and convenience of development. More and more projects boast of migrating microservice workloads from Spring Boot to Quarkus. At the end I can add that the documentation is really good.

Micronaut

Micronaut like Quarkus is a relatively newer framework, but it has been gaining popularity in recent years. It has a very spring-inspired programming model. It also uses Reactor (instead of Vert.x that Quarkus use). So if you are coming from a Spring world, in Micronaut you will find many similar patterns, techniques e.g. Mono and Flux from Reactor core. At the same time Micronaut aims to avoid downsides of Spring. It minimizes using reflections and proxies and doesn't use runtime bytecode generation. The source I found says that performance is a tiny bit better with Quarkus, but it's just negligible value.

Measurements and charts

Let's move on to the main point - measurements. They were the main reason for writing this article. I wanted to measure how a cold start looks in 2023 for Java and its most popular frameworks. How much SnapStart makes things better. If SnapStart also affect warm start? How resource changes (Lambda memory) affect a cold & warm start performance? I hope the charts and tables below will help you answer these questions.

Vanilla Java

Non SnapStart		Cold Start (ms)				Warm Start (ms)
memory (MB)	error rate	p50	p90	p99	max	p50	p90	p99	max
128	0%	754.9	790.8	826.9	904	11.3	37	228.2	275.4
256	0%	566.6	599.9	666.9	676.1	1.9	15.4	107.1	328.2
512	0%	549.4	474.3	502.1	529.5	1.6	8.4	50.9	97.3
1024	0%	426.1	445.5	466.1	489.7	1.6	3.3	20.6	25.5
4096	0%	301.7	327.7	415.9	450.4	1.5	2.5	13.2	21.1

SnapStart		Cold Start (ms)				Warm Start (ms)
memory (MB)	error rate	p50	p90	p99	max	p50	p90	p99	max
128	0%	705.3	773.3	817.8	896.2	17.4	52.6	268	479.7
256	0%	401.6	447.9	473.2	536.6	7.7	20.3	120.2	214.3
512	0%	231.9	261.9	311.7	1174.6	1.7	9.7	53.5	135.6
1024	0%	203.9	231.6	367.3	399.1	1.6	3.7	24.8	53.4
4096	0%	241.5	353.4	484.1	501.4	1.5	2.6	14	25.7

Java Cold start median - SnapStart comparison — Java Cold start median

Java Warm start median - SnapStart comparison — Java Warm start median

Spring Boot

Non SnapStart		Cold Start (ms)				Warm Start (ms)
memory (MB)	error rate	p50	p90	p99	max	p50	p90	p99	max
128	100%	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
256	10,5%	5584.1	6867.7	7119.3	7157.4	28.3	1135.7	3582.5	3808.8
512	0%	3515.4	3647.8	3725.2	3762.8	12.1	20.2	52.9	180.6
1024	0%	3396.6	3512.3	3599.6	3599.6	3.9	9.2	18.5	94.7
4096	0%	2366.4	2525.2	3127.5	3191.1	3.4	5	10.6	33.4

SnapStart		Cold Start (ms)				Warm Start (ms)
memory (MB)	error rate	p50	p90	p99	max	p50	p90	p99	max
128	100%	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
256	60.3%	3920	5027.8	5149.8	5173.7	2399.7	3717.8	3931.8	4141.4
512	0%	515.3	554.2	598.7	611.4	5.6	18.6	37.1	54.3
1024	0%	347.3	381.1	451.6	1270	3.8	9.1	17.1	32.3
4096	0%	350.4	417.3	604.7	641	3.6	5.7	16.9	65.1

Spring Boot Cold start median - SnapStart comparison — Spring Boot Cold start median

Spring Boot Warm start median - SnapStart comparison — Spring Boot Warm start median

Quarkus

Non SnapStart		Cold Start (ms)				Warm Start (ms)
memory (MB)	error rate	p50	p90	p99	max	p50	p90	p99	max
128	100%	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
256	0%	3452.7	3543.6	3732.7	3757.3	50.5	73.1	213.8	317.2
512	0%	2738.2	2818.7	2890	2899.4	16.5	34	93.1	189.5
1024	0%	2305.7	2387.7	2512.6	4079.8	5.8	11.4	14.5	70.2
4096	0%	1676.2	1823	2028.8	2046.3	4.4	7.7	18.4	42.6

SnapStart		Cold Start (ms)				Warm Start (ms)
memory (MB)	error rate	p50	p90	p99	max	p50	p90	p99	max
128	100%	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
256	0%	1918.5	1977.8	2034.1	2063.4	48.8	63.5	168.2	264.1
512	0%	1059.1	1115.8	1144.8	1148.2	16	33.7	94.6	172.5
1024	0%	583.38	622	690.4	711.2	5.5	13.6	37.9	64.3
4096	0%	455.7	498.6	556	566.1	4.3	7.4	20.2	57.2

Quarkus Cold start median - SnapStart comparison — Quarkus Cold start median

Quarkus Warm start median - SnapStart comparison — Quarkus Warm start median

Micronaut

Non SnapStart		Cold Start (ms)				Warm Start (ms)
memory (MB)	error rate	p50	p90	p99	max	p50	p90	p99	max
128	100%	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
256	0%	3758.9	3912.2	4362.3	4262.7	29.1	46.8	174.1	315.7
512	0%	3391.1	3626	3916.1	3941.4	9.3	18.9	55	151.1
1024	0%	3146.2	3357.5	3680.2	3723.9	3.6	9.6	25.7	82
4096	0%	2517.6	2628.2	2738.1	2881.9	3.2	4.6	12.7	45.9

SnapStart		Cold Start (ms)				Warm Start (ms)
memory (MB)	error rate	p50	p90	p99	max	p50	p90	p99	max
128	100%	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
256	0%	1725.4	1814.6	2257.8	2289.8	29.1	44.2	175.2	231.8
512	0%	677	729.7	798.4	809.1	11.4	20.5	65.8	91
1024	0%	468.6	518.9	626.3	1373.1	3.7	8.6	25.5	97.7
4096	0%	388.8	439.2	562.1	594.3	3.1	4.9	16.7	67.1

Micronaut Cold start median - SnapStart comparison — Micronaut Cold start median

Micronaut Warm start median - SnapStart comparison — Micronaut Warm start median

Spring Boot vs Quarkus vs Micronaut vs Java - cold start charts

Spring Boot, Quarkus, Micronaut, Java - cold start without SnapStart median

Spring Boot, Quarkus, Micronaut, Java - cold start with SnapStart median — Spring Boot, Quarkus, Micronaut, Java - cold start with SnapStart enabled median

General conclusions and observations

only pure Java Lambda can be run with 128 MB configuration and handle traffic. Frameworks fail with java.lang.OutOfMemoryError
Spring Boot framework fails part of requests also with 256 MB configuration. This spoils chart presentation for 256 MB
function package size is the largest for Spring Boot (13,7 MB) and the smallest for Micronaut (11,8 MB) apart from pure Java that size is around 1MB
the biggest difference in performance can be observed when upgrading memory from 256 MB to 512 MB. This applies to all frameworks, cold/warm start
enabling SnapStart brings the greatest benefit for Spring Boot - almost x10 shorter cold start in few configurations. For Quarkus it is average x4 short and for Micronaut +/- x6
for all frameworks with SnapStart enabled cold start comes close to pure Java cold start which is a fantastic result
looking at the median graph, you can see that Quarkus had the shortest cold starts without SnapStart. On the other hand, with SnapStart enabled it performs worst
SnapStart doesn't significantly affect warm start. It's hard to say if it has at all

DEV Community

Cold starts with SnapStart for Java Frameworks (Spring Boot vs Quarkus vs Micronaut)

What is SnapStart?

Spring Boot vs Quarkus vs Micronaut - introduction of competitors

Spring Boot

Quarkus

Micronaut

Measurements and charts

Vanilla Java

Spring Boot

Quarkus

Micronaut

General conclusions and observations

Top comments (0)