Saeed Ghanavat

Posted on Jun 29 • Originally published at ghanavats.tech on Jun 26

Lambda Cold Starts Are Not the Whole Story: How I Took a .NET API from ~2s to <1s with SnapStart

#net #awslambda #ecs #lambdasnapstart

With APIs gaining popularity on a daily basis, and with countless ways to host them, the cost of running APIs remains tightly coupled to how you design the architecture of your application.

If cost is a factor you want to get right, then you need to revisit the architecture of your application and understand the trade-offs you are making. One of my favourite cloud architecture approaches is serverless architecture.

There are many serverless or serverless-style services on AWS. Some examples are:

API Gateway
Lambda
DynamoDB
S3
SQS
Fargate

And many more.

Serverless does not mean there are no servers. Come on 😁. It means there are no servers for you to provision, patch, operate, or babysit. There are obviously servers under the hood. They are just managed by AWS.

In this article, I am going to focus on a real problem I encountered while building my AWS Clean Architecture Starter Kit: latency with Lambda APIs after inactivity.

This is not theory. I tested it, broke it down, got misled by some numbers, corrected the measurements, and eventually got the API below 1 second after inactivity.

ECS

One of the best ways to host your application backend is containerisation. I am not going to go deep into how to deploy to ECS in this article. That is not the point here.

The point is to compare the architectural trade-off between running an API continuously in containers versus running it on demand with Lambda.

ECS supports two common ways to run containers:

ECS on EC2
ECS on Fargate

Let’s not argue about EC2. We all know EC2 is not serverless. But what about Fargate?

AWS calls Fargate a serverless compute engine for containers. Fair enough. You do not provision or manage the underlying compute infrastructure. However, from an application architecture point of view, I still do not treat ECS/Fargate as the same kind of serverless experience as Lambda.

Why?

Because with ECS/Fargate, you still manage important parts of the container architecture:

task definitions
ECS services
desired task count
deployment behaviour
scaling policies
container health checks
networking
load balancing

That is not a bad thing. It is just a different trade-off.

Applications running in ECS/Fargate generally continue running while your service maintains running tasks. You pay for those running tasks, even if the API is sitting there doing nothing. That may be perfectly acceptable for latency-sensitive APIs, but it is not the same cost model as Lambda.

And that is as much as I want to say about ECS in this article.

Lambda

On the other side, we have Lambda.

Lambda is much closer to the serverless model most people think about. You do not provision servers, you do not manage the runtime host, and you do not manually scale EC2 instances. Lambda handles the execution environment and scales horizontally as requests come in.

In my AWS Clean Architecture Starter Kit, I chose Lambda for the API hosting model. It is easy to deploy, easy to configure, and easy to connect to API Gateway using CDK.

But in architecture, nothing is free.

You always pay somewhere.

With Lambda, one of the things you need to understand properly is cold start latency.

Lambda Cold Starts

One of the important aspects of serverless architecture is how the service behaves when it has not been used for a while.

With Lambda, AWS may reuse an existing execution environment for later invocations. But after a period of inactivity, or when Lambda needs to scale out, there may be no warm execution environment ready for your request. In that case, Lambda has to prepare one.

That extra preparation time is what we call a cold start.

For a normal non-SnapStart Lambda cold start, the rough shape is:

Cold start path = Init Duration + Duration

For a SnapStart-restored Lambda invocation, the rough shape is:

SnapStart restored path = Restore Duration + Duration

Those fields matter. Client-side tools such as Postman or curl are useful, but they do not tell the full Lambda story on their own.

Ghanavats - Lambda vs ECS - Response from Lambda took almost 2 seconds during cold start. — Response time in Cold Start

Cold starts also happen when a function is invoked for the first time or when Lambda has to create a new execution environment during scale-out.

This is the part you need to understand as an architect:

Can your API tolerate that first-request latency?

There is no universal answer. It depends on the workload.

For internal tools, admin APIs, low-traffic apps, event-driven workloads, and async workloads, Lambda can be an excellent fit.

For APIs that need consistently low p99 latency, especially after idle periods or during scale-out, you need to be more careful.

In my case, the endpoint was simple. It was a /api/people/{id} endpoint reading a record from DynamoDB. Nothing crazy. No heavy computation. No massive payload.

And yet, after inactivity, I was seeing responses around 2 seconds.

That was not acceptable enough for me to ignore.

So I investigated.

Lambda SnapStart

Lambda SnapStart is designed to improve startup performance by taking a snapshot of the initialised execution environment and restoring from that snapshot later.

In simple terms:

Lambda initialises your function.
Lambda takes a snapshot of memory and disk state.
Later, instead of doing a normal initialisation from scratch, Lambda restores from that snapshot.

That sounds like magic. It is not. It is engineering. And it still has trade-offs.

SnapStart applies to published versions. This part is critical.

If you enable SnapStart but keep invoking $LATEST, you are not testing SnapStart properly. You need to publish a version and invoke that version, usually through an alias such as dev, test, or prod.

Ghanavats - Lambda vs ECS - SnapStart is enabled and will be available on next published version — SnapStart is enabled and will be on when next version published

Once a new version is published and SnapStart optimisation is complete, SnapStart shows as On for the published version.

Ghanavats Tech - Lambda vs ECS - SnapStart fully enabled — SnapStart is on

But here is the mistake I nearly made:

Enabling SnapStart is not the same as optimising for SnapStart.

That distinction matters.

Measuring the Right Thing

At first, I tested with Postman. That was useful, but it was also misleading.

Postman response time includes more than the backend execution time. It can include:

socket initialisation
DNS lookup
TCP handshake
TLS handshake
waiting for first byte
download time
client-side processing

So I moved to curl to get cleaner timing breakdowns.

This is the command I used:

curl -s -H "x-api-key: your_api_key" -o /dev/null \
  -w "namelookup: %{time_namelookup}s\nconnect: %{time_connect}s\nappconnect: %{time_appconnect}s\npretransfer: %{time_pretransfer}s\nstarttransfer_ttfb: %{time_starttransfer}s\ntotal: %{time_total}s\n" \
  "https://apigatewayid.execute-api.eu-west-1.amazonaws.com/dev/api/people/person_id"

The key field was:

curl backend wait = starttransfer_ttfb - pretransfer

That gave me a better view of how long the client waited after the connection was ready.

But even curl was not enough on its own.

I also checked:

API Gateway Latency
API Gateway IntegrationLatency
Lambda Init Duration
Lambda Restore Duration
Lambda Duration
DynamoDB SuccessfulRequestLatency

That last one was important.

DynamoDB was not the problem.

Across the tests, DynamoDB SuccessfulRequestLatency stayed low: around 6–14 ms. The slow part was not the DynamoDB table. The slow part was the first SDK/network/runtime path from the restored Lambda environment.

Prepare your application for SnapStart

This is the part most people will miss.

I enabled SnapStart, published the version, wired the alias correctly, waited 5–6 minutes, tested again, and the response was still poor.

In fact, it was worse in one test.

That was annoying, but it was also useful.

It forced me to stop assuming and start measuring properly.

Memory Was a Bottleneck

My Lambda initially had 1024 MB memory.

That was not enough for this .NET Lambda API path.

I increased memory gradually:

1024 MB
1800 MB
2048 MB

The first jump made a meaningful difference, 1390 ms down from 2230 ms. The jump from 1800 MB to 2048 MB was smaller, but still worth testing. This time slight improvement down to 1200 ms.

This matters because Lambda memory is tied to CPU allocation. More memory gives your function more CPU capacity, which can help with .NET runtime work, JSON serialisation, AWS SDK work, TLS/signing, and framework overhead.

Hello World Warm-Up Was Not Enough

I found out that for .Net applications the JIT compilation and assembly loading time can also be a bottleneck. This was when I learned about runtime hooks, in particular RegisterBeforeSnapshot().

I then tried warming a hello-world endpoint before the snapshot.

That helped a bit, but it did not solve the real problem.

Why?

Because hello-world warmed the wrong path.

My real endpoint was:

/api/people/{id}

That endpoint exercised:

ASP.NET Core routing
minimal API path
handler resolution
repository logic
DynamoDB SDK
response mapping

A hello-world endpoint does not warm all of that.

So I changed the warm-up to target the real endpoint path.

builder.Services.AddAWSLambdaBeforeSnapshotRequest(
    new HttpRequestMessage(HttpMethod.Get, "api/people/00000000-0000-0000-0000-000000000001"));

I used a fixed diagnostic ID. Do not use random production data for this. Create a harmless diagnostic record that exists only to warm the path.

Also, do not warm endpoints that write data. That is asking for trouble.

The point is not to run business logic for its own sake. The point is to warm deterministic and reusable startup paths before Lambda creates the snapshot.

Important SnapStart Warning: Uniqueness

SnapStart snapshots initialised state.

That means you must be careful with anything generated during initialisation.

Do not generate these before the snapshot if they must be unique after restore:

request IDs
unique runtime IDs
secrets
random seeds
entropy used for security-sensitive randomness
per-request timestamps

If a value must be unique per request, generate it during the request.

If a value must be unique per restored execution environment, regenerate it after restore.

This is not a theoretical concern. It is part of using SnapStart correctly.

The Tests

I tested the same endpoint:

GET /api/people/{id}

I used three scenarios:

No SnapStart, no warm-up
SnapStart enabled, no warm-up
SnapStart enabled, with real-path warm-up

All timings below are normalised to milliseconds.

Test #1 — No SnapStart, No Warm-Up

This test gave me the baseline for a normal Lambda cold start without SnapStart.

Metric	Value
Endpoint	`/api/people/{id}`
Mode	No SnapStart, no warm-up
API Gateway IntegrationLatency	1300 ms
API Gateway Latency	1300 ms
API Gateway Overhead	0 ms
Lambda Init Duration	478.54 ms
Lambda Restore Duration	N/A
CloudWatch REPORT Duration	646.94 ms
CloudWatch Lambda Duration Metric	632.9 ms
DynamoDB SuccessfulRequestLatency	13.6 ms
curl pretransfer	119.031 ms
curl TTFB	1469.751 ms
curl total	1467 ms

The important number here is the Lambda-side total:

Init Duration + REPORT Duration = 1125.48 ms

So the backend cold path was already above 1 second before adding client-side/network overhead.

Ghanavats - Lambda vs ECS - Log result with no SnapStart and no warm-up — No SnapStart enabled and no warmup mechanism configured

Test #2 — SnapStart Enabled, No Warm-Up

This was the annoying one.

SnapStart was enabled. The function was published. The alias was pointing to the published version. CloudWatch showed Restore Duration.

So SnapStart was working.

But the API was not faster.

Metric	Value
Endpoint	`/api/people/{id}`
Mode	SnapStart enabled, no warm-up
API Gateway IntegrationLatency	1700 ms
API Gateway Latency	1700 ms
API Gateway Overhead	0 ms
Lambda Init Duration	N/A
Lambda Restore Duration	579.74 ms
CloudWatch REPORT Duration	993.37 ms
CloudWatch Lambda Duration Metric	993 ms
DynamoDB SuccessfulRequestLatency	11 ms
curl pretransfer	101 ms
curl TTFB	1829 ms
curl total	1830 ms

The key calculation:

Restore Duration + REPORT Duration = 1,573.11 ms

This was worse than the normal non-SnapStart cold start.

That is the part people need to understand.

SnapStart was technically working, but the application was not prepared properly for SnapStart. The first restored request still paid expensive first-use costs.

Ghanavats - Lambda vs ECS - Log result with SnapStart enabled and no warm-up — SnapStart enabled and no warmup mechanism configured

Test #3 — SnapStart Enabled, Real-Path Warm-Up

This is where the result finally became useful.

I kept SnapStart enabled, but added warm-up for the actual /api/people/{id} path before snapshot creation.

Metric	Value
Endpoint	`/api/people/{id}`
Mode	SnapStart enabled, real-path warm-up
API Gateway IntegrationLatency	733 ms
API Gateway Latency	733 ms
API Gateway Overhead	0 ms
Lambda Init Duration	N/A
Lambda Restore Duration	452.75 ms
CloudWatch REPORT Duration	169.07 ms
CloudWatch Lambda Duration Metric	169 ms
DynamoDB SuccessfulRequestLatency	6.6 ms
curl pretransfer	135 ms
curl TTFB	896 ms
curl total	896.8 ms

The key calculation:

Restore Duration + REPORT Duration = 621.82 ms

This was the result I was looking for.

The API response after inactivity dropped below 1 second from curl.

Ghanavats - Lambda vs ECS - Log result with SnapStart enabled and with warm-up mechanism configured — SnapStart enabled and with warmup mechanism configured

All tests aggregated

Here is the full comparison - a bit simplified to help with readability.

Mode	API Gateway Latency	Init / Restore Duration	REPORT Duration	Lambda Duration Metric	curl pretransfer	curl TTFB	curl total
No SnapStart, no warm-up	1300 ms	Init: 478.54 ms	646.94 ms	633 ms	119 ms	1469 ms	1467 ms
SnapStart, no warm-up	1700 ms	Restore: 579.74 ms	993.37 ms	993 ms	101 ms	1829 ms	1830 ms
SnapStart, real-path warm-up	733 ms	Restore: 452.75 ms	169.07 ms	169 ms	135 ms	896 ms	896.8 ms

The numbers are gathered from CloudWatch logs and API calls results via curl (last three).

Lessons learned

The biggest lesson was this:

SnapStart alone was not enough.

In my test, SnapStart without warm-up was worse than the normal non-SnapStart cold start.

That does not mean SnapStart is bad. It means I had not prepared the application properly.

The meaningful improvement came when I warmed the actual request path before the snapshot.

The latency was mainly coming from the Lambda cold/restore path, first-use application code, SDK/network setup, and framework/runtime costs.

Conclusion

Using Lambda to host APIs is exciting, cost-effective, and massively scalable. But it can be a poor choice if you use it blindly and ignore the cold start behaviour.

The lazy take is:

Lambda is bad for latency-sensitive APIs because after idle it can take 2 seconds.

That statement is incomplete.

In my opinion and after all my investigations:

Lambda can be poor for latency-sensitive APIs if you do not understand and optimise the cold/restore path.

In my case, the API did show around 2 seconds after inactivity. But the cause was not simply “Lambda is slow”.

The real causes were:

cold/restore startup cost
low memory allocation
first-use .NET/runtime costs
first-use AWS SDK/network path
warming the wrong endpoint path
measuring with client-side tools without separating backend time properly

After enabling SnapStart correctly, increasing memory, and warming the real /api/people/{id} path before snapshot creation, I got the response to sub-second after inactivity.

That is the real architectural takeaway.

Lambda is not the answer for every API. ECS/Fargate is still a very strong option, especially when you need consistently low latency and always-running services.

But Lambda is also not weak just because cold starts exist.

The truth is more boring and more useful:

You need to understand the trade-off, measure properly, and optimise the actual path your users hit.

No magic. No marketing fluff. Just architecture.

DEV Community