With APIs gaining popularity on a daily basis, and with countless ways to host them, the cost of running APIs remains tightly coupled to how you design the architecture of your application.
If cost is a factor you want to get right, then you need to revisit the architecture of your application and understand the trade-offs you are making. One of my favourite cloud architecture approaches is serverless architecture.
There are many serverless or serverless-style services on AWS. Some examples are:
- API Gateway
- Lambda
- DynamoDB
- S3
- SQS
- Fargate
And many more.
Serverless does not mean there are no servers. Come on π. It means there are no servers for you to provision, patch, operate, or babysit. There are obviously servers under the hood. They are just managed by AWS.
In this article, I am going to focus on a real problem I encountered while building my AWS Clean Architecture Starter Kit: latency with Lambda APIs after inactivity.
This is not theory. I tested it, broke it down, got misled by some numbers, corrected the measurements, and eventually got the API below 1 second after inactivity.
ECS
One of the best ways to host your application backend is containerisation. I am not going to go deep into how to deploy to ECS in this article. That is not the point here.
The point is to compare the architectural trade-off between running an API continuously in containers versus running it on demand with Lambda.
ECS supports two common ways to run containers:
- ECS on EC2
- ECS on Fargate
Letβs not argue about EC2. We all know EC2 is not serverless. But what about Fargate?
AWS calls Fargate a serverless compute engine for containers. Fair enough. You do not provision or manage the underlying compute infrastructure. However, from an application architecture point of view, I still do not treat ECS/Fargate as the same kind of serverless experience as Lambda.
Why?
Because with ECS/Fargate, you still manage important parts of the container architecture:
- task definitions
- ECS services
- desired task count
- deployment behaviour
- scaling policies
- container health checks
- networking
- load balancing
That is not a bad thing. It is just a different trade-off.
Applications running in ECS/Fargate generally continue running while your service maintains running tasks. You pay for those running tasks, even if the API is sitting there doing nothing. That may be perfectly acceptable for latency-sensitive APIs, but it is not the same cost model as Lambda.
And that is as much as I want to say about ECS in this article.
Lambda
On the other side, we have Lambda.
Lambda is much closer to the serverless model most people think about. You do not provision servers, you do not manage the runtime host, and you do not manually scale EC2 instances. Lambda handles the execution environment and scales horizontally as requests come in.
In my AWS Clean Architecture Starter Kit, I chose Lambda for the API hosting model. It is easy to deploy, easy to configure, and easy to connect to API Gateway using CDK.
But in architecture, nothing is free.
You always pay somewhere.
With Lambda, one of the things you need to understand properly is cold start latency.
Lambda Cold Starts
One of the important aspects of serverless architecture is how the service behaves when it has not been used for a while.
With Lambda, AWS may reuse an existing execution environment for later invocations. But after a period of inactivity, or when Lambda needs to scale out, there may be no warm execution environment ready for your request. In that case, Lambda has to prepare one.
That extra preparation time is what we call a cold start.
For a normal non-SnapStart Lambda cold start, the rough shape is:
Cold start path = Init Duration + Duration
For a SnapStart-restored Lambda invocation, the rough shape is:
SnapStart restored path = Restore Duration + Duration
Those fields matter. Client-side tools such as Postman or curl are useful, but they do not tell the full Lambda story on their own.
Cold starts also happen when a function is invoked for the first time or when Lambda has to create a new execution environment during scale-out.
This is the part you need to understand as an architect:
Can your API tolerate that first-request latency?
There is no universal answer. It depends on the workload.
For internal tools, admin APIs, low-traffic apps, event-driven workloads, and async workloads, Lambda can be an excellent fit.
For APIs that need consistently low p99 latency, especially after idle periods or during scale-out, you need to be more careful.
In my case, the endpoint was simple. It was a /api/people/{id} endpoint reading a record from DynamoDB. Nothing crazy. No heavy computation. No massive payload.
And yet, after inactivity, I was seeing responses around 2 seconds.
That was not acceptable enough for me to ignore.
So I investigated.
Lambda SnapStart
Lambda SnapStart is designed to improve startup performance by taking a snapshot of the initialised execution environment and restoring from that snapshot later.
In simple terms:
- Lambda initialises your function.
- Lambda takes a snapshot of memory and disk state.
- Later, instead of doing a normal initialisation from scratch, Lambda restores from that snapshot.
That sounds like magic. It is not. It is engineering. And it still has trade-offs.
SnapStart applies to published versions. This part is critical.
If you enable SnapStart but keep invoking $LATEST, you are not testing SnapStart properly. You need to publish a version and invoke that version, usually through an alias such as dev, test, or prod.

Once a new version is published and SnapStart optimisation is complete, SnapStart shows as On for the published version.

But here is the mistake I nearly made:
Enabling SnapStart is not the same as optimising for SnapStart.
That distinction matters.
Measuring the Right Thing
At first, I tested with Postman. That was useful, but it was also misleading.
Postman response time includes more than the backend execution time. It can include:
- socket initialisation
- DNS lookup
- TCP handshake
- TLS handshake
- waiting for first byte
- download time
- client-side processing
So I moved to curl to get cleaner timing breakdowns.
This is the command I used:
curl -s -H "x-api-key: your_api_key" -o /dev/null \
-w "namelookup: %{time_namelookup}s\nconnect: %{time_connect}s\nappconnect: %{time_appconnect}s\npretransfer: %{time_pretransfer}s\nstarttransfer_ttfb: %{time_starttransfer}s\ntotal: %{time_total}s\n" \
"https://apigatewayid.execute-api.eu-west-1.amazonaws.com/dev/api/people/person_id"
The key field was:
curl backend wait = starttransfer_ttfb - pretransfer
That gave me a better view of how long the client waited after the connection was ready.
But even curl was not enough on its own.
I also checked:
- API Gateway Latency
- API Gateway IntegrationLatency
- Lambda Init Duration
- Lambda Restore Duration
- Lambda Duration
- DynamoDB SuccessfulRequestLatency
That last one was important.
DynamoDB was not the problem.
Across the tests, DynamoDB SuccessfulRequestLatency stayed low: around 6β14 ms. The slow part was not the DynamoDB table. The slow part was the first SDK/network/runtime path from the restored Lambda environment.
Prepare your application for SnapStart
This is the part most people will miss.
I enabled SnapStart, published the version, wired the alias correctly, waited 5β6 minutes, tested again, and the response was still poor.
In fact, it was worse in one test.
That was annoying, but it was also useful.
It forced me to stop assuming and start measuring properly.
Memory Was a Bottleneck
My Lambda initially had 1024 MB memory.
That was not enough for this .NET Lambda API path.
I increased memory gradually:
- 1024 MB
- 1800 MB
- 2048 MB
The first jump made a meaningful difference, 1390 ms down from 2230 ms. The jump from 1800 MB to 2048 MB was smaller, but still worth testing. This time slight improvement down to 1200 ms.
This matters because Lambda memory is tied to CPU allocation. More memory gives your function more CPU capacity, which can help with .NET runtime work, JSON serialisation, AWS SDK work, TLS/signing, and framework overhead.
Hello World Warm-Up Was Not Enough
I found out that for .Net applications the JIT compilation and assembly loading time can also be a bottleneck. This was when I learned about runtime hooks, in particular RegisterBeforeSnapshot().
I then tried warming a hello-world endpoint before the snapshot.
That helped a bit, but it did not solve the real problem.
Why?
Because hello-world warmed the wrong path.
My real endpoint was:
/api/people/{id}
That endpoint exercised:
- ASP.NET Core routing
- minimal API path
- handler resolution
- repository logic
- DynamoDB SDK
- response mapping
A hello-world endpoint does not warm all of that.
So I changed the warm-up to target the real endpoint path.
builder.Services.AddAWSLambdaBeforeSnapshotRequest(
new HttpRequestMessage(HttpMethod.Get, "api/people/00000000-0000-0000-0000-000000000001"));
I used a fixed diagnostic ID. Do not use random production data for this. Create a harmless diagnostic record that exists only to warm the path.
Also, do not warm endpoints that write data. That is asking for trouble.
The point is not to run business logic for its own sake. The point is to warm deterministic and reusable startup paths before Lambda creates the snapshot.
Important SnapStart Warning: Uniqueness
SnapStart snapshots initialised state.
That means you must be careful with anything generated during initialisation.
Do not generate these before the snapshot if they must be unique after restore:
- request IDs
- unique runtime IDs
- secrets
- random seeds
- entropy used for security-sensitive randomness
- per-request timestamps
If a value must be unique per request, generate it during the request.
If a value must be unique per restored execution environment, regenerate it after restore.
This is not a theoretical concern. It is part of using SnapStart correctly.
The Tests
I tested the same endpoint:
GET /api/people/{id}
I used three scenarios:
- No SnapStart, no warm-up
- SnapStart enabled, no warm-up
- SnapStart enabled, with real-path warm-up
All timings below are normalised to milliseconds.
Test #1 β No SnapStart, No Warm-Up
This test gave me the baseline for a normal Lambda cold start without SnapStart.
| Metric | Value |
|---|---|
| Endpoint | /api/people/{id} |
| Mode | No SnapStart, no warm-up |
| API Gateway IntegrationLatency | 1300 ms |
| API Gateway Latency | 1300 ms |
| API Gateway Overhead | 0 ms |
| Lambda Init Duration | 478.54 ms |
| Lambda Restore Duration | N/A |
| CloudWatch REPORT Duration | 646.94 ms |
| CloudWatch Lambda Duration Metric | 632.9 ms |
| DynamoDB SuccessfulRequestLatency | 13.6 ms |
| curl pretransfer | 119.031 ms |
| curl TTFB | 1469.751 ms |
| curl total | 1467 ms |
The important number here is the Lambda-side total:
Init Duration + REPORT Duration = 1125.48 ms
So the backend cold path was already above 1 second before adding client-side/network overhead.
Test #2 β SnapStart Enabled, No Warm-Up
This was the annoying one.
SnapStart was enabled. The function was published. The alias was pointing to the published version. CloudWatch showed Restore Duration.
So SnapStart was working.
But the API was not faster.
| Metric | Value |
|---|---|
| Endpoint | /api/people/{id} |
| Mode | SnapStart enabled, no warm-up |
| API Gateway IntegrationLatency | 1700 ms |
| API Gateway Latency | 1700 ms |
| API Gateway Overhead | 0 ms |
| Lambda Init Duration | N/A |
| Lambda Restore Duration | 579.74 ms |
| CloudWatch REPORT Duration | 993.37 ms |
| CloudWatch Lambda Duration Metric | 993 ms |
| DynamoDB SuccessfulRequestLatency | 11 ms |
| curl pretransfer | 101 ms |
| curl TTFB | 1829 ms |
| curl total | 1830 ms |
The key calculation:
Restore Duration + REPORT Duration = 1,573.11 ms
This was worse than the normal non-SnapStart cold start.
That is the part people need to understand.
SnapStart was technically working, but the application was not prepared properly for SnapStart. The first restored request still paid expensive first-use costs.
Test #3 β SnapStart Enabled, Real-Path Warm-Up
This is where the result finally became useful.
I kept SnapStart enabled, but added warm-up for the actual /api/people/{id} path before snapshot creation.
| Metric | Value |
|---|---|
| Endpoint | /api/people/{id} |
| Mode | SnapStart enabled, real-path warm-up |
| API Gateway IntegrationLatency | 733 ms |
| API Gateway Latency | 733 ms |
| API Gateway Overhead | 0 ms |
| Lambda Init Duration | N/A |
| Lambda Restore Duration | 452.75 ms |
| CloudWatch REPORT Duration | 169.07 ms |
| CloudWatch Lambda Duration Metric | 169 ms |
| DynamoDB SuccessfulRequestLatency | 6.6 ms |
| curl pretransfer | 135 ms |
| curl TTFB | 896 ms |
| curl total | 896.8 ms |
The key calculation:
Restore Duration + REPORT Duration = 621.82 ms
This was the result I was looking for.
The API response after inactivity dropped below 1 second from curl.
All tests aggregated
Here is the full comparison - a bit simplified to help with readability.
| Mode | API Gateway Latency | Init / Restore Duration | REPORT Duration | Lambda Duration Metric | curl pretransfer | curl TTFB | curl total |
|---|---|---|---|---|---|---|---|
| No SnapStart, no warm-up | 1300 ms | Init: 478.54 ms | 646.94 ms | 633 ms | 119 ms | 1469 ms | 1467 ms |
| SnapStart, no warm-up | 1700 ms | Restore: 579.74 ms | 993.37 ms | 993 ms | 101 ms | 1829 ms | 1830 ms |
| SnapStart, real-path warm-up | 733 ms | Restore: 452.75 ms | 169.07 ms | 169 ms | 135 ms | 896 ms | 896.8 ms |
The numbers are gathered from CloudWatch logs and API calls results via curl (last three).
Lessons learned
The biggest lesson was this:
SnapStart alone was not enough.
In my test, SnapStart without warm-up was worse than the normal non-SnapStart cold start.
That does not mean SnapStart is bad. It means I had not prepared the application properly.
The meaningful improvement came when I warmed the actual request path before the snapshot.
The latency was mainly coming from the Lambda cold/restore path, first-use application code, SDK/network setup, and framework/runtime costs.
Conclusion
Using Lambda to host APIs is exciting, cost-effective, and massively scalable. But it can be a poor choice if you use it blindly and ignore the cold start behaviour.
The lazy take is:
Lambda is bad for latency-sensitive APIs because after idle it can take 2 seconds.
That statement is incomplete.
In my opinion and after all my investigations:
Lambda can be poor for latency-sensitive APIs if you do not understand and optimise the cold/restore path.
In my case, the API did show around 2 seconds after inactivity. But the cause was not simply βLambda is slowβ.
The real causes were:
- cold/restore startup cost
- low memory allocation
- first-use .NET/runtime costs
- first-use AWS SDK/network path
- warming the wrong endpoint path
- measuring with client-side tools without separating backend time properly
After enabling SnapStart correctly, increasing memory, and warming the real /api/people/{id} path before snapshot creation, I got the response to sub-second after inactivity.
That is the real architectural takeaway.
Lambda is not the answer for every API. ECS/Fargate is still a very strong option, especially when you need consistently low latency and always-running services.
But Lambda is also not weak just because cold starts exist.
The truth is more boring and more useful:
You need to understand the trade-off, measure properly, and optimise the actual path your users hit.
No magic. No marketing fluff. Just architecture.





Top comments (0)