If you've ever noticed occasional spikes in your Lambda function's response time — especially after a period of inactivity — you've already met the cold start.
Cold starts are one of the most discussed pain points in serverless architectures. In this article, we'll break down exactly what happens during a cold start, what makes it worse, and — most importantly — how to fix it in production.
Warm Start vs Cold Start
When a Lambda function is invoked, AWS needs to find an execution environment to run your code. Two scenarios can happen:
Warm Start: An idle execution environment already exists from a previous invocation. AWS reuses it directly. Your code runs immediately. This is fast — typically under 10ms overhead.
Cold Start: No idle environment is available. AWS must provision a brand new execution environment from scratch before your code can run. This takes time — anywhere from 100ms to several seconds depending on your configuration.
Invocation arrives
│
▼
Idle environment available?
│
YES │ NO
│ │
▼ ▼
Warm Start Cold Start
(reuse env) (provision new env)
│ │
└────────┬───────────┘
▼
Execute handler
Cold starts are triggered in three main situations:
- First invocation: The function has never been called, or all previous environments have been recycled.
- Scaling out: Concurrent requests exceed the number of available warm environments, forcing Lambda to spin up new ones.
- After idle timeout: AWS recycles environments that haven't been used for a period of time (typically 5–15 minutes, though this is not officially documented).
The 6 Internal Phases of a Cold Start
A Lambda cold start is not a single operation — it's a pipeline of 6 sequential phases. Understanding each phase tells you exactly where time is being spent.
Phase 1: Execution Environment Provisioning
AWS Lambda runs on Firecracker microVMs. When a cold start is triggered, AWS must allocate a new microVM slot from its fleet, apply your function's resource configuration (memory, CPU, timeout), and prepare the isolated sandbox.
Typical duration: 50–200ms
AWS optimization: Lambda maintains a pool of pre-initialized Firecracker slots to reduce this overhead.
Phase 2: Code & Layer Download
Lambda does not keep your deployment package permanently mounted. During a cold start, it downloads your code package and any attached Lambda Layers from S3 into the execution environment.
Typical duration: 10ms–2s (depends on package size)
Key insight: A 50MB zipped package takes significantly longer to download and extract than a 1MB package. This is one of the most controllable factors.
Phase 3: Environment Variables & Config Injection
Lambda injects your configured environment variables, function metadata, and any AWS-managed credentials (via IAM role) into the execution environment.
Typical duration: ~10ms
Note: Secrets Manager or Parameter Store lookups in your init code happen here — and they add latency.
Phase 4: VPC Network Attachment (if applicable)
If your function is configured to run inside a VPC, AWS must attach an Elastic Network Interface (ENI) to your execution environment. This involves:
- Allocating an ENI from the Hyperplane ENI pool
- Configuring routing and security group rules
- Establishing connectivity to your VPC subnets
Typical duration: historically 10–30s (the old ENI model), now under 1s thanks to AWS Hyperplane (launched 2019)
Key insight: Avoid VPC unless you actually need to access private resources. Many teams add VPC "just in case" and pay the cold start penalty unnecessarily.
Phase 5: Runtime Initialization
The language runtime starts up inside the execution environment. This is where the JVM boots for Java, the Node.js event loop initializes, or the Python interpreter loads.
Typical duration by runtime:
| Runtime | Typical Cold Start Overhead |
|---|---|
| Python 3.12 | ~50ms |
| Node.js 20.x | ~50ms |
| Java 21 (standard) | ~500ms–1s |
| Java 21 (SnapStart) | ~100ms |
| Go 1.x | ~20ms |
| .NET 8 | ~200ms |
Phase 6: User Code Initialization
This is the code outside your handler function — the "init" phase. Everything at module level runs here:
import boto3
import json
# This runs ONCE during cold start (init phase)
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('brands')
ssm = boto3.client('ssm')
config = ssm.get_parameter(Name='/app/config') # ← adds cold start latency!
def handler(event, context):
# This runs on EVERY invocation (warm or cold)
result = table.get_item(Key={'id': event['brandId']})
return result['Item']
Key insight: Initializing clients outside the handler is good practice (they get reused on warm starts). But avoid heavy operations like fetching large configs or establishing multiple DB connections — these directly add to your cold start time.
What Makes Cold Starts Worse
| Factor | Impact | Notes |
|---|---|---|
| Large deployment package | 🔴 High | Every MB adds download + extract time |
| Many Lambda Layers | 🟠 Medium | Each layer is downloaded separately |
| VPC attachment | 🟠 Medium | Now mitigated by Hyperplane, but still adds ~100–500ms |
| Java / .NET runtime | 🔴 High | JVM startup is inherently slow |
| Heavy init code | 🔴 High | SDK clients, DB connections, config fetches |
| Low memory allocation | 🟡 Low-Medium | More memory = more CPU = faster init |
| Infrequent invocations | 🔴 High | Environments get recycled, every call is cold |
6 Strategies to Reduce Cold Start Latency
Strategy 1: Provisioned Concurrency (The Nuclear Option)
Provisioned Concurrency keeps a specified number of execution environments initialized and ready at all times. These environments never experience a cold start.
# serverless.yml
functions:
brandLookup:
handler: handler.main
provisionedConcurrency: 5 # 5 environments always warm
events:
- http:
path: /brand/{id}
method: get
When to use: Latency-sensitive APIs (e.g., your brand lookup endpoint needs to respond in <50ms P99).
Cost: You pay for provisioned concurrency even when idle. Use Application Auto Scaling to schedule it only during peak hours.
# Auto Scaling Provisioned Concurrency with boto3
import boto3
client = boto3.client('application-autoscaling')
client.put_scheduled_action(
ServiceNamespace='lambda',
ResourceId='function:brandLookup:prod',
ScheduledActionName='scale-up-business-hours',
Schedule='cron(0 8 * * ? *)', # 8 AM UTC daily
ScalableTargetAction={
'MinCapacity': 10,
'MaxCapacity': 10
}
)
Strategy 2: Scheduled Warm-Up with EventBridge (The Budget Option)
If Provisioned Concurrency is too expensive, you can use EventBridge to ping your function every few minutes, keeping environments alive.
# handler.py — handle warm-up pings gracefully
def handler(event, context):
if event.get('source') == 'warmup':
print('Warm-up ping received, skipping business logic')
return {'statusCode': 200, 'body': 'warm'}
return process_brand_request(event)
# serverless.yml
functions:
brandLookup:
handler: handler.main
events:
- schedule:
rate: rate(5 minutes)
input:
source: warmup
Limitation: This only keeps a small number of environments warm. Under sudden traffic spikes, you'll still see cold starts on the new instances that scale out.
Strategy 3: Minimize Your Deployment Package
This is the highest ROI optimization — it costs nothing and directly reduces Phase 2 duration.
# Bad: shipping everything
pip install -r requirements.txt -t ./package
# Result: 45MB package including boto3, botocore, etc.
# Good: boto3 and botocore are pre-installed in the Lambda runtime
pip install -r requirements.txt -t ./package --no-deps
# serverless.yml — exclude unnecessary files
package:
patterns:
- '!node_modules/aws-sdk/**'
- '!**/__pycache__/**'
- '!**/*.pyc'
- '!tests/**'
- '!*.md'
Target: Keep your zipped package under 5MB for Python/Node.js. Under 1MB is ideal.
Strategy 4: Avoid Unnecessary VPC Configuration
Only put your Lambda in a VPC if it needs to access genuinely VPC-private resources.
Do you need to access:
├── DynamoDB? → NO VPC needed
├── S3? → NO VPC needed
├── SQS/SNS? → NO VPC needed
├── RDS/Aurora? → YES, VPC required
├── ElastiCache? → YES, VPC required
└── Internal ALB? → YES, VPC required
DynamoDB and S3 — the primary data stores in most Serverless architectures — do not require VPC. Use VPC Endpoints if you need private connectivity without the cold start penalty.
Strategy 5: Choose the Right Runtime (and Use SnapStart for Java)
AWS Lambda SnapStart (Java 21) takes a snapshot of the initialized execution environment and restores from it on cold start — reducing Java cold starts from ~1s to ~100ms.
# serverless.yml
functions:
brandProcessor:
handler: com.brandfetch.Handler
runtime: java21
snapStart: true
For new functions, Python and Node.js are the best choices for cold-start-sensitive workloads.
Strategy 6: Lazy Load Heavy Dependencies
Don't initialize everything at module load time. Defer expensive operations until they're actually needed.
import boto3
import os
_rekognition_client = None
_opensearch_client = None
def get_rekognition():
global _rekognition_client
if _rekognition_client is None:
_rekognition_client = boto3.client('rekognition')
return _rekognition_client
def get_opensearch():
global _opensearch_client
if _opensearch_client is None:
from opensearchpy import OpenSearch
_opensearch_client = OpenSearch(
hosts=[{'host': os.environ['OPENSEARCH_HOST'], 'port': 443}],
use_ssl=True
)
return _opensearch_client
def handler(event, context):
if event['type'] == 'image_analysis':
return analyze_logo(get_rekognition(), event['imageUrl'])
if event['type'] == 'vector_search':
return search_similar_logos(get_opensearch(), event['embedding'])
Real-World Cold Start Numbers
Based on production measurements and community benchmarks (2024):
| Runtime | Package Size | VPC | P50 Cold Start | P99 Cold Start |
|---|---|---|---|---|
| Python 3.12 | 1MB | No | 180ms | 420ms |
| Python 3.12 | 50MB | No | 890ms | 1,800ms |
| Python 3.12 | 1MB | Yes | 280ms | 650ms |
| Node.js 20.x | 1MB | No | 150ms | 380ms |
| Java 21 | 10MB | No | 800ms | 1,500ms |
| Java 21 (SnapStart) | 10MB | No | 90ms | 200ms |
| Go 1.x | 5MB | No | 80ms | 180ms |
Key takeaway: Package size has a bigger impact than most engineers expect. A Python function with a 50MB package has a P50 cold start 5x worse than the same function with a 1MB package.
Decision Framework: Do You Actually Need to Optimize?
Is cold start latency causing user-visible issues?
│
├── NO → Don't optimize. Cold starts are <1% of invocations
│ for most functions with moderate traffic.
│
└── YES → What's your traffic pattern?
│
├── Steady traffic (>1 req/min)
│ → Environments stay warm naturally.
│ Focus on package size + init code.
│
├── Bursty traffic (sudden spikes)
│ → Provisioned Concurrency + Auto Scaling
│
└── Infrequent / scheduled jobs
→ EventBridge warm-up, or just accept it
Summary
| Phase | Your Control | Best Action |
|---|---|---|
| Environment provisioning | ❌ None | Use Provisioned Concurrency |
| Code download | ✅ High | Minimize package size |
| Config injection | ✅ Medium | Avoid SSM calls in init |
| VPC attachment | ✅ High | Avoid VPC unless necessary |
| Runtime init | ✅ Medium | Choose Python/Node/Go; use SnapStart for Java |
| User code init | ✅ High | Lazy load, keep init lightweight |
Cold starts are a real challenge in serverless production systems — but they're also highly addressable. The engineers who struggle most with cold starts are usually the ones who haven't measured which phase is actually costing them time.
Measure first. Optimize the right thing.
Next in this series: **Part 2 — Lambda Triggers Deep Dive: S3, EventBridge, API Gateway & SQS**
Top comments (0)