James Lee

Posted on May 26

Cold Start in AWS Lambda: Causes, Phases & How to Fix It

#architecture #aws #performance #serverless

If you've ever noticed occasional spikes in your Lambda function's response time — especially after a period of inactivity — you've already met the cold start.

Cold starts are one of the most discussed pain points in serverless architectures. In this article, we'll break down exactly what happens during a cold start, what makes it worse, and — most importantly — how to fix it in production.

Warm Start vs Cold Start

When a Lambda function is invoked, AWS needs to find an execution environment to run your code. Two scenarios can happen:

Warm Start: An idle execution environment already exists from a previous invocation. AWS reuses it directly. Your code runs immediately. This is fast — typically under 10ms overhead.

Cold Start: No idle environment is available. AWS must provision a brand new execution environment from scratch before your code can run. This takes time — anywhere from 100ms to several seconds depending on your configuration.

Invocation arrives
       │
       ▼
Idle environment available?
       │
   YES │                    NO
       │                    │
       ▼                    ▼
  Warm Start           Cold Start
  (reuse env)       (provision new env)
       │                    │
       └────────┬───────────┘
                ▼
         Execute handler

Cold starts are triggered in three main situations:

First invocation: The function has never been called, or all previous environments have been recycled.
Scaling out: Concurrent requests exceed the number of available warm environments, forcing Lambda to spin up new ones.
After idle timeout: AWS recycles environments that haven't been used for a period of time (typically 5–15 minutes, though this is not officially documented).

The 6 Internal Phases of a Cold Start

A Lambda cold start is not a single operation — it's a pipeline of 6 sequential phases. Understanding each phase tells you exactly where time is being spent.

Phase 1: Execution Environment Provisioning

AWS Lambda runs on Firecracker microVMs. When a cold start is triggered, AWS must allocate a new microVM slot from its fleet, apply your function's resource configuration (memory, CPU, timeout), and prepare the isolated sandbox.

Typical duration: 50–200ms

AWS optimization: Lambda maintains a pool of pre-initialized Firecracker slots to reduce this overhead.

Phase 2: Code & Layer Download

Lambda does not keep your deployment package permanently mounted. During a cold start, it downloads your code package and any attached Lambda Layers from S3 into the execution environment.

Typical duration: 10ms–2s (depends on package size)

Key insight: A 50MB zipped package takes significantly longer to download and extract than a 1MB package. This is one of the most controllable factors.

Phase 3: Environment Variables & Config Injection

Lambda injects your configured environment variables, function metadata, and any AWS-managed credentials (via IAM role) into the execution environment.

Typical duration: ~10ms

Note: Secrets Manager or Parameter Store lookups in your init code happen here — and they add latency.

Phase 4: VPC Network Attachment (if applicable)

If your function is configured to run inside a VPC, AWS must attach an Elastic Network Interface (ENI) to your execution environment. This involves:

Allocating an ENI from the Hyperplane ENI pool
Configuring routing and security group rules
Establishing connectivity to your VPC subnets

Typical duration: historically 10–30s (the old ENI model), now under 1s thanks to AWS Hyperplane (launched 2019)

Key insight: Avoid VPC unless you actually need to access private resources. Many teams add VPC "just in case" and pay the cold start penalty unnecessarily.

Phase 5: Runtime Initialization

The language runtime starts up inside the execution environment. This is where the JVM boots for Java, the Node.js event loop initializes, or the Python interpreter loads.

Typical duration by runtime:

Runtime	Typical Cold Start Overhead
Python 3.12	~50ms
Node.js 20.x	~50ms
Java 21 (standard)	~500ms–1s
Java 21 (SnapStart)	~100ms
Go 1.x	~20ms
.NET 8	~200ms

Phase 6: User Code Initialization

This is the code outside your handler function — the "init" phase. Everything at module level runs here:

import boto3
import json

# This runs ONCE during cold start (init phase)
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('brands')
ssm = boto3.client('ssm')
config = ssm.get_parameter(Name='/app/config')  # ← adds cold start latency!

def handler(event, context):
    # This runs on EVERY invocation (warm or cold)
    result = table.get_item(Key={'id': event['brandId']})
    return result['Item']

Key insight: Initializing clients outside the handler is good practice (they get reused on warm starts). But avoid heavy operations like fetching large configs or establishing multiple DB connections — these directly add to your cold start time.

What Makes Cold Starts Worse

Factor	Impact	Notes
Large deployment package	🔴 High	Every MB adds download + extract time
Many Lambda Layers	🟠 Medium	Each layer is downloaded separately
VPC attachment	🟠 Medium	Now mitigated by Hyperplane, but still adds ~100–500ms
Java / .NET runtime	🔴 High	JVM startup is inherently slow
Heavy init code	🔴 High	SDK clients, DB connections, config fetches
Low memory allocation	🟡 Low-Medium	More memory = more CPU = faster init
Infrequent invocations	🔴 High	Environments get recycled, every call is cold

6 Strategies to Reduce Cold Start Latency

Strategy 1: Provisioned Concurrency (The Nuclear Option)

Provisioned Concurrency keeps a specified number of execution environments initialized and ready at all times. These environments never experience a cold start.

# serverless.yml
functions:
  brandLookup:
    handler: handler.main
    provisionedConcurrency: 5  # 5 environments always warm
    events:
      - http:
          path: /brand/{id}
          method: get

When to use: Latency-sensitive APIs (e.g., your brand lookup endpoint needs to respond in <50ms P99).

Cost: You pay for provisioned concurrency even when idle. Use Application Auto Scaling to schedule it only during peak hours.

# Auto Scaling Provisioned Concurrency with boto3
import boto3

client = boto3.client('application-autoscaling')

client.put_scheduled_action(
    ServiceNamespace='lambda',
    ResourceId='function:brandLookup:prod',
    ScheduledActionName='scale-up-business-hours',
    Schedule='cron(0 8 * * ? *)',  # 8 AM UTC daily
    ScalableTargetAction={
        'MinCapacity': 10,
        'MaxCapacity': 10
    }
)

Strategy 2: Scheduled Warm-Up with EventBridge (The Budget Option)

If Provisioned Concurrency is too expensive, you can use EventBridge to ping your function every few minutes, keeping environments alive.

# handler.py — handle warm-up pings gracefully
def handler(event, context):
    if event.get('source') == 'warmup':
        print('Warm-up ping received, skipping business logic')
        return {'statusCode': 200, 'body': 'warm'}

    return process_brand_request(event)

# serverless.yml
functions:
  brandLookup:
    handler: handler.main
    events:
      - schedule:
          rate: rate(5 minutes)
          input:
            source: warmup

Limitation: This only keeps a small number of environments warm. Under sudden traffic spikes, you'll still see cold starts on the new instances that scale out.

Strategy 3: Minimize Your Deployment Package

This is the highest ROI optimization — it costs nothing and directly reduces Phase 2 duration.

# Bad: shipping everything
pip install -r requirements.txt -t ./package
# Result: 45MB package including boto3, botocore, etc.

# Good: boto3 and botocore are pre-installed in the Lambda runtime
pip install -r requirements.txt -t ./package --no-deps

# serverless.yml — exclude unnecessary files
package:
  patterns:
    - '!node_modules/aws-sdk/**'
    - '!**/__pycache__/**'
    - '!**/*.pyc'
    - '!tests/**'
    - '!*.md'

Target: Keep your zipped package under 5MB for Python/Node.js. Under 1MB is ideal.

Strategy 4: Avoid Unnecessary VPC Configuration

Only put your Lambda in a VPC if it needs to access genuinely VPC-private resources.

Do you need to access:
├── DynamoDB?     → NO VPC needed
├── S3?           → NO VPC needed
├── SQS/SNS?      → NO VPC needed
├── RDS/Aurora?   → YES, VPC required
├── ElastiCache?  → YES, VPC required
└── Internal ALB? → YES, VPC required

DynamoDB and S3 — the primary data stores in most Serverless architectures — do not require VPC. Use VPC Endpoints if you need private connectivity without the cold start penalty.

Strategy 5: Choose the Right Runtime (and Use SnapStart for Java)

AWS Lambda SnapStart (Java 21) takes a snapshot of the initialized execution environment and restores from it on cold start — reducing Java cold starts from ~1s to ~100ms.

# serverless.yml
functions:
  brandProcessor:
    handler: com.brandfetch.Handler
    runtime: java21
    snapStart: true

For new functions, Python and Node.js are the best choices for cold-start-sensitive workloads.

Strategy 6: Lazy Load Heavy Dependencies

Don't initialize everything at module load time. Defer expensive operations until they're actually needed.

import boto3
import os

_rekognition_client = None
_opensearch_client = None

def get_rekognition():
    global _rekognition_client
    if _rekognition_client is None:
        _rekognition_client = boto3.client('rekognition')
    return _rekognition_client

def get_opensearch():
    global _opensearch_client
    if _opensearch_client is None:
        from opensearchpy import OpenSearch
        _opensearch_client = OpenSearch(
            hosts=[{'host': os.environ['OPENSEARCH_HOST'], 'port': 443}],
            use_ssl=True
        )
    return _opensearch_client

def handler(event, context):
    if event['type'] == 'image_analysis':
        return analyze_logo(get_rekognition(), event['imageUrl'])

    if event['type'] == 'vector_search':
        return search_similar_logos(get_opensearch(), event['embedding'])

Real-World Cold Start Numbers

Based on production measurements and community benchmarks (2024):

Runtime	Package Size	VPC	P50 Cold Start	P99 Cold Start
Python 3.12	1MB	No	180ms	420ms
Python 3.12	50MB	No	890ms	1,800ms
Python 3.12	1MB	Yes	280ms	650ms
Node.js 20.x	1MB	No	150ms	380ms
Java 21	10MB	No	800ms	1,500ms
Java 21 (SnapStart)	10MB	No	90ms	200ms
Go 1.x	5MB	No	80ms	180ms

Key takeaway: Package size has a bigger impact than most engineers expect. A Python function with a 50MB package has a P50 cold start 5x worse than the same function with a 1MB package.

Decision Framework: Do You Actually Need to Optimize?

Is cold start latency causing user-visible issues?
│
├── NO → Don't optimize. Cold starts are <1% of invocations
│         for most functions with moderate traffic.
│
└── YES → What's your traffic pattern?
          │
          ├── Steady traffic (>1 req/min)
          │   → Environments stay warm naturally.
          │     Focus on package size + init code.
          │
          ├── Bursty traffic (sudden spikes)
          │   → Provisioned Concurrency + Auto Scaling
          │
          └── Infrequent / scheduled jobs
              → EventBridge warm-up, or just accept it

Summary

Phase	Your Control	Best Action
Environment provisioning	❌ None	Use Provisioned Concurrency
Code download	✅ High	Minimize package size
Config injection	✅ Medium	Avoid SSM calls in init
VPC attachment	✅ High	Avoid VPC unless necessary
Runtime init	✅ Medium	Choose Python/Node/Go; use SnapStart for Java
User code init	✅ High	Lazy load, keep init lightweight

Cold starts are a real challenge in serverless production systems — but they're also highly addressable. The engineers who struggle most with cold starts are usually the ones who haven't measured which phase is actually costing them time.

Measure first. Optimize the right thing.

Next in this series: **Part 2 — Lambda Triggers Deep Dive: S3, EventBridge, API Gateway & SQS**

DEV Community