Siddhant Khare

Posted on Dec 1, 2024

Turbocharging AWS Lambda: How to eliminate cold starts forever

#aws #lambda #devops #performance

Imagine rushing to grab your morning coffee, only to find the barista needs to boot up the espresso machine first. That's essentially what happens during a Lambda cold start – and just like your coffee delay, it can be frustrating. But fear not! Today we'll dive deep into how provisioned concurrency can help you serve up those functions piping hot.

Understanding the cold start problem

When a Lambda function hasn't been used recently, AWS needs to spin up a new execution environment before running your code. This initialization process includes:

Downloading your code
Bootstrapping the runtime
Loading your dependencies
Running initialization code

Here's a simple Node.js function that demonstrates the cold start impact:

const mongoose = require('mongoose');

// This connection happens during cold start
let conn = null;

const connectToDb = async () => {
    if (conn == null) {
        conn = await mongoose.connect(process.env.MONGODB_URI, {
            serverSelectionTimeoutMS: 5000
        });
    }
    return conn;
};

exports.handler = async (event) => {
    // Connection time will impact cold start duration
    await connectToDb();

    // Rest of your handler code...
};

In my testing, this simple function with a database connection could take 800ms-2s to cold start, compared to 10-50ms for warm starts.

Enter provisioned concurrency

Provisioned concurrency is like having a barista who keeps the espresso machine running even during quiet periods. It maintains a pool of pre-initialized execution environments ready to respond instantly to incoming requests.

To enable it using AWS CDK:

const lambda = new lambda.Function(this, 'MyFunction', {
    runtime: lambda.Runtime.NODEJS_18_X,
    handler: 'index.handler',
    code: lambda.Code.fromAsset('lambda'),
    // Other configuration...
});

const version = lambda.currentVersion;
version.addProvisionedConcurrency(5); // Keep 5 instances warm

The magic behind the scenes

When you enable provisioned concurrency, AWS does something clever:

Creates the specified number of execution environments
Runs your initialization code
Freezes the environments in a ready state
Maintains this pool, replacing any that become unhealthy

This means your function starts executing almost immediately when triggered, as the heavy lifting has already been done.

Best practices and optimization tips

1. Smart initialization code

Move as much initialization as possible into global scope:

// Good: Done once during provisioned concurrency initialization
const client = new AWS.DynamoDB.DocumentClient();
const tableName = process.env.TABLE_NAME;

// Bad: Would run on every invocation
exports.handler = async (event) => {
    const client = new AWS.DynamoDB.DocumentClient();
    // ...
};

2. Precise concurrency levels

Monitor your function's concurrent executions using CloudWatch metrics like ConcurrentExecutions and adjust provisioned concurrency accordingly. Over-provisioning wastes money, while under-provisioning leads to cold starts.

3. Using application auto-scaling

Set up auto-scaling to automatically adjust provisioned concurrency based on utilization:

const target = version.addAutoScaling({
    minCapacity: 2,
    maxCapacity: 10
});

target.scaleOnUtilization({
    utilizationTarget: 0.75,
    scaleInCooldown: Duration.seconds(60),
    scaleOutCooldown: Duration.seconds(60)
});

Cost considerations

Provisioned concurrency isn't free – you pay for:

The time your provisioned instances are available
The compute time used during function execution
Any additional instances that spin up beyond your provisioned amount

A practical approach is to:

Identify functions that are latency-sensitive
Monitor their actual usage patterns
Apply provisioned concurrency selectively
Use scheduled provisioned concurrency for predictable load patterns

Measuring the impact

I built a simple testing framework to measure the difference:

import concurrent.futures
import requests
import time
import statistics

def invoke_function(url):
    start = time.time()
    response = requests.post(url)
    return time.time() - start

def run_load_test(url, concurrent_requests):
    with concurrent.futures.ThreadPoolExecutor(max_workers=concurrent_requests) as executor:
        futures = [executor.submit(invoke_function, url) for _ in range(concurrent_requests)]
        times = [f.result() for f in concurrent.futures.as_completed(futures)]

    return {
        'avg': statistics.mean(times),
        'p95': statistics.quantile(times, 0.95),
        'max': max(times)
    }

# Results with provisioned concurrency disabled
print("Without PC:", run_load_test(lambda_url, 100))

# Results with provisioned concurrency enabled
print("With PC:", run_load_test(lambda_url, 100))

Real-world results

In production environments, I've seen:

Cold starts reduced from 1-2s to under 100ms
P95 latency improved by 80%
More consistent performance during traffic spikes

Beyond provisioned concurrency

While provisioned concurrency is powerful, consider these complementary strategies:

Using smaller dependencies to reduce initialization time
Implementing connection pooling for databases
Leveraging Lambda SnapStart for Java functions
Using external caching services for frequently accessed data

Conclusion

Provisioned concurrency is a powerful tool for reducing Lambda cold starts, but it requires careful planning and monitoring to be used effectively.

Remember: like that perfect cup of coffee, the key is finding the right balance – between performance, cost, and complexity – that works for your specific use case.

For more tips and insights, follow me on Twitter @Siddhant_K_code and stay updated with the latest & detailed tech content like this.

DEV Community