You’ve probably heard that AWS Lambda provisioned concurrency eliminates cold starts. What they don’t tell you? It can turn your $50 Lambda bill into $500+ overnight if you’re not careful.
Here’s the thing: provisioned concurrency is incredibly powerful, but most developers either avoid it entirely (suffering cold starts) or implement it carelessly (suffering bill shock). There’s a sweet spot in between, and Terraform makes it possible to hit it consistently.
🎯 The Problem: Cold Starts vs. Cost Explosions
Lambda cold starts happen when AWS needs to initialize a new execution environment. For a typical Node.js function, this adds 200-500ms of latency. For Java or .NET? You’re looking at 2-5 seconds.
The traditional solution is provisioned concurrency - keeping execution environments warm and ready. But here’s the catch:
- On-demand pricing: ~$0.20 per 1M requests + compute time
- Provisioned concurrency: $0.0000041667 per GB-second (always running) + $0.20 per 1M requests
For a 512MB function with 10 provisioned instances running 24/7:
Monthly cost = 10 instances × 0.5 GB × 730 hours × 3600 seconds × $0.0000041667
= ~$546 per month
Just to keep functions warm. Before you’ve processed a single request.
🧠 The Strategy: Dynamic Provisioned Concurrency
The secret is treating provisioned concurrency like autoscaling for EC2 - scale it up when you need it, down when you don’t. Here’s how to do it right with Terraform.
📊 Step 1: Calculate Your Break-Even Point
Not every function needs provisioned concurrency. Use this formula:
Break-even daily invocations = (Provisioned cost per day) / (Cold start cost savings per invocation)
Cold start cost = Cold start duration (s) × Memory (GB) × $0.0000166667
Example: 512MB function, 300ms cold start, 20% of requests are cold starts
Daily provisioned cost (1 instance) = 1 × 0.5 × 86400 × $0.0000041667 = $0.18
Cold start cost per invocation = 0.3 × 0.5 × $0.0000166667 = $0.0000025
Savings per invocation (20% cold) = $0.0000025 × 0.2 = $0.0000005
Break-even = $0.18 / $0.0000005 = 360,000 invocations/day
If you’re processing fewer than 360k requests per day, provisioned concurrency costs you money.
🛠️ Step 2: Implement Smart Scaling with Terraform
Here’s a production-ready Terraform configuration that scales provisioned concurrency based on actual demand:
Basic Lambda Function Setup
# main.tf
resource "aws_lambda_function" "api_handler" {
filename = "lambda_function.zip"
function_name = "api-handler"
role = aws_iam_role.lambda_role.arn
handler = "index.handler"
runtime = "nodejs20.x"
memory_size = 512
timeout = 30
environment {
variables = {
ENVIRONMENT = "production"
}
}
publish = true # Required for provisioned concurrency
}
# Create an alias for stable endpoint
resource "aws_lambda_alias" "live" {
name = "live"
description = "Live traffic alias"
function_name = aws_lambda_function.api_handler.function_name
function_version = aws_lambda_function.api_handler.version
}
Provisioned Concurrency with Auto Scaling
# provisioned_concurrency.tf
# Enable provisioned concurrency
resource "aws_lambda_provisioned_concurrency_config" "api_handler" {
function_name = aws_lambda_alias.live.function_name
qualifier = aws_lambda_alias.live.name
provisioned_concurrent_executions = 2 # Minimum instances
}
# Auto Scaling Target
resource "aws_appautoscaling_target" "lambda_target" {
max_capacity = 10 # Maximum instances
min_capacity = 2 # Minimum instances (must match provisioned concurrency)
resource_id = "function:${aws_lambda_function.api_handler.function_name}:${aws_lambda_alias.live.name}"
scalable_dimension = "lambda:function:ProvisionedConcurrencyUtilization"
service_namespace = "lambda"
depends_on = [aws_lambda_provisioned_concurrency_config.api_handler]
}
# Scale up when utilization > 70%
resource "aws_appautoscaling_policy" "lambda_scale_up" {
name = "lambda-scale-up"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.lambda_target.resource_id
scalable_dimension = aws_appautoscaling_target.lambda_target.scalable_dimension
service_namespace = aws_appautoscaling_target.lambda_target.service_namespace
target_tracking_scaling_policy_configuration {
target_value = 70.0 # Target 70% utilization
predefined_metric_specification {
predefined_metric_type = "LambdaProvisionedConcurrencyUtilization"
}
scale_in_cooldown = 300 # Wait 5 min before scaling down
scale_out_cooldown = 60 # Wait 1 min before scaling up again
}
}
Scheduled Scaling for Predictable Workloads
If your traffic follows a pattern (business hours, weekend drops), use scheduled actions:
# scheduled_scaling.tf
# Scale up for business hours (Mon-Fri 8 AM - 6 PM EST)
resource "aws_appautoscaling_scheduled_action" "scale_up_business_hours" {
name = "scale-up-business-hours"
service_namespace = aws_appautoscaling_target.lambda_target.service_namespace
resource_id = aws_appautoscaling_target.lambda_target.resource_id
scalable_dimension = aws_appautoscaling_target.lambda_target.scalable_dimension
schedule = "cron(0 8 ? * MON-FRI *)" # 8 AM EST weekdays
scalable_target_action {
min_capacity = 5
max_capacity = 15
}
}
# Scale down for nights and weekends
resource "aws_appautoscaling_scheduled_action" "scale_down_off_hours" {
name = "scale-down-off-hours"
service_namespace = aws_appautoscaling_target.lambda_target.service_namespace
resource_id = aws_appautoscaling_target.lambda_target.resource_id
scalable_dimension = aws_appautoscaling_target.lambda_target.scalable_dimension
schedule = "cron(0 18 ? * MON-FRI *)" # 6 PM EST weekdays
scalable_target_action {
min_capacity = 1
max_capacity = 3
}
}
# Weekend minimal capacity
resource "aws_appautoscaling_scheduled_action" "scale_down_weekend" {
name = "scale-down-weekend"
service_namespace = aws_appautoscaling_target.lambda_target.service_namespace
resource_id = aws_appautoscaling_target.lambda_target.resource_id
scalable_dimension = aws_appautoscaling_target.lambda_target.scalable_dimension
schedule = "cron(0 0 ? * SAT *)" # Midnight Saturday
scalable_target_action {
min_capacity = 1
max_capacity = 2
}
}
IAM Permissions
# iam.tf
resource "aws_iam_role" "lambda_role" {
name = "api-handler-lambda-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "lambda.amazonaws.com"
}
}]
})
}
resource "aws_iam_role_policy_attachment" "lambda_basic" {
role = aws_iam_role.lambda_role.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
}
# Required for Application Auto Scaling
resource "aws_iam_role" "autoscaling_role" {
name = "lambda-autoscaling-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "application-autoscaling.amazonaws.com"
}
}]
})
}
resource "aws_iam_role_policy_attachment" "autoscaling_policy" {
role = aws_iam_role.autoscaling_role.name
policy_arn = "arn:aws:iam::aws:policy/aws-service-role/LambdaReplicatorPolicy"
}
📈 Step 3: Monitor and Optimize
Create CloudWatch dashboards to track your investment:
# monitoring.tf
resource "aws_cloudwatch_metric_alarm" "high_cost_alert" {
alarm_name = "lambda-provisioned-concurrency-high-cost"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "ProvisionedConcurrencyUtilization"
namespace = "AWS/Lambda"
period = "300"
statistic = "Average"
threshold = "30" # Alert if avg utilization < 30%
alarm_description = "Provisioned concurrency may be over-provisioned"
treat_missing_data = "notBreaching"
dimensions = {
FunctionName = aws_lambda_function.api_handler.function_name
Resource = "${aws_lambda_function.api_handler.function_name}:${aws_lambda_alias.live.name}"
}
}
💰 Real-World Cost Comparison
Let’s say you have an API with these characteristics:
- Traffic: 50k requests/day during business hours (10 hours), 5k requests overnight
- Function: 512MB, 200ms average execution, 400ms cold start
- Cold start rate: 20% without provisioned concurrency
Without Provisioned Concurrency
Compute cost: 55k × 0.2s × 0.5GB × $0.0000166667 = $0.92/day
Request cost: 55k × $0.0000002 = $0.01/day
Cold start penalty: 55k × 0.2 × 0.4s × 0.5GB × $0.0000166667 = $0.37/day
Total: ~$40/month
With Always-On Provisioned Concurrency (5 instances)
Provisioned cost: 5 × 0.5GB × 86400s × $0.0000041667 = $0.90/day
Compute cost: 55k × 0.2s × 0.5GB × $0.0000166667 = $0.92/day
Request cost: 55k × $0.0000002 = $0.01/day
Total: ~$56/month
With Scheduled Provisioned Concurrency (5 instances @ 10hrs, 1 instance @ 14hrs)
Provisioned cost: (5 × 0.5 × 36000) + (1 × 0.5 × 50400) × $0.0000041667 = $0.48/day
Compute cost: $0.92/day
Request cost: $0.01/day
Total: ~$43/month
Savings: $13/month vs always-on, while eliminating cold starts during peak hours. Scale this across 20 functions and you’re saving $3,120/year.
🎓 Best Practices Checklist
✅ Always use an alias - Provisioned concurrency requires versioning
✅ Start small - Begin with min_capacity = 1 or 2, monitor utilization
✅ Set cooldown periods - Prevent thrashing (scale_in_cooldown = 300s minimum)
✅ Match workload patterns - Use scheduled actions for predictable traffic
✅ Monitor utilization - Alert when average utilization < 40% (you’re wasting money)
✅ Test scaling behavior - Simulate load spikes in staging first
✅ Version carefully - Updating function code requires reprovisioning (brief downtime)
⚠️ Common Gotchas
1. The “Insufficient Data” Trap
When scaling down to zero, CloudWatch alarms can get stuck in “INSUFFICIENT_DATA” state. Solution: Keep min_capacity = 1 or use scheduled scaling to go to zero during known idle periods.
2. Deployment Delays
Updating a function with provisioned concurrency takes 2-3 minutes to reprovision. For CI/CD, either:
- Accept the delay
- Use blue/green deployments with two aliases
- Temporarily disable provisioned concurrency during deploys
3. Region Limits
Default provisioned concurrency limit is 100 instances per account per region. Request increases through AWS Support if needed.
🚀 Quick Start Commands
# Initialize Terraform
terraform init
# Preview changes
terraform plan
# Deploy
terraform apply
# Check provisioned concurrency status
aws lambda get-provisioned-concurrency-config \
--function-name api-handler \
--qualifier live
# Monitor scaling activity
aws application-autoscaling describe-scaling-activities \
--service-namespace lambda \
--resource-id "function:api-handler:live"
🎯 When to Use Provisioned Concurrency
Great for:
- Customer-facing APIs with strict latency SLAs (<100ms p99)
- Functions with >2s cold starts (Java, .NET, large dependencies)
- Predictable, high-volume traffic patterns
- Functions processing >100k requests/day
Skip it for:
- Internal tools and batch processing
- Low-traffic endpoints (<10k requests/day)
- Functions with fast cold starts (<200ms)
- Unpredictable, sporadic workloads
📚 Complete Example Repository
Here’s a complete working example structure:
lambda-provisioned-concurrency/
├── main.tf
├── provisioned_concurrency.tf
├── scheduled_scaling.tf
├── iam.tf
├── monitoring.tf
├── variables.tf
├── outputs.tf
└── lambda_function/
├── index.js
└── package.json
variables.tf:
variable "function_name" {
description = "Lambda function name"
type = string
default = "api-handler"
}
variable "min_capacity" {
description = "Minimum provisioned concurrency instances"
type = number
default = 2
}
variable "max_capacity" {
description = "Maximum provisioned concurrency instances"
type = number
default = 10
}
variable "target_utilization" {
description = "Target utilization percentage for auto scaling"
type = number
default = 70
}
outputs.tf:
output "function_arn" {
value = aws_lambda_function.api_handler.arn
}
output "function_url" {
value = aws_lambda_alias.live.invoke_arn
}
output "estimated_monthly_cost" {
value = "Min: $${var.min_capacity * 0.5 * 730 * 3600 * 0.0000041667}, Max: $${var.max_capacity * 0.5 * 730 * 3600 * 0.0000041667}"
}
💡 Final Thoughts
Provisioned concurrency isn’t an all-or-nothing decision. The key is using it strategically:
- Calculate your break-even point
- Implement dynamic scaling with Terraform
- Monitor utilization and costs
- Optimize based on real data
With this approach, you can eliminate cold starts during peak hours while avoiding the $500+ monthly surprise on functions that only need it 20% of the time.
Remember: The goal isn’t zero cold starts - it’s optimal cost for your latency requirements.
Have you implemented provisioned concurrency? What’s your biggest cost optimization win with Lambda? Share in the comments! 💬
Found this helpful? Follow me for more AWS cost optimization tips with Terraform! 🚀
Top comments (0)