Suhas Mallesh

Posted on Jan 30 • Edited on Feb 16

Stop Burning Money: Master AWS Lambda Provisioned Concurrency with Terraform 💸

#aws #terraform #serverless #devops

You’ve probably heard that AWS Lambda provisioned concurrency eliminates cold starts. What they don’t tell you? It can turn your $50 Lambda bill into $500+ overnight if you’re not careful.

Here’s the thing: provisioned concurrency is incredibly powerful, but most developers either avoid it entirely (suffering cold starts) or implement it carelessly (suffering bill shock). There’s a sweet spot in between, and Terraform makes it possible to hit it consistently.

🎯 The Problem: Cold Starts vs. Cost Explosions

Lambda cold starts happen when AWS needs to initialize a new execution environment. For a typical Node.js function, this adds 200-500ms of latency. For Java or .NET? You’re looking at 2-5 seconds.

The traditional solution is provisioned concurrency - keeping execution environments warm and ready. But here’s the catch:

On-demand pricing: ~$0.20 per 1M requests + compute time
Provisioned concurrency: $0.0000041667 per GB-second (always running) + $0.20 per 1M requests

For a 512MB function with 10 provisioned instances running 24/7:

Monthly cost = 10 instances × 0.5 GB × 730 hours × 3600 seconds × $0.0000041667
             = ~$546 per month

Just to keep functions warm. Before you’ve processed a single request.

🧠 The Strategy: Dynamic Provisioned Concurrency

The secret is treating provisioned concurrency like autoscaling for EC2 - scale it up when you need it, down when you don’t. Here’s how to do it right with Terraform.

📊 Step 1: Calculate Your Break-Even Point

Not every function needs provisioned concurrency. Use this formula:

Break-even daily invocations = (Provisioned cost per day) / (Cold start cost savings per invocation)

Cold start cost = Cold start duration (s) × Memory (GB) × $0.0000166667

Example: 512MB function, 300ms cold start, 20% of requests are cold starts

Daily provisioned cost (1 instance) = 1 × 0.5 × 86400 × $0.0000041667 = $0.18
Cold start cost per invocation = 0.3 × 0.5 × $0.0000166667 = $0.0000025
Savings per invocation (20% cold) = $0.0000025 × 0.2 = $0.0000005

Break-even = $0.18 / $0.0000005 = 360,000 invocations/day

If you’re processing fewer than 360k requests per day, provisioned concurrency costs you money.

🛠️ Step 2: Implement Smart Scaling with Terraform

Here’s a production-ready Terraform configuration that scales provisioned concurrency based on actual demand:

Basic Lambda Function Setup

# main.tf
resource "aws_lambda_function" "api_handler" {
  filename         = "lambda_function.zip"
  function_name    = "api-handler"
  role            = aws_iam_role.lambda_role.arn
  handler         = "index.handler"
  runtime         = "nodejs20.x"
  memory_size     = 512
  timeout         = 30

  environment {
    variables = {
      ENVIRONMENT = "production"
    }
  }

  publish = true  # Required for provisioned concurrency
}

# Create an alias for stable endpoint
resource "aws_lambda_alias" "live" {
  name             = "live"
  description      = "Live traffic alias"
  function_name    = aws_lambda_function.api_handler.function_name
  function_version = aws_lambda_function.api_handler.version
}

Provisioned Concurrency with Auto Scaling

# provisioned_concurrency.tf

# Enable provisioned concurrency
resource "aws_lambda_provisioned_concurrency_config" "api_handler" {
  function_name                     = aws_lambda_alias.live.function_name
  qualifier                         = aws_lambda_alias.live.name
  provisioned_concurrent_executions = 2  # Minimum instances
}

# Auto Scaling Target
resource "aws_appautoscaling_target" "lambda_target" {
  max_capacity       = 10  # Maximum instances
  min_capacity       = 2   # Minimum instances (must match provisioned concurrency)
  resource_id        = "function:${aws_lambda_function.api_handler.function_name}:${aws_lambda_alias.live.name}"
  scalable_dimension = "lambda:function:ProvisionedConcurrencyUtilization"
  service_namespace  = "lambda"

  depends_on = [aws_lambda_provisioned_concurrency_config.api_handler]
}

# Scale up when utilization > 70%
resource "aws_appautoscaling_policy" "lambda_scale_up" {
  name               = "lambda-scale-up"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.lambda_target.resource_id
  scalable_dimension = aws_appautoscaling_target.lambda_target.scalable_dimension
  service_namespace  = aws_appautoscaling_target.lambda_target.service_namespace

  target_tracking_scaling_policy_configuration {
    target_value = 70.0  # Target 70% utilization

    predefined_metric_specification {
      predefined_metric_type = "LambdaProvisionedConcurrencyUtilization"
    }

    scale_in_cooldown  = 300  # Wait 5 min before scaling down
    scale_out_cooldown = 60   # Wait 1 min before scaling up again
  }
}

Scheduled Scaling for Predictable Workloads

If your traffic follows a pattern (business hours, weekend drops), use scheduled actions:

# scheduled_scaling.tf

# Scale up for business hours (Mon-Fri 8 AM - 6 PM EST)
resource "aws_appautoscaling_scheduled_action" "scale_up_business_hours" {
  name               = "scale-up-business-hours"
  service_namespace  = aws_appautoscaling_target.lambda_target.service_namespace
  resource_id        = aws_appautoscaling_target.lambda_target.resource_id
  scalable_dimension = aws_appautoscaling_target.lambda_target.scalable_dimension

  schedule = "cron(0 8 ? * MON-FRI *)"  # 8 AM EST weekdays

  scalable_target_action {
    min_capacity = 5
    max_capacity = 15
  }
}

# Scale down for nights and weekends
resource "aws_appautoscaling_scheduled_action" "scale_down_off_hours" {
  name               = "scale-down-off-hours"
  service_namespace  = aws_appautoscaling_target.lambda_target.service_namespace
  resource_id        = aws_appautoscaling_target.lambda_target.resource_id
  scalable_dimension = aws_appautoscaling_target.lambda_target.scalable_dimension

  schedule = "cron(0 18 ? * MON-FRI *)"  # 6 PM EST weekdays

  scalable_target_action {
    min_capacity = 1
    max_capacity = 3
  }
}

# Weekend minimal capacity
resource "aws_appautoscaling_scheduled_action" "scale_down_weekend" {
  name               = "scale-down-weekend"
  service_namespace  = aws_appautoscaling_target.lambda_target.service_namespace
  resource_id        = aws_appautoscaling_target.lambda_target.resource_id
  scalable_dimension = aws_appautoscaling_target.lambda_target.scalable_dimension

  schedule = "cron(0 0 ? * SAT *)"  # Midnight Saturday

  scalable_target_action {
    min_capacity = 1
    max_capacity = 2
  }
}

IAM Permissions

# iam.tf

resource "aws_iam_role" "lambda_role" {
  name = "api-handler-lambda-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "lambda.amazonaws.com"
      }
    }]
  })
}

resource "aws_iam_role_policy_attachment" "lambda_basic" {
  role       = aws_iam_role.lambda_role.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
}

# Required for Application Auto Scaling
resource "aws_iam_role" "autoscaling_role" {
  name = "lambda-autoscaling-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "application-autoscaling.amazonaws.com"
      }
    }]
  })
}

resource "aws_iam_role_policy_attachment" "autoscaling_policy" {
  role       = aws_iam_role.autoscaling_role.name
  policy_arn = "arn:aws:iam::aws:policy/aws-service-role/LambdaReplicatorPolicy"
}

📈 Step 3: Monitor and Optimize

Create CloudWatch dashboards to track your investment:

# monitoring.tf

resource "aws_cloudwatch_metric_alarm" "high_cost_alert" {
  alarm_name          = "lambda-provisioned-concurrency-high-cost"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "ProvisionedConcurrencyUtilization"
  namespace           = "AWS/Lambda"
  period              = "300"
  statistic           = "Average"
  threshold           = "30"  # Alert if avg utilization < 30%
  alarm_description   = "Provisioned concurrency may be over-provisioned"
  treat_missing_data  = "notBreaching"

  dimensions = {
    FunctionName = aws_lambda_function.api_handler.function_name
    Resource     = "${aws_lambda_function.api_handler.function_name}:${aws_lambda_alias.live.name}"
  }
}

💰 Real-World Cost Comparison

Let’s say you have an API with these characteristics:

Traffic: 50k requests/day during business hours (10 hours), 5k requests overnight
Function: 512MB, 200ms average execution, 400ms cold start
Cold start rate: 20% without provisioned concurrency

Without Provisioned Concurrency

Compute cost: 55k × 0.2s × 0.5GB × $0.0000166667 = $0.92/day
Request cost: 55k × $0.0000002 = $0.01/day
Cold start penalty: 55k × 0.2 × 0.4s × 0.5GB × $0.0000166667 = $0.37/day

Total: ~$40/month

With Always-On Provisioned Concurrency (5 instances)

Provisioned cost: 5 × 0.5GB × 86400s × $0.0000041667 = $0.90/day
Compute cost: 55k × 0.2s × 0.5GB × $0.0000166667 = $0.92/day
Request cost: 55k × $0.0000002 = $0.01/day

Total: ~$56/month

With Scheduled Provisioned Concurrency (5 instances @ 10hrs, 1 instance @ 14hrs)

Provisioned cost: (5 × 0.5 × 36000) + (1 × 0.5 × 50400) × $0.0000041667 = $0.48/day
Compute cost: $0.92/day
Request cost: $0.01/day

Total: ~$43/month

Savings: $13/month vs always-on, while eliminating cold starts during peak hours. Scale this across 20 functions and you’re saving $3,120/year.

🎓 Best Practices Checklist

✅ Always use an alias - Provisioned concurrency requires versioning
✅ Start small - Begin with min_capacity = 1 or 2, monitor utilization
✅ Set cooldown periods - Prevent thrashing (scale_in_cooldown = 300s minimum)
✅ Match workload patterns - Use scheduled actions for predictable traffic
✅ Monitor utilization - Alert when average utilization < 40% (you’re wasting money)
✅ Test scaling behavior - Simulate load spikes in staging first
✅ Version carefully - Updating function code requires reprovisioning (brief downtime)

⚠️ Common Gotchas

1. The “Insufficient Data” Trap
When scaling down to zero, CloudWatch alarms can get stuck in “INSUFFICIENT_DATA” state. Solution: Keep min_capacity = 1 or use scheduled scaling to go to zero during known idle periods.

2. Deployment Delays
Updating a function with provisioned concurrency takes 2-3 minutes to reprovision. For CI/CD, either:

Accept the delay
Use blue/green deployments with two aliases
Temporarily disable provisioned concurrency during deploys

3. Region Limits
Default provisioned concurrency limit is 100 instances per account per region. Request increases through AWS Support if needed.

🚀 Quick Start Commands

# Initialize Terraform
terraform init

# Preview changes
terraform plan

# Deploy
terraform apply

# Check provisioned concurrency status
aws lambda get-provisioned-concurrency-config \
  --function-name api-handler \
  --qualifier live

# Monitor scaling activity
aws application-autoscaling describe-scaling-activities \
  --service-namespace lambda \
  --resource-id "function:api-handler:live"

🎯 When to Use Provisioned Concurrency

Great for:

Customer-facing APIs with strict latency SLAs (<100ms p99)
Functions with >2s cold starts (Java, .NET, large dependencies)
Predictable, high-volume traffic patterns
Functions processing >100k requests/day

Skip it for:

Internal tools and batch processing
Low-traffic endpoints (<10k requests/day)
Functions with fast cold starts (<200ms)
Unpredictable, sporadic workloads

📚 Complete Example Repository

Here’s a complete working example structure:

lambda-provisioned-concurrency/
├── main.tf
├── provisioned_concurrency.tf
├── scheduled_scaling.tf
├── iam.tf
├── monitoring.tf
├── variables.tf
├── outputs.tf
└── lambda_function/
    ├── index.js
    └── package.json

variables.tf:

variable "function_name" {
  description = "Lambda function name"
  type        = string
  default     = "api-handler"
}

variable "min_capacity" {
  description = "Minimum provisioned concurrency instances"
  type        = number
  default     = 2
}

variable "max_capacity" {
  description = "Maximum provisioned concurrency instances"
  type        = number
  default     = 10
}

variable "target_utilization" {
  description = "Target utilization percentage for auto scaling"
  type        = number
  default     = 70
}

outputs.tf:

output "function_arn" {
  value = aws_lambda_function.api_handler.arn
}

output "function_url" {
  value = aws_lambda_alias.live.invoke_arn
}

output "estimated_monthly_cost" {
  value = "Min: $${var.min_capacity * 0.5 * 730 * 3600 * 0.0000041667}, Max: $${var.max_capacity * 0.5 * 730 * 3600 * 0.0000041667}"
}

💡 Final Thoughts

Provisioned concurrency isn’t an all-or-nothing decision. The key is using it strategically:

Calculate your break-even point
Implement dynamic scaling with Terraform
Monitor utilization and costs
Optimize based on real data

With this approach, you can eliminate cold starts during peak hours while avoiding the $500+ monthly surprise on functions that only need it 20% of the time.

Remember: The goal isn’t zero cold starts - it’s optimal cost for your latency requirements.

Have you implemented provisioned concurrency? What’s your biggest cost optimization win with Lambda? Share in the comments! 💬

Found this helpful? Follow me for more AWS cost optimization tips with Terraform! 🚀

DEV Community