DEV Community

Cover image for Stop Burning Money: Master AWS Lambda Provisioned Concurrency with Terraform 💸
Suhas Mallesh
Suhas Mallesh

Posted on

Stop Burning Money: Master AWS Lambda Provisioned Concurrency with Terraform 💸

You’ve probably heard that AWS Lambda provisioned concurrency eliminates cold starts. What they don’t tell you? It can turn your $50 Lambda bill into $500+ overnight if you’re not careful.

Here’s the thing: provisioned concurrency is incredibly powerful, but most developers either avoid it entirely (suffering cold starts) or implement it carelessly (suffering bill shock). There’s a sweet spot in between, and Terraform makes it possible to hit it consistently.

🎯 The Problem: Cold Starts vs. Cost Explosions

Lambda cold starts happen when AWS needs to initialize a new execution environment. For a typical Node.js function, this adds 200-500ms of latency. For Java or .NET? You’re looking at 2-5 seconds.

The traditional solution is provisioned concurrency - keeping execution environments warm and ready. But here’s the catch:

  • On-demand pricing: ~$0.20 per 1M requests + compute time
  • Provisioned concurrency: $0.0000041667 per GB-second (always running) + $0.20 per 1M requests

For a 512MB function with 10 provisioned instances running 24/7:

Monthly cost = 10 instances × 0.5 GB × 730 hours × 3600 seconds × $0.0000041667
             = ~$546 per month
Enter fullscreen mode Exit fullscreen mode

Just to keep functions warm. Before you’ve processed a single request.

🧠 The Strategy: Dynamic Provisioned Concurrency

The secret is treating provisioned concurrency like autoscaling for EC2 - scale it up when you need it, down when you don’t. Here’s how to do it right with Terraform.

📊 Step 1: Calculate Your Break-Even Point

Not every function needs provisioned concurrency. Use this formula:

Break-even daily invocations = (Provisioned cost per day) / (Cold start cost savings per invocation)

Cold start cost = Cold start duration (s) × Memory (GB) × $0.0000166667
Enter fullscreen mode Exit fullscreen mode

Example: 512MB function, 300ms cold start, 20% of requests are cold starts

Daily provisioned cost (1 instance) = 1 × 0.5 × 86400 × $0.0000041667 = $0.18
Cold start cost per invocation = 0.3 × 0.5 × $0.0000166667 = $0.0000025
Savings per invocation (20% cold) = $0.0000025 × 0.2 = $0.0000005

Break-even = $0.18 / $0.0000005 = 360,000 invocations/day
Enter fullscreen mode Exit fullscreen mode

If you’re processing fewer than 360k requests per day, provisioned concurrency costs you money.

🛠️ Step 2: Implement Smart Scaling with Terraform

Here’s a production-ready Terraform configuration that scales provisioned concurrency based on actual demand:

Basic Lambda Function Setup

# main.tf
resource "aws_lambda_function" "api_handler" {
  filename         = "lambda_function.zip"
  function_name    = "api-handler"
  role            = aws_iam_role.lambda_role.arn
  handler         = "index.handler"
  runtime         = "nodejs20.x"
  memory_size     = 512
  timeout         = 30

  environment {
    variables = {
      ENVIRONMENT = "production"
    }
  }

  publish = true  # Required for provisioned concurrency
}

# Create an alias for stable endpoint
resource "aws_lambda_alias" "live" {
  name             = "live"
  description      = "Live traffic alias"
  function_name    = aws_lambda_function.api_handler.function_name
  function_version = aws_lambda_function.api_handler.version
}
Enter fullscreen mode Exit fullscreen mode

Provisioned Concurrency with Auto Scaling

# provisioned_concurrency.tf

# Enable provisioned concurrency
resource "aws_lambda_provisioned_concurrency_config" "api_handler" {
  function_name                     = aws_lambda_alias.live.function_name
  qualifier                         = aws_lambda_alias.live.name
  provisioned_concurrent_executions = 2  # Minimum instances
}

# Auto Scaling Target
resource "aws_appautoscaling_target" "lambda_target" {
  max_capacity       = 10  # Maximum instances
  min_capacity       = 2   # Minimum instances (must match provisioned concurrency)
  resource_id        = "function:${aws_lambda_function.api_handler.function_name}:${aws_lambda_alias.live.name}"
  scalable_dimension = "lambda:function:ProvisionedConcurrencyUtilization"
  service_namespace  = "lambda"

  depends_on = [aws_lambda_provisioned_concurrency_config.api_handler]
}

# Scale up when utilization > 70%
resource "aws_appautoscaling_policy" "lambda_scale_up" {
  name               = "lambda-scale-up"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.lambda_target.resource_id
  scalable_dimension = aws_appautoscaling_target.lambda_target.scalable_dimension
  service_namespace  = aws_appautoscaling_target.lambda_target.service_namespace

  target_tracking_scaling_policy_configuration {
    target_value = 70.0  # Target 70% utilization

    predefined_metric_specification {
      predefined_metric_type = "LambdaProvisionedConcurrencyUtilization"
    }

    scale_in_cooldown  = 300  # Wait 5 min before scaling down
    scale_out_cooldown = 60   # Wait 1 min before scaling up again
  }
}
Enter fullscreen mode Exit fullscreen mode

Scheduled Scaling for Predictable Workloads

If your traffic follows a pattern (business hours, weekend drops), use scheduled actions:

# scheduled_scaling.tf

# Scale up for business hours (Mon-Fri 8 AM - 6 PM EST)
resource "aws_appautoscaling_scheduled_action" "scale_up_business_hours" {
  name               = "scale-up-business-hours"
  service_namespace  = aws_appautoscaling_target.lambda_target.service_namespace
  resource_id        = aws_appautoscaling_target.lambda_target.resource_id
  scalable_dimension = aws_appautoscaling_target.lambda_target.scalable_dimension

  schedule = "cron(0 8 ? * MON-FRI *)"  # 8 AM EST weekdays

  scalable_target_action {
    min_capacity = 5
    max_capacity = 15
  }
}

# Scale down for nights and weekends
resource "aws_appautoscaling_scheduled_action" "scale_down_off_hours" {
  name               = "scale-down-off-hours"
  service_namespace  = aws_appautoscaling_target.lambda_target.service_namespace
  resource_id        = aws_appautoscaling_target.lambda_target.resource_id
  scalable_dimension = aws_appautoscaling_target.lambda_target.scalable_dimension

  schedule = "cron(0 18 ? * MON-FRI *)"  # 6 PM EST weekdays

  scalable_target_action {
    min_capacity = 1
    max_capacity = 3
  }
}

# Weekend minimal capacity
resource "aws_appautoscaling_scheduled_action" "scale_down_weekend" {
  name               = "scale-down-weekend"
  service_namespace  = aws_appautoscaling_target.lambda_target.service_namespace
  resource_id        = aws_appautoscaling_target.lambda_target.resource_id
  scalable_dimension = aws_appautoscaling_target.lambda_target.scalable_dimension

  schedule = "cron(0 0 ? * SAT *)"  # Midnight Saturday

  scalable_target_action {
    min_capacity = 1
    max_capacity = 2
  }
}
Enter fullscreen mode Exit fullscreen mode

IAM Permissions

# iam.tf

resource "aws_iam_role" "lambda_role" {
  name = "api-handler-lambda-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "lambda.amazonaws.com"
      }
    }]
  })
}

resource "aws_iam_role_policy_attachment" "lambda_basic" {
  role       = aws_iam_role.lambda_role.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
}

# Required for Application Auto Scaling
resource "aws_iam_role" "autoscaling_role" {
  name = "lambda-autoscaling-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "application-autoscaling.amazonaws.com"
      }
    }]
  })
}

resource "aws_iam_role_policy_attachment" "autoscaling_policy" {
  role       = aws_iam_role.autoscaling_role.name
  policy_arn = "arn:aws:iam::aws:policy/aws-service-role/LambdaReplicatorPolicy"
}
Enter fullscreen mode Exit fullscreen mode

📈 Step 3: Monitor and Optimize

Create CloudWatch dashboards to track your investment:

# monitoring.tf

resource "aws_cloudwatch_metric_alarm" "high_cost_alert" {
  alarm_name          = "lambda-provisioned-concurrency-high-cost"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "ProvisionedConcurrencyUtilization"
  namespace           = "AWS/Lambda"
  period              = "300"
  statistic           = "Average"
  threshold           = "30"  # Alert if avg utilization < 30%
  alarm_description   = "Provisioned concurrency may be over-provisioned"
  treat_missing_data  = "notBreaching"

  dimensions = {
    FunctionName = aws_lambda_function.api_handler.function_name
    Resource     = "${aws_lambda_function.api_handler.function_name}:${aws_lambda_alias.live.name}"
  }
}
Enter fullscreen mode Exit fullscreen mode

💰 Real-World Cost Comparison

Let’s say you have an API with these characteristics:

  • Traffic: 50k requests/day during business hours (10 hours), 5k requests overnight
  • Function: 512MB, 200ms average execution, 400ms cold start
  • Cold start rate: 20% without provisioned concurrency

Without Provisioned Concurrency

Compute cost: 55k × 0.2s × 0.5GB × $0.0000166667 = $0.92/day
Request cost: 55k × $0.0000002 = $0.01/day
Cold start penalty: 55k × 0.2 × 0.4s × 0.5GB × $0.0000166667 = $0.37/day

Total: ~$40/month
Enter fullscreen mode Exit fullscreen mode

With Always-On Provisioned Concurrency (5 instances)

Provisioned cost: 5 × 0.5GB × 86400s × $0.0000041667 = $0.90/day
Compute cost: 55k × 0.2s × 0.5GB × $0.0000166667 = $0.92/day
Request cost: 55k × $0.0000002 = $0.01/day

Total: ~$56/month
Enter fullscreen mode Exit fullscreen mode

With Scheduled Provisioned Concurrency (5 instances @ 10hrs, 1 instance @ 14hrs)

Provisioned cost: (5 × 0.5 × 36000) + (1 × 0.5 × 50400) × $0.0000041667 = $0.48/day
Compute cost: $0.92/day
Request cost: $0.01/day

Total: ~$43/month
Enter fullscreen mode Exit fullscreen mode

Savings: $13/month vs always-on, while eliminating cold starts during peak hours. Scale this across 20 functions and you’re saving $3,120/year.

🎓 Best Practices Checklist

Always use an alias - Provisioned concurrency requires versioning
Start small - Begin with min_capacity = 1 or 2, monitor utilization
Set cooldown periods - Prevent thrashing (scale_in_cooldown = 300s minimum)
Match workload patterns - Use scheduled actions for predictable traffic
Monitor utilization - Alert when average utilization < 40% (you’re wasting money)
Test scaling behavior - Simulate load spikes in staging first
Version carefully - Updating function code requires reprovisioning (brief downtime)

⚠️ Common Gotchas

1. The “Insufficient Data” Trap
When scaling down to zero, CloudWatch alarms can get stuck in “INSUFFICIENT_DATA” state. Solution: Keep min_capacity = 1 or use scheduled scaling to go to zero during known idle periods.

2. Deployment Delays
Updating a function with provisioned concurrency takes 2-3 minutes to reprovision. For CI/CD, either:

  • Accept the delay
  • Use blue/green deployments with two aliases
  • Temporarily disable provisioned concurrency during deploys

3. Region Limits
Default provisioned concurrency limit is 100 instances per account per region. Request increases through AWS Support if needed.

🚀 Quick Start Commands

# Initialize Terraform
terraform init

# Preview changes
terraform plan

# Deploy
terraform apply

# Check provisioned concurrency status
aws lambda get-provisioned-concurrency-config \
  --function-name api-handler \
  --qualifier live

# Monitor scaling activity
aws application-autoscaling describe-scaling-activities \
  --service-namespace lambda \
  --resource-id "function:api-handler:live"
Enter fullscreen mode Exit fullscreen mode

🎯 When to Use Provisioned Concurrency

Great for:

  • Customer-facing APIs with strict latency SLAs (<100ms p99)
  • Functions with >2s cold starts (Java, .NET, large dependencies)
  • Predictable, high-volume traffic patterns
  • Functions processing >100k requests/day

Skip it for:

  • Internal tools and batch processing
  • Low-traffic endpoints (<10k requests/day)
  • Functions with fast cold starts (<200ms)
  • Unpredictable, sporadic workloads

📚 Complete Example Repository

Here’s a complete working example structure:

lambda-provisioned-concurrency/
├── main.tf
├── provisioned_concurrency.tf
├── scheduled_scaling.tf
├── iam.tf
├── monitoring.tf
├── variables.tf
├── outputs.tf
└── lambda_function/
    ├── index.js
    └── package.json
Enter fullscreen mode Exit fullscreen mode

variables.tf:

variable "function_name" {
  description = "Lambda function name"
  type        = string
  default     = "api-handler"
}

variable "min_capacity" {
  description = "Minimum provisioned concurrency instances"
  type        = number
  default     = 2
}

variable "max_capacity" {
  description = "Maximum provisioned concurrency instances"
  type        = number
  default     = 10
}

variable "target_utilization" {
  description = "Target utilization percentage for auto scaling"
  type        = number
  default     = 70
}
Enter fullscreen mode Exit fullscreen mode

outputs.tf:

output "function_arn" {
  value = aws_lambda_function.api_handler.arn
}

output "function_url" {
  value = aws_lambda_alias.live.invoke_arn
}

output "estimated_monthly_cost" {
  value = "Min: $${var.min_capacity * 0.5 * 730 * 3600 * 0.0000041667}, Max: $${var.max_capacity * 0.5 * 730 * 3600 * 0.0000041667}"
}
Enter fullscreen mode Exit fullscreen mode

💡 Final Thoughts

Provisioned concurrency isn’t an all-or-nothing decision. The key is using it strategically:

  1. Calculate your break-even point
  2. Implement dynamic scaling with Terraform
  3. Monitor utilization and costs
  4. Optimize based on real data

With this approach, you can eliminate cold starts during peak hours while avoiding the $500+ monthly surprise on functions that only need it 20% of the time.

Remember: The goal isn’t zero cold starts - it’s optimal cost for your latency requirements.


Have you implemented provisioned concurrency? What’s your biggest cost optimization win with Lambda? Share in the comments! 💬

Found this helpful? Follow me for more AWS cost optimization tips with Terraform! 🚀

Top comments (0)