CloudWatch logs set to “Never Expire” can cost thousands. Here’s how to automate retention policies with Terraform and slash your logging costs by 90%.
Pop quiz: When was the last time you checked your CloudWatch Logs retention settings?
If you’re like 80% of AWS users, the answer is “never” — because the default retention is “Never Expire.”
Here’s what that means for your wallet:
Month 1: $10 in logs
Month 6: $60 in logs
Month 12: $120 in logs
Month 24: $240 in logs
Your logs are growing indefinitely. And you’re paying $0.03/GB per month for storage you probably never look at.
Let me show you how to fix this in 10 minutes with Terraform and save 80-90% on CloudWatch costs.
💸 The Hidden Cost of “Never Expire”
CloudWatch Logs pricing is deceptively simple:
- Ingestion: $0.50 per GB
- Storage: $0.03 per GB per month
- Analysis: $0.005 per GB scanned
A typical production app generates 10-50 GB of logs per month. Let’s say you’re at 20 GB/month:
Year 1 accumulation:
Month 1: 20 GB × $0.03 = $0.60
Month 2: 40 GB × $0.03 = $1.20
Month 3: 60 GB × $0.03 = $1.80
...
Month 12: 240 GB × $0.03 = $7.20
Total Year 1: $46.80 (storage alone)
Year 2:
Starting: 240 GB
Ending: 480 GB × $0.03 = $14.40/month
Total Year 2: $164.40
Year 3: $285.60
Year 4: $410.40
After 4 years, you’re paying $35/month just to store logs you’ll never read. Multiply this by 50 log groups and you’re at $1,750/month.
🎯 The Solution: Smart Retention Policies
The fix is ridiculously simple: Set retention policies based on log importance.
Here’s a sensible default strategy:
| Log Type | Retention | Reasoning |
|---|---|---|
| Production errors | 90 days | Compliance & debugging |
| Application logs | 30 days | Recent troubleshooting |
| Access logs | 14 days | Security reviews |
| Debug/verbose logs | 7 days | Active development only |
| Lambda logs | 14 days | Quick investigations |
🛠️ Terraform Implementation
Basic Retention Setup
# cloudwatch_logs.tf
# Production application logs
resource "aws_cloudwatch_log_group" "app_production" {
name = "/aws/application/production"
retention_in_days = 30
tags = {
Environment = "production"
Application = "web-app"
}
}
# Lambda function logs
resource "aws_cloudwatch_log_group" "lambda_api" {
name = "/aws/lambda/api-handler"
retention_in_days = 14
tags = {
Environment = "production"
Function = "api-handler"
}
}
# Development logs (shorter retention)
resource "aws_cloudwatch_log_group" "app_dev" {
name = "/aws/application/dev"
retention_in_days = 7
tags = {
Environment = "dev"
}
}
Bulk Retention Manager Module
For existing log groups, here’s a module that sets retention across all groups:
# modules/cloudwatch-retention-manager/main.tf
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
variable "default_retention_days" {
description = "Default retention in days for all log groups"
type = number
default = 30
}
variable "retention_rules" {
description = "Map of log group patterns to retention days"
type = map(object({
pattern = string
retention_days = number
}))
default = {
production = {
pattern = "/aws/*/production/*"
retention_days = 90
}
lambda = {
pattern = "/aws/lambda/*"
retention_days = 14
}
dev = {
pattern = "/aws/*/dev/*"
retention_days = 7
}
}
}
variable "exclude_patterns" {
description = "Log groups matching these patterns won't be modified"
type = list(string)
default = ["/aws/rds/*", "/aws/audit/*"] # Keep RDS and audit logs longer
}
# Data source to get all log groups
data "aws_cloudwatch_log_groups" "all" {}
locals {
# Filter log groups based on patterns and exclusions
log_groups_to_manage = [
for lg in data.aws_cloudwatch_log_groups.all.log_group_names :
lg if !contains([for pattern in var.exclude_patterns : can(regex(pattern, lg))], true)
]
# Map log groups to retention days based on rules
retention_map = {
for lg in local.log_groups_to_manage :
lg => try(
[for k, v in var.retention_rules : v.retention_days if can(regex(v.pattern, lg))][0],
var.default_retention_days
)
}
}
# Apply retention policy to each log group
resource "aws_cloudwatch_log_group" "managed" {
for_each = local.retention_map
name = each.key
retention_in_days = each.value
# Prevent recreation of existing log groups
lifecycle {
prevent_destroy = true
}
}
# Output savings estimation
output "estimated_savings" {
value = {
log_groups_managed = length(local.retention_map)
retention_policies = local.retention_map
message = "Retention policies applied. Check AWS Cost Explorer in 30 days to see savings."
}
}
Usage Example
# main.tf
module "cloudwatch_retention" {
source = "./modules/cloudwatch-retention-manager"
default_retention_days = 30
retention_rules = {
production_errors = {
pattern = "/aws/*/production/errors"
retention_days = 90
}
production_app = {
pattern = "/aws/*/production"
retention_days = 30
}
lambda = {
pattern = "/aws/lambda"
retention_days = 14
}
dev = {
pattern = "/dev/"
retention_days = 7
}
staging = {
pattern = "/staging/"
retention_days = 14
}
}
exclude_patterns = [
"/aws/rds/instance/production-db/audit", # Compliance requirement
"/aws/cloudtrail" # Keep CloudTrail longer
]
}
output "retention_summary" {
value = module.cloudwatch_retention.estimated_savings
}
Apply and Monitor
# Preview changes
terraform plan
# Apply retention policies
terraform apply
# Output example:
# log_groups_managed = 47
# Retention policies applied to 47 log groups
🔍 Find Your Biggest Offenders
Before applying retention policies, identify which log groups are costing you the most:
# List all log groups with their sizes
aws logs describe-log-groups \
--query 'logGroups[?retentionInDays==`null`].[logGroupName,storedBytes]' \
--output table
# Calculate monthly cost
aws logs describe-log-groups \
--query 'logGroups[?retentionInDays==`null`].storedBytes' \
--output json | jq '[.[] / 1073741824] | add * 0.03'
Add this as a Terraform data source:
# audit.tf
data "external" "cloudwatch_costs" {
program = ["bash", "-c", <<-EOT
aws logs describe-log-groups \
--query 'logGroups[?retentionInDays==null]' \
--output json | jq '{
count: (. | length | tostring),
total_gb: ([.[].storedBytes | select(. != null)] | add / 1073741824 | tostring),
monthly_cost: ([.[].storedBytes | select(. != null)] | add / 1073741824 * 0.03 | tostring)
}'
EOT
]
}
output "current_cloudwatch_waste" {
value = {
log_groups_without_retention = data.external.cloudwatch_costs.result.count
total_storage_gb = data.external.cloudwatch_costs.result.total_gb
estimated_monthly_cost = "$${data.external.cloudwatch_costs.result.monthly_cost}"
}
}
📊 Advanced: Dynamic Retention Based on Environment
# dynamic_retention.tf
locals {
environments = {
production = 90
staging = 30
dev = 7
}
log_group_configs = {
for env, retention in local.environments : env => {
api_logs = {
name = "/aws/api/${env}"
retention = retention
}
app_logs = {
name = "/aws/application/${env}"
retention = retention
}
worker_logs = {
name = "/aws/worker/${env}"
retention = retention
}
}
}
# Flatten into individual log groups
all_log_groups = merge([
for env, configs in local.log_group_configs : {
for service, config in configs :
"${env}-${service}" => config
}
]...)
}
resource "aws_cloudwatch_log_group" "dynamic" {
for_each = local.all_log_groups
name = each.value.name
retention_in_days = each.value.retention
tags = {
ManagedBy = "terraform"
Environment = split("-", each.key)[0]
}
}
💰 Real Savings Example
Before retention policies:
- 50 log groups
- Average 5 GB per group after 1 year
- Total: 250 GB × $0.03 = $7.50/month
- After 3 years: 750 GB × $0.03 = $22.50/month
After implementing 30-day retention:
- 50 log groups
- Average 1.5 GB per group (30 days of data)
- Total: 75 GB × $0.03 = $2.25/month
- Savings: $5.25/month → $63/year
- After 3 years: Still $2.25/month (savings of $242/year)
For a mid-size company with 200 log groups:
- Savings: ~$1,000/year 🎉
⚠️ Important Considerations
1. Compliance Requirements
Some logs must be kept for regulatory reasons:
# compliance.tf
resource "aws_cloudwatch_log_group" "audit_logs" {
name = "/aws/audit/production"
retention_in_days = 2555 # 7 years for SOX/HIPAA compliance
tags = {
Compliance = "required"
Retention = "7-years"
}
}
2. Lambda Log Groups Auto-Creation
Lambda creates log groups automatically. Prevent this:
resource "aws_lambda_function" "api" {
# ... other config ...
# Create log group BEFORE the Lambda function
depends_on = [aws_cloudwatch_log_group.lambda_api]
}
resource "aws_cloudwatch_log_group" "lambda_api" {
name = "/aws/lambda/${var.function_name}"
retention_in_days = 14
# Create this FIRST so Lambda doesn't create it without retention
}
3. Existing Data Isn’t Deleted Immediately
Setting retention doesn’t delete old data immediately. AWS cleans up expired logs eventually (within days).
🎓 Quick Implementation Checklist
✅ Audit current log groups - Find groups without retention
✅ Categorize by importance - Production vs dev vs debug
✅ Set retention policies - 7/14/30/90 days based on category
✅ Handle Lambda logs - Create log groups before functions
✅ Document compliance needs - Don’t auto-expire audit logs
✅ Monitor savings - Check Cost Explorer after 30 days
🚀 5-Minute Quick Start
# 1. Check your current waste
terraform init
terraform apply -target=data.external.cloudwatch_costs
# 2. Apply retention module
terraform apply
# 3. Verify in AWS Console
aws logs describe-log-groups \
--query 'logGroups[*].[logGroupName,retentionInDays]' \
--output table
# 4. Celebrate! 🎉
💡 Pro Tips
1. Start with dev/staging
Apply aggressive retention (7 days) to non-production first. Production can stay at 30-90 days.
2. Use log exports for long-term storage
If you need logs beyond retention period, export to S3 (much cheaper):
resource "aws_cloudwatch_log_subscription_filter" "export_to_s3" {
name = "export-old-logs"
log_group_name = aws_cloudwatch_log_group.app_production.name
filter_pattern = ""
destination_arn = aws_kinesis_firehose_delivery_stream.logs_to_s3.arn
}
S3 storage: $0.023/GB vs CloudWatch: $0.03/GB (23% cheaper + Glacier options)
3. Set up alerts for high ingestion
Catch runaway logging before it costs you:
resource "aws_cloudwatch_metric_alarm" "high_log_ingestion" {
alarm_name = "high-cloudwatch-ingestion"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 1
metric_name = "IncomingBytes"
namespace = "AWS/Logs"
period = 3600
statistic = "Sum"
threshold = 10737418240 # 10 GB per hour
alarm_description = "Alert when log ingestion exceeds 10GB/hour"
}
🎯 When This Makes the Biggest Impact
This optimization shines when you have:
- Many Lambda functions (each creates a log group)
- Multiple environments (dev/staging/prod all logging)
- Verbose application logging (debug logs in production 😱)
- Long-running workloads (logs accumulating for years)
- Microservices architecture (100+ services = 100+ log groups)
📈 Summary: Why This Matters
CloudWatch Logs retention is one of those “set it and forget it” optimizations:
✅ One-time setup - 10 minutes with Terraform
✅ Automatic savings - Every month, forever
✅ Zero operational impact - Logs you need are kept, old ones purged
✅ Scales with your infrastructure - More log groups = more savings
✅ Compound benefits - Savings grow over time as log accumulation stops
The math is simple: Stop paying to store logs you’ll never read.
Set retention policies today, thank yourself every month. 💰
How much are you spending on CloudWatch Logs? Run the audit script and share in the comments! 💬
Follow for more AWS cost optimization tips with Terraform! 🚀
Top comments (0)