How to Control CloudWatch Logs Costs on ECS

#aws #ecs #cloudwatch #devops

Your AWS bill shows CloudWatch at $400 this month. You have 15 ECS services logging INFO-level with retention set to Never Expire. You didn't configure this — ECS did it by default.

The fix takes 4 steps.

Why CloudWatch silently eats your bill

ECS uses the awslogs driver by default. Every container's stdout goes to CloudWatch. ECS creates log groups with no retention policy — Never Expire — so logs accumulate forever.

Here's what that looks like for a typical 15-service fleet:

Cost component	15 services, INFO level, 3 GB/day
Ingestion ($0.50/GB)	$45/mo
Storage ($0.03/GB/month)	$54/mo (grows every month)
Insights queries ($0.005/GB)	$36/mo (5 queries/day)
Total	$135/mo

Three separate charges on the same data. Ingestion is pay-what-you-send. Storage is pay-what-you-keep. Insights is pay-what-you-scan. ECS defaults mean you pay all three — with no upper bound — on every log line your application prints.

Download the skill file

There's a skill file at fortem.dev that an AI agent (Claude Code, OpenCode, Codex) can run for you. It scans your CloudWatch log groups, finds the ones bleeding money, and optionally fixes them — all read-only by default, changes only with your confirmation.

Get the CloudWatch Cost Optimizer skill file → fortem.dev/blog/cloudwatch-costs-ecs

The agent runs locally against your AWS account. No data leaves your machine.

Step 1 — Set retention on every log group (90% of the impact)

One Terraform line — retention_in_days = 30 — cuts storage cost by 60-80%. This single change has the biggest impact of any step in this guide.

Find groups without retention:

aws logs describe-log-groups \
    --query 'logGroups[?retentionInDays==`null`].[logGroupName,storedBytes]' \
    --output table

Set 30-day retention:

aws logs put-retention-policy \
    --log-group-name "/aws/ecs/your-service" \
    --retention-in-days 30

Terraform:

resource "aws_cloudwatch_log_group" "ecs_service" {
  name              = "/ecs/${var.env_prefix}-${var.service_name}"
  retention_in_days = 30  # was null (Never Expire)
}

Recommended retention by environment:

Environment	Retention	Why
Production	90 days	Compliance + incident investigation
Staging	30 days	Recent deploy history
Dev / QA	7 days	Active development only
CI/CD / Build	1 day	Don't store ephemeral build logs

Step 2 — Filter by log level (5% impact, but easy)

Spring Boot, Express, Django — they all default to INFO. In practice, an INFO-level web server generates one to two orders of magnitude more log volume than the same server at WARN. Switch production to WARN.

# Find which services generate the most log volume (last 7 days)
aws logs start-query \
    --log-group-name "/aws/ecs/prod-api" \
    --start-time $(date -v-7d +%s) \
    --end-time $(date +%s) \
    --query-string "stats count() by @logStream | sort count desc | limit 10"

# Set log level by framework:
# Spring Boot: logging.level.root=WARN in application.properties
# Express: LOG_LEVEL=warn
# Django: LOGGING['root']['level'] = 'WARNING'

"CloudWatch Logs charges $0.50 per GB ingested, $0.03 per GB stored per month, and $0.005 per GB scanned by Logs Insights queries — beyond the 5 GB/month free tier." — aws.amazon.com/cloudwatch/pricing, verified June 2026

Step 3 — Use Insights instead of streaming everything

Streaming everything to Datadog adds an indexing cost on top of ingestion. Once you index for search — which is the point — the combined cost per GB is several times CloudWatch's ingest + storage combined.

For debugging, use CloudWatch Logs Insights instead — query on demand at $0.005/GB scanned, not per GB indexed.

# Find errors in the last hour
aws logs start-query \
    --log-group-name "/aws/ecs/prod-api" \
    --start-time $(date -v-1H +%s) \
    --end-time $(date +%s) \
    --query-string "fields @timestamp, @message | filter @message like /ERROR/ | sort @timestamp desc | limit 50"

# For compliance: send to S3 instead (cheap, durable)
aws logs put-subscription-filter \
    --log-group-name "/aws/ecs/prod-api" \
    --filter-name "AllToS3" \
    --filter-pattern "" \
    --destination-arn "arn:aws:firehose:..."

Step 4 — Find which service costs the most

You know CloudWatch is $400. You don't know which of your 15 services is responsible for $300 of it. This Insights query tells you in 5 minutes.

aws logs start-query \
    --log-group-name "/aws/ecs/prod-api" \
    --start-time $(date -v-7d +%s) \
    --end-time $(date +%s) \
    --query-string "stats sum(strlen(@message)) as totalBytes by @logStream | sort totalBytes desc | limit 10"

Once you know the top offender, check three things: (1) log level, (2) whether it logs stack traces on every request, (3) whether it logs health check pings. Those three fix 90% of high-volume log problems.

FAQ

Will reducing log retention affect my ability to debug?
For production: 90 days covers both incident response and compliance. For dev/staging: 7 days — if you haven't debugged it in a week, the logs won't help. You can always increase retention temporarily during an incident.

Can I use a different log driver instead of CloudWatch?
Yes — ECS supports awsfirelens (20+ destinations), fluentd, and Splunk. But switching the driver doesn't reduce costs — it moves them. CloudWatch with retention set and log-level filtering is often the cheapest option because you're already in the AWS ecosystem.

How do I estimate my CloudWatch costs before the bill arrives?
CloudWatch Metrics → Logs → IncomingBytes and StoredBytes. Multiply IncomingBytes by $0.50/GB for ingestion. Multiply StoredBytes by $0.03/GB for storage. Most importantly: count how many log groups have retentionInDays = null (Never Expire) — those are silently accumulating.

Can I set retention globally across all log groups?
No single command sets retention for all groups. Use the CLI loop approach above, or add retention_in_days to every aws_cloudwatch_log_group resource in Terraform. AWS does not offer a global retention default.

Does CloudWatch Logs Insights query cost depend on retention?
No — Insights costs $0.005 per GB scanned regardless of data age. Shorter retention means less data to scan, so queries cost proportionally less. A 30-day log group has 1/12th the data of a 365-day group.

Full article with downloadable skill file: fortem.dev/blog/cloudwatch-costs-ecs