Automate AWS Cost Reports with Python, boto3, and SES

#aws #python #devops

Originally published on kuryzhev.cloud

Your AWS bill already increased by the time you open Cost Explorer — here's how to make Python tell you before finance does. Last month a team I was working with got a Slack message from their finance lead: "Why did AWS spend jump 40% this month?" Nobody in engineering knew. The charges had been accumulating for three weeks. That's the exact problem AWS cost report Python automation solves: shift visibility left, from invoice to inbox.

Symptoms: Your AWS Bill Surprises You Every Month

The pattern is always the same. Cost Allocation Tags are enabled. Someone set up Cost Explorer once during an audit. There's a vague plan to "check it regularly." And then the PDF invoice arrives, after the charges are already incurred, and the post-mortem begins.

Here are the specific symptoms that tell me a team has no real cost visibility:

Finance sends a Slack message asking about a bill spike — and no engineer can answer immediately
Cost Explorer is enabled but nobody opens it between billing cycles
The only "alerting" is a Budget alarm set to 100% of last month's spend — which fires too late to act
When someone does investigate, they spend 20 minutes clicking through the Cost Explorer UI to find the offending service
There is no audit trail of what the cost breakdown looked like on any given day

None of this is unique to small teams. I've seen the same pattern at organizations running six-figure monthly AWS spend. The tooling exists. The API is there. The problem is that nobody wired it up with a scheduled, automated delivery mechanism. A report that lives in a console is not a report — it's a manual task waiting to be skipped.

The second symptom worth calling out: teams that do attempt automation often get halfway through parsing the boto3 response, hit the nested ResultsByTime structure, and abandon the effort. The script sits in a local repo, uncommitted, never scheduled.

Root Cause: Manual Cost Visibility Doesn't Scale

The root cause isn't laziness. It's that the AWS Cost Explorer API has several non-obvious activation and usage requirements that create friction at every step.

Activation delay. Cost Explorer must be manually enabled in the AWS Console under Billing → Cost Explorer → Enable. First data appears 24–48 hours after activation. If you call get_cost_and_usage() before that window, you get a misleading error: DataUnavailableException: Data is not available. Please try to adjust the time period. Most engineers interpret this as a permissions issue and spend an hour debugging IAM before realizing the API just isn't active yet.

Response structure complexity. The get_cost_and_usage() response wraps data inside ResultsByTime[*].Groups[*].Keys. If a service had zero spend in the queried period, Groups is an empty list — not absent, not null, just empty. Code that assumes at least one group will cause index errors in production on quiet billing periods like the first of the month.

No execution layer. Even engineers who write a working script rarely schedule it. EventBridge rules and Lambda functions require a small amount of Terraform or CLI wiring that nobody gets around to. The script runs once manually, produces correct output, and then gets forgotten. Two months later the team is back to checking Cost Explorer by hand.

API cost itself. Each get_cost_and_usage() call costs $0.01. Running it hourly on a 30-day window costs ~$7.20/month. Daily execution costs $0.31/month. I've seen well-intentioned automation scripts scheduled every 5 minutes by engineers who didn't read the pricing page. Watch out for this — it's a real gotcha that adds cost to your cost reporting.

For more AWS automation patterns we use in production, see the kuryzhev.cloud DevOps runbook archive.

Fix #1: Activate Cost Explorer API and Query It Correctly with boto3

Before writing a single line of Python, activate Cost Explorer in the console and wait 24 hours. Then install a pinned boto3 version for your Lambda layer:

pip install boto3==1.34.69  # reproducible Lambda layer build

The following Lambda handler queries month-to-date costs grouped by AWS service. Read the inline comments — several of these lines exist specifically to avoid the common failure modes described above.

# lambda_function.py
# AWS Cost Explorer → HTML Email Report via SES
# Runtime: python3.12 | boto3 1.34.x
# IAM requires: ce:GetCostAndUsage, ses:SendEmail, s3:PutObject

import boto3
import json
import os
from datetime import datetime, timedelta

# --- Config (use Lambda env vars, never hardcode) ---
SENDER_EMAIL = os.environ["SENDER_EMAIL"]       # must be SES-verified
RECIPIENT_EMAIL = os.environ["RECIPIENT_EMAIL"]
REPORT_BUCKET = os.environ["REPORT_BUCKET"]     # S3 bucket for audit trail
AWS_REGION = os.environ.get("AWS_REGION", "us-east-1")

ce_client = boto3.client("ce", region_name="us-east-1")  # Cost Explorer only in us-east-1
ses_client = boto3.client("ses", region_name=AWS_REGION)
s3_client = boto3.client("s3", region_name=AWS_REGION)


def get_monthly_costs() -> dict:
    """Query Cost Explorer for current month-to-date costs grouped by SERVICE."""
    today = datetime.today()
    # End date is exclusive — use today's date; data through yesterday is returned
    end_date = today.strftime("%Y-%m-%d")
    start_date = today.replace(day=1).strftime("%Y-%m-%d")  # first of current month

    response = ce_client.get_cost_and_usage(
        TimePeriod={"Start": start_date, "End": end_date},
        Granularity="MONTHLY",
        Metrics=["UnblendedCost"],  # NOT BlendedCost — blended averages RI rates, misleading
        GroupBy=[{"Type": "DIMENSION", "Key": "SERVICE"}],
        # WARNING: do NOT add a second GroupBy for TAG here — raises ValidationException
    )

    costs = {}
    for result in response["ResultsByTime"]:
        for group in result.get("Groups", []):  # .get() guards against empty periods
            service = group["Keys"][0]
            amount = round(float(group["Metrics"]["UnblendedCost"]["Amount"]), 2)
            if amount > 0:  # skip zero-cost services to keep report clean
                costs[service] = amount

    # Sort descending by cost — top drivers first
    return dict(sorted(costs.items(), key=lambda x: x[1], reverse=True))


def build_html_report(costs: dict, report_date: str) -> str:
    """Build an HTML email table from the costs dict."""
    total = round(sum(costs.values()), 2)
    rows = "".join(
        f"<tr><td>{svc}</td><td>${amt:.2f}</td></tr>"
        for svc, amt in costs.items()
    )
    return f"""
    <html><body>
    <h2>AWS Cost Report — {report_date}</h2>
    <table border="1" cellpadding="6" style="border-collapse:collapse;">
      <tr><th>Service</th><th>Cost (USD)</th></tr>
      {rows}
      <tr><td><strong>TOTAL</strong></td><td><strong>${total:.2f}</strong></td></tr>
    </table>
    <p>Generated by cost-reporter Lambda | kuryzhev.cloud</p>
    </body></html>
    """


def send_email(html_body: str, report_date: str) -> None:
    """Send the HTML report via SES."""
    ses_client.send_email(
        Source=SENDER_EMAIL,
        Destination={"ToAddresses": [RECIPIENT_EMAIL]},
        Message={
            "Subject": {"Data": f"AWS Cost Report {report_date}"},
            "Body": {
                "Text": {"Data": "Open in HTML-capable email client."},
                "Html": {"Data": html_body},
            },
        },
    )


def save_to_s3(costs: dict, report_date: str) -> None:
    """Persist raw cost data to S3 for audit trail."""
    key = f"cost-reports/{report_date}.json"
    s3_client.put_object(
        Bucket=REPORT_BUCKET,
        Key=key,
        Body=json.dumps(costs, indent=2),
        ContentType="application/json",
    )


def lambda_handler(event, context):
    report_date = datetime.today().strftime("%Y-%m-%d")
    costs = get_monthly_costs()
    html = build_html_report(costs, report_date)
    send_email(html, report_date)
    save_to_s3(costs, report_date)
    return {"statusCode": 200, "body": f"Report sent for {report_date}"}

Two things I want to highlight here. First, ce_client is always initialized with region_name="us-east-1" regardless of where your Lambda runs — Cost Explorer is a global service only accessible through that region endpoint. Second, the result.get("Groups", []) pattern is not optional. On the first day of the month, or in accounts with no spend in a queried period, Groups will be an empty list. Without the .get() fallback, you get a KeyError in production on the 1st of every month.

Fix #2: Parse the Response and Build a Readable Report Structure

The raw API response returns cost amounts as strings, not floats. The Lambda API returns "0.0000012" for a Lambda invocation cost. If you skip the float() cast and try to sort or format directly, you get a TypeError that surfaces only when a low-cost service appears in the results.

The build_html_report() function above handles this with round(float(amount), 2). I use round() to two decimal places rather than string formatting alone, because I want the costs dict to contain actual numeric values for the S3 JSON dump — useful if you later feed this data into a dashboard or Slack message.

A note on templating: I deliberately avoided Jinja2 here. For a single-file Lambda that generates one HTML table, Jinja2 adds a dependency layer with zero benefit. Python's str.join() on a list comprehension of <tr> strings is readable, testable, and requires no extra packaging. I stopped using Jinja2 in Lambda functions after spending 45 minutes debugging a layer packaging issue that turned out to be a Jinja2 version conflict. For anything more complex — multi-section reports, conditional blocks — reconsider. For a cost table, it's overkill.

Sort order matters for the email recipient. The sorted(..., reverse=True) call ensures EC2 and RDS appear at the top, not alphabetically buried below CloudTrail and Config. Executive readers scan the first three rows. Make those rows count.

Fix #3: Send the Report via SES with HTML Formatting

SES has two operational gotchas that will silently break your delivery if you miss them.

Sandbox mode. By default, SES accounts are in sandbox mode. In sandbox mode, SES silently drops emails sent to unverified recipient addresses — no error, no bounce, no log entry. The send_email() call returns HTTP 200 and you have no idea the email never arrived. Request production access via the AWS console under SES → Account dashboard → Request production access before testing with real recipients. See the official SES production access documentation for the process.

Missing Source key. The ses.send_email() call requires Source, Destination, and Message as top-level keys. If Source is missing or the email address is not SES-verified, you get: InvalidParameterValue: Missing required header 'From'. This error message is confusing because you're not setting headers directly — it's SES's way of saying the Source address failed verification.

Now wire up the EventBridge schedule and IAM policy with Terraform. This is the piece most automation attempts skip, and it's why scripts run once and get forgotten.

# eventbridge_lambda.tf
# Terraform: EventBridge rule to trigger Lambda daily at 08:00 UTC
# Requires: aws_lambda_function.cost_reporter already defined

resource "aws_cloudwatch_event_rule" "daily_cost_report" {
  name                = "daily-cost-report"
  description         = "Trigger cost reporter Lambda every day at 08:00 UTC"
  schedule_expression = "cron(0 8 * * ? *)"  # AWS cron uses ? not * for dow when dom is set
}

resource "aws_cloudwatch_event_target" "cost_reporter_target" {
  rule      = aws_cloudwatch_event_rule.daily_cost_report.name
  target_id = "CostReporterLambda"
  arn       = aws_lambda_function.cost_reporter.arn
}

resource "aws_lambda_permission" "allow_eventbridge" {
  statement_id  = "AllowEventBridgeInvoke"
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.cost_reporter.function_name
  principal     = "events.amazonaws.com"
  source_arn    = aws_cloudwatch_event_rule.daily_cost_report.arn
}

# Least-privilege IAM policy for Lambda execution role
resource "aws_iam_role_policy" "cost_reporter_policy" {
  name = "cost-reporter-policy"
  role = aws_iam_role.cost_reporter_exec.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect   = "Allow"
        Action   = ["ce:GetCostAndUsage"]
        Resource = "*"  # Cost Explorer does not support resource-level permissions
      },
      {
        Effect   = "Allow"
        Action   = ["ses:SendEmail"]
        Resource = "*"
      },
      {
        Effect   = "Allow"
        Action   = ["s3:PutObject"]
        Resource = "arn:aws:s3:::${var.report_bucket}/cost-reports/*"  # scoped to prefix
      }
    ]
  })
}

Watch out for the EventBridge cron syntax. AWS uses ? instead of * for the day-of-week field when day-of-month is already specified, and vice versa. cron(0 8 * * * *) will fail with a validation error. The correct form is cron(0 8 * * ? *). This trips up every engineer the first time. See the EventBridge cron expression documentation for the full syntax reference.

Also set your Lambda timeout to at least 30 seconds. The default is 3 seconds. Cost Explorer API p95 latency is 4–6 seconds. Your function will time out silently on the first real invocation if you leave the default in place.

Prevention: Make Cost Reporting a Reliable, Auditable System

Getting the script working is step one. Making it reliable over months without manual intervention is the harder part. Here's what I add to every cost reporting Lambda before considering it production-ready.

CloudWatch alarm on Lambda errors. Create an alarm on the Errors metric for the function with threshold > 0 for one evaluation period. If SES throttles, IAM permissions drift after a role rotation, or Cost Explorer returns a transient error, you want to know within minutes — not after a week of missed reports when someone notices the inbox has gone quiet.

S3 audit trail with lifecycle policy. The save_to_s3() function stores each day's raw cost JSON at cost-reports/YYYY-MM-DD.json. At ~5KB per file, 365 files/year costs essentially nothing in S3 Standard. Add a lifecycle rule to expire objects after 90 days. Do not enable S3 Intelligent-Tiering on these objects — the per-object monitoring fee for Intelligent-Tiering exceeds the storage cost of a 5KB file. That's a real gotcha I've hit on similar audit trail patterns.

Never use environment variables for AWS credentials in Lambda. Use the IAM execution role exclusively. Environment variable secrets are visible to anyone with lambda:GetFunctionConfiguration permission — which is a surprisingly common IAM grant in developer accounts. The execution role approach is both more secure and eliminates credential rotation as a failure mode.

Pin your runtime. Use python3.12. Avoid python3.8, which reached EOL. Unpinned runtimes drift when AWS deprecates versions and forces migration — you want to control that on your schedule, not AWS's.

Handler path must match exactly. If your file is lambda_function.py, the handler value in the Lambda configuration must be lambda_function.lambda_handler. A mismatch causes Runtime.HandlerNotFound at invocation time. It's obvious when it happens, but it wastes five minutes every time someone renames a file without updating the handler config.

AWS cost report Python automation done right means the engineering team knows about cost spikes before the finance team does — and has the data to explain exactly which service caused them. This Lambda, scheduled daily via EventBridge, delivers that visibility with under 100 lines of Python and a Terraform block that takes 10 minutes to apply.