I Built an AI AWS Cost Detective That Found $900/Year in Waste — Here's How

#devops #aws #cloud #ai

The Problem

AWS Cost Explorer shows you data. It doesn't tell you what to do about it. I was paying $127/month and knew I was wasting money but couldn't quickly identify where.

What the AI Found

Running the tool against my own account uncovered:

EC2 Waste: A t3.small running 24/7 — used maybe 2 hours a day for testing. That's $45/month for 22 hours of idle compute every single day.
EBS Volumes: Three EBS volumes still attached to stopped instances. No data being written, no instance using them. $8/month evaporating for nothing.
NAT Gateway: A NAT Gateway from an old VPC setup I'd completely forgotten. Nothing routing through it. $12/month for a network door with no traffic.
Snapshots: Automated snapshots from an RDS instance I deleted months ago. The database was gone but the snapshots kept accumulating — $10/month.

Total: $75/month = $900/year

How It Works

The tool chains three things together: boto3 fetches your AWS costs and resource counts, Python shapes the data, and Ollama (local Llama 3.2) turns it into actionable recommendations.

AWS Cost Explorer API  →  Python (boto3)  →  Ollama  →  Structured report
     (billing data)        (resource counts)   (local LLM)

First it pulls 30 days of costs grouped by service, then counts your live resources (EC2 instances, EBS volumes, S3 buckets, RDS databases, Lambda functions). Both datasets go into the AI prompt together — because cost numbers without resource context give you vague answers.

def analyze_costs(self, services: List[Dict], resources: Dict) -> str:
    total_cost = sum(s['cost'] for s in services)
    top_services = services[:5]

    services_text = "\n".join([
        f"- {s['service']}: ${s['cost']:.2f}"
        for s in top_services
    ])

    resources_text = "\n".join([
        f"- {k.replace('_', ' ').title()}: {v}"
        for k, v in resources.items()
    ])

    prompt = f"""You are an AWS cost optimization expert.

COST SUMMARY (Last 30 Days):
Total Cost: ${total_cost:.2f}

Top Services by Cost:
{services_text}

Resources Currently Running:
{resources_text}

Provide recommendations in this format:

**COST ANALYSIS:**
**HIDDEN COSTS DETECTED:**
**OPTIMIZATION RECOMMENDATIONS:**
**ESTIMATED SAVINGS:**
"""
    return self.ask_ollama(prompt)

The structured output format in the prompt is what makes the response actually parseable and useful — not just a wall of text.

Setting Up Read-Only AWS Access

This is important — the tool only needs read permissions. Here's the minimal IAM policy:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": [
      "ce:GetCostAndUsage",
      "ec2:Describe*",
      "s3:ListAllMyBuckets",
      "rds:Describe*",
      "lambda:List*"
    ],
    "Resource": "*"
  }]
}

Create a dedicated IAM user (cost-detective), attach this policy, generate an access key, and run aws configure. The tool never writes anything to your account — worst case it reads data you didn't expect it to.

What Surprised Me

Two things I didn't expect:

The Cost Explorer API is completely free. I assumed querying billing data would itself have a cost. It doesn't. Zero charges for API calls to Cost Explorer.

AWS returns cost values as Python Decimal, not float. This one is a quiet killer — json.dumps() will crash when you try to save a report because the standard JSON encoder doesn't handle Decimal. Had to write a custom encoder:

class DecimalEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, Decimal):
            return float(obj)
        return super(DecimalEncoder, self).default(obj)

No error message tells you why it failed. You just get a TypeError and have to figure out where the Decimal came from.

Limitations

Being honest about what this doesn't handle well:

Data is 24–48 hours delayed. Cost Explorer isn't real-time. If you just spun up a resource today, it won't show up yet.
Single region by default. Resource counts only scan us-east-1. Multi-region setups need extra config.
Doesn't catch everything. Very small charges (under $0.01) are filtered out. Some hidden costs — like cross-AZ data transfer — aren't obvious from the Cost Explorer groupings.
AI recommendations need verification. The tool identifies patterns and suggests actions, but you should always sanity-check before terminating anything. I almost deleted an EBS volume that was actually still in use by a snapshot restore.

Try It

GitHub: https://github.com/ThinkWithOps
Demo video: https://youtu.be/rg1Vnjjt9xk

git clone https://github.com/ThinkWithOps/ai-devops-projects
cd ai-devops-projects/03-ai-aws-cost-detective
pip install -r requirements.txt
python src/aws_cost_detective.py

# Save report to JSON
python src/aws_cost_detective.py --output report.json

Project 3 in my AI+DevOps series. Project 4 is an AI GitHub Actions Auto-Healer — it reads failing CI logs and suggests fixes. Link in my profile.

What's the most unexpected thing hiding in your AWS bill? I'd have never noticed that NAT Gateway without the tool surfacing it. Drop yours in the comments.