DEV Community

Nat Thompson
Nat Thompson

Posted on

I Built an AI-Powered AWS Cost Optimizer — Here's How It Works

I'm an AWS consultant. My clients always ask the same question: "Why is my AWS bill so high?"

The answer is always the same: idle resources, oversized instances, and services nobody remembered to turn off. So I built a tool to find them automatically.

The Architecture

Sharktooth connects to a customer's AWS account via a cross-account IAM role — the same pattern used by Datadog, CloudHealth, and AWS's own tools.

Customer AWS Account          Sharktooth AWS Account
┌─────────────────┐          ┌──────────────────────┐
│                 │          │                      │
│  IAM Role ◄──────────────── STS AssumeRole        │
│  (read-only)    │          │   + ExternalId       │
│                 │          │                      │
│  Cost Explorer  │          │  Cost Analysis       │
│  CloudWatch     │────────► │  AI Recommendations  │
│  EC2 Describe   │          │  Dashboard           │
│  RDS Describe   │          │                      │
└─────────────────┘          └──────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Step 1: Connect (5 minutes)

The customer creates a read-only IAM role with this trust policy:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": { "AWS": "arn:aws:iam::SHARKTOOTH_ACCOUNT:root" },
    "Action": "sts:AssumeRole",
    "Condition": {
      "StringEquals": { "sts:ExternalId": "UNIQUE_PER_CUSTOMER" }
    }
  }]
}
Enter fullscreen mode Exit fullscreen mode

The ExternalId prevents confused deputy attacks — each customer gets a unique one.

Permissions requested:

ce:GetCostAndUsage
cloudwatch:GetMetricStatistics
ec2:DescribeInstances
ec2:DescribeInstanceTypes
rds:DescribeDBInstances
Enter fullscreen mode Exit fullscreen mode

That's it. No write permissions. No S3, Lambda, or IAM access.

Step 2: Pull Cost Data

Using STS AssumeRole, we get temporary credentials (1-hour expiry) and call the Cost Explorer API:

  • Monthly spend by service (last 6 months)
  • Daily spend trend (last 30 days)
  • Service breakdown with amounts

Step 3: Detect Idle Resources

For each running EC2 instance, we pull 7 days of CloudWatch CPUUtilization:

  • Average CPU < 5% = flagged as idle
  • Average CPU 5-20% = right-sizing candidate

For RDS, we check DatabaseConnections:

  • 0 connections over 7 days = flagged as idle

Step 4: AI Analysis

This is where it gets interesting. We feed everything into Claude (via AWS Bedrock):

  • Monthly spend total and service breakdown
  • 30-day daily trend
  • List of idle resources
  • Right-sizing candidates

The AI returns structured JSON:

[
  {
    "title": "Right-size or consolidate Lightsail instances",
    "description": "Lightsail represents 42.9% of total spend...",
    "estimatedMonthlySavings": 45.36,
    "priority": "high",
    "effort": "medium"
  }
]
Enter fullscreen mode Exit fullscreen mode

Cost per AI analysis: ~$0.005 (half a cent) using Haiku.

Results

On my own account ($352/month), it found $135/month in savings:

  • Consolidate Lightsail instances ($45/mo)
  • Remove unused ELB ($20/mo)
  • Audit WorkMail seats ($22/mo)
  • Clean up VPC resources ($18/mo)
  • Plus 6 more recommendations

That's 38% waste. On an AWS consultant's own account.

Tech Stack

  • .NET 10 — Razor Pages + Minimal API
  • SQLite — EF Core, lightweight and self-contained
  • AWS Bedrock — Claude Haiku 4.5 for AI analysis
  • Chart.js — Cost trend visualizations
  • Stripe — Billing

The whole thing runs on a single Lightsail instance alongside 7 other .NET apps.

Try It

Free tier available at sharktoothproject.com. Pro ($19/mo) adds idle detection, right-sizing, and AI recommendations.

If you manage AWS for clients, the Team tier ($39/mo) covers unlimited accounts.


Built by Nat Thompson at Obsidian River. Questions? Find me on LinkedIn or drop a comment below.

Top comments (0)