DEV Community

Rez Moss for AWS Community Builders

Posted on

Finding AWS Waste in Go, Part 3: Reporting, a $4,204 Total, and a CI Gate

Finding AWS Waste in Go

By now costsweep has the Cost Explorer rightsizing scanner, three EC2 scanners, and a pricing table behind it. At this point it produces a big unordered pile of Findings, which isn't an answer yet. A six-figure-a-month AWS bill hides its leaks because no one adds the scattered charges into one number with a dollar sign and a comma in it.

This post turns that pile into something you act on: sort the findings biggest-first, total them into the annual figure, render them three ways for three audiences, bundle a reproducible demo so anyone can run the tool in two seconds, and add a CI gate so the waste you cleaned up stays cleaned up. This is where the $4,204/year prints.

The Problem with a Pile of Findings

A slice of Findings holds the data but says nothing. To be useful it needs three jobs the scanners skip on purpose, because doing them once in a report layer beats doing them four times in four scanners:

  1. Order: biggest waste first, because that's the order you act in.
  2. Totals: the monthly sum, the annual sum, and a per-category breakdown.
  3. Rendering: a human wants an aligned table; a dashboard wants JSON; a pull request wants Markdown.

A Summary type holds the rolled-up view, and a Summarize function is the only place sorting and totaling happen:

type Summary struct {
    Findings     []finding.Finding  `json:"findings"`
    TotalMonthly float64            `json:"total_monthly_usd"`
    TotalAnnual  float64            `json:"total_annual_usd"`
    ByType       map[string]float64 `json:"annual_by_type_usd"`
    Count        int                `json:"count"`
}

// Summarize sorts biggest-waste-first and totals, so all renderers share order.
func Summarize(findings []finding.Finding) Summary {
    sorted := make([]finding.Finding, len(findings))
    copy(sorted, findings)
    slices.SortStableFunc(sorted, func(a, b finding.Finding) int {
        return cmp.Compare(b.MonthlyUSD, a.MonthlyUSD)
    })

    s := Summary{Findings: sorted, ByType: map[string]float64{}, Count: len(sorted)}
    for _, f := range sorted {
        s.TotalMonthly += f.MonthlyUSD
        s.ByType[f.Type] += f.AnnualUSD()
    }
    s.TotalAnnual = s.TotalMonthly * 12
    return s
}
Enter fullscreen mode Exit fullscreen mode

Two small choices worth defending. I copy before sorting so Summarize doesn't mutate the caller's slice; a function that reorders its input behind your back is a future bug. And I use the generics-era slices.SortStableFunc with cmp.Compare(b, a). The argument order, b before a, flips it to descending. The stable sort keeps findings with identical costs (five idle IPs at $3.65) in a fixed order, which matters the moment you snapshot the output in a test or screenshot it for a blog post.

Three Renderers, One Order

Each renderer takes a Summary and an io.Writer. The table is the default: hand-sized columns, biggest first, a total line across the bottom. I skipped text/tabwriter and any table library on purpose, because keeping the dependency list at "the AWS SDK and nothing else" is worth a few Fprintf format strings.

func WriteTable(w io.Writer, s Summary) {
    if s.Count == 0 {
        fmt.Fprintln(w, "No waste found. Either you're tidy or the scanners lack permissions.")
        return
    }
    fmt.Fprintf(w, "%-22s %-20s %-12s %10s %10s\n", "TYPE", "RESOURCE", "REGION", "MONTHLY", "ANNUAL")
    fmt.Fprintln(w, strings.Repeat("-", 78))
    for _, f := range s.Findings {
        fmt.Fprintf(w, "%-22s %-20s %-12s %9.2f %9.0f\n",
            trunc(f.Type, 22), trunc(f.Resource, 20), trunc(f.Region, 12), f.MonthlyUSD, f.AnnualUSD())
    }
    fmt.Fprintln(w, strings.Repeat("-", 78))
    fmt.Fprintf(w, "%-56s %9.2f %9.0f\n",
        fmt.Sprintf("TOTAL (%d findings)", s.Count), s.TotalMonthly, s.TotalAnnual)
}
Enter fullscreen mode Exit fullscreen mode

That empty-state message earns its keep. The first time I ran the tool against a clean test account it printed "No waste found" and I almost shipped a bug: the cause was a missing IAM permission, not a tidy account. Naming both possibilities in the message saved a future me an hour.

The Markdown renderer is built for one job: pasting into a pull request or a GitHub Actions step summary, with the headline number in the heading so it's visible without expanding anything.

func WriteMarkdown(w io.Writer, s Summary) {
    fmt.Fprintf(w, "### 💸 costsweep: $%.0f/yr of estimated waste across %d findings\n\n",
        s.TotalAnnual, s.Count)
    if s.Count == 0 {
        fmt.Fprintln(w, "_No waste found._")
        return
    }
    fmt.Fprintln(w, "| Type | Resource | Region | Monthly | Annual |")
    fmt.Fprintln(w, "|------|----------|--------|--------:|-------:|")
    for _, f := range s.Findings {
        fmt.Fprintf(w, "| %s | `%s` | %s | $%.2f | $%.0f |\n",
            f.Type, f.Resource, f.Region, f.MonthlyUSD, f.AnnualUSD())
    }
}
Enter fullscreen mode Exit fullscreen mode

JSON is encoding/json with indentation, because a human reads it at least once before a machine does. The struct tags on Summary do the work.

A Reproducible Demo

I wanted anyone to run costsweep without an AWS account, credentials, or setup, and I wanted this blog post's numbers reproducible. Go's embed does it: a sample dataset compiled into the binary, decoded into the same []Finding the live scanners produce.

//go:embed testdata/demo.json
var demoData []byte

func collect(demo bool, region, profile string, snapAgeDays int) ([]finding.Finding, error) {
    if demo {
        var findings []finding.Finding
        if err := json.Unmarshal(demoData, &findings); err != nil {
            return nil, fmt.Errorf("decoding demo data: %w", err)
        }
        return findings, nil
    }
    // ...otherwise build the real AWS clients and run the scanners
}
Enter fullscreen mode Exit fullscreen mode

Because demo mode rejoins the pipeline at the Finding level, every line after it (sorting, totaling, all three renderers, the CI gate) runs on demo data the way it does on a live account. The demo shares the real code path and only swaps the source of findings, so it can't drift from production behavior.

Test Output

The report layer is pure functions over a slice, so it's the easiest thing in the project to test. One test pins the whole contract: sort order, both totals, and the per-type rollup:

func TestSummarizeSortsAndTotals(t *testing.T) {
    s := Summarize(sample()) // EIP $3.65, terminate $30, EBS $8

    // Biggest monthly waste must come first.
    if s.Findings[0].Type != "rightsizing-terminate" {
        t.Errorf("first finding = %q, want rightsizing-terminate", s.Findings[0].Type)
    }
    // 3.65 + 30 + 8 = 41.65/mo -> 499.80/yr
    if got := s.TotalAnnual; got < 499.7 || got > 499.9 {
        t.Errorf("TotalAnnual = %.2f, want 499.80", got)
    }
    if s.ByType["unattached-ebs"] != 96.0 {
        t.Errorf("ByType[unattached-ebs] = %.2f, want 96.00", s.ByType["unattached-ebs"])
    }
}
Enter fullscreen mode Exit fullscreen mode

The full run on the bundled demo data, biggest waste first, totals to the number that named this series:

$ costsweep -demo
TYPE                   RESOURCE             REGION          MONTHLY     ANNUAL
------------------------------------------------------------------------------
rightsizing-terminate  i-04e1f7a9c2b3d5e60  us-east-1       138.70      1664
rightsizing-modify     i-09a2bc4d6e8f1a2b3  us-east-1        69.35       832
unattached-ebs         vol-0a1b2c3d4e5f607… us-east-1        50.00       600
stale-snapshot         snap-0d4e5f60718293… us-east-1        25.00       300
unattached-ebs         vol-0b2c3d4e5f60718… us-east-1        16.00       192
stale-snapshot         snap-0e5f60718293a4… us-east-1        15.00       180
stale-snapshot         snap-0f60718293a4b5… us-west-2        10.00       120
unattached-ebs         vol-0c3d4e5f6071829… us-west-2         8.00        96
idle-eip               52.20.14.101         us-east-1         3.65        44
idle-eip               52.20.14.102         us-east-1         3.65        44
idle-eip               52.20.14.103         us-east-1         3.65        44
idle-eip               34.210.9.51          us-west-2         3.65        44
idle-eip               34.210.9.52          us-west-2         3.65        44
------------------------------------------------------------------------------
TOTAL (13 findings)                                         350.30      4204
Enter fullscreen mode Exit fullscreen mode

$350.30/month is $4,204/year: two oversized instances, a few orphaned volumes, a stack of old snapshots, and five Elastic IPs no one released. Any single line is easy to skip. Added up, they're worth a morning's cleanup.

The CI Gate

Cleaning up waste once feels good and doesn't last. Three months later you have a fresh batch of orphaned volumes, because waste is a flow, not a stock. The last feature is a gate: run costsweep in CI and fail the job when annual waste crosses a threshold you set.

The detail that makes this usable is two different non-zero exit codes. A pipeline needs to tell "the tool itself broke" from "the tool ran fine and found too much waste"; conflate them and a credentials error looks like a budget breach:

summary := report.Summarize(findings)
// ...render...

// exit 2 (over budget) != exit 1 (tool broke)
if *failOver >= 0 && summary.OverThreshold(*failOver) {
    fmt.Fprintf(os.Stderr, "\ncostsweep: annual waste $%.0f >= threshold $%.0f\n",
        summary.TotalAnnual, *failOver)
    os.Exit(2)
}
Enter fullscreen mode Exit fullscreen mode

OverThreshold is a one-liner on Summary, which keeps the policy testable without spinning up the binary:

func (s Summary) OverThreshold(annualLimit float64) bool {
    return s.TotalAnnual >= annualLimit
}
Enter fullscreen mode Exit fullscreen mode

In practice:

$ costsweep -demo -fail-over 1000 ; echo "exit: $?"
... (table) ...
costsweep: annual waste $4204 >= threshold $1000
exit: 2

$ costsweep -demo -fail-over 6000 ; echo "exit: $?"
... (table) ...
exit: 0
Enter fullscreen mode Exit fullscreen mode

Dropped into a GitHub Actions workflow, that's a nightly job that posts the Markdown table to the run summary and goes red when the account drifts past budget:

- name: AWS waste check
  run: |
    go run github.com/rezmoss/costsweep@latest -format markdown >> "$GITHUB_STEP_SUMMARY"
    go run github.com/rezmoss/costsweep@latest -fail-over 2000
Enter fullscreen mode Exit fullscreen mode

The IAM it needs is read-only: the tool deletes nothing, it tells you what you should delete:

{
  "Effect": "Allow",
  "Action": [
    "ce:GetRightsizingRecommendation",
    "ce:GetCostAndUsage",
    "ec2:DescribeVolumes",
    "ec2:DescribeAddresses",
    "ec2:DescribeSnapshots"
  ],
  "Resource": "*"
}
Enter fullscreen mode Exit fullscreen mode

Where to Take It

Three posts in, costsweep is a real tool: four scanners, a pricing table, three output formats, a demo mode, and a CI gate, all testable offline behind the interface seam from Part 1. Its limits map to the next features. It scans one region per run, so a -all-regions flag that fans out with a goroutine per region is the obvious next step. RDS, idle load balancers, and old AMIs are three more scanners that slot into the same Scanner interface without touching anything else. The static price book could also fall back to the real Pricing API for exotic regions.

It already does the one thing the AWS console leaves undone: it adds the waste up. On our account that total was $4,204/year, and finding it took one go run. The full source, every scanner, test, and the demo data behind these numbers, is on GitHub. Clone it, run costsweep -demo, then point it at your own account and see what it adds up to.

Top comments (0)