varun varde

Posted on May 1

The FinOps Starter Kit: Making Cloud Cost Visible in 5 Days

#aws #cloud #finops #devops

Most cloud cost advice starts at the wrong layer. It jumps straight into optimization tactics Reserved Instances, Spot capacity, aggressive rightsizing without first addressing the more fundamental problem: visibility.

Because without visibility, optimization becomes guesswork. And guesswork is expensive.

This guide takes a different approach. Five days. No third-party FinOps platforms. Just native tooling, deliberate structure, and a system engineers will actually use.

Day 1: Tagging Strategy The Foundation Everything Else Depends On

Every meaningful cost analysis begins with attribution. Without tags, cost data is a monolith. With tags, it becomes dimensional.

Core Tagging Model

A minimal, effective tagging schema:

{
  "team": "platform",
  "service": "auth-api",
  "environment": "production",
  "owner": "team-lead"
}

Enforcing Tags at Resource Creation

aws ec2 run-instances \
  --image-id ami-123456 \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=team,Value=platform},{Key=service,Value=auth-api}]'

Tag Compliance Check

aws resourcegroupstaggingapi get-resources \
  --tag-filters Key=team

Why This Matters

Tags are not metadata. They are the index keys for your cost database.

No tags → no attribution → no accountability.

Day 2: AWS Cost Explorer API — Pulling Cost Data Programmatically

The console is fine for humans. Systems need APIs.

Basic Cost Query

import boto3

ce = boto3.client('ce')

response = ce.get_cost_and_usage(
    TimePeriod={"Start": "2026-04-01", "End": "2026-04-30"},
    Granularity="DAILY",
    Metrics=["UnblendedCost"]
)

Group by Service and Team Tag

response = ce.get_cost_and_usage(
    TimePeriod={"Start": "2026-04-01", "End": "2026-04-30"},
    Granularity="DAILY",
    Metrics=["UnblendedCost"],
    GroupBy=[
        {"Type": "DIMENSION", "Key": "SERVICE"},
        {"Type": "TAG", "Key": "team"}
    ]
)

Key Insight

Cost data is delayed (~24h), but still actionable.

This becomes your source of truth. Everything else builds on it.

Day 3: Building Per-Service Cost Dashboards in Grafana

Raw data is inert. Visualization activates it.

Architecture

AWS Cost Explorer → Export Script → JSON/Prometheus → Grafana

Example Export Script

import json

data = response["ResultsByTime"]

with open("cost.json", "w") as f:
    json.dump(data, f)

Prometheus Metric Format

aws_cost{service="EC2",team="platform"} 123.45

Grafana Panel Query

sum by(service) (aws_cost)

Dashboard Views

Cost per service
Cost per team
Daily trend lines
Top 10 spenders

Good dashboards don’t overwhelm. They illuminate.

Day 4: Anomaly Detection Alerting When Cost Spikes Unexpectedly

Spikes happen. Some are valid. Others are not.

Detection must be immediate.

Simple Threshold Alert

aws_cost_daily > 500

Deviation-Based Alert

aws_cost_daily > avg_over_time(aws_cost_daily[7d]) * 1.5

CloudWatch Anomaly Detection

aws cloudwatch put-anomaly-detector \
  --metric-name EstimatedCharges \
  --namespace AWS/Billing

Alert Routing

Alert → SNS → Slack / Email

Short spikes matter. Long drifts matter more.

Both need visibility.

Day 5: The Weekly Cost Digest Automated Slack Report Per Team

Dashboards are passive. Digests are proactive.

Engineers rarely check dashboards. They read Slack.

Weekly Cost Digest Script

# cost_digest.py Weekly per-team cost report to Slack
import boto3, json, datetime
from slack_sdk import WebClient

ce = boto3.client('ce', region_name='us-east-1')
slack = WebClient(token="YOUR_SLACK_TOKEN")

def get_team_costs(team_tag: str, days: int = 7) -> dict:
    end = datetime.date.today()
    start = end - datetime.timedelta(days=days)

    resp = ce.get_cost_and_usage(
        TimePeriod={"Start": str(start), "End": str(end)},
        Granularity="DAILY",
        Filter={"Tags": {"Key": "team", "Values": [team_tag], "MatchOptions": ["EQUALS"]}},
        Metrics=["UnblendedCost"],
        GroupBy=[{"Type": "DIMENSION", "Key": "SERVICE"}]
    )

    totals = {}
    for result in resp["ResultsByTime"]:
        for group in result["Groups"]:
            svc = group["Keys"][0]
            cost = float(group["Metrics"]["UnblendedCost"]["Amount"])
            totals[svc] = totals.get(svc, 0) + cost

    return totals

def post_digest(team: str, channel: str):
    costs = get_team_costs(team)
    total = sum(costs.values())

    lines = [
        f"*Weekly Cloud Cost Digest — Team: {team}*",
        f"Total (last 7 days): *${total:,.2f}*",
        ""
    ]

    for svc, cost in sorted(costs.items(), key=lambda x: -x[1])[:8]:
        lines.append(f" • {svc}: ${cost:,.2f}")

    slack.chat_postMessage(channel=channel, text="\n".join(lines))

# Run weekly via EventBridge scheduled rule
post_digest("platform-team", "#platform-costs")

Scheduling with EventBridge

aws events put-rule \
  --schedule-expression "cron(0 9 ? * MON *)"

This creates a ritual. A cadence. Cost becomes visible and social.

Bonus: Cost-per-Request Metrics Using CloudWatch + Lambda

Absolute cost is useful. Unit cost is transformative.

Custom Metric Example

import boto3

cloudwatch = boto3.client('cloudwatch')

cloudwatch.put_metric_data(
    Namespace='AppMetrics',
    MetricData=[
        {
            'MetricName': 'CostPerRequest',
            'Value': 0.002
        }
    ]
)

Formula

Cost per request = Total service cost / Total requests

Now teams optimize efficiency not just spend.

Azure and GCP Equivalents

Azure

Cost Management API
Azure Monitor
Tags via Resource Manager

GCP

Billing Export to BigQuery
Looker Studio dashboards
Labels for resource tagging

The principles remain identical. Only the APIs differ.

Common Tagging Mistakes (and How to Fix Them)

1. Inconsistent Tag Keys

team vs Team vs TEAM

Fix: Enforce via policy.

2. Missing Tags on Critical Resources

Fix: Use SCPs or IAM policies

{
  "Effect": "Deny",
  "Action": "ec2:RunInstances",
  "Condition": {
    "Null": {
      "aws:RequestTag/team": "true"
    }
  }
}

3. Over-Tagging

Too many tags dilute clarity.

Fix: Keep it minimal. Intentional.

FinOps does not begin with optimization. It begins with visibility.

In five days:

Costs become attributable
Dashboards become actionable
Alerts become immediate
Engineers become accountable

And something subtle happens.

Cost stops being a finance concern. It becomes an engineering signal.

That shift quiet, structural, and profound is where real savings begin.

DEV Community