DEV Community

Cover image for The FinOps Starter Kit: Making Cloud Cost Visible in 5 Days
varun varde
varun varde

Posted on

The FinOps Starter Kit: Making Cloud Cost Visible in 5 Days

Most cloud cost advice starts at the wrong layer. It jumps straight into optimization tactics Reserved Instances, Spot capacity, aggressive rightsizing without first addressing the more fundamental problem: visibility.

Because without visibility, optimization becomes guesswork. And guesswork is expensive.

This guide takes a different approach. Five days. No third-party FinOps platforms. Just native tooling, deliberate structure, and a system engineers will actually use.

Day 1: Tagging Strategy The Foundation Everything Else Depends On

Every meaningful cost analysis begins with attribution. Without tags, cost data is a monolith. With tags, it becomes dimensional.

Core Tagging Model

A minimal, effective tagging schema:

{
  "team": "platform",
  "service": "auth-api",
  "environment": "production",
  "owner": "team-lead"
}
Enter fullscreen mode Exit fullscreen mode

Enforcing Tags at Resource Creation

aws ec2 run-instances \
  --image-id ami-123456 \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=team,Value=platform},{Key=service,Value=auth-api}]'
Enter fullscreen mode Exit fullscreen mode

Tag Compliance Check

aws resourcegroupstaggingapi get-resources \
  --tag-filters Key=team
Enter fullscreen mode Exit fullscreen mode

Why This Matters

Tags are not metadata. They are the index keys for your cost database.

No tags → no attribution → no accountability.

Day 2: AWS Cost Explorer API — Pulling Cost Data Programmatically

The console is fine for humans. Systems need APIs.

Basic Cost Query

import boto3

ce = boto3.client('ce')

response = ce.get_cost_and_usage(
    TimePeriod={"Start": "2026-04-01", "End": "2026-04-30"},
    Granularity="DAILY",
    Metrics=["UnblendedCost"]
)
Enter fullscreen mode Exit fullscreen mode

Group by Service and Team Tag

response = ce.get_cost_and_usage(
    TimePeriod={"Start": "2026-04-01", "End": "2026-04-30"},
    Granularity="DAILY",
    Metrics=["UnblendedCost"],
    GroupBy=[
        {"Type": "DIMENSION", "Key": "SERVICE"},
        {"Type": "TAG", "Key": "team"}
    ]
)
Enter fullscreen mode Exit fullscreen mode

Key Insight

Cost data is delayed (~24h), but still actionable.
Enter fullscreen mode Exit fullscreen mode

This becomes your source of truth. Everything else builds on it.

Day 3: Building Per-Service Cost Dashboards in Grafana

Raw data is inert. Visualization activates it.

Architecture

AWS Cost Explorer → Export Script → JSON/Prometheus → Grafana
Enter fullscreen mode Exit fullscreen mode

Example Export Script

import json

data = response["ResultsByTime"]

with open("cost.json", "w") as f:
    json.dump(data, f)
Enter fullscreen mode Exit fullscreen mode

Prometheus Metric Format

aws_cost{service="EC2",team="platform"} 123.45
Enter fullscreen mode Exit fullscreen mode

Grafana Panel Query

sum by(service) (aws_cost)
Enter fullscreen mode Exit fullscreen mode

Dashboard Views

  • Cost per service

  • Cost per team

  • Daily trend lines

  • Top 10 spenders

Good dashboards don’t overwhelm. They illuminate.

Day 4: Anomaly Detection Alerting When Cost Spikes Unexpectedly

Spikes happen. Some are valid. Others are not.

Detection must be immediate.

Simple Threshold Alert

aws_cost_daily > 500
Enter fullscreen mode Exit fullscreen mode

Deviation-Based Alert

aws_cost_daily > avg_over_time(aws_cost_daily[7d]) * 1.5
Enter fullscreen mode Exit fullscreen mode

CloudWatch Anomaly Detection

aws cloudwatch put-anomaly-detector \
  --metric-name EstimatedCharges \
  --namespace AWS/Billing
Enter fullscreen mode Exit fullscreen mode

Alert Routing

Alert → SNS → Slack / Email
Enter fullscreen mode Exit fullscreen mode

Short spikes matter. Long drifts matter more.

Both need visibility.

Day 5: The Weekly Cost Digest Automated Slack Report Per Team

Dashboards are passive. Digests are proactive.

Engineers rarely check dashboards. They read Slack.

Weekly Cost Digest Script

# cost_digest.py Weekly per-team cost report to Slack
import boto3, json, datetime
from slack_sdk import WebClient

ce = boto3.client('ce', region_name='us-east-1')
slack = WebClient(token="YOUR_SLACK_TOKEN")

def get_team_costs(team_tag: str, days: int = 7) -> dict:
    end = datetime.date.today()
    start = end - datetime.timedelta(days=days)

    resp = ce.get_cost_and_usage(
        TimePeriod={"Start": str(start), "End": str(end)},
        Granularity="DAILY",
        Filter={"Tags": {"Key": "team", "Values": [team_tag], "MatchOptions": ["EQUALS"]}},
        Metrics=["UnblendedCost"],
        GroupBy=[{"Type": "DIMENSION", "Key": "SERVICE"}]
    )

    totals = {}
    for result in resp["ResultsByTime"]:
        for group in result["Groups"]:
            svc = group["Keys"][0]
            cost = float(group["Metrics"]["UnblendedCost"]["Amount"])
            totals[svc] = totals.get(svc, 0) + cost

    return totals

def post_digest(team: str, channel: str):
    costs = get_team_costs(team)
    total = sum(costs.values())

    lines = [
        f"*Weekly Cloud Cost Digest — Team: {team}*",
        f"Total (last 7 days): *${total:,.2f}*",
        ""
    ]

    for svc, cost in sorted(costs.items(), key=lambda x: -x[1])[:8]:
        lines.append(f"{svc}: ${cost:,.2f}")

    slack.chat_postMessage(channel=channel, text="\n".join(lines))

# Run weekly via EventBridge scheduled rule
post_digest("platform-team", "#platform-costs")
Enter fullscreen mode Exit fullscreen mode

Scheduling with EventBridge

aws events put-rule \
  --schedule-expression "cron(0 9 ? * MON *)"
Enter fullscreen mode Exit fullscreen mode

This creates a ritual. A cadence. Cost becomes visible and social.

Bonus: Cost-per-Request Metrics Using CloudWatch + Lambda

Absolute cost is useful. Unit cost is transformative.

Custom Metric Example

import boto3

cloudwatch = boto3.client('cloudwatch')

cloudwatch.put_metric_data(
    Namespace='AppMetrics',
    MetricData=[
        {
            'MetricName': 'CostPerRequest',
            'Value': 0.002
        }
    ]
)
Enter fullscreen mode Exit fullscreen mode

Formula

Cost per request = Total service cost / Total requests
Enter fullscreen mode Exit fullscreen mode

Now teams optimize efficiency not just spend.

Azure and GCP Equivalents

Azure

  • Cost Management API

  • Azure Monitor

  • Tags via Resource Manager

GCP

  • Billing Export to BigQuery

  • Looker Studio dashboards

  • Labels for resource tagging

The principles remain identical. Only the APIs differ.

Common Tagging Mistakes (and How to Fix Them)

1. Inconsistent Tag Keys

team vs Team vs TEAM
Enter fullscreen mode Exit fullscreen mode

Fix: Enforce via policy.

2. Missing Tags on Critical Resources

Fix: Use SCPs or IAM policies

{
  "Effect": "Deny",
  "Action": "ec2:RunInstances",
  "Condition": {
    "Null": {
      "aws:RequestTag/team": "true"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

3. Over-Tagging

Too many tags dilute clarity.

Fix: Keep it minimal. Intentional.

FinOps does not begin with optimization. It begins with visibility.

In five days:

  • Costs become attributable

  • Dashboards become actionable

  • Alerts become immediate

  • Engineers become accountable

And something subtle happens.

Cost stops being a finance concern. It becomes an engineering signal.

That shift quiet, structural, and profound is where real savings begin.

Top comments (0)