InstaDevOps

Posted on Apr 17 • Originally published at instadevops.com

Cloud Cost FinOps: Cut Your AWS Bill by 40% Without Sacrificing Performance

#aws #cloud #costoptimization #finops

Introduction

Cloud spending has a way of creeping up silently. You start with a few EC2 instances and an RDS database, and eighteen months later your AWS bill is five figures and nobody can explain exactly where the money goes. Engineering says they need all the resources. Finance says the cloud was supposed to be cheaper than on-premise. Both are frustrated.

FinOps - short for Cloud Financial Operations - is the practice of bringing financial accountability to cloud spending. It is not about cutting costs to the bone. It is about making sure every dollar you spend on cloud infrastructure delivers value, and that your team can make informed trade-off decisions between cost, speed, and quality.

This guide covers the practical FinOps strategies that consistently deliver the biggest savings on AWS: commitment discounts, spot instances, right-sizing, cost allocation, and the organizational practices that make cost optimization sustainable.

Understanding Your Current Spend

Before optimizing anything, you need visibility into where your money goes. Without data, you are guessing.

Enable Cost Allocation Tags

Tags are the foundation of cloud cost management. Without them, your bill is one opaque number. With them, you can attribute costs to teams, services, environments, and projects.

At minimum, enforce these tags on all resources:

# Terraform - enforce tagging via default_tags
provider "aws" {
  region = "eu-west-1"

  default_tags {
    tags = {
      Environment = var.environment      # production, staging, development
      Team        = var.team_name        # payments, platform, data
      Service     = var.service_name     # payment-api, user-service
      CostCenter  = var.cost_center      # maps to finance department codes
      ManagedBy   = "terraform"
    }
  }
}

Activate tags in the AWS Billing console under Cost Allocation Tags. It takes 24 hours for newly activated tags to appear in Cost Explorer.

AWS Cost Explorer and CUR

Cost Explorer gives you quick visibility. For deeper analysis, enable the Cost and Usage Report (CUR) which exports detailed billing data to S3:

resource "aws_cur_report_definition" "cost_report" {
  report_name                = "daily-cost-report"
  time_unit                  = "DAILY"
  format                     = "Parquet"
  compression                = "Parquet"
  additional_schema_elements = ["RESOURCES"]

  s3_bucket = aws_s3_bucket.cost_reports.id
  s3_region = "eu-west-1"
  s3_prefix = "cur"

  report_versioning = "OVERWRITE_REPORT"
}

Query the CUR data with Athena to answer questions like "How much did the payments team spend on RDS last quarter?"

Set Up Budget Alerts

Never let a bill surprise you. Create budgets with alerts at 50%, 80%, and 100% thresholds:

resource "aws_budgets_budget" "monthly" {
  name         = "monthly-total"
  budget_type  = "COST"
  limit_amount = "5000"
  limit_unit   = "USD"
  time_unit    = "MONTHLY"

  notification {
    comparison_operator       = "GREATER_THAN"
    threshold                 = 80
    threshold_type            = "PERCENTAGE"
    notification_type         = "ACTUAL"
    subscriber_email_addresses = ["engineering@company.com"]
  }

  notification {
    comparison_operator       = "GREATER_THAN"
    threshold                 = 100
    threshold_type            = "PERCENTAGE"
    notification_type         = "FORECASTED"
    subscriber_email_addresses = ["engineering@company.com", "finance@company.com"]
  }
}

Reserved Instances vs Savings Plans

Commitment discounts are the single largest cost lever for most AWS accounts, typically saving 30-60% over on-demand pricing.

Savings Plans (Recommended for Most Teams)

Savings Plans offer flexibility that Reserved Instances lack. You commit to a dollar amount per hour of compute usage, and AWS applies the discount automatically.

Compute Savings Plans cover EC2, Fargate, and Lambda across all regions and instance families. They offer the most flexibility:

On-demand: $0.0416/hour (t3.medium, eu-west-1)
1-year Compute SP (no upfront): $0.0270/hour - 35% savings
3-year Compute SP (all upfront): $0.0166/hour - 60% savings

EC2 Instance Savings Plans are locked to a specific instance family and region but offer slightly deeper discounts.

How to Calculate Your Commitment

Open Cost Explorer and view the last 3 months of EC2/Fargate spend
Identify your baseline (the minimum spend that is consistent every day)
Commit to 70-80% of that baseline with Savings Plans
Cover the remaining variable usage with on-demand and spot

Example:
  Average daily compute spend: $200/day
  Minimum daily spend (baseline): $160/day
  Recommended SP commitment: $160 * 0.75 = $120/day = $5.00/hour

  Annual savings: $120/day * 365 * 0.35 = $15,330/year

Common Mistake: Over-Committing

Do not commit to 100% of current usage. Your architecture will change, services will be rewritten, and traffic patterns will shift. Commit conservatively and re-evaluate quarterly.

Spot Instances: Up to 90% Savings

Spot instances use AWS's spare capacity at steep discounts (60-90% off on-demand), but can be interrupted with 2 minutes notice.

Where Spot Works Well

CI/CD build runners
Batch processing and data pipelines
Dev/staging environments
Stateless web workers behind a load balancer (with enough capacity diversity)
Machine learning training jobs with checkpointing

Where Spot Does Not Work

Single-instance databases
Stateful services without replication
Anything where a 2-minute shutdown causes data loss

ECS Spot with Capacity Providers

resource "aws_ecs_capacity_provider" "spot" {
  name = "spot-provider"

  auto_scaling_group_provider {
    auto_scaling_group_arn         = aws_autoscaling_group.spot.arn
    managed_termination_protection = "ENABLED"

    managed_scaling {
      maximum_scaling_step_size = 5
      minimum_scaling_step_size = 1
      status                    = "ENABLED"
      target_capacity           = 100
    }
  }
}

resource "aws_ecs_cluster_capacity_providers" "main" {
  cluster_name       = aws_ecs_cluster.main.name
  capacity_providers = [
    aws_ecs_capacity_provider.spot.name,
    "FARGATE",
    "FARGATE_SPOT"
  ]

  default_capacity_provider_strategy {
    base              = 2          # 2 tasks always on regular Fargate
    weight            = 1
    capacity_provider = "FARGATE"
  }

  default_capacity_provider_strategy {
    weight            = 3          # 3x weight on Fargate Spot
    capacity_provider = "FARGATE_SPOT"
  }
}

This runs a baseline of 2 tasks on regular Fargate and scales additional tasks on Fargate Spot at roughly 70% discount.

Right-Sizing: Stop Paying for Idle Resources

Most organizations over-provision by 30-50%. Right-sizing means matching resource allocation to actual usage.

Identifying Oversized Instances

Use AWS Compute Optimizer (free) or query CloudWatch directly:

# Check average CPU utilization for an instance over the last 2 weeks
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-0123456789abcdef0 \
  --start-time $(date -u -v-14d +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 3600 \
  --statistics Average Maximum \
  --output table

Rules of thumb:

Average CPU under 20%: downsize by one tier
Average CPU under 10%: downsize by two tiers or consider Fargate
Max CPU consistently under 40%: safe to downsize one tier

Graviton Instances: 20% Better Price-Performance

AWS Graviton (ARM) instances offer 20% better price-performance than equivalent x86 instances. If your application runs on Linux and does not depend on x86-specific binaries, switching is straightforward:

t3.medium  (x86): $0.0416/hour
t4g.medium (ARM): $0.0336/hour - 19% cheaper, 20% faster

For containerized workloads, build multi-arch images:

docker buildx build --platform linux/amd64,linux/arm64 -t myapp:latest --push .

Storage and Data Transfer Optimization

S3 Lifecycle Policies

Most S3 data is accessed frequently for a short period and then rarely again:

resource "aws_s3_bucket_lifecycle_configuration" "logs" {
  bucket = aws_s3_bucket.logs.id

  rule {
    id     = "archive-old-logs"
    status = "Enabled"

    transition {
      days          = 30
      storage_class = "STANDARD_IA"        # 45% cheaper
    }

    transition {
      days          = 90
      storage_class = "GLACIER_IR"         # 68% cheaper, millisecond retrieval
    }

    transition {
      days          = 365
      storage_class = "DEEP_ARCHIVE"       # 95% cheaper
    }

    expiration {
      days = 730  # Delete after 2 years
    }
  }
}

EBS Volume Optimization

Switch gp2 volumes to gp3: same performance, 20% cheaper, and you can independently scale IOPS
Delete unattached EBS volumes (check monthly)
Use EBS snapshots for infrequently accessed data instead of keeping volumes running

# Find unattached EBS volumes
aws ec2 describe-volumes \
  --filters Name=status,Values=available \
  --query 'Volumes[*].{ID:VolumeId,Size:Size,Created:CreateTime}' \
  --output table

NAT Gateway Costs

NAT Gateways charge $0.045/hour ($32.85/month) plus $0.045/GB of data processed. For high-traffic workloads, this adds up fast.

Alternatives:

VPC Endpoints for AWS services (S3, DynamoDB, ECR) - eliminates NAT data charges for AWS API calls
NAT instances on t4g.nano ($3.07/month) for dev/staging environments

# S3 Gateway Endpoint (free, eliminates NAT charges for S3)
resource "aws_vpc_endpoint" "s3" {
  vpc_id       = aws_vpc.main.id
  service_name = "com.amazonaws.eu-west-1.s3"
  route_table_ids = [aws_route_table.private.id]
}

# ECR Interface Endpoints
resource "aws_vpc_endpoint" "ecr_api" {
  vpc_id              = aws_vpc.main.id
  service_name        = "com.amazonaws.eu-west-1.ecr.api"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = aws_subnet.private[*].id
  private_dns_enabled = true
}

Building a FinOps Culture

Tools alone do not reduce cloud costs. You need organizational practices that create accountability.

Weekly Cost Reviews

Run a 15-minute weekly meeting reviewing:

Total spend vs budget
Week-over-week change and top 3 drivers of increase
Top 5 most expensive services
Any anomalies or unexpected charges
Action items from last week

Team-Level Cost Dashboards

Give each team visibility into their own spending (this is where tags become critical). When engineers can see that their service costs $2,400/month, they start caring about optimization.

Architecture Decision Records for Cost

For significant infrastructure decisions, document the cost implications:

# ADR-015: Switch payment-api from t3.xlarge to t4g.large

## Context
Payment API currently runs on 3x t3.xlarge ($0.1664/hr each).
Average CPU: 22%. Average memory: 35%.

## Decision
Migrate to 3x t4g.large (ARM, $0.0672/hr each).
CPU and memory are sufficient based on 30-day CloudWatch data.

## Cost Impact
Before: 3 * $0.1664 * 730 = $364.42/month
After:  3 * $0.0672 * 730 = $147.17/month
Savings: $217.25/month ($2,607/year)

Quick Wins Checklist

If you are just starting with FinOps, tackle these in order. Each one can be done in a day or less:

Delete unused resources - unattached EBS volumes, old snapshots, idle load balancers, stopped instances with EBS volumes
Enable S3 Intelligent-Tiering on buckets with unpredictable access patterns
Switch gp2 to gp3 EBS volumes (20% savings, no performance loss)
Add VPC endpoints for S3 and DynamoDB (eliminates NAT data charges)
Purchase Savings Plans covering 70% of your steady-state compute
Set up budget alerts so you know before the bill arrives
Enable Compute Optimizer recommendations (free)
Review and clean up old ECR images, CloudWatch log groups, and Lambda versions

Need Help with Your DevOps?

Cloud cost optimization is an ongoing practice, not a one-time project. At InstaDevOps, we help startups and growing companies implement FinOps practices that reduce cloud spending by 30-50% while maintaining the performance and reliability your users expect.

Plans start at $2,999/mo for a dedicated fractional DevOps engineer.

Book a free 15-minute consultation to get a cloud cost assessment.

DEV Community