Darian Vance

Posted on Jan 7 • Edited on Jan 20 • Originally published at wp.me

Solved: Had to hop on the Stranger Things hype, tried connecting it with FinOps. Thoughts?

#devops #programming #tutorial #cloud

🚀 Executive Summary

TL;DR: Unpredictable cloud spend, characterized by spikes and resource sprawl, mirrors the chaos of the Upside Down. This post outlines a FinOps strategy, inspired by Stranger Things, to bring order through proactive cost governance, anomaly detection, and automated remediation, ultimately closing the gate on overspending.

🎯 Key Takeaways

Proactive cost governance enforces policies like mandatory tagging (e.g., ‘CostCenter’ on AWS EC2 via AWS Config Rules or Azure Resource Groups via Azure Policy) to ensure accountability and visibility from resource deployment.
Anomaly detection systems, such as AWS Cost Anomaly Detection, utilize machine learning to identify unusual spending patterns and trigger alerts for rapid investigation and mitigation of unexpected cost surges.
Automated remediation leverages serverless functions (e.g., AWS Lambda, Azure Functions) to programmatically correct cost issues, like stopping idle EC2 instances lacking a ‘KeepAlive’ tag or deleting unattached Azure managed disks.

The unpredictable nature of cloud spend can feel like battling a creature from the Upside Down. This post explores FinOps strategies, inspired by Stranger Things, to bring order and control to your cloud costs through proactive governance, anomaly detection, and automated remediation.

The Upside Down of Cloud Spend: Connecting Stranger Things to FinOps

The world of cloud computing, for all its promise of agility and scalability, often mirrors the chaotic unpredictability of Hawkins, Indiana. Just as residents might suddenly face a Demogorgon or a flickering light indicating otherworldly interference, IT professionals frequently encounter unexpected spikes in cloud bills, unoptimized resources, or shadow IT projects consuming vast sums. This isn’t just a nuisance; it’s the Upside Down of your budget, a dimension where resources run rampant and costs spiral out of control.

Connecting the dots between the terrifying chaos of Stranger Things and the strategic discipline of FinOps might seem like a stretch, but the parallels are surprisingly apt. FinOps, at its core, is about bringing financial accountability and operational excellence to the variable spend model of cloud. It’s about taming the unknown, establishing visibility, and empowering teams to make informed, cost-conscious decisions – much like the residents of Hawkins banding together to understand and combat the threats from another dimension.

Symptoms: The Shadow Monster Lurking in Your Cloud Bills

Before we can fight the monsters, we need to recognize their signs. In the FinOps realm, these symptoms often manifest as:

Uncontrolled Spend Spikes: Sudden, unexplained surges in your monthly cloud bill, often attributed to a “mystery factor” or a resource left running by an engineer who’s since moved on. This is your Demogorgon-level surprise.
Lack of Visibility: You know you’re spending money, but you can’t easily pinpoint where it’s going, who owns what, or why certain resources exist. It’s like trying to navigate the Upside Down blindfolded.
Resource Sprawl: A proliferation of idle VMs, unattached storage volumes, old snapshots, or over-provisioned instances across multiple accounts and regions. The Mind Flayer’s tendrils reaching everywhere.
Missed Optimization Opportunities: Failing to leverage reserved instances, savings plans, spot instances, or right-sizing recommendations due to lack of awareness or process. Leaving the gate to significant savings wide open.
Siloed Teams and Blame Games: Engineering, Finance, and Operations teams speaking different languages, working with different data, and pointing fingers when cost issues arise. Different dimensions, zero communication.
Budget Overruns: Consistently exceeding allocated budgets for cloud resources, leading to difficult conversations with finance and impacting future project approvals.

Solution 1: Proactive Cost Governance and the Hawkins Lab Approach

Just as Hawkins Lab attempted (with mixed success) to study and control the anomaly, proactive cost governance establishes foundational structures to manage cloud spend from the outset. This solution focuses on prevention through strict policies, mandatory tagging, and clear accountability.

Concept: Implement policies that enforce tagging, resource lifecycle management, and cost allocation best practices. This ensures every resource deployed has an owner, purpose, and associated cost center, providing immediate visibility and accountability.

Real Example: AWS Tagging Policies via AWS Organizations

You can use AWS Organizations Service Control Policies (SCPs) or AWS Config rules to enforce mandatory tagging. Let’s look at an AWS Config rule example to ensure all EC2 instances have a ‘CostCenter’ tag.

# Deploying an AWS Config Rule using CloudFormation
# This rule checks if EC2 instances have a 'CostCenter' tag.

Resources:
  EC2CostCenterTagRule:
    Type: AWS::Config::ConfigRule
    Properties:
      ConfigRuleName: RequiredTagsForEC2
      Description: Checks if EC2 instances have a 'CostCenter' tag.
      Source:
        Owner: AWS
        SourceIdentifier: REQUIRED_TAGS
        SourceDetails:
          - EventSource: aws.config
            MessageType: ConfigurationItemChangeNotification
      Scope:
        ComplianceResourceTypes:
          - AWS::EC2::Instance
      InputParameters:
        tag1Key: "CostCenter"
      MaximumExecutionFrequency: TwentyFour_Hours

Outputs:
  ConfigRuleARN:
    Description: ARN of the Config Rule
    Value: !GetAtt EC2CostCenterTagRule.Arn

Once deployed, any new EC2 instance without a ‘CostCenter’ tag will be marked as non-compliant, triggering alerts and providing data for remediation. For more strict enforcement, you can combine this with an SCP that denies resource creation if specific tags are missing.

Azure Policy for Mandatory Tagging

Azure Policy allows you to define policies that audit or enforce tagging standards. Here’s a policy definition JSON that requires a ‘CostCenter’ tag on resource groups:

{
  "properties": {
    "displayName": "Require 'CostCenter' tag on Resource Groups",
    "policyType": "Custom",
    "mode": "All",
    "description": "Requires all Resource Groups to have a 'CostCenter' tag with a specified value.",
    "parameters": {
      "tagName": {
        "type": "String",
        "metadata": {
          "displayName": "Tag Name",
          "description": "Name of the tag to enforce (e.g., CostCenter)"
        },
        "defaultValue": "CostCenter"
      }
    },
    "policyRule": {
      "if": {
        "allOf": [
          {
            "field": "type",
            "equals": "Microsoft.Resources/subscriptions/resourceGroups"
          },
          {
            "field": "[concat('tags[', parameters('tagName'), ']')]",
            "exists": "false"
          }
        ]
      },
      "then": {
        "effect": "deny"
      }
    }
  }
}

This policy, when assigned, would deny the creation of any new Resource Group that lacks the ‘CostCenter’ tag, acting as a gatekeeper against untagged resources.

Solution 2: Anomaly Detection and the Demogorgon Sighting System

Even with proactive governance, the cloud environment is dynamic, and unexpected cost surges can still occur. This is where anomaly detection comes in – a system designed to alert you to unusual spending patterns, much like the flickering lights and strange noises signaling a Demogorgon’s presence.

Concept: Utilize native cloud provider tools or third-party solutions to continuously monitor spend patterns. When spending deviates significantly from historical norms, an alert is triggered, allowing for immediate investigation and mitigation.

Real Example: AWS Cost Anomaly Detection

AWS Cost Anomaly Detection uses machine learning to identify unusual spending and alerts you. You can set it up via the AWS Console or programmatically.

# AWS CLI command to create an Anomaly Monitor
# This monitor tracks daily costs for the entire linked account.

aws ce create-anomaly-monitor \
    --anomaly-monitor-name "Daily_Account_Spend_Monitor" \
    --monitor-type "DIMENSIONAL" \
    --monitor-dimension "SERVICE" \
    --resource-tags Key=Project,Values=FinOps

# To create an Anomaly Subscription to receive alerts
aws ce create-anomaly-subscription \
    --anomaly-subscription-name "High_Spend_Alerts" \
    --threshold 100 \
    --frequency "DAILY" \
    --monitor-arn-list "arn:aws:ce::123456789012:anomalymonitor/b42d1f05-b7f7-43ce-a1a7-f5c7e19d7d96" \
    --subscriber EmailAddress=finops-alerts@yourcompany.com,Type=EMAIL

This setup will alert your FinOps team if daily spend exceeds its historical pattern by more than $100 for any service. The monitor-dimension “SERVICE” helps pinpoint which service is causing the anomaly, making investigation faster.

Azure Cost Management Alerts

Azure Cost Management provides budget alerts that notify you when your spend reaches a certain percentage of your budget. While not strictly “anomaly” detection in the ML sense, it serves a similar purpose for budgeted thresholds.

# Azure CLI command to create a budget and alert
# This creates a monthly budget for a subscription and sends an email when 80% is reached.

az consumption budget create \
    --budget-name "MonthlyFinOpsBudget" \
    --amount 10000 \
    --time-grain "Monthly" \
    --start-date "2023-11-01" \
    --end-date "2024-11-01" \
    --category "Cost" \
    --resource-group "FinOpsRG" \
    --notification-enabled \
    --notification-threshold 80 \
    --notification-contact-emails "finops-alerts@yourcompany.com" \
    --subscription-id "your-subscription-id"

For true anomaly detection in Azure, integrating with Azure Monitor and Log Analytics, and custom Kusto queries on cost data, can provide more sophisticated insights, or utilizing third-party FinOps platforms that offer ML-driven anomaly detection.

Solution 3: Automated Remediation and Eleven’s Telekinetic Optimization

Once an anomaly is detected or a non-compliant resource identified, manual intervention can be slow and error-prone. This is where automated remediation comes in – programmatic actions to correct cost issues, much like Eleven using her powers to fix problems from a distance.

Concept: Develop serverless functions or automation scripts that automatically take corrective actions based on predefined rules or detected anomalies. This could include stopping idle resources, right-sizing instances, deleting old snapshots, or enforcing scaling policies.

Real Example: Automated Stopping of Idle AWS EC2 Instances

A common scenario is EC2 instances left running outside business hours. Here’s a simplified Python script for an AWS Lambda function that stops EC2 instances without a ‘KeepAlive’ tag, simulating an idle resource detection.

# Python code for an AWS Lambda function
# This function stops EC2 instances that do not have a 'KeepAlive' tag set to 'true'.

import boto3
import os

def lambda_handler(event, context):
    region = os.environ.get('AWS_REGION', 'us-east-1')
    ec2 = boto3.client('ec2', region_name=region)

    # Get all running instances
    response = ec2.describe_instances(
        Filters=[
            {'Name': 'instance-state-name', 'Values': ['running']}
        ]
    )

    instances_to_stop = []

    for reservation in response['Reservations']:
        for instance in reservation['Instances']:
            instance_id = instance['InstanceId']

            # Check for 'KeepAlive' tag
            keep_alive_tag_found = False
            for tag in instance.get('Tags', []):
                if tag['Key'] == 'KeepAlive' and tag['Value'].lower() == 'true':
                    keep_alive_tag_found = True
                    break

            if not keep_alive_tag_found:
                instances_to_stop.append(instance_id)
                print(f"Instance {instance_id} does not have 'KeepAlive=true' tag. Adding to stop list.")

    if instances_to_stop:
        try:
            ec2.stop_instances(InstanceIds=instances_to_stop)
            print(f"Successfully stopped instances: {', '.join(instances_to_stop)}")
        except Exception as e:
            print(f"Error stopping instances: {e}")
    else:
        print("No instances found to stop based on 'KeepAlive' tag.")

    return {
        'statusCode': 200,
        'body': 'Automated EC2 instance stop complete.'
    }

This Lambda function could be triggered on a schedule (e.g., nightly) or in response to a Cost Anomaly Detection alert. Instances that genuinely need to run 24/7 would simply have the ‘KeepAlive:true’ tag.

Azure Function for Deleting Unattached Disks

Similarly, an Azure Function could be used to identify and delete unattached managed disks, a common source of wasted storage costs.

# PowerShell code for an Azure Function
# This function deletes unattached Azure managed disks.

param($TimerInfo)

Write-Host "PowerShell timer trigger function executed at: $(Get-Date)"

try {
    # Connect to Azure (Managed Identity recommended for production)
    # Connect-AzAccount -Identity # Example for Managed Identity

    # Get all managed disks in the subscription
    $disks = Get-AzDisk | Where-Object { $_.DiskState -eq 'Unattached' }

    if ($disks.Count -gt 0) {
        Write-Host "Found $($disks.Count) unattached disks. Deleting..."
        foreach ($disk in $disks) {
            Write-Host "Deleting disk: $($disk.Name) in resource group $($disk.ResourceGroupName)"
            Remove-AzDisk -DiskName $disk.Name -ResourceGroupName $disk.ResourceGroupName -Force -ErrorAction Stop
        }
        Write-Host "Deletion complete."
    } else {
        Write-Host "No unattached disks found."
    }
}
catch {
    Write-Error "An error occurred: $($_.Exception.Message)"
}

This function, triggered by a timer, automates a crucial cleanup task, directly reducing operational costs.

Choosing Your Weapon: A FinOps Strategy Comparison

Each of these FinOps solutions tackles a different aspect of cloud cost management, much like different characters in Stranger Things contribute unique skills to the fight. A truly robust FinOps strategy will likely combine elements from all three.


Strategy	Focus	Primary Goal	Proactiveness	Automation Level	Immediate Impact	Long-term Value
1. Proactive Cost Governance	Prevention, Accountability, Structure	Ensure resources are deployed with cost considerations from Day 1.	High (Pre-deployment enforcement)	Medium (Policy enforcement, audit)	Moderate (Prevents future waste)	High (Foundational cost control)
2. Anomaly Detection	Monitoring, Early Warning, Investigation	Identify unexpected cost spikes quickly to minimize impact.	Medium (Reactive to anomaly)	Medium (Automated alerting)	High (Rapid issue identification)	Medium (Requires human intervention for fix)
3. Automated Remediation	Correction, Optimization, Efficiency	Programmatically fix identified cost issues and enforce best practices.	Medium (Reactive to event/schedule)	High (Self-healing infrastructure)	High (Direct cost savings)	High (Continuous optimization, reduces toil)

Conclusion: Closing the Gate to Cloud Overspend

Connecting FinOps to the Stranger Things universe highlights a crucial truth: without vigilance, clear communication, and the right tools, your cloud environment can quickly devolve into an unpredictable, costly Upside Down. A comprehensive FinOps strategy integrates proactive governance to prevent issues, anomaly detection for early warnings, and automated remediation to swiftly correct problems.

Embrace these strategies to transform your cloud cost management from a chaotic battle against unseen forces into a disciplined, data-driven operation. By implementing these solutions, you empower your teams to not just react to the Demogorgons of cloud spend, but to anticipate, prevent, and ultimately close the gate on overspending for good.