Shailendra Singh for MechCloud

Posted on Mar 16 • Edited on Apr 18 • Originally published at mechcloud.io

🚀 Taming Your AWS Bills: A Practical Guide to Finding and Eliminating Cloud Wastage

#aws #devops #finops #cloud

If you are a software developer, DevOps engineer, or a dedicated Cloud Architect, you already know that the modern public cloud is a magnificent but highly dangerous double edged sword. On one hand, you are granted infinite global scalability and the almost magical power to provision immensely complex infrastructure with just a few lines of declarative code or a handful of clicks in a web browser. On the other hand, you are constantly haunted by the dreaded, highly unpredictable end of the month cloud bill. We have all been in that exact, uncomfortable scenario where a developer spins up a massive testing environment for a quick Proof of Concept. The testing finishes successfully, the expensive EC2 instances are terminated to save money, but the attached Elastic IPs and Elastic Block Store volumes are accidentally left behind in the digital void.

Fast forward three to six months, and your organization is suddenly hemorrhaging hundreds or even thousands of dollars for completely invisible resources that are doing absolutely nothing but collecting digital dust. In the cloud native infrastructure world, we refer to these forgotten artifacts as Orphan Resources, and they are consistently ranked by FinOps professionals as one of the primary culprits behind massive cloud wastage. Today, I want to walk you through a highly effective, modern approach to managing Infrastructure as Code, visualizing your deployments in real time, and automatically eliminating this cloud wastage before it drains your engineering budget.

The Anatomy of Cloud Wastage and Why You Pay for What You Do Not Use

Before we jump right into the practical tutorial, it is critically important to fully understand exactly why orphan resources exist in the first place and why Amazon Web Services charges you for them even when you are not actively using them. Understanding the underlying billing mechanics is the absolute first step to mastering FinOps and implementing effective cloud cost optimization strategies across your entire organization.

First, let us look deeply at unattached Elastic Block Store volumes, commonly referred to simply as EBS volumes. When you launch a standard EC2 instance, it typically comes with a root EBS volume to hold the underlying operating system, and you might attach additional secondary data volumes for active databases or large file storage. When that EC2 instance is finally terminated, the root volume is usually deleted by default depending on your specific launch configuration. However, secondary volumes are explicitly designed by AWS to persist independently of the compute instance lifecycle to completely protect your vital business data. This default behavior means they are almost always left behind when servers are destroyed. AWS bills you for the provisioned storage capacity in gigabyte months, as well as provisioned IOPS and throughput, regardless of whether that EBS volume is actually attached to a running server or just floating unattached in your account. A large provisioned gp3 volume left unattached will quietly cost you a small fortune over a single year.

Second, consider the hidden and evolving costs of unassociated Elastic IPs. Public IPv4 addresses are a finite, increasingly scarce, and therefore highly valuable global resource. Historically, AWS only charged you a penalty fee for an Elastic IP if it was not attached to a running instance. However, to reflect the true market scarcity of IPv4, AWS fundamentally changed its billing model in early 2024. Now, AWS charges exactly $0.005 per hour for all public IPv4 addresses, regardless of whether they are attached to a running service or completely idle. Because every single public IP now costs money every hour of the day, holding onto an unassociated Elastic IP is the absolute definition of pure cloud waste. You are paying a premium hourly rate for a networking asset that provides zero value to your architecture because it routes traffic to absolutely nothing.

Finding these hidden resources manually using the standard AWS Management Console is a surprisingly tedious and error prone process. It requires clicking through multiple disconnected billing screens, filtering massive tables by available or unattached resource states, and then manually crossing your fingers that you do not accidentally delete something crucial that just happens to be temporarily offline for routine maintenance. To solve this exact problem, we are going to look at a workflow using MechCloud, a platform that beautifully merges Stateless Infrastructure as Code with powerful visual asset discovery to make cloud management entirely frictionless.

Step 1: Governance First and Configuring Regions and Zones

Security and cost governance in the public cloud does not start with aggressively deleting things. It starts with strictly restricting exactly where resources can be deployed in the first place. If your core customer base is exclusively located in North America, there is rarely a good business or technical reason for a junior developer to have the permissions necessary to spin up expensive GPU instances in the Sydney, Mumbai, or Tokyo AWS regions. Unrestricted region access is a massive vector for both accidental overspending and malicious crypto mining attacks via compromised IAM credentials.

In our specific workflow within the MechCloud portal, the absolute first step is configuring the strictly allowed regions for your AWS account. You navigate to the Manage Cloud Accounts section of the dashboard, select your target AWS cloud account which is a user account named Academy in our demonstration, and open the Configure Regions / Zones administrative panel.

What you will immediately notice is that by default, the platform embraces a strict Zero Trust approach to geographical cloud regions. All Regions and Availability Zones across the entire globe are completely disabled for the account by default. You must explicitly and intentionally choose the specific regions where you want to allow provisioning or where you want to actively discover existing resources. In our specific walkthrough, we scroll down to the US Regions section and select US East Ohio, known programmatically to developers as us-east-2. We click the region block to enable it, which subsequently enables the underlying availability zones within it like us-east-2a, us-east-2b, and us-east-2c. We then carefully save this configuration. This one simple toggle acts as a powerful, account wide guardrail, ensuring that rogue Infrastructure as Code scripts cannot quietly spin up expensive resources in unmonitored corners of your cloud environment.

Step 2: Provisioning Resources and Simulating Messy Habits

Next, we move over to the Stateless IaC module to actually build some functional infrastructure. If you are a DevOps engineer who is used to traditional tools like Terraform or Pulumi, you already intimately know the extreme pain of managing complex state files. You have to configure secure backend S3 buckets, set up specific DynamoDB tables for state locking to prevent concurrent state corruption, and constantly worry about state drift when someone makes a manual console change. Stateless IaC abstractions aim to completely handle all of the state mapping complexity on your behalf, allowing you to focus purely on the declarative, desired state of your infrastructure in a simple, highly readable YAML format.

To simulate a real world, predictably messy development environment, we are going to write a custom YAML configuration that provisions a standard, best practice web tier. We will intentionally inject some incredibly bad practices into this file. We will explicitly add an unattached Elastic IP and an unattached EBS volume to simulate the exact type of cloud waste that typically accumulates over time in enterprise accounts.

Take a close, analytical look at our specific YAML template designed for this walkthrough:

resources:
  - type: aws_ec2_vpc
    name: vpc1
    props:
      cidr_block: "10.0.0.0/16"
    resources:
      - type: aws_ec2_subnet
        name: subnet1
        props:
          cidr_block: "10.0.1.0/24"
          availability_zone: "{{CURRENT_REGION}}a"
        resources:
          - type: aws_ec2_instance
            name: vm1
            props:
              image_id: "{{Image|arm64_ubuntu_24_04}}"
              instance_type: "t4g.small"
              security_group_ids:
                - "ref:vpc1/sg1"
      - type: aws_ec2_security_group
        name: sg1
        props:
          group_name: "mc-sg1"
          group_description: "SG for EC2 instance"
          security_group_ingress:
            - ip_protocol: tcp
              from_port: 22
              to_port: 22
              cidr_ip: "{{CURRENT_IP}}/32"
  - type: aws_ec2_eip
    name: eip1
  - type: aws_ec2_volume
    name: vol1
    props:
      availability_zone: "{{CURRENT_REGION}}a"
      size: 10
      volume_type: "gp3"

Notice the elegant, nested hierarchical structure of this code. The aws_ec2_subnet and aws_ec2_security_group are explicitly nested directly inside the aws_ec2_vpc block, while the aws_ec2_instance is nested safely inside the subnet block itself. This visual nesting makes the architectural relationships immediately obvious to any engineer reading the code. We are also leveraging highly dynamic platform variables to avoid hardcoding fragile values. We use {{CURRENT_REGION}}a to automatically inject our previously approved Ohio availability zone. We use {{Image|arm64_ubuntu_24_04}} to automatically fetch the correct AMI ID for a modern ARM based Ubuntu 24.04 image without having to manually search for it. We also choose the t4g.small instance type powered by AWS Graviton processors for better price performance. Finally, we use the {{CURRENT_IP}}/32 macro in our security group ingress rules to automatically restrict SSH access solely to the precise IP address of the engineer currently executing the code. This is a massive security enhancement over hardcoding open networks.

But here is exactly where we introduce the simulated financial wastage. At the very end of our YAML file, completely outside the VPC hierarchy, we boldly declare an aws_ec2_eip named eip1 with absolutely no instance association properties defined. Directly below that, we declare an aws_ec2_volume named vol1 with a size of 10GB using the highly performant gp3 volume type, completely unattached to any compute resource.

Before we blindly hit the Apply button and send this massive payload to AWS, we generate a comprehensive Plan. This is where the true magic of Shift Left FinOps truly shines. The platform intelligently analyzes the delta between the current empty cloud state and our desired YAML state. The console logs output a detailed execution plan stating that exactly eight resources are to be created, zero are to be recreated, zero are to be updated, and zero are to be deleted.

However, the terminal output is additionally enriched with crucial, immediate financial data that most standard automation tools severely lack. It explicitly shows a calculated Cost Impact of a specific monthly dollar amount. It does not just give a single abstract number. It breaks the cost down granularly. It explicitly shows the ongoing Compute price for our t4g.small instance, the Root disk cost, and it explicitly highlights the precise Storage cost for our orphan gp3 data volume, as well as the idle IP cost for our unassociated Elastic IP. By surfacing these actual dollar costs directly in the planning phase before a single API call is made to provision the infrastructure, developers are empowered to make financially responsible architectural decisions immediately. We confidently click Apply, the logs execute our desired creations sequentially based on the inferred dependencies from our nested code, and our intentionally messy infrastructure is successfully built in the Ohio region.

Step 3: Discovering Assets and Visualizing the Wastage

Writing YAML is highly efficient for automation, but fully understanding the complex web of networking, routing, and resource dependencies through lines of text alone is incredibly difficult for the human brain. This is exactly where visual Asset Discovery becomes an indispensable tool for Cloud Architects and operations teams trying to maintain rigid order in their environments.

We navigate away from the code editor to the Discover Assets tab. We select our Academy team, choose the standard AWS Cloud Provider, pick our specific AWS target account, and target the us-east-2 region that we exclusively enabled in the very first step. We click the Update button to aggressively fetch the live state of the cloud.

Upon successfully fetching the data via the AWS API, the platform dynamically generates a beautiful and highly interactive topology map of our actual running AWS environment. This is not just a flat, boring list of resources. It is a nested, hierarchical representation that perfectly mimics real cloud architecture. The outer bounding box represents the overarching AWS Region. Inside that large box, we immediately see regional resources and specific Availability Zones like us-east-2a and us-east-2b. We can clearly see our newly created VPC bridging across the multiple zones. Inside the VPC block, we see our precise Subnet, and nested safely inside that specific subnet is our running EC2 instance complete with its private IP address, AMI details, and ARM based instance size.

This spatial visual layout makes it incredibly easy to immediately understand the potential blast radius and network flow of your deployment. But the most critical part of this entire visualization lies at the very top of the interactive canvas. Our intentionally unattached Elastic IP and our unattached EBS Volume are rendered in a separate Regional Resources block, but they are rendered with a glaring, completely unmissable visual difference. They both feature a bright, solid RED header.

The platform discovery engine automatically analyzes all resource relationships and attachment states during the live scan. Because it instantly identifies that these specific resources are not attached to any running compute instance or active network interface, they are immediately flagged visually as Cloud Wastage. There is absolutely no need to write complex Python scripts to audit your account, no need to set up expensive custom AWS Config rules, and no need to wait for a shocking billing alert at the end of the month. The cloud waste is identified visually, instantly, and undeniably.

Step 4: The Magic Button and Deleting Orphan Resources

Identifying infrastructure waste is only half of the FinOps battle. Remediating it safely and efficiently is the much harder other half. Traditionally, cleaning up these unused resources requires a DevOps engineer to completely context switch, log into the slow AWS Management Console, carefully cross reference the exact resource IDs of the unused volumes and IPs, heavily double check that they are not meant to be attached to something currently offline, and then manually delete them one by one. It is a terrifying process because simple human error can easily lead to deleting the wrong volume and causing a massive, career ending data outage.

With this modern visual discovery tool, the FinOps remediation process is built directly into the UI workflow to entirely eliminate friction. In the Discover Assets visual view, we simply click the Actions dropdown menu located at the top right of the canvas. From the curated list of automated operational actions, we confidently select Delete Orphan Resources.

A confirmation modal instantly appears, acting as a crucial final safety check. It explicitly lists exactly what is about to be destroyed based on the previous system scan. It tells us it will delete one Unattached Volume named vol1 while providing the exact volume ID for manual verification, and one Unassociated Elastic IP named eip1 while providing the exact IP address. We confidently click the final Delete Orphans button.

Behind the scenes, the platform securely and rapidly executes the precise API calls to AWS. Within seconds, the visual topology map refreshes itself automatically. The red warning boxes representing our expensive waste disappear entirely from the graphical canvas. Our monthly AWS bill is instantly optimized, and our cloud environment is immediately cleaner. This single feature represents a massive, completely unquantifiable quality of life improvement for Site Reliability Engineers and FinOps teams who historically spend hours every single week chasing down cloud leakage and pleading with developers to clean up their messes. When doing the right financial thing is as easy as clicking a single button, your entire engineering team will naturally maintain a leaner cloud environment.

Step 5: Clean Slate and Deprovisioning the Entire Environment

Now that we have successfully cleaned up the accidental waste, let us assume our Proof of Concept project is officially finished. It is time to completely tear down the rest of the underlying infrastructure to absolutely stop the billing meters from running.

Because we are exclusively utilizing a declarative Infrastructure as Code approach, we do not need to endure the incredibly painful process of manually deleting resources in the correct exact order. If you have ever tried to manually delete a VPC, you know the absolute frustration of AWS throwing dependency errors because you forgot to delete a security group, an internet gateway, or a hidden running instance inside a subnet first.

Instead of a manual, error prone teardown, we simply return to our Stateless IaC editor dashboard. We take our previous, lengthy YAML configuration file and replace the entire contents with a completely empty state block represented by an empty array. We simply type resources: [] to declare that our desired state is now a clean slate with zero resources.

Once again, we click the Plan button. The engine performs another delta calculation, this time comparing the active resources in our AWS account against our new, empty desired state. The logs generate a highly satisfying teardown plan:

Plan generated: 0 to create, 0 to recreate, 0 to update, 6 to delete
[DELETE] aws_ec2_instance -> vm1
[DELETE] aws_ec2_security_group -> vpc1/sg1
[DELETE] aws_ec2_subnet -> vpc1/subnet1
[DELETE] aws_ec2_vpc -> vpc1

One of the most psychologically rewarding aspects of this final step is looking at the newly generated FinOps plan. Because we are completely removing all compute and network resources, the platform calculates a beautiful negative cost impact. The logs clearly show a massive negative change percentage and a final Cost Impact in dollars and cents saved per month. Seeing a tangible, negative dollar amount associated directly with your infrastructure teardown is a massive motivational boost for engineering teams. It directly gamifies the concept of cost savings, making developers feel like active, highly appreciated participants in the company's overall financial health.

We click Apply to permanently execute the teardown. The platform inherently understands the complex AWS dependency graph. It natively handles the exact ordering by automatically terminating the EC2 instance first, waiting patiently for it to shut down, then deleting the specific subnet, detaching the Internet Gateway, deleting the custom Security Group, and finally deleting the overarching VPC itself. Within moments, the entire infrastructure stack is cleanly and safely wiped from our AWS account without a single frustrating dependency error.

To absolutely validate this clean state, we perform one final operational step. We navigate back to the Discover Assets visual dashboard and hit the refresh button for the us-east-2 region. The hierarchical topology map vividly updates and is now completely empty. We are left with a perfectly clean slate, with absolutely zero lingering resources, zero orphan volumes, zero unattached IPs, and absolutely zero nasty surprises waiting for us on the next Amazon monthly invoice.

Why This Modern Workflow Matters

The methodology shown in this video represents a fundamental and important shift in how we should approach Cloud Engineering and FinOps.

1. Bridging the Gap Between Code and Visibility

Developers love and live in code (Infrastructure as Code), while Operations teams and Cloud Architects thrive on visibility (Topology graphs and dashboards). Usually, these are entirely separate tools and mindsets. By combining Stateless IaC with immediate visual asset discovery, teams create a powerful, shared language. You write the code, and you immediately see the architecture you have built, including its flaws.

2. True Shift-Left FinOps

Cost optimization is usually a reactive, painful process. A billing alert triggers, a manager gets angry, and an engineer is tasked with the stressful job of figuring out why the bill suddenly spiked. By integrating granular cost estimation directly into the IaC Plan phase, cost management is "shifted left" into the development cycle. The developer sees the price tag before they build the infrastructure, empowering them to make better, more cost effective sizing and architectural choices from the very beginning.

3. Frictionless Waste Management

Orphan resources pile up in cloud accounts because cleaning them up involves significant friction. It requires context switching away from your primary task, tedious manual console navigation, and a constant fear of accidentally breaking a production system. Highlighting orphan resources in bright red and providing a one-click Delete Orphans action removes the friction entirely. When doing the right thing (cleaning up expensive waste) is the easiest thing to do, your team will naturally and proactively maintain a leaner, more cost effective cloud environment.

Conclusion

Managing complex cloud infrastructure does not have to be a black box of unpredictable costs and forgotten, expensive resources. By adopting a modern, integrated approach that utilizes Stateless IaC for predictable and version controlled deployments, combined tightly with visual asset discovery for continuous monitoring and one-click remediation, you can finally regain complete, effortless control over your sprawling AWS environments.

The complete end to end workflow we explored today, from configuring specific regional guardrails and provisioning infrastructure with full cost awareness, to visually identifying red flagged orphan resources and executing targeted or full automated teardowns, is the definitive blueprint for maintaining a healthy, lean, and cost optimized cloud presence.

Stop letting unattached volumes and idle IP addresses quietly drain your hard earned engineering budget. Visualize your complex architecture, automate your tedious cleanup, and make your CFO happy!

What about you? What is the absolute worst case of cloud wastage you have ever personally discovered in your AWS, GCP, or Azure environments? Have you ever stumbled upon a massive unattached EBS volume sitting there quietly billing you for years? Share your absolute best cloud bill horror stories in the comments below! 👇

DEV Community