Why AWS Cost Explorer Isn’t Enough to Seriously Reduce Your Cloud Expenses

#cloud #aws #kubernetes #devops

Teams interested in tracing their AWS cloud costs usually turn to AWS Cost Explorer, the industry’s go-to tool for usage reporting.

The Cost Explorer may be a great tool when you start but if you run a larger cloud operation, you might be missing out on one important thing - real-time cost reporting.

Without it, every month the cloud bill can come with a nasty surprise.

Why is the delay in reporting so important for managing and optimizing cloud costs? Read on to find out.

What you need to know about AWS Cost Explorer and billing

You probably know how AWS billing works, but let’s go over the basics first.

AWS bills every service separately using different metrics. While some are easy to understand - for example, billing an EC2 instance per second - others present a challenge. Think Lambda, which is billed per invocation and per GB-sec used.

Want to know how much you’ll pay at the end of the month? Log into the AWS billing console and check the data there. However, you will instantly see a lag between usage and billing. That’s because AWS updates its billing data at least once a day.

Once a day sounds reasonable. Until you rack up a $72k bill in just a few hours by merely testing a service.

When does the Cost Explorer come into play? Well, it’s all about data.

At the end of the month, AWS sends you a final invoice listing all the usage, refunds, credits, or support fees. The problem with invoices is that they show aggregate charges generated by service without helping you to map them onto specific AWS resources. How else can you analyze your cloud spending?

What AWS gives you is access to the raw data used to create that invoice. You can dump that so-called Cost and Usage Report (CUR) into an S3 bucket in your account (this requires setup) and feed it into the Cost Explorer for visualization.

This might sound great - as if you had a source of up-to-date data to analyze. The only problem? AWS will refresh this data file three times a day at most. That’s better than once a day, but still not enough.

And there is no guarantee here - according to AWS docs, “Cost Explorer refreshes your cost data at least once every 24 hours.”

So, you’re left with a solution that gives you some degree of visibility into cloud costs. This sure is better than nothing, but it’s not enough if you want to seriously reduce your bill.

Here’s why you need real-time visibility

I already mentioned how Silicon Valley startup Milkie Way played around with a cloud service for a day only to wake up to a massive $72k bill the next morning.

Many things can go wrong when a team stops paying attention to cloud resource use. Adobe once generated a surprise cloud bill of over 500k because they left a computing job running on Azure. One alert acting on real-time usage data would have been enough to prevent this.

Real-time alerting is a good solution here, especially in areas that you expect might cause such spikes, like heavy GPU instances running for more than a few hours.

Access to real-time usage data helps to avoid massive cost spikes. But can it also optimize your regular cloud bill?

Combine real-time data with automation for the best result

Having real-time visibility into your cloud costs is great, but it’s just one side of the coin. The other is doing something about it, especially when you see an unexplained cost spike.

At this point, you can choose from two solutions.

The first one is delegating a team of engineers to oversee your cloud infrastructure, at least when the most important processes are running. But engineer time is expensive. And you can be sure that people aren’t going to stay long on board when all they do is monitor the details of the cloud.

The other solution is hiring an automation engine to take over infrastructure management and react to data in real-time.

For example, when the price of a spot instance goes beyond the threshold you set for it, the automation engine instantly replaces it with a cheaper solution.

When your application suddenly starts receiving lots of buzz and people are flocking in, the automation tool will automatically scale your resources to match this new demand and deliver a great experience.

You’ll never have to worry about an instance left running by mistake. The automation engine will remove any resources that aren’t being actively used to help you avoid wasting your money.

If you find a way to get your hands on real-time data, you’ll open the door to the most potent cost optimization tactics: autoscaling, rightsizing, spot instance use, and more.

See how real-time optimization works in this post: