DEV Community

Orel Bello for AWS Community Builders

Posted on • Originally published at Medium on

How did we reduce our monthly AWS bills by 20% without breaking a sweat?

Intro

One of my many tasks as a DevOps engineer in Melio was to reduce our cloud cost.

Ok…it wasn’t my task, but I made it mine.

I saw the enormous price we paid every month and I just couldn’t stand by, I wanted to do something about it.

Who am I and why do I care about cloud costs?

My name is Orel Bello, and for the last year, I’ve been working as a DevOps Engineer on the SRE (Site Reliability Engineering) team in Melio. I started as a Deputy Commander in the Technological Control Center for Israel Police as part of my military service. I then completed my BS.c In computer science, and started working as a Storage And Virtualization Engineer. After a year and a half, I realized that I wanted to be a Devops engineer, and I got into my first Devops position right before I got started at Melio.

Since I started using AWS, I have been paying attention to every resource price, as it is a big part of the AWS solution Architect Associate that I went through at the beginning of my cloud journey, so I knew we had a lot to cut from.

So, what’s the challenge?

As we faced a lot of challenging and more urgent tasks in our day-to-day work, reducing our cloud cost wasn’t a priority. I had to find a way to do it with minimum effort and without the help of the R&D.

Getting started…

I started to dig up our bills, and I saw many different metrics, but I didn’t know what they meant.

One thing caught my eye — Cloudwatch’s cost was high (about $20,000 monthly).

After a little research, I discovered that we don’t have a retention policy for our log groups, so we keep them forever.

I wanted to set a life-cycle policy (similar to the one S3 has natively) to set the retention policy to 3 months and then export the log groups to an S3 bucket to archive it since it’s a much cheaper storage solution. However, I was amazed to see that there wasn’t a built-in automated option for it, so I had to build one of my own (Using step function and lambdas, it was really fun to build).

How does it work?

At Melio, we store log groups in CloudWatch to meet compliance requirements. However, due to the high costs associated with CloudWatch, we devised a cost-effective solution: exporting log groups to a more economical storage option — S3 buckets.

We implemented a custom solution to automate this export process using AWS Step Functions triggered by an event bus. Here’s a breakdown of the process, which occurs every three months:

  1. DynamoDB Table Creation:

Create a DynamoDB table containing the names of all log groups. This table acts as a registry for managing the export process.

2. Export Task Initialization:

Retrieve the last item from the DynamoDB table, initiating an export task for the corresponding log group. Subsequently, remove the item from the table.

3. Set Retention Policy:

Apply a retention policy of 3 months to the log group that was exported successfully, ensuring that only relevant data is retained in CloudWatch

4. Task Status Monitoring:

Check if the DynamoDB table is empty. If it is, the export process is complete. If not, wait for 15 minutes and monitor the status of the ongoing export task.

5. Task Completion Check:

If the export task is marked as done, start the next export task. If not, wait for 15 minutes and recheck the status.

This systematic approach ensures that log groups are exported to S3, reducing costs while adhering to compliance requirements. The periodic execution every three months guarantees that only necessary data remains in CloudWatch, contributing to significant cost savings over time.

After a month or two, I noticed the costs were decreasing less than anticipated. In addition to our custom solution effectively managing data retention and export, diving into CloudWatch metrics revealed another key expense: Ingested data cost.

While this solution remains beneficial for those with substantial expenses on CloudWatch log groups, I felt the need to delve deeper and explore additional avenues for savings.

Cloudwatch: the big money lies in writing, not in storing

I deep-dived into our billing metrics and saw that the price of the Ingested-Data (The writing to the log group) makes up most of our Cloudwatch’s cost, while our Stored-Bytes (The storage of the log groups) was pretty low, so I had to change the tactic.

I found out that we have three log groups that produce so many logs, that each log group costs more than $ 1,500$ monthly! Luckily, those log groups are pretty common and you can also benefit from it.

The first one was VPC Flow logs (Which record all the traffic that enters the VPC, useful for security and debugging purposes), which we simply modified to write logs into an S3 bucket instead of Cloudwatch (If you don’t need it you can just disable it), doing that saved us 1,500$ monthly!

Cloudtrail, when not properly configured, is REALLY expensive

Next, there was the Cloudtrail logs group. Cloudtrail is a useful (And expensive) service of AWS that records any action performed inside the AWS account.

We had two separate Cloudtrail log groups that we simply disabled and deleted (we didn’t even need them, since they were saved in S3 and the Cloudtrail dashboard as well).

And just like that, we saved another 4,000$!

After I saw how expensive the Cloudtrail log groups were, I decided to take another look at them. I found out that we have a duplicate trail, so we were paying extra — I just didn’t know how much extra. Disabling the additional trial resulted in saving 27,000$ per month! We went from paying 30,000$ monthly, to reducing the costs to only *3,000$ * monthly.

RIs and Savings Plans — the first steps toward cost optimization

One of the most common and simple ways to save costs is by purchasing Reserved Instances (RIs) and Saving Plans.

RIs and Savings Plans are similar, but with some key differences:

RIs are tied to a specific instance type in a specific region, so if you want to change a region to a different instance class mid-year, you will still be paying for the RIs you bought and are no longer using. Savings Plans, on the other hand, allow you the flexibility to switch between instance families, sizes, and OS within the same region. Both require a commitment of 1–3 years.

We already had a compute Saving Plan, which saved around 8,000$ per month (It’s valid for EC2, ECS, and Lambda functions, our architecture is mostly serverless so it was good for our needs). I purchased RIs for RDS (Relational Database Service), with the most basic plan (A 1-year commitment with no upfront cost, so you don’t have any reason not to use it!). Then we saved another 10,500$ per month.

Keep an eye out for unknown bills — you might be surprised

Last but not least, I saw an odd bill for a new service, called Security Lake. It was costly (Around 10,000$ per month), so I decided to check with the relevant team. The service didn’t provide enough value for them to justify its expensive price tag, so we disabled it and saved another 10,000$.

Conclusion

This was the first phase of reducing our cloud costs. The rest of the savings won’t be as easy to achieve, but will be worth it!

Remember that cost optimization is all about monitoring. `You should check each month that you don’t see unfamiliar bills or anomalies, and work constantly to reduce extra costs.

First, you need to pinpoint your most expensive services, prioritizing quality over quantity. It’s important to choose your battles wisely, you can’t optimize all of your costs (OK, you can but some of them are not worth the trouble, so make sure to focus on the most impactful ones).

It’s very satisfying to help make a difference with so little effort. I encourage you to try it yourself. Saving money for your organization can impact its growth, and you can take some of the credit for it. 🙂


Visit our career website


Top comments (0)