How a small misconfiguration cost me $10000 in AWS Bill!!

This was the time when I was just starting with AWS. I came from a background where I worked on bare-metal servers and knew little about Cloud Platforms. I joined a startup, and there I started working on AWS, where the company's entire production environment had only two EC2 instances - one hosting the web application and the other running the database.

Being a startup, after some time, they started getting traction, and with growing users, the database started growing at a rapid pace. To ensure data protection and business continuity, I was asked to design and implement a backup and disaster recovery (DR) strategy.

At the time, AWS did not offer a secondary region within the same country for our geography. Due to strict data compliance requirements, storing data outside the country was not an option - effectively ruling out cross-region DR within AWS. So, after much discussion, I finalised the following plan

Primary Hosting would remain on AWS.
Cold Disaster Recovery would be hosted on Google Cloud Platform (GCP), solely for worst-case scenarios.

To achieve the above I created 3 types of backup jobs–

Full Backup - Once every Friday night
Differential Backup - Every night
Transactional backup - Every 30 min

I uploaded the backups to AWS S3 and then synced them to GCP. In GCP, I retained only two weeks of data to keep storage costs under control.

For the first few months, everything appeared to work as expected. However, after roughly 4 months, I started seeing our AWS bill rising. When I looked closely at the bill, the cost spike was primarily due to data transfer charges running into terabytes. This was puzzling. Our database size was around 200 GB, and even with regular backups, my calculations suggested that monthly transfer costs should not exceed $500.

We raised a support ticket with AWS. After reviewing the case, AWS confirmed that the data transfer charges were legitimate and advised us to inspect our S3 buckets more closely.

After more investigation, I found terabytes of broken multipart uploads - incomplete files that had never been cleaned up. These broken multipart uploads were being picked up by the S3-to-GCP sync process and transferred repeatedly, massively increasing data transfer costs.

With the root cause found, the solution was simple. I applied S3 lifecycle policy to automatically delete incomplete multipart uploads. Once this rule was applied, the unnecessary data transfers stopped, and the AWS bills came down to normal in subsequent months.

From that point onward, every new S3 bucket I created included a default lifecycle rule to clean up incomplete multipart uploads. This costly lesson not only taught me to include this in our best practice but also to check and verify small configurations for Cloud setup and its governance, which we often overlook.

DEV Community

How a small misconfiguration cost me $10000 in AWS Bill!!

Top comments (0)