DEV Community

Cover image for Valuable Lessons from My First AWS Setup Mistakes (and How to Optimize Costs to Avoid Bill Shock!)
Ifan Jaya Suswanto Zalukhu for AWS Community Builders

Posted on • Edited on • Originally published at Medium

2

Valuable Lessons from My First AWS Setup Mistakes (and How to Optimize Costs to Avoid Bill Shock!)

When I first set up services on AWS, I learned a lot—or more accurately, I made a lot of mistakes that turned into valuable lessons. And I mean really valuable because the first one to notify me wasn’t the engineering team, nor was it a monitoring alert, but... AWS billing! If that’s not valuable, I don’t know what is—after all, it’s your cost suddenly spiking that tells you something went wrong. 😆

You can also listen to this post as a NotebookLM podcast, if that’s more your style 😊

For a bit of context, I currently work as the Head of Engineering at a SaaS startup in Medan, Indonesia.

But these mistakes happened when I was still a mobile developer at the same company. At first, I got involved in backend tasks because I often had to wait for APIs from the backend team. Once the backend was done, I then had to wait again for the DevOps team to set up and deploy it on AWS. To reduce dependency on other teams, I decided to start learning how to handle AWS and manage the backend APIs myself.

That’s where my AWS journey began—along with a series of mistakes. The funny thing is, many of these mistakes weren’t immediately obvious in the first month because they were still covered by the free tier. But after a few months, they started showing up on the billing statement.


1: Enabling CloudWatch Logs Without Retention and Minimum Log Level

One of my first mistakes was enabling logging in CloudWatch without setting data retention and log level filters. Logging is crucial for troubleshooting—without logs, debugging a service issue is like being a fortune teller, trying to guess what went wrong. 😂

However, I stored everything—Info, Warning, Debug, and Error logs—without filtering. The result? CloudWatch log storage grew rapidly, and the cost for ingested and scanned log data in CloudWatch Logs Insights became insanely expensive. This issue only became apparent after a few months when my CloudWatch billing started climbing.

How We Fixed It

  • Adjusted the minimum log level to Error to reduce stored data.

  • Set log retention based on actual needs. If logs are only needed for troubleshooting within the past 7 days, set retention to 7 days (default is "never expire").

  • Used CloudWatch Logs Insights queries only when necessary to avoid unnecessary costs.

References:


2: Not Enabling Retention / Lifecycle Policy in ECR

Most of our system runs on Docker containers, and our Docker images are stored in AWS Elastic Container Registry (ECR). The mistake? I didn’t enable retention or lifecycle policies in the ECR repository. As a result, old images were kept indefinitely, and storage costs kept growing.

This problem wasn’t obvious in the first six months since the cost was just a few cents. But as more images accumulated, our ECR storage bill started rising.

How We Fixed It

  • Enabled a lifecycle policy in ECR based on image count, keeping only the latest versions.

  • If an old image is needed, we simply rebuild it via CI/CD.

References:


3: Running Web Frontend on ECS Instead of Amazon S3 + CloudFront

Since most of our services run in Docker, our frontend was initially hosted on Amazon ECS. What happened? The data transfer costs were crazy high!

After some research, we decided to migrate to Amazon S3 + CloudFront. Our CI/CD pipeline now builds and compiles the frontend source code into S3, while CloudFront serves requests from there. The benefits:

  • CloudFront caching reduces data transfer costs.

  • 1TB free tier per month for outbound data transfer.

  • Better security with features like AWS WAF (Web Application Firewall).

References:


4: Vertical Scaling ECS Services with Temporary Spikes

When a service on ECS had traffic spikes, my first solution was to increase CPU and memory (vertical scaling). The problem? The spikes were occasional, like when users exported data. So, most of the time, the extra resources were sitting idle—wasting money.

How We Fixed It

  • Analyzed the spike patterns and root causes.

  • Used ECS auto-scaling to dynamically add tasks during spikes.

  • Moved heavy processes to AWS Lambda. For example, when users export data, ECS sends a queue to Amazon SQS, and Lambda processes it, storing the output in S3.


5: Not Using EC2 Spot Instances to Save Costs

Initially, all our ECS instances ran on EC2 On-Demand. But EC2 Spot Instances can reduce costs by up to 90% compared to On-Demand!

However, Spot Instances aren’t suitable for all workloads since AWS can terminate them anytime if demand rises. To mitigate this, we used Spot Instances for staging and development environments only.

To ensure availability, we used multiple instance types with the price-capacity-optimized strategy, which automatically selects the best-priced instance with the lowest termination risk.

References:


Tips to Avoid Cost Surprises

To help detect mistakes faster, here are a few things that have helped me recently:

  • Enable billing alerts in AWS Billing & Cost Management. Set thresholds and forecasts to avoid getting surprised at the end of the month. 😃

  • Regularly review your AWS costs—whether weekly, bi-weekly, or monthly—to spot cost trends and identify services with rising expenses.

  • Follow AWS updates and recommendations. AWS often suggests optimizations in the AWS Portal, such as cost-effective configurations.


Hopefully, these experiences help those of you just starting with AWS avoid these “valuable learning experiences” yourself! 😆

This article is also available in Indonesian:
Pelajaran Berharga dari Kesalahan Setup AWS Pertama Kali
(dan Cara Optimasi Cost-nya) Biar Tagihan Nggak Bikin Kaget!

Top comments (0)