Quang Chien TRAN

Posted on Mar 21 • Edited on Mar 27

How I Reduced Cloud Costs by 40% Using the "Clean & Think" Method

#aws #cloud #facture #devops

Actually, there have been tons of posts about how to optimize AWS costs. I’ve read them, analyzed them, and applied what makes sense for the cloud infrastructure I manage.

At some point, your cloud infra is stable, services run smoothly, no errors, the team is happy… but then at the end of the month, you look at the bill and—well… why are you paying a few thousand dollars for just a handful of services?

Whether it’s a big company or a small one, optimizing costs for any expense always needs careful consideration. Cloud cost is no exception. If you control it well, cloud is an amazing tool. If not… your wallet slowly bleeds every month and you don’t even know why.

How I optimize my infrastructure

Clean up the garbage

Yep, you read that right. In almost every infrastructure, if you don’t clean regularly, there will be a bunch of unused stuff still sitting around—and you’re paying for things that bring zero value.

My approach:

S3 buckets: Delete buckets or objects from non-production environments that are no longer in use.
ECR (Docker images): Implement lifecycle policies to prune old versions. By keeping only the latest 10–12 images, I stopped storage costs from ballooning and cut “zombie” storage waste by over 90%.
Networking & Storage: Periodically audit for Elastic IPs and EBS volumes that exist but are not attached to any instance.

If you don’t use it, turn it off

Simple logic: you work 8 hours a day. After that, you don’t use the system—but if services are still running, you’re still paying :D

For example, non-production environments can be stopped from 8 PM to 7 AM.

ECS → set desired count = 0
RDS → stop at night, start in the morning
EC2 → stop/start or scale auto scaling down to 0

All of this is super easy. Just combine Lambda + EventBridge to schedule it. Fully automated, no need to click manually.

Use less, pay less

This one is obvious for S3. Setting lifecycle policies can save you quite a bit.

Hot data → keep in standard storage
Cold data / logs backup → move to Glacier

You can also set lifecycle rules for ECR to auto-delete old images instead of doing it manually.

Ask: "Can this be optimized further?"

This mindset applies to almost everything I do, not just AWS.

I’m a backend dev. Sometimes I finish a task but still feel the code isn’t clean enough, naming isn’t right, or it doesn’t follow SOLID / reusable principles → I refactor.

Same with AWS cost optimization, but even more frequently.

When deploying systems (EC2, ECS, EKS, RDS), we often over-provision resources “just to be safe.” But you still pay for all of it.

Example:

With ECS Fargate, I've seen payment services set to 4 vCPU, 8GB RAM. After running, CPU and memory usage were only ~10–20%. So I cut the config in half, then monitored again. Around 50–70% utilization is a good balance.

So the question I always keep in mind is:

“Can this system (or task) be optimized further?”

And optimization isn’t just about cost—it’s also performance, scalability, and clean code.

If AWS recommends it, just follow

Honestly, AWS engineers know their stuff. They’ve laid out best practices in the Well-Architected Framework, and tools like Amazon Q can guide you. Here’s what I implemented:

Use S3 Gateway Endpoints: This is a total “cheat code.” By adding a Gateway VPC Endpoint to your routing table, traffic to S3 stays within the AWS internal network. It doesn’t go over the public internet, it’s more secure, and most importantly—S3 Gateway Endpoints are free. No data transfer markers, no hourly fees. (Just be careful not to choose the “Interface” type unless you specifically need it, as those do have a cost!)
Switch ECS Fargate to ARM: I converted our Fargate tasks from x86_64 to ARM (Graviton). It’s the “good-cheap-better” standard: it usually performs better for backend workloads and is roughly 20% cheaper right out of the box.
Reserved Instances & Savings Plans: For stable workloads like RDS or baseline EC2, this is a no-brainer. If you know you’ll be running it for a year, commit to it and take the 30–60% discount.
Ditch Public IPs: Since 2024, AWS charges for every public IPv4 address (~$3.60/month per IP). By keeping resources in private subnets and using internal communication, you save money and harden your security at the same time.
Free tier is great. A lot of my Lambda, SNS, SQS, and CloudWatch usage stays within the Free Tier, so I barely pay anything.
Keep a close eye on the AWS bill and set up an AWS Budget to avoid catching issues too late. Once you notice a service suddenly getting expensive, you need to understand exactly why. In general, checking Cost Explorer every 2 or 3 days is a safe habit.
If your application uses an RDS Cluster with heavy read/write activity, leading to high I/O costs (more than 25% of the total RDS bill), it may be worth considering AWS I/O Optimized storage. In that case, storage is about 30% more expensive, but I/O cost becomes 0.

In short, there are still many ways to save AWS costs. The most important thing is to understand the service and its pricing. Before using a service, you should first understand how it’s priced and analyze it carefully, because sometimes you jump in first, and only when the bill arrives do you ask why it’s so expensive.

Conclusion

With what I’ve done above, the AWS bill has dropped by 40% compared to before. I’m happy that everything is still running well, especially for small and medium businesses, where cost should be a top priority. Running well and cheap is still better than running well and expensive, right? 😅

“The views and optimizations shared here are my own personal engineering perspectives and do not represent the specific data or policies of any employer.”

(If you enjoy these kinds of engineering stories, you can subscribe or visit my blog to receive the next ones.)
Connect me on LinkedIn :D

DEV Community