How I learned to tame my machine learning costs, and how you can avoid my mistakes!
I still remember the pit in my stomach when I opened my first AWS bill after a month of just experimenting with SageMaker. $956. For a side project. My gadget budget for the month—gone.
Sounds familiar? Well, that makes two.
The hard truth is, AWS won’t stop you from accidentally spending your rent money on machine learning. But you can.
Here Come My Painful Lessons:
- The 'Always On' Notebook: I often leave my SageMaker notebook running overnight. "I need to finish that experiment first before I can close that notebook". In reality, notebook instances bill per hour, even while you sleep. That - ml.t3.mediumcosts ~$0.058/hr. Doesn’t sound like much? Leave it on 24/7 for a week: $9.74. Do nothing for a month? $41.76.
 STOP your notebooks when you're done using them. To do this, go to AWS Console > Notebook Instances > Stop. Alternatively, you can use SageMaker Studio Lab (it's free!)
- The K!ller Training Job: I picked the biggest GPU (I mean, who doesn't like fast?) for my tiny cat/dog model. Sorry to burst your bubble, but - ml.p3.16xlarge(that fancy GPU) costs $24.48/hour. Training for 4 hours? $97.92. For a model you’ll tweak tomorrow? Ouch.
 Start small.- ml.m5.large= $0.115/hr will be okay, no matter how small you think it is. Use SageMaker Debugger to spot early failure. It'll save you hours.
- The Zombie Endpoint: I deployed my model to test the API... and forgot about it. Hehe. You should know that endpoints charge per hour, including compute. A small - ml.m5.largeendpoint left running = ~$83/month. Yes. Per model.
 DELETE your endpoints after testing them and thank yourself later. Also, you can make use of SageMaker Serverless Inference for sporadic traffic.
- The S3 Black Hole: This one may not be relatable to many ML folks, but hello big data people! (smiles in don't say it). 
 I saved 10 versions of my 50GB dataset. "Just in case, you know...". There was no case. And no, I don't know anything. I never returned to it. What I didn't know was that S3 charges $0.023/GB/month. 500GB = $11.50/month. Glacier storage is cheaper, true, but the retrieval fees bite.
 You can enable S3 Lifecycle Rules to auto-archive/delete your old datasets or models from S3, say if you don't use them after some period.
The Cost Survival Toolkit
- Billing Alarms: Set an alarm for whatever hurts for you. If it's $50 (hello bestie!), call it $50 alarm. Also, you can open your AWS Console, open AWS Cost Management > Budgets > Create Budget, pick "Monthly Budget", set your amount, and add email/SMS alerts. Do this NOW and thank yourself tomorrow for a better sleep tonight. 
- Cost Explorer: I call this my spending detective. You can find it in AWS Cost Management. You can filter by Service (SageMaker, S3, EC2) and Usage Type. 
 Pro Tip: You can add tags to projects (e.g., Project=cat-classifier) to track your costs per experiment. You're welcome.
- The Shutdown Ritual: Shutdown your notebook instances, delete your trial endpoints, terminate your unused training jobs, and empty your S3 buckets from your failed jobs. Do these EVERY TIME. It's not a ritual if you do it only sometimes. All of it can be a little hectic at first, but with time, it becomes your routine. Plus, bookmarking the 'special' consoles (Training Jobs, Notebooks) can also speed things up. 
The Golden Rule
Never sign out of your AWS account without asking yourself: "Did I leave anything running?"
Now, go build amazing stuff—without the heartburn, of course.
 
 
              
 
                       
    
Top comments (2)
Love this article- and very timely for me, too!
Means so much to me!