TL;DR: I built an AutoML platform on AWS that handles training and inference for pennies. By moving from always-on SageMaker endpoints to a Serverless Lambda + ONNX architecture, I eliminated idle costs completely—bringing the monthly bill for a side project from ~$150 down to practically $0.
As an AWS Community Builder, I'm always looking for ways to leverage cloud-native services to solve problems efficiently. In my previous article, I shared how to train models cheaply using AWS Fargate Spot. But training is only half the battle.
To turn a trained model into a product, you need Inference—the ability to make predictions on new data. The industry standard is Amazon SageMaker Real-time Endpoints. They are powerful, scalable, and enterprise-ready. But for a personal project or sporadic workload, they have one major downside: You pay for them 24/7, even when no one is using them.
In this post, I'll walk you through how I evolved AWS AutoML Lite into a full-stack ML platform by adding Serverless Inference, Dark Mode, and robust Model Comparison, all while keeping costs at rock bottom.
1. The Inference Challenge: $0 vs $150/month
The standard "easy" path on AWS is deploying a SageMaker Endpoint.
- Production Standard: An
ml.c5.xlargeinstance costs ~$147/month. - Budget Option: An
ml.t3.mediuminstance costs ~$36/month.
But what about traffic?
Let's compare costs for a Side Project (100k reqs) vs Startup Scale (10M reqs).
| Scenario | Monthly Requests | SageMaker (ml.t3.medium) | Lambda (Serverless) |
|---|---|---|---|
| Side Project | 100,000 | $36.00 | $0.35 |
| Startup Scale | 10,000,000 | $36.00 | ~$35.00 |
Transparent Accounting: "Serverless" isn't magic. You also pay for S3 Storage (to hold the model) and S3 GET Requests (every time a cold Lambda downloads it).
- Storage: 50MB model = ~$0.001/month.
- Access: 100k cold starts = ~$0.04 in request fees.
Even adding these "hidden" costs, the total remains ~$35.04 vs $36.00. The math holds up.
The Verdict: You would need over 10 million predictions/month just to match the cost of the cheapest, least reliable SageMaker instance. For the production-grade ml.c5.xlarge ($147), the break-even point is closer to 40 million requests.
Until you hit that scale, Serverless is vastly cheaper.
The Solution: ONNX Runtime on Lambda
I decided to move the inference logic to AWS Lambda. Since Lambda is event-driven, you only pay when code runs.
- Cost: ~$0.20 per 1 million requests.
- Idle Cost: $0.00.
To make this work efficiently, I used the ONNX (Open Neural Network Exchange) format.
- Export: During training (on Fargate), we convert models (Scikit-Learn, LightGBM) to
.onnx. - Deploy: The deployment process simply flags the job in DynamoDB—no server provisioning required.
- Predict: A Python Lambda function loads the model from S3 into memory (creating a "warm" start) and uses
onnxruntimeto generate predictions in milliseconds.
This architecture is the definition of Cloud Native Efficiency: maximum value, minimum waste.
We added a "Deploy" button to the results page. Under the hood, this doesn't spin up a server. It simply flags the job as "deployed" in DynamoDB and ensures the model artifacts are ready in S3. The API then knows to route prediction requests for that Job ID to the inference engine.
Bonus: Portable Models
If you don't even want to run Lambda, you can just download the model. We provide both the raw .pkl (Pickle) file and the .onnx file. This means you can run your trained model locally, on your own server, or even inside a browser using ONNX Runtime Web.
2. Comparing Models: Data-Driven Decisions
Training one model is rarely enough. You usually train 4-5 variations with different time budgets or datasets to see what sticks.
Version 1.1.0 introduces a Compare Page. You can select up to 4 training runs and see them side-by-side.
Once selected, the platform visualizes the differences. This is crucial for spotting trade-offs. Maybe Model A has slightly better accuracy (94%) but took 20 minutes to train, while Model B is close enough (92%) but finished in 2 minutes.
3. UI Polish: Dark Mode & UX
A modern developer tool isn't complete without Dark Mode. We implemented a system-aware theme switcher using next-themes and Tailwind CSS. It respects your OS preference by default but lets you toggle it manually.
We also moved from a "developer console" look to a cleaner design, optimizing the history table to show just the key stats you need.
4. Engineering Lessons Learned
Building features is fun, but fixing bugs teaches you more. Here are two big technical hurdles we overcame in this release.
Lesson A: The "Deleted" Resource that Wasn't
We encountered a frustrating bug: users would delete a training job, but if they clicked "Back" in their browser, the Job Details page would still load perfectly—serving stale data.
The Culprit: Browser Caching.
Even though the DELETE API call was successful, the browser had cached the previous GET request for the Job Details. When the user navigated back, the browser served the cached "200 OK" response from disk without asking the server.
The Fix:
We had to implement a strict "Trust No One" caching strategy:
- Backend: The
DELETEresponse now sends aggression anti-cache headers (no-store,max-age=0). - Frontend: The critical
getJobDetailsfetch call now usescache: 'no-cache'. This forces the browser to send an ETag check to the server.- If the job exists: Server says "304 Not Modified" (fast).
- If the job is deleted: Server says "404 Not Found" -> UI updates immediately to show the error state.
Takeaway: Deleting a resource on the server doesn't automatically purge it from your user's browser cache. You have to be explicit.
Lesson B: Preserving State via Polling
Our UI polls the backend every 5 seconds to update the training progress bar. However, we found that sometimes valid presigned URLs (for downloading models) would disappear or break during these updates.
Ideally, you shouldn't re-fetch the whole object if you only need the status. But since we do fetch the full state, we implemented a client-side merger. It takes the new status from the API but preserves any existing valid URLs from the old state. This prevents the "flickering" download buttons and ensures a smooth user experience.
What's Next? (Roadmap)
We aren't stopping here. The roadmap for v1.2 includes:
- Multi-user Authentication (Cognito)
- Email Notifications (SNS) when long training jobs finish.
- Hyperparameter Tuning UI for advanced users.
Try It Yourself
AutoML Lite is open source and can be deployed to your own AWS account in about 20 minutes.
If you have questions or want to contribute, drop a comment or open an issue on GitHub!






Top comments (0)