Automate the end-to-end AutoML lifecycle with Amazon SageMaker Autopilot and Amazon Step Functions — CD4AutoML
In my previous posts (linked below), I wrote about automating machine learning workflows on Amazon Web Services (AWS) with Amazon SageMaker and Amazon Step Functions. In those posts, I only provided GitHub Gists and minor code snippets but no fully working solutions. This left a lot of readers asking a lot of questions on the technical solutions either privately or via comments. I kinda solved a problem, but created more problems. This led to me ask myself:
How might I help my readers better achieve the job they wanted done anytime they employed my technical blog posts?
Answer is what you’re now reading. In this, I provide a working project on deploy automate an end-to-end AutoML lifecycle with Amazon SageMaker Autopilot and Amazon Step Functions, which I now call CD4AutoML. With end-to-end, I am not referring to calling the Amazon SageMaker Endpoint from a notebook. I am talking about having a publicly available serverless REST API (Amazon API Gateway) connected to your Amazon SageMaker Endpoint in a fully automated way. This means that you can serve predictions in your applications with a fully automated workflow requiring no developer input apart from committing code to GitHub.
Enough talk, you are already asking: How do I get started? Everything to get you started is available in the GitHub repository below:
OElesin/sagemaker-autopilot-step-functions
Architecture
This project is designed to get up and running with CD4AutoML ( I coined this word ), much CD4ML from Martin Fowler’s blogpost. This project indeed completes the “Automated” part of AutoML.
Technologies:
- Amazon Cloudformation
- Amazon Step Functions
- Amazon SageMaker Autopilot
- Amazon CodeBuild
- AWS Step Functions Data Science SDK
- AWS Serverless Application Model
- Amazon Lambda
- Amazon API Gateway
- Amazon SSM Parameter Store
With this project, you move out of the play/lab mode with Amazon SageMaker Autopilot into running real-life applications with Amazon SageMaker Autopilot.
State machine Workflow
The entire workflow is managed with AWS Step Functions Data Science SDK. Amazon Step Functions does not have service integration with Amazon SageMaker Autopilot out of the box. To manage this, I leveraged Amazon Lambda integration with Step Functions to periodically poll for Amazon SageMaker Autopilot job status.
Once the AutoML job is completed, a model is created using the Amazon SageMaker Autopilot Inference Containers, and an Amazon SageMaker Endpoint is deployed. But there is more…
On completion of the deployment of the Amazon SageMaker Endpoint, an Amazon CodeBuild Project state machine task is triggered which deploys our Amazon API Gateway with AWS Serverless Application Model.
See workflow image below:
Future Work
I have plans to abstract away all deployment details and convert this into a Python Module or better put AutoML-as-a-Service. Users can either provide their Pandas DataFrame or local CSV/JSON data, and the service takes care of the rest. Users will get a secure REST API which they can make predictions in their applications.
If you’re interested in working on this together, feel free to reach out. Also feel free to extend this project as it suites you. Experiencing any challenges getting started, create an issue and I will have a look as soon as I can.
Further Reading
- Part 1: Automating Machine Learning Workflows with AWS Glue, Amazon SageMaker and AWS Step Functions Data Science SDK
- Part 2: Automating Machine Learning Workflows Pt2: Amazon SageMaker Processing and AWS Step Functions Data Science SDK
- Amazon Step Functions
- Amazon Step Functions Developer Guide
- AWS Step Functions Data Science SDK
Kindly share your thoughts and comments — looking forward to your feedback. You can reach me via email, follow me on Twitter or connect with me on LinkedIn. Can’t wait to hear from you!!
Top comments (0)