Automate the end-to-end AutoML lifecycle with Amazon SageMaker Autopilot and Amazon Step Functions…

#automl #aws #awsstepfunctions #machinelearning

Automate the end-to-end AutoML lifecycle with Amazon SageMaker Autopilot and Amazon Step Functions — CD4AutoML

CD4AutoML with Amazon SageMaker Autopilot — Olalekan Elesin

In my previous posts (linked below), I wrote about automating machine learning workflows on Amazon Web Services (AWS) with Amazon SageMaker and Amazon Step Functions. In those posts, I only provided GitHub Gists and minor code snippets but no fully working solutions. This left a lot of readers asking a lot of questions on the technical solutions either privately or via comments. I kinda solved a problem, but created more problems. This led to me ask myself:

How might I help my readers better achieve the job they wanted done anytime they employed my technical blog posts?

Answer is what you’re now reading. In this, I provide a working project on deploy automate an end-to-end AutoML lifecycle with Amazon SageMaker Autopilot and Amazon Step Functions, which I now call CD4AutoML. With end-to-end, I am not referring to calling the Amazon SageMaker Endpoint from a notebook. I am talking about having a publicly available serverless REST API (Amazon API Gateway) connected to your Amazon SageMaker Endpoint in a fully automated way. This means that you can serve predictions in your applications with a fully automated workflow requiring no developer input apart from committing code to GitHub.

Enough talk, you are already asking: How do I get started? Everything to get you started is available in the GitHub repository below:

OElesin/sagemaker-autopilot-step-functions

Architecture

Automate the end-to-end AutoML lifecycle with Amazon SageMaker Autopilot on Amazon Step Functions — CD4AutoML

This project is designed to get up and running with CD4AutoML ( I coined this word ), much CD4ML from Martin Fowler’s blogpost. This project indeed completes the “Automated” part of AutoML.

Technologies:

Amazon Cloudformation
Amazon Step Functions
Amazon SageMaker Autopilot
Amazon CodeBuild
AWS Step Functions Data Science SDK
AWS Serverless Application Model
Amazon Lambda
Amazon API Gateway
Amazon SSM Parameter Store

With this project, you move out of the play/lab mode with Amazon SageMaker Autopilot into running real-life applications with Amazon SageMaker Autopilot.

State machine Workflow

The entire workflow is managed with AWS Step Functions Data Science SDK. Amazon Step Functions does not have service integration with Amazon SageMaker Autopilot out of the box. To manage this, I leveraged Amazon Lambda integration with Step Functions to periodically poll for Amazon SageMaker Autopilot job status.

Once the AutoML job is completed, a model is created using the Amazon SageMaker Autopilot Inference Containers, and an Amazon SageMaker Endpoint is deployed. But there is more…

On completion of the deployment of the Amazon SageMaker Endpoint, an Amazon CodeBuild Project state machine task is triggered which deploys our Amazon API Gateway with AWS Serverless Application Model.

See workflow image below:

CD4AutoML: Continuous Delivery for AutoML with Amazon SageMaker Autopilot and Amazon Step Functions

Future Work

I have plans to abstract away all deployment details and convert this into a Python Module or better put AutoML-as-a-Service. Users can either provide their Pandas DataFrame or local CSV/JSON data, and the service takes care of the rest. Users will get a secure REST API which they can make predictions in their applications.

If you’re interested in working on this together, feel free to reach out. Also feel free to extend this project as it suites you. Experiencing any challenges getting started, create an issue and I will have a look as soon as I can.