DEV Community: Olalekan Fuad Elesin

AutoML Just Got a Lot More Automated with CD4AutoML and AWS Cloudformation Registry and CLI —…

Olalekan Fuad Elesin — Sun, 21 Jun 2020 10:20:38 +0000

AutoML Just Got a Lot More Automated with CD4AutoML and AWS Cloudformation Registry and CLI — Developer Preview

Last month I published a tutorial on automating end-to-end AutoML lifecycle with Amazon SageMaker Autopilot and Amazon Step Functions including code sample on GitHub and the architecture.

CD4AutoML with Amazon Cloudformation

Today, I would like to introduce you to my newest project, CD4AutoML via Amazon Cloudformationfor deploying managed end-to-end automated workflows for AutoML. CD4AutoML Cloudformation third-party resource type is now available in public beta — developer preview.

CD4AutoML, AWS Cloudformation and IaC

According to Gartner’s Magic Quadrant for Cloud AI Developer Services report

By 2023, 40% of development teams will be using automated machine learning services to build models that add AI capabilities to their applications, up from 2% in 2019.

This means that development teams will look to leverage DevOps principles to automate processes surrounding the development and deployment of automated machine learning models. Training automated machine learning models is one piece of the entire machine learning landscape. Deploying, monitoring, and maintaining these models make up larger parts of the landscape which might turn out to be overhead for development teams. The question then is, how might we enable development teams to use automated machine learning, remove the burden of maintaining the machine learning systems whilst embracing existing DevOps culture? Enter CD4AutoML via Amazon Cloudformation.

CD4AutoML allows you to train, deploy, and manage end-to-end automated machine workflows on managed cloud infrastructure.

Amazon Cloudformation allows you to define the desired AWS resources along with their configurations and connections in blueprint files called AWS Cloudformation templates. These templates then run within the Amazon Cloudformation console to provision the defined infrastructure.

Using CD4AutoML support for AWS Cloudformation Registry and CLI, you can, with not more than 6 lines of configuration parameters, provision an entire end-to-end AutoML system. This comes with:

Fully managed AutoML training infrastructure with retraining capabilities
Fully managed model serving infrastructure over REST API.
Model monitoring for model drift detection.

This means that we can now benefit from DevOps practices, deploying AutoML workflow Infrastructure as Code, IaC, when using AWS stack.

Integrate CD4AutoML with your AWS Account

As previously mentioned, deploying CD4AutoML workflow in your AWS Account is as simple as 6 configuration parameters in your AWS Cloudformation template. You are also required to have your training data in Amazon S3, CSV format and grant CD4AutoML AWS Account ID read access to your training data. Once all these are set, you are only aws cloudformation deploy command away from your managed AutoML system with CD4AutoML.

A Real Life Use Case

Let’s walk through the process so you see how easy it really is.

Grant Access to Training

{
  "Version": "2008-10-17",
  "Id": "Policy15237839393",
  "Statement": [
    {
      "Sid": "AllowCD4AutoML",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<CD4AutoML-AWS-AccountID>:role/CD4AutoMLWorkflow"
      },
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::my-training-data-bucket/data-path/\*"
    }
  ]
}

Setting up your workflow

Now that you have granted to your Amazon S3 Bucket containing your training data, you can deploy your CD4AutoML workflow as AWS Cloudformation resource. Your AWS Cloudformation template would look similar to the template below:

**Resources** :
**MyDirectMarketingPredictionWorkflow** :
**Type** : CD4AutoML::Workflow::Deploy
**Properties** :
**S3TrainingDataPath** : 's3://my-training-data-bucket/examples/direct-marketing/bank-additional-full.csv'
**TargetColumnName** : 'y'
**NotificationEmail** : 'test@example.com'
**WorkflowName** : 'workflow-v1'
**Schedule** : 14


**Outputs** :
**MyDirectMarketingPredictionWorkflowApi** :
**Value** : !Ref MyDirectMarketingPredictionWorkflow

The resource parameters for CD4AutoML::Workflow::Deploy resource can be found here.

Create Stack

Finally, once we have our templates defined, we can upload them to the AWS CloudFormation console, and hit ‘Create Stack’ and see all our required resources come to life for us. You can also run the AWS Cloudformation command using the AWS CLI command:

aws cloudformation deploy --stack-name <Your\_stack\_name> --template-file <path-to-template>.yaml

Resources deployed and managed in CD4AutoML AWS Account

The Future with CD4AutoML and Cloudformation Registry and CLI

Using AWS Cloudformation templates with CD4AutoML to deploy end-to-end automated machine learning workflows help you save time, reduce errors, cut back on repetitive work, and removes complexity managing ML systems. As AutoML evolves, adopting IaC will become an integral component to enable development teams continually deploy AutoML models to production with minimal management overhead.

CD4AutoML is in active development and will continue to build features that focus on business outcomes and developer productivity with Automated Machine Learning.

To request access for CD4AutoML Developer Preview, please visit the CD4AutoML Cloudformation resource GitHub project and create an issue.

Automate the end-to-end AutoML lifecycle with Amazon SageMaker Autopilot and Amazon Step Functions…

Olalekan Fuad Elesin — Sun, 24 May 2020 13:34:48 +0000

Automate the end-to-end AutoML lifecycle with Amazon SageMaker Autopilot and Amazon Step Functions — CD4AutoML

CD4AutoML with Amazon SageMaker Autopilot — Olalekan Elesin

In my previous posts (linked below), I wrote about automating machine learning workflows on Amazon Web Services (AWS) with Amazon SageMaker and Amazon Step Functions. In those posts, I only provided GitHub Gists and minor code snippets but no fully working solutions. This left a lot of readers asking a lot of questions on the technical solutions either privately or via comments. I kinda solved a problem, but created more problems. This led to me ask myself:

How might I help my readers better achieve the job they wanted done anytime they employed my technical blog posts?

Answer is what you’re now reading. In this, I provide a working project on deploy automate an end-to-end AutoML lifecycle with Amazon SageMaker Autopilot and Amazon Step Functions, which I now call CD4AutoML. With end-to-end, I am not referring to calling the Amazon SageMaker Endpoint from a notebook. I am talking about having a publicly available serverless REST API (Amazon API Gateway) connected to your Amazon SageMaker Endpoint in a fully automated way. This means that you can serve predictions in your applications with a fully automated workflow requiring no developer input apart from committing code to GitHub.

Enough talk, you are already asking: How do I get started? Everything to get you started is available in the GitHub repository below:

OElesin/sagemaker-autopilot-step-functions

Architecture

Automate the end-to-end AutoML lifecycle with Amazon SageMaker Autopilot on Amazon Step Functions — CD4AutoML

This project is designed to get up and running with CD4AutoML ( I coined this word ), much CD4ML from Martin Fowler’s blogpost. This project indeed completes the “Automated” part of AutoML.

Technologies:

Amazon Cloudformation
Amazon Step Functions
Amazon SageMaker Autopilot
Amazon CodeBuild
AWS Step Functions Data Science SDK
AWS Serverless Application Model
Amazon Lambda
Amazon API Gateway
Amazon SSM Parameter Store

With this project, you move out of the play/lab mode with Amazon SageMaker Autopilot into running real-life applications with Amazon SageMaker Autopilot.

State machine Workflow

The entire workflow is managed with AWS Step Functions Data Science SDK. Amazon Step Functions does not have service integration with Amazon SageMaker Autopilot out of the box. To manage this, I leveraged Amazon Lambda integration with Step Functions to periodically poll for Amazon SageMaker Autopilot job status.

Once the AutoML job is completed, a model is created using the Amazon SageMaker Autopilot Inference Containers, and an Amazon SageMaker Endpoint is deployed. But there is more…

On completion of the deployment of the Amazon SageMaker Endpoint, an Amazon CodeBuild Project state machine task is triggered which deploys our Amazon API Gateway with AWS Serverless Application Model.

See workflow image below:

CD4AutoML: Continuous Delivery for AutoML with Amazon SageMaker Autopilot and Amazon Step Functions

Future Work

I have plans to abstract away all deployment details and convert this into a Python Module or better put AutoML-as-a-Service. Users can either provide their Pandas DataFrame or local CSV/JSON data, and the service takes care of the rest. Users will get a secure REST API which they can make predictions in their applications.

If you’re interested in working on this together, feel free to reach out. Also feel free to extend this project as it suites you. Experiencing any challenges getting started, create an issue and I will have a look as soon as I can.

Continuously deploy to Expo with Amazon CodeBuild

Olalekan Fuad Elesin — Sun, 17 May 2020 09:04:18 +0000

This is going to be a really short post on what I learned in the past one week. For the past one week, I had been working on a mobile…

Continue reading on The Startup »

Using artificial intelligence to differentiate between human and synthetic hair wigs with Amazon…

Olalekan Fuad Elesin — Sat, 02 May 2020 07:19:10 +0000

Using artificial intelligence to differentiate between human and synthetic hair wigs with Amazon SageMaker

The hair extensions industry is a multi-billion dollar global industry. According to China Customs, China’s wig exports reached excess of $2 billion dollar, with US being the largest destination. However, wig manufacturers in China are moving focus from US to Africa.

Customer Problem Discovery

After speaking with a handful of ladies, one of the major challenges experienced by shoppers is Trust which is rooted in vendors selling synthetic hair (low quality) and human hair. Also speaking with vendors, it turns out that differentiating between both synthetic hair and human hair wigs can be challenging. One only knows when the hair is made or wig is bought and used. This appears like a classic task for Image Classification.

Implementation

In this tutorial, we will train an image classification model using Amazon SageMaker built-in algorithm. Images are organized in folders named for corresponding classes, human-hair and synthetic-hair.

images\_to\_classify

├── human-hair
│ ├── 1.jpg
│ ├── 2.jpg
| ├── 3.jpg
│ └── . . .
└── synthetic-hair
│ ├── 1.jpg
│ ├── 2.jpg
│ ├── 3.jpg
│ ├── . . .
└── . . .

The sample dataset was scraped from GlamorousHair.com and is available on Amazon S3.

Synthetic and Human Hair wig images. Courtesy: GlamorousHair.com

Hypothesis

If we are able to differentiate between synthetic and human hair wigs with images right before customers complete their purchase, then we can help customers can build trust with hair vendors. We will know that we succeeded if we achieve 65% validation accuracy with our machine learning model.

Jupyter Notebook available below:

OElesin/aws-samples

Hypothesis Validation

Based on the acceptance criterion defined above hypothesis, we can proceed to deploy our model into production. The deployment is completed with the Amazon SageMaker Endpoint.

First run, 10 epochs, with the following accuracy metrics:

Train accuracy: 1.000000
Validation-accuracy: 0.937500

We also trained on Spot instances saving 69.8% of the cost.

Further model improvements, Amazon SageMaker Automatic Model Tuning will be used. For real-life, constant retraining of the model and evaluation with Human-In-the-Loop with Amazon Augmented AI (Amazon A2I).

Conclusion

This is my first recipe with Amazon SageMaker Studio. A next step might be to train an image segmentation model to identify hair quality at pixel level.

I hope to share more with you. If you enjoyed reading this, kindly share and comment. You can reach me via email, follow me on Twitter or connect with me on LinkedIn. Can’t wait to hear from you!!

Dear Product Manager, how NOT to fail your next Data Science Initiative

Olalekan Fuad Elesin — Thu, 02 Apr 2020 07:08:12 +0000

How product managers should think of data science/AI initiatives — Olalekan Elesin

In the course of my career from a Software Engineer, Data Science Engineer, Data Landscape Engineer, Technical Product Manager, and now Product Manager, Data Science, I have built a series of failed and very successful data science projects. When I talk about successful, I mean huge business impact as measured by bottomline. See a few of my portfolio projects below:

Based on these, I can share with you what works and does not work — although I am yet to write about the one(s) that failed. As a product manager, many a time, you are informed that your company is adopting or rather developing an AI Strategy or something close. However, you wonder to yourself how this translates to improving lives for your users or its impact on your next product feature. Please be aware you are not alone. Many a product manager struggle with this as well. The problem is not you nor your company’s data scientists. The challenge is how to translate your product understanding to what data scientists understand and vice-versa.

In this short post, I share with you a framework that has worked for me over the years. What I share here are not only based on heuristics; I attended trainings such as Cloudera Data Science and recently bagged a certification from Wharton Online AI for Business course:

AI for Business was issued by Wharton Online to Olalekan Fuad Elesin.

In this post, we define a framework for Data Science and AI inclined projects. The goal of the framework is to create a repeatable execution model for DS/AI projects through shorter iterative cycles and strong focus on business/product outcomes. Following this framework, you will be guided through solving business/user problems with the scientific method, building new knowledge and data along the way. It is very important to note the following:

Data Science problem is NOT a Machine Learning problem.
Data Science projects DO NOT necessarily need to begin with a machine learning model.
NOT all Data Science projects will result in a machine learning model.

Originally, solving business problems with Data Science or Artificial Intelligence started out in Statistical and Mathematical modeling techniques which are the basis of the scientific method. Furthermore, solving business problems with Statistical and Mathematical modeling techniques resulted from the inefficiencies in traditional software programs in accommodating variation in data using standard control flows, i.e “if…else” statements.

Project Framework

The framework has its foundation in the CRISP DM (Cross-industry standard process for data mining). The starting point of applying this framework to a data science/AI project is that a clear business problem , an assumption of what the outcome of solving the problem and measurable signals of the outcome are defined. Definitely, data is required as with any data project but data is not the starting point. This is because, if data is defined as the starting point, there is a possibility to confine our scope to ONLY the data believe we have available. Technologies and machine learning techniques are least to know upfront, this is in the domain of our data scientists and machine learning experts to figure out.

Dear product manager, data science should be part of your toolbox — Olalekan Elesin

Problem Formulation with Framework

Having defined the individual components of the framework as outlined in the previous section, we define the smallest testable hypothesis which can be implemented within a short iteration and provides learnings in the direction of the ideal world, the defined outcome. See format and example below:

Format

We see/have .

We believe that will result in .

We will know

Why?

The goal is to aim for an assumption that can be implemented and tested within a 4-week work window or less. This is mainly due to scientific nature (lots of uncertainties) of developing data science initiatives. As such, we learn faster in shorter iterations and demonstrate value quickly (or park the idea for later if data collected informs us otherwise). This way, you will figure out measurable progress rapidly, not having to wait 8 to 18 months to know the idea was not worth it — think lean.

At the completion of every work window, you should evaluate your desired outcome with defined measurable signals as checks that you are moving in the right direction.

Key Takeaways

Focus on business/user problem not product feature
If you don’t have an AI Strategy, ask why. If you need some guidance, let’s talk.
Data Scientists are people, talk to them.
Agile and Lean practices are not from Software Engineering originally. They can be adopted to Data Science as well.

I hope you found this resourceful. Kindly share your thoughts and comments — looking forward to your feedback. You can reach me via email, follow me on Twitter or connect with me on LinkedIn. Can’t wait to hear from you!!

How to train and serve deep learning models on low budget — $20 or less per month

Olalekan Fuad Elesin — Thu, 27 Feb 2020 07:40:40 +0000

How to train and serve deep learning models on low budget — $20 or less per month

My friends, Samuel James and emmy adigun, and I recently submitted our project to the Facebook’s AI Hackathon hosted on DevPost. The main hackathon requirement was to (as stated on the hackathon page):

Build a creative and well-implemented solution using PyTorch that can unlock positive impact on businesses or people. The solution can be a machine learning model, an application, or focused on a creative project like art or music — all built with PyTorch.

None of us, at least I, had ever done anything with PyTorch. Besides, we had our separate daily jobs and we were pretty much a cross-continent distributed team. After a short call, we decided to help people eat healthier green by identifying diseased plants and fruits with artificial intelligence. With each of us possessing expertise in different parts of software engineering craft, we allocated our responsibilities as follows:

emmy adigun, being a user experience expert, to design the mobile app
Samuel James, the Chuck Norris of code, develop and train the deep learning model in PyTorch plus setting up the mobile app with Expo.
And I, coordinating and helping when I can.

With responsibilities, Sammy and I worked on the deep learning part together as this was new to both us. We started out searching for available dataset and using the image classification tutorial on PyTorch website. Our starting point was Google Colab to leverage free GPU instances. This, however, did not work out as planned as took longer than usual before the model training completed. Meaning, we to keep our laptops on for a long time. At this point, we split the work so as to deliver faster: Sam focused on developing the mobile app while I focused on PyTorch.

Welcome, Amazon SageMaker!

We wanted to run on a budget as low as we could, thankfully we have team AWS Account loaded with credits from AWS. I launched an Amazon SageMaker notebook instance with GPU (p2.xlarge), and got to work. We trained the PyTorch model successfully in good time and exported the model as pickled object in GPU and CPU versions. Here comes the hard part.

We could have easily hosted the model on Amazon SageMaker, but this would cost us about $100 a month. This is large portion of our AWS credits. Hence, we decided to be creative: Train on Amazon SageMaker, Deploy on Heroku Containers. Our idea is still an experiment, as such running on the free Heroku pricing tier.

Training time: ~45 mins

Total Cost: ~$5.00

Enter, Heroku Container Registry & Runtime

For me, this was more or less like the 8th Wonder of the World: free docker runtime. I wrote a simple Python Flask api to serve, following the same pattern as hosting custom models on Amazon SageMaker, built the docker container on my local and now time to push the image to Heroku Container Registry. This posed another issue as the image about 1.8GB in size and my internet upload speed was not fast enough, running for more than 1 hour. How to solve this? We got more creative.

We setup an Amazon CodeBuild pipeline linked to our project GitHub repository. The idea behind the build was push our model serving container image to Heroku. For this to happen, we needed to include Heroku credentials in the build. As fast as we wanted to be, we are also security first. So, we saved our Heroku security credentials in Amazon Systems Manager Parameter Store, and accessed at build time. With these setup, we were ready to go and as with every software project, nothing works the first time. However, we got it working. Our new build time: 5 mins , compared to >1 hour when we run from my local machine. Now, we only commit to GitHub, AWS CodeBuild does the rest

Setup time : Friday night, 2 hours 30 mins

Total cost : <$2

Finally, Automation. WHAT!!!@?

Our goal, should we be named as winners, would be to scale solution from product and technical perspective. On the technical front, until we make money, we need to find a way to be agile with our software —infrastructure as code, continuous deployment, continuous delivery. What better place to do this, if not on Amazon Web Services Cloud. I will not be sharing the code to our repository, but will show you a glimpse of our architecture.

How to train and serve deep learning models on low budget — $20 or less per month

Total cost : ~ $20.00 per month

Wrapping Up

For now, we are not able to share our code but I hope you got the idea and inspired you. We are not saying this is the best, however, constraints placed on us while working on this project forced to rethink our approach and come up with a creative solution to build within budget. If you have any questions, feel free to reach out to me or emmy adigun or Samuel James. You can reach me via email, follow me on Twitter or connect with me on LinkedIn. Can’t wait to hear from you!!

Automating Machine Learning Workflows Pt2: SageMaker Processing, SageMaker and AWS Step Functions…

Olalekan Fuad Elesin — Sat, 15 Feb 2020 13:17:07 +0000

Automating Machine Learning Workflows Pt2: Amazon SageMaker Processing and AWS Step Functions Data Science SDK

Automating Machine Learning Workflows with Amazon SageMaker Processing, Amazon SageMaker and AWS Step Functions Data Science SDK

In the previous blogpost, I demonstrated how to automate machine learning workflows with AWS Step Functions from data preparation with PySpark on AWS Glue to Model (Endpoint) Deployment with Amazon SageMaker. In this tutorial, I will repeat almost the same approach, however with a little adjustment in the data preparation phase.

Not all data preparation in machine learning require distributed nature of PySpark. Hence, what happens if you don’t need PySpark? Would you need to run Glue Jobs anyway? You could, but would probably be wasting Glue compute resources. One more thing to consider is the use of external libraries with AWS Glue; you have to package external libraries as zip files and upload to some external location. This process in itself might not be user friendly especially when you want to focus only on building your machine learning pipeline.

Enter Amazon SageMaker Processing, fully managed data processing and model evaluation solution. You can read more about Amazon SageMaker processing in this blogpost from AWS. As of the time of this post, Amazon SageMaker Processing is not yet part of AWS Step Functions Service Integrations, however, AWS Step Functions offers the flexibility to orchestrate AWS services with Lambda Functions.

Prepare the workflow

Knowing that Amazon SageMaker Processing is not natively integrated with AWS Step Functions, we will start by creating 2 Amazon Lambda functions that will orchestrate: Amazon SageMaker Processing Job creation and Amazon SageMaker Processing job status checker.

Once we have both functions in place, we are ready to define our workflow with the AWS Step Functions Data Science SDK. For more details about the AWS Step Functions Data Science SDK, you can read my previous blogpost or visit the project page on GitHub.

Amazon SageMaker Processing Jobs run processing scripts with pre-baked docker containers hosted on AWS ECR. Below, we will create a docker container with the necessary packages required for our processing job.

Note that you could as well achieve the above operation using AWS Cloudformation.

Workflow Definition

In my previous post, I demonstrated workflow definition steps from data preparation to model deployment. Since I am only replacing the AWS Glue Step, I will not go into the details of the model training and endpoint deployment steps.

Data Preparation with Amazon SageMaker Processing

We created our create-processing-job Lambda function with sensible defaults ensuring that we provide the minimum possible arguments as Amazon SageMaker Processing job configurations. Next we poll for the job status with a Lambda Function step every 60 seconds.

We then need to create the processing script, which will contain our data transformations and will be executed by Amazon SageMaker Processing on our docker container.

The above script is an example data preparation logic, depending on your use case and requirements, you might have more complex data transformation logic. Last step is to upload the processing script to an S3 location.

$ aws s3 cp scripts/preprocessing.py s3://my-code-bucket/processing/scripts/preprocessing.py

Having completed the build up to our data preparation steps, we proceed to defining our machine learning steps. For the purpose of brevity, I defined these steps in my previous post, so I will not delve into the details, but show how to chain the respective steps together:

Automating Machine Learning Workflows with AWS Glue, SageMaker and AWS Step Functions Data Science…

ml\_steps\_definition = Chain([
    training\_step,
    model\_step,
    endpoint\_config\_step,
    endpoint\_step,
    workflow\_success
])

To build the entire workflow graph, we will make use of AWS Step Function Choice States, which adds a branching logic to our workflow. The choice state checks the status of the data processing job, and proceeds to next steps based on the job status:

check\_job\_choice = Choice(
    state\_id= **"IsProcessingJobComplete"**  
)

check\_job\_choice.add\_choice(
    ChoiceRule.StringEquals(variable=get\_processing\_job\_status.output()[**'Payload'**][**'ProcessingJobStatus'**], value= **'InProgress'** ),
    next\_step=get\_processing\_job\_status
)

check\_job\_choice.add\_choice(
    ChoiceRule.StringEquals(variable=get\_processing\_job\_status.output()[**'Payload'**][**'ProcessingJobStatus'**], value= **'Stopping'** ),
    next\_step=get\_processing\_job\_status
)

check\_job\_choice.add\_choice(
    ChoiceRule.StringEquals(variable=get\_processing\_job\_status.output()[**'Payload'**][**'ProcessingJobStatus'**], value= **'Failed'** ),
    next\_step=workflow\_failure
)

check\_job\_choice.add\_choice(
    ChoiceRule.StringEquals(variable=get\_processing\_job\_status.output()[**'Payload'**][**'ProcessingJobStatus'**], value= **'Stopped'** ),
    next\_step=workflow\_failure
)

check\_job\_choice.add\_choice(
    ChoiceRule.StringEquals(variable=get\_processing\_job\_status.output()[**'Payload'**][**'ProcessingJobStatus'**], value= **'Completed'** ),
    next\_step=ml\_steps\_definition
)

The choice state is explained as follows: if the processing job is running, it loops back to check the job status, if the job failed, the entire workflow is terminated and if the job is completed, it proceeds to the model training and endpoint deployment steps.

Our complete machine learning workflow is chained together as the following:

ml\_workflow\_definition = Chain(
    [
        create\_processing\_job\_step, 
        get\_processing\_job\_status,
        check\_job\_wait\_state,
        check\_job\_choice
    ]
)

ml\_workflow = Workflow(
    name= **"MyCompleteMLWorkflow\_v2"** ,
    definition=ml\_workflow\_definition,
    role=workflow\_execution\_role
)

Automating Machine Learning Workflows with Amazon SageMaker Processing, Amazon SageMaker and AWS Step Functions Data Science SDK

Create and execute the workflow:

**try** :
    workflow\_arn = ml\_workflow.create()
**except** BaseException **as** e:
    print( **"Workflow already exists"** )
    workflow\_arn = ml\_workflow.update(ml\_workflow\_definition)

# execute workflow 
ml\_workflow.execute(
    inputs={
**'IAMRole'** : **sagemaker\_execution\_role** ,
**'EcrContainerUri'** : **'1234567890.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-processing-container'** ,
**'S3InputDataPath'** : **f's3://{data\_bucket}/raw-data/'** ,
**'S3OutputDataPath'** : **f's3://{data\_bucket}/{processed\_data\_output\_path}/'** ,
**'S3CodePath'** : **'s3://my-code-bucket/processing/scripts/preprocessing.py'** ,
**'JobName'** : job\_name,   
**'ModelName'** : model\_name_,_  
**'EndpointName'** : endpoint\_name
    }
)

Successful machine learning workflow with Amazon SageMaker Processing, Amazon SageMaker and AWS Step Functions Data Science SDK

Conclusion

In my previous post, I demonstrated how to create an end-to-end machine learning workflow on AWS using AWS Glue for data preparation. In this post, I swapped the data preparation component with Amazon SageMaker processing. Your choice of AWS Service to execute data processing job on depends on your use case. You can also use with Step Functions service integration with AWS EMR to run your data preparation jobs which is also supported natively by the AWS Step Functions Data Science SDK.

DEV Community: Olalekan Fuad Elesin

AutoML Just Got a Lot More Automated with CD4AutoML and AWS Cloudformation Registry and CLI —…

AutoML Just Got a Lot More Automated with CD4AutoML and AWS Cloudformation Registry and CLI — Developer Preview

CD4AutoML with Amazon Cloudformation

CD4AutoML, AWS Cloudformation and IaC

Integrate CD4AutoML with your AWS Account

A Real Life Use Case

Grant Access to Training

Setting up your workflow

Create Stack

The Future with CD4AutoML and Cloudformation Registry and CLI

Automate the end-to-end AutoML lifecycle with Amazon SageMaker Autopilot and Amazon Step Functions…

Automate the end-to-end AutoML lifecycle with Amazon SageMaker Autopilot and Amazon Step Functions — CD4AutoML

Architecture

Technologies:

State machine Workflow

Future Work

Further Reading

Continuously deploy to Expo with Amazon CodeBuild

Using artificial intelligence to differentiate between human and synthetic hair wigs with Amazon…

Using artificial intelligence to differentiate between human and synthetic hair wigs with Amazon SageMaker

Customer Problem Discovery

Implementation

Hypothesis

Hypothesis Validation

Conclusion

Dear Product Manager, how NOT to fail your next Data Science Initiative

Project Framework

Problem Formulation with Framework

Format

Why?

Key Takeaways

How to train and serve deep learning models on low budget — $20 or less per month

How to train and serve deep learning models on low budget — $20 or less per month

Welcome, Amazon SageMaker!

Enter, Heroku Container Registry & Runtime

Finally, Automation. WHAT!!!@?

Wrapping Up

Automating Machine Learning Workflows Pt2: SageMaker Processing, SageMaker and AWS Step Functions…

Automating Machine Learning Workflows Pt2: Amazon SageMaker Processing and AWS Step Functions Data Science SDK

Prepare the workflow

Workflow Definition

Conclusion

Further Reading