<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rustem Feyzkhanov</title>
    <description>The latest articles on DEV Community by Rustem Feyzkhanov (@ryfeus).</description>
    <link>https://dev.to/ryfeus</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F411360%2Fb25947df-b324-4d34-80b9-9ae47231dec4.jpeg</url>
      <title>DEV Community: Rustem Feyzkhanov</title>
      <link>https://dev.to/ryfeus</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ryfeus"/>
    <language>en</language>
    <item>
      <title>Using custom docker image with SageMaker + AWS Step Functions</title>
      <dc:creator>Rustem Feyzkhanov</dc:creator>
      <pubDate>Thu, 22 Oct 2020 05:22:20 +0000</pubDate>
      <link>https://dev.to/aws-heroes/using-custom-docker-image-with-sagemaker-aws-step-functions-4b0k</link>
      <guid>https://dev.to/aws-heroes/using-custom-docker-image-with-sagemaker-aws-step-functions-4b0k</guid>
      <description>&lt;p&gt;Amazon SageMaker is extremely popular for data science projects which need to be organized in the cloud. It provides a simple and transparent way to try multiple models on your data.&lt;/p&gt;

&lt;p&gt;Great news is that it becomes even cheaper to train and deploy models using SageMaker (up to 18% reduction on all ml.p2.* and ml.p3.* instance types) so it makes it even more suitable for integration with your existing production AWS infrastructure. It could be the case that you want to train your model automatically on-demand or retrain the model based on some changes in your data.&lt;/p&gt;

&lt;p&gt;There are multiple challenges associated with this task:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First, you need a way to organize preprocessing and postprocessing steps for training the model. The former could include ETL latest data and the latter could include updating or registering models in your system.&lt;/li&gt;
&lt;li&gt;Second, you need a way to handle long-running training tasks in an asynchronous way while also having a way to retry or restart the task in case some retriable error happened. Otherwise, you would want to handle the non-retriable error and notify that the process failed.&lt;/li&gt;
&lt;li&gt;Finally, you need a way to run multiple tasks in parallel in a scalable way in case you need to handle the retraining of multiple models in your system.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AWS Step Functions provide a way to tackle these challenges by orchestrating deep learning training workflows, which can handle the multi-step process with custom logic like retries or error handling while also providing a way to use together with different AWS services for computing like Amazon SageMaker, AWS Batch, AWS Fargate and AWS Lambda. It has nice additional features like scheduling the workflow or using it with AWS EventBridge to integrate with other services that you could use, for example, for notification purposes.&lt;/p&gt;

&lt;p&gt;In this post, I’ll cover a method to build a serverless workflow using Amazon SageMaker with a custom docker image to training a model, AWS Lambda for preprocessing and postprocessing, and AWS Step Functions as an orchestrator for the workflow.&lt;/p&gt;

&lt;p&gt;We will cover the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Using Amazon SageMaker for running the training task and creating custom docker image for training and uploading it to AWS ECR&lt;/li&gt;
&lt;li&gt;Using AWS Lambda with AWS Step Functions to pass training configuration to Amazon SageMaker and for uploading the model&lt;/li&gt;
&lt;li&gt;Using serverless framework to deploy all necessary services and return link to invoke Step Function&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Prerequisites:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Installed AWS CLI&lt;/li&gt;
&lt;li&gt;Installed docker&lt;/li&gt;
&lt;li&gt;Installed serverless frameworks with plugins&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Code decomposition:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Container folder which contains Dockerfile for building the image and train script for model training&lt;/li&gt;
&lt;li&gt;Index.py file which contains code for AWS Lambdas&lt;/li&gt;
&lt;li&gt;Serverless.yml file which contains configuration for AWS Lambda, execution graph for AWS Step Functions and configuration for Amazon SageMaker&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Using Amazon SageMaker for running the training task
&lt;/h2&gt;

&lt;p&gt;Amazon SageMaker provides a great interface for running custom docker image on GPU instance. It handles starting and terminating the instance, placing and running docker image on it, customizing instance, stopping conditions, metrics, training data and hyperparameters of the algorithm.&lt;/p&gt;

&lt;p&gt;In our example, we will make a container for the training classification model for fashion mnist dataset. The training code will look like classic training example, but will have two main differences:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Import hyperparameters step
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/opt/ml/input/config/hyperparameters.json&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;json_file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;hyperparameters&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json_file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hyperparameters&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Saving the model to S3 step
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/opt/ml/model/pipelineSagemakerModel.h5&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is how the dockerfile will look like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; tensorflow/tensorflow:1.12.0-gpu-py3&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip3 &lt;span class="nb"&gt;install &lt;/span&gt;boto3
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; PATH="/opt/ml/code:${PATH}"&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . /opt/ml/code/&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /opt/ml/code&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;chmod &lt;/span&gt;777 /opt/ml/code/train
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is how we will build and push the image to AWS ECR (you would need to replace accountId and regionId with account and region id):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/ryfeus/stepfunctions2processing.git
&lt;span class="nb"&gt;cd &lt;/span&gt;aws-sagemaker/container
docker build &lt;span class="nt"&gt;-t&lt;/span&gt; aws-sagemaker-example &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;span class="si"&gt;$(&lt;/span&gt;aws ecr get-login &lt;span class="nt"&gt;--no-include-email&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1&lt;span class="si"&gt;)&lt;/span&gt;
aws ecr create-repository &lt;span class="nt"&gt;--repository-name&lt;/span&gt; aws-sagemaker-example
docker tag aws-sagemaker-example:latest &amp;lt;accountId&amp;gt;.dkr.ecr.&amp;lt;regionId&amp;gt;.amazonaws.com/aws-sagemaker-example:latest
docker push &amp;lt;accountId&amp;gt;.dkr.ecr.&amp;lt;regionId&amp;gt;.amazonaws.com/aws-sagemaker-example:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Using AWS Lambda with AWS Step Functions to pass training configuration to Amazon SageMaker and for uploading the model
&lt;/h2&gt;

&lt;p&gt;In our case, we will use preprocessing Lambda to generate a custom configuration for the SageMaker training task. This approach can be used to make sure that we have a unique name for the SageMaker task as well as generate a custom set of hyperparameters. Also, it could be used to provide a specific docker image name or tag or to provide a custom training dataset.&lt;/p&gt;

&lt;p&gt;In our case the execution graph will consist of the following steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Preprocessing step which will generate config for the SageMaker task&lt;/li&gt;
&lt;li&gt;SageMaker step which will run the training job based on the config from the previous step&lt;/li&gt;
&lt;li&gt;Postprocessing step which can handler model publishing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is how the config for the Step Functions will look like. As you can see we define each step separately and then define what the next step in the process is. Also, we can define some parts of the SageMaker training job definition in its state config. In this case, we define instance type, docker image and whether to use Spot instance in the config.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;stepFunctions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;stateMachines&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;SagemakerStepFunction&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;events&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;http&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;startFunction&lt;/span&gt;
            &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;GET&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${self:service}-StepFunction&lt;/span&gt;
      &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;Fn::GetAtt&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;StepFunctionsRole&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;Arn&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;definition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;StartAt&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PreprocessingLambdaStep&lt;/span&gt;
        &lt;span class="na"&gt;States&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;PreprocessingLambdaStep&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Task&lt;/span&gt;
            &lt;span class="na"&gt;Resource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;Fn::GetAtt&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;preprocessingLambda&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;Arn&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
            &lt;span class="na"&gt;Next&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TrainingSagemakerStep&lt;/span&gt;
          &lt;span class="na"&gt;TrainingSagemakerStep&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Task&lt;/span&gt;
            &lt;span class="na"&gt;Resource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;arn:aws:states:::sagemaker:createTrainingJob.sync&lt;/span&gt;
            &lt;span class="na"&gt;Next&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PostprocessingLambdaStep&lt;/span&gt;
            &lt;span class="na"&gt;Parameters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;TrainingJobName.$&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$.name"&lt;/span&gt;
              &lt;span class="na"&gt;ResourceConfig&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="na"&gt;InstanceCount&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
                &lt;span class="na"&gt;InstanceType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ml.p2.xlarge&lt;/span&gt;
                &lt;span class="na"&gt;VolumeSizeInGB&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
              &lt;span class="na"&gt;StoppingCondition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="na"&gt;MaxRuntimeInSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;86400&lt;/span&gt;
                &lt;span class="na"&gt;MaxWaitTimeInSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;86400&lt;/span&gt;
              &lt;span class="na"&gt;HyperParameters.$&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$.hyperparameters"&lt;/span&gt;
              &lt;span class="na"&gt;AlgorithmSpecification&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="na"&gt;TrainingImage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;#{AWS::AccountId}.dkr.ecr.#{AWS::Region}.amazonaws.com/aws-sagemaker-example:latest'&lt;/span&gt;
                &lt;span class="na"&gt;TrainingInputMode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;File&lt;/span&gt;
              &lt;span class="na"&gt;OutputDataConfig&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="na"&gt;S3OutputPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;s3://sagemaker-#{AWS::Region}-#{AWS::AccountId}/&lt;/span&gt;
              &lt;span class="na"&gt;EnableManagedSpotTraining&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
              &lt;span class="na"&gt;RoleArn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;arn:aws:iam::#{AWS::AccountId}:role/SageMakerAccessRole&lt;/span&gt;
          &lt;span class="na"&gt;PostprocessingLambdaStep&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Task&lt;/span&gt;
            &lt;span class="na"&gt;Resource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;Fn::GetAtt&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;postprocessingLambda&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;Arn&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
            &lt;span class="na"&gt;End&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is how the execution graph will look like in the AWS Step Functions dashboard.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F3318397%2F96682169-4f4f8680-132d-11eb-9e25-da3c1f42d624.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F3318397%2F96682169-4f4f8680-132d-11eb-9e25-da3c1f42d624.png" alt="image1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is how the AWS Lambda code will look like. Since Amazon SageMaker requires all training jobs to have unique names, we will use random generator to generate unique string.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handlerPreprocessing&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;letters&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ascii_lowercase&lt;/span&gt;
    &lt;span class="n"&gt;suffix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;letters&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;jobParameters&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model-trainining-&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;suffix&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;hyperparameters&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;num_of_epochs&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;4&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;jobParameters&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handlerPostprocessing&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Using serverless framework to deploy all necessary services and return link to invoke Step Function
&lt;/h2&gt;

&lt;p&gt;We will use the serverless framework to deploy AWS Step Functions and AWS Lambda. There are following advantages of using it for deploying serverless infrastructure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Usage of plugins which provides a way to deploy and configure AWS Step Functions&lt;/li&gt;
&lt;li&gt;Resources section which enables to use cloudformation notation to create custom resources&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We can install dependencies and deploy services by using the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;aws-sagemaker
npm &lt;span class="nb"&gt;install
&lt;/span&gt;serverless deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is how the output will look like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Serverless: Packaging service...
Serverless: Excluding development dependencies...
Serverless: Creating Stack...
Serverless: Checking Stack create progress...
.....
Serverless: Stack create finished...
Serverless: Uploading CloudFormation file to S3...
Serverless: Uploading artifacts...
Serverless: Uploading service DeepLearningSagemaker.zip file to S3 &lt;span class="o"&gt;(&lt;/span&gt;35.3 KB&lt;span class="o"&gt;)&lt;/span&gt;...
Serverless: Validating template...
Serverless: Updating Stack...
Serverless: Checking Stack update progress...
.............................................
Serverless: Stack update finished...
Service Information
service: DeepLearningSagemaker
stage: dev
region: us-east-1
stack: DeepLearningSagemaker-dev
resources: 15
api keys:
  None
endpoints:
functions:
  preprocessingLambda: DeepLearningSagemaker-dev-preprocessingLambda
  postprocessingLambda: DeepLearningSagemaker-dev-postprocessingLambda
layers:
  None
Serverless StepFunctions OutPuts
endpoints:
  GET - https://&amp;lt;url_prefix&amp;gt;.execute-api.us-east-1.amazonaws.com/dev/startFunction
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can use the url in the output to call deployed AWS Step Functions and Amazon SageMaker. It could be done, for example, by using curl:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://&amp;lt;url_prefix&amp;gt;.execute-api.us-east-1.amazonaws.com/dev/startFunction
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that, we can take a look at Step Functions execution graph at AWS Step Functions dashboard (&lt;a href="https://console.aws.amazon.com/states/home" rel="noopener noreferrer"&gt;https://console.aws.amazon.com/states/home&lt;/a&gt;) and review the training job at Amazon SageMaker dashboard (&lt;a href="https://console.aws.amazon.com/sagemaker/home" rel="noopener noreferrer"&gt;https://console.aws.amazon.com/sagemaker/home&lt;/a&gt;). &lt;/p&gt;

&lt;p&gt;AWS Step Functions dashboard:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F3318397%2F96682175-4fe81d00-132d-11eb-90fd-733cbd11bf3d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F3318397%2F96682175-4fe81d00-132d-11eb-90fd-733cbd11bf3d.png" alt="image2"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Amazon SageMaker dashboard:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F3318397%2F96682180-51194a00-132d-11eb-8774-98c1cdd87879.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F3318397%2F96682180-51194a00-132d-11eb-8774-98c1cdd87879.png" alt="image3"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;We’ve created a deep learning training pipeline using Amazon SageMaker and AWS Step Functions. Setting everything up was simple, and you can use this example to develop more complex workflows, for example by implementing branching, parallel executions, or custom error handling.&lt;/p&gt;

&lt;p&gt;Feel free to check the project repository at &lt;a href="https://github.com/ryfeus/stepfunctions2processing" rel="noopener noreferrer"&gt;https://github.com/ryfeus/stepfunctions2processing&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>machinelearning</category>
      <category>docker</category>
    </item>
    <item>
      <title>Amazon CodeGuru Profiler for monitoring cloud applications</title>
      <dc:creator>Rustem Feyzkhanov</dc:creator>
      <pubDate>Fri, 10 Jul 2020 01:41:27 +0000</pubDate>
      <link>https://dev.to/aws-heroes/amazon-codeguru-profiler-for-monitoring-cloud-applications-2n32</link>
      <guid>https://dev.to/aws-heroes/amazon-codeguru-profiler-for-monitoring-cloud-applications-2n32</guid>
      <description>&lt;p&gt;One of the challenges with building cloud applications is finding the correct way to optimize it. To do so you need to have an insight into how your application performs in production and what are current bottlenecks and latencies which exist in your code. The usual cycle is that you monitor your application, you find existing bottlenecks, next you prioritize based on the latency or CPU/RAM impact and then you optimize your code. Finally, you need to monitor the app after the change to make sure that your fix actually helped.&lt;/p&gt;

&lt;p&gt;There are a lot of APM (Application Performance Monitoring) tools out there that could be used to monitor your application in production and they have different ways of gathering and visualizing data about your application. Amazon Codeguru takes a step further in the direction of handling bottlenecks and provides several great additions to the way how the optimization cycle works:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It provides automatic insights into what are possible ways to optimize your app. Service provides insights based on the combination of knowledge base and ML algorithms.&lt;/li&gt;
&lt;li&gt;It puts a dollar value on each bottleneck based on the instance which you are using. This feature enables you to prioritize optimization between multiple apps while also providing a way to calculate ROI for decreasing technical debt. Also, it provides a way to find methods that contribute the most to the latency of the app and estimate what are the possibilities for decreasing your app’s latency.&lt;/li&gt;
&lt;li&gt;It monitors data continuously and can notify you in case there are any anomalies in your app related. The service also generates cloudwatch metrics meaning you can add them in your main dashboard for low-level app monitoring and easier debugging.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In this post, I’ll cover a demo on how to use Amazon Codeguru with a Java application running on AWS Batch. I will use AWS Step Functions to handle AWS Batch invocations and will build the Java application using Maven.&lt;/p&gt;

&lt;p&gt;We will cover the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Building up and testing a Java application with Amazon CodeGuru Profiler&lt;/li&gt;
&lt;li&gt;Deploying Java application with AWS Batch and providing necessary permissions&lt;/li&gt;
&lt;li&gt;Monitoring application using Amazon CodeGuru dashboard&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Prerequisites:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Installed AWS CLI&lt;/li&gt;
&lt;li&gt;Installed Java 8 and Maven (optional)&lt;/li&gt;
&lt;li&gt;Installed docker&lt;/li&gt;
&lt;li&gt;Installed serverless frameworks with plugins&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Code decomposition:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Java project files (demoApplication.java, pom.xml)&lt;/li&gt;
&lt;li&gt;Dockerfile for the image which we will run&lt;/li&gt;
&lt;li&gt;Serverless config file which contains infrastructure which we will deploy&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Build and test Java application with Amazon CodeGuru profiler
&lt;/h2&gt;

&lt;p&gt;We will use maven to build our Java project. If you have Java 8 and maven installed, you can run the following commands to build the project and run it locally. You will also need to have your AWS credentials set up locally (&lt;a href="https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html&lt;/a&gt;). Keep in mind that we will deploy necessary infrastructure in the next step so you would be able to run code successfully only once necessary AWS Lambda is deployed. Alternatively, you can deploy AWS Lambda manually and update demoApplication.java with its name. Also, you will need to set up profiling group “demoApplication” in CodeGuru dashboard (&lt;a href="https://console.aws.amazon.com/codeguru/profiler/search):" rel="noopener noreferrer"&gt;https://console.aws.amazon.com/codeguru/profiler/search):&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/ryfeus/stepfunctions2processing.git
&lt;span class="nb"&gt;cd &lt;/span&gt;aws-batch-with-profiler/docker
mvn clean compile assembly:single
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can run your Java code locally with CodeGuru Profiler and it will report profiling data to AWS. It can be used for both testing your code locally and for profiling your code without deploying it to the cloud.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;java &lt;span class="nt"&gt;-jar&lt;/span&gt; target/demo-1.0.0-jar-with-dependencies.jar
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this demo we are using CodeGuru Profiler library to run Profiler within application. Alternative way is to download Profiler jar directly (from &lt;a href="https://d1osg35nybn3tt.cloudfront.net/" rel="noopener noreferrer"&gt;https://d1osg35nybn3tt.cloudfront.net/&lt;/a&gt;) and the run it as an additional parameter for Java command. In this case you don’t need to update your code to include Profiler logic and you don’t need to include dependencies and repositories to your maven project.&lt;/p&gt;

&lt;p&gt;Now let’s build docker image which we will run in AWS Batch by using docker.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker build &lt;span class="nt"&gt;-t&lt;/span&gt; test-java-app-with-profiler &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Deploying Java application to the cloud
&lt;/h2&gt;

&lt;p&gt;Once image is built, let’s create repository in AWS ECR and push docker image there. We would need to login into ECR, create repository, tag built image and push it to ECR.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws ecr get-login-password &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1 | docker login &lt;span class="nt"&gt;--username&lt;/span&gt; AWS &lt;span class="nt"&gt;--password-stdin&lt;/span&gt; &amp;lt;accountId&amp;gt;.dkr.ecr.us-east-1.amazonaws.com
aws ecr create-repository &lt;span class="nt"&gt;--repository-name&lt;/span&gt; test-java-app-with-profiler
docker tag test-java-app-with-profiler:latest &amp;lt;accountId&amp;gt;.dkr.ecr.us-east-1.amazonaws.com/test-java-app-with-profiler:latest
docker push &amp;lt;accountId&amp;gt;.dkr.ecr.us-east-1.amazonaws.com/test-java-app-with-profiler:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the container image is pushed, we can install dependencies for the serverless framework and deploy services by using the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;aws-batch-with-profiler/aws-batch
npm &lt;span class="nb"&gt;install
&lt;/span&gt;serverless deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is how the output will look like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Serverless: Packaging service...
Serverless: Excluding development dependencies...
Serverless: Creating Stack...
Serverless: Checking Stack create progress...
.....
Serverless: Stack create finished...
Serverless: Uploading CloudFormation file to S3...
Serverless: Uploading artifacts...
Serverless: Uploading service StepFuncBatchWithProfiler.zip file to S3 (32.94 KB)...
Serverless: Validating template...
Serverless: Updating Stack...
Serverless: Checking Stack update progress...
..........................................................................................
Serverless: Stack update finished...
Service Information
service: StepFuncBatchWithProfiler
stage: dev
region: us-east-1
stack: StepFuncBatchWithProfiler-dev
resources: 30
api keys:
  None
endpoints:
functions:
  async: StepFuncBatchWithProfiler-dev-async
layers:
  None
Serverless StepFunctions OutPuts
endpoints:
  GET - https://&amp;lt;urlPrefix&amp;gt;.execute-api.us-east-1.amazonaws.com/dev/startFunction
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can use the url in the output to call deployed AWS Step Functions and AWS Batch. It could be done, for example, by using curl:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl https://&amp;lt;urlPrefix&amp;gt;.execute-api.us-east-1.amazonaws.com/dev/startFunction
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that we can take a look at Step Functions execution graph at AWS Step Functions dashboard (&lt;a href="https://console.aws.amazon.com/states/home):" rel="noopener noreferrer"&gt;https://console.aws.amazon.com/states/home):&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fvnqnggrq2lpeqq88tk15.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fvnqnggrq2lpeqq88tk15.png" alt="Step Function Console"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can check logs produced by the application in Cloudwatch group “aws/batch/job” (&lt;a href="https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#logsV2:log-groups/log-group/$252Faws$252Fbatch$252Fjob" rel="noopener noreferrer"&gt;https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#logsV2:log-groups/log-group/$252Faws$252Fbatch$252Fjob&lt;/a&gt;). It should look in the following way:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Ft6pb9qppsh84hpm7j9jh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Ft6pb9qppsh84hpm7j9jh.png" alt="Cloudwatch Console"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Monitoring application using Amazon CodeGuru
&lt;/h2&gt;

&lt;p&gt;Once everything is set up and you’ve run multiple jobs with CodeGuru Profiler running, you can start monitoring the results. CodeGuru Profiler provides data in a flame graph format. You can see the stack of your code’ and libraries’ methods and for each method you can see the CPU or amount of time which was spent within the method. This allows you to see which methods give the most latency and what are the places for possible optimizations. One of the main advantages is that you also will be able to see cost per each method which provides a way to estimate savings of potential optimizations.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fye14xsngjoaa5x4yl2h7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fye14xsngjoaa5x4yl2h7.png" alt="CodeGuru Profiler"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;We deployed a demo Java application with Amazon CodeGuru Profiler on AWS Batch, wrapped by AWS Step Functions. You can use the same project settings to deploy your application and test it to see how it performs in the cloud and get recommendations. Amazon CodeGuru has a 90 day trial so you can do so completely free.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>java</category>
      <category>docker</category>
      <category>serverless</category>
    </item>
  </channel>
</rss>
