DEV Community

prasanth mathesh for AWS Community Builders

Posted on • Updated on

Automate SageMaker Real-Time ML Inference in a ServerLess way


Amazon SageMaker is a fully managed service that enables data scientists and ML engineers to quickly create, train and deploy models and ML pipelines in an easily scalable and cost-effective way. The SageMaker was launched around Nov 2017 and I had a chance to get to know about inbuilt algorithms and features of SageMaker from Kris Skrinak during a boot camp roadshow for the Amazon Partners. Over the period, SageMaker has matured a lot to enable ML engineers to deploy and track models quickly and scalable. Apart from its built-in Algorithms, there were many new features like AutoPilot, Model Clarify and Feature Store, Docker Container. This blog will look into these new SageMaker features and the ServerLess way of training, deployment, and real-time inference.


The steps for the below reference architecture are explained at the end of the SageMaker Pipeline section of this article.
Alt Text

SageMaker Features

A) Auto Pilot-Low Code Machine Learning

  • Launched around DEC 2019
  • Industry-first Automated ML to give control and visibility to ML Models
  • Does Feature Processing, picks the best algorithm, trains and selects the best model with just a few clicks
  • Vertical AI services like Amazon Personalize and Amazon Forecast can be used for personalized recommendation and forecasting problems
  • AutoPilot is a generic ML service for all kinds of classification and regression problems like fraud detection and churn analysis and targeted marketing
  • Supports inbuilt Algorithms of SageMaker like xgboost and linear learner
  • Default max size of input dataset is 5 GB but can be increased in GBs only

Auto-Pilot Demo
Data for AutoPilot Experiment
The dataset considered is public data provided by UCI.
Data Set Information
The survey data describes different driving scenarios including the destination, current time, weather, passenger, etc., and then asks the person whether he will accept the coupon if he is the driver. The task we will be performing on this dataset is Classification

AutoPilot Experiment

Import the data for training.

Enter fullscreen mode Exit fullscreen mode

Once data is uploaded, the AutoPilot can be set up within minutes using the SageMaker studio. Add the training input and output data paths, Label to predict and enable the auto-deployment of the model. SageMaker deploys the best model and creates an endpoint after the successful training.

Alt Text

Alternately one can select the model of their wish and deploy it.

Alt Text

The endpoint configurations and endpoint details of deployed model can be found in the console

Alt Text

Alt Text

Infer and Evaluate Model

Take a validation record and invoke the endpoint. The feature engineering tasks are done by autopilot and thus raw features data can infer the trained model and predict.

No Urgent Place,Friend(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,1,Some college - no 
degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,1
Enter fullscreen mode Exit fullscreen mode

Alt Text

Infer the model using validation data set using the code given in Github.

Alt Text

B) SageMaker Clarify

  • Launched around DEC 2020
  • Explains how machine learning (ML) models made predictions during the Autopilot experiments
  • Monitors Bias Drift for Models in Production
  • Provides components that help AWS customers build less biased and more understandable machine learning models
  • Provides explanations for individual predictions available via API
  • Helps in establishing the model governance for ML applications

The bias information can be generated for the AutoPilot experiment.

bias_data_config = sagemaker.clarify.DataConfig(

model_config = sagemaker.clarify.ModelConfig(

Enter fullscreen mode Exit fullscreen mode

C) SageMaker Feature Store

  • Launched around DEC 2020
  • Amazon SageMaker Feature Store is a fully managed repository to store, update, retrieve, and share machine learning (ML) features in S3.
  • The feature set that was used to train the model needs to be available to make real-time predictions (inference).
  • Data Wrangler of SageMaker Studio can be used to engineer features and ingest features into a feature store
  • Feature Store - both online and offline stores can be ingested via separate Featuring Engineering Pipeline via SDK
  • Streaming sources can directly ingest features to the online feature store for inference or feature creation
  • Feature Store automatically builds an Amazon Glue Data Catalog when Feature Groups are created and can optionally be turned off

The below table shows various data stores used to maintain the features. Some open source frameworks like Feast have evolved as feature store platform and any key-value data store that supports fast lookup can be used as Feature Store.

The feature stores are end-stage of the feature engineering pipeline and the features can be stored in cloud Data Warehouses like Snowflake, RedShift too as shown in the image of

Alt Text

record_identifier_value = str(2990130)
featurestore_runtime.get_record(FeatureGroupName=transaction_feature_group_name, RecordIdentifierValueAsString=record_identifier_value)
Enter fullscreen mode Exit fullscreen mode

The feature group can be accessed as Hive external table too.

  write_time TIMESTAMP
  event_time TIMESTAMP
  is_deleted BOOLEAN
  INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
  OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'
LOCATION 's3://coupon-featurestore/onlinestore/139219451296/sagemaker/ap-south-1/offline-store/coupon-1621050755/data'
Enter fullscreen mode Exit fullscreen mode

D) SageMaker Pipelines

  • Launched around DEC 2020
  • SageMaker natively supports MLOPS via the SageMaker project and pipelines are created during the SageMaker Project creation
  • MLOPS is a standard to streamline the continuous delivery of models. It is essential for a successful production-grade ML application.
  • SageMaker pipeline is a series of interconnected steps that are defined by a JSON pipeline definition to perform build, train and deploy or only train and deploy etc.
  • The alternate ways to set up the MLOPS in SageMaker are Mlflow, Airflow and Kubeflow, Step Functions, etc.

Docker Containers

SageMaker Studio itself runs from a Docker container. The docker containers can be used to migrate the existing on-premise live ML pipelines and models into the SageMaker environment.

Both stateful and stateless inference pipelines can be created. For example the anomaly and fraud detection pipelines are stateless and the example considered in this article is a stateful model inference pipeline.

SageMaker Container Demo
Download the Github folder. The container folder should show files as shown in the image.

Alt Text

The dataset is the same as we have considered for Autopilot Experiment. 

The sckit-learn algorithm is used for the local training and model tuning. After various iterations, the features having less importance have been removed and then encoding has been performed for the key features.

The final encoded features (97 labels) are stored in coupon_train.csv and will be used for training and validation locally. 

Docker Container Build

The following steps have to be performed in an orderly manner.

  • Build the image
docker build -t recommend-in-vehicle-coupon:latest .
Enter fullscreen mode Exit fullscreen mode

Alt Text

  • Train the features in local mode
./ recommend-in-vehicle-coupon:latest
Enter fullscreen mode Exit fullscreen mode

Alt Text

  • Serve the model in local mode
./ recommend-in-vehicle-coupon:latest
Enter fullscreen mode Exit fullscreen mode

The servers are up and waiting for request.

Alt Text

  • Predict locally

The payload.csv will have features to predict the model. Run below command to predict the response for the features available in the csv.

./ payload.csv
Enter fullscreen mode Exit fullscreen mode

Alt Text

Once the request is accepted, servers listening will respond to the requests received.

Alt Text

  • Push Image

Once the local testing is completed, the container train, deploy and serve image can be pushed to AWS ECR. In case any code change is done, the final build and push step alone is enough.


Enter fullscreen mode Exit fullscreen mode

Alt Text

Deployed Container Image
Alt Text

The AWS ECR images can be pulled and containers can be run from Lambda, AWS EKS etc.

Lambda Function 

The SageMaker API calls meant for training, deployment and inference are created as Lambda Functions. Then deployed Lambda handler function should be integrated with API Gateway so that pipeline can be run for any triggered API event.
The lambda function kept in Github has three major blocks.

Create SageMaker Training Function

The lambda will read features from s3 and complete the training.

client = boto3.client("sagemaker", region_name=region)
        status = client.describe_training_job(TrainingJobName=job_name)["TrainingJobStatus"]

Enter fullscreen mode Exit fullscreen mode

Alt Text

Create SageMaker Model and Endpoint Function

Create the model
The training job will place model artifacts in s3 and that model has to be registered with SageMaker.

Register the models in the SageMaker environment using the below API call.

create_model_response = client.create_model(
              ModelName=model_name, ExecutionRoleArn=role, PrimaryContainer=primary_container
Enter fullscreen mode Exit fullscreen mode

Alt Text

Create End Point Config

response = client.create_endpoint_config(
                    'VariantName': 'variant-1',
                    'ModelName': model_name,
                    'InitialInstanceCount': 1,
                    'InstanceType': 'ml.t2.medium'
Enter fullscreen mode Exit fullscreen mode

Alt Text

Create End Point

response = client.create_endpoint(
Enter fullscreen mode Exit fullscreen mode

Alt Text

Invoke SageMaker Model Function
Based on the API request body message, the endpoint will be invoked by the Lambda.

response = client.invoke_endpoint(
Enter fullscreen mode Exit fullscreen mode

The status of the In-service endpoint and the requests made to the endpoint can be checked in the cloud watch logs.

Alt Text

Testing State-full Real-time Inference

Trigger SageMaker Training

Once API Gateway and Lambda have been integrated, Training Job can be triggered by passing the below request body to Lambda function.

Enter fullscreen mode Exit fullscreen mode

Alt Text

Trigger SageMaker Model and Endpoint Deployment

Once the training job is completed, deploy the model with the below request body. The training job should be the job which we created recently.

{"key" : "deploy_model",
"training_job" :"<training job name>"
Enter fullscreen mode Exit fullscreen mode

Trigger SageMaker Model Endpoint

Invoke the endpoint with the below request. The feature is encoded and should be the same as we used to train.

Alt Text

The predicted response will be as shown below.

Alt Text

The events created during invoking can be viewed in cloud watch logs.

Alt Text


Machine Learning inference costs account for more than 80 percent of operational costs for running the ML workloads. The SageMaker capabilities like container orchestration, multi-model endpoint, serverless inference can save both operational and development costs. Also,the event-driven training and inference pipelines can enable any non-technical person from the sales or marketing team to refresh both batch and real-time predictions with a click of a button built using the mechanisms like API, webhooks from their sales portal on an Adhoc basis before running their campaign.

Top comments (0)