Prasanth Mathesh for AWS Community Builders

Posted on May 19, 2021 • Edited on May 23, 2021

Automate SageMaker Real-Time ML Inference in a ServerLess way

#machinelearning #serverless #aws #cloud

Introduction

Amazon SageMaker is a fully managed service that enables data scientists and ML engineers to quickly create, train and deploy models and ML pipelines in an easily scalable and cost-effective way. The SageMaker was launched around Nov 2017 and I had a chance to get to know about inbuilt algorithms and features of SageMaker from Kris Skrinak during a boot camp roadshow for the Amazon Partners. Over the period, SageMaker has matured a lot to enable ML engineers to deploy and track models quickly and scalable. Apart from its built-in Algorithms, there were many new features like AutoPilot, Model Clarify and Feature Store, Docker Container. This blog will look into these new SageMaker features and the ServerLess way of training, deployment, and real-time inference.

Architecture

The steps for the below reference architecture are explained at the end of the SageMaker Pipeline section of this article.

SageMaker Features

A) Auto Pilot-Low Code Machine Learning

Launched around DEC 2019
Industry-first Automated ML to give control and visibility to ML Models
Does Feature Processing, picks the best algorithm, trains and selects the best model with just a few clicks
Vertical AI services like Amazon Personalize and Amazon Forecast can be used for personalized recommendation and forecasting problems
AutoPilot is a generic ML service for all kinds of classification and regression problems like fraud detection and churn analysis and targeted marketing
Supports inbuilt Algorithms of SageMaker like xgboost and linear learner
Default max size of input dataset is 5 GB but can be increased in GBs only

Auto-Pilot Demo
Data for AutoPilot Experiment
The dataset considered is public data provided by UCI.
Data Set Information
The survey data describes different driving scenarios including the destination, current time, weather, passenger, etc., and then asks the person whether he will accept the coupon if he is the driver. The task we will be performing on this dataset is Classification

AutoPilot Experiment

Import the data for training.

%%sh
wget https://archive.ics.uci.edu/ml/machine-learning-databases/00603/in-vehicle-coupon-recommendation.csv

Once data is uploaded, the AutoPilot can be set up within minutes using the SageMaker studio. Add the training input and output data paths, Label to predict and enable the auto-deployment of the model. SageMaker deploys the best model and creates an endpoint after the successful training.

Alternately one can select the model of their wish and deploy it.

The endpoint configurations and endpoint details of deployed model can be found in the console

Infer and Evaluate Model

Take a validation record and invoke the endpoint. The feature engineering tasks are done by autopilot and thus raw features data can infer the trained model and predict.

No Urgent Place,Friend(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,1,Some college - no 
degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,1

Infer the model using validation data set using the code given in Github.

B) SageMaker Clarify

Launched around DEC 2020
Explains how machine learning (ML) models made predictions during the Autopilot experiments
Monitors Bias Drift for Models in Production
Provides components that help AWS customers build less biased and more understandable machine learning models
Provides explanations for individual predictions available via API
Helps in establishing the model governance for ML applications

The bias information can be generated for the AutoPilot experiment.

bias_data_config = sagemaker.clarify.DataConfig(
    s3_data_input_path=training_data_s3_uri,
    s3_output_path=bias_report_1_output_path,
    label="Y",
    headers=train_cols,
    dataset_type="text/csv",
)

model_config = sagemaker.clarify.ModelConfig(
    model_name=model_name,
    instance_type=train_instance_type,
    instance_count=1,
    accept_type="text/csv",
)

C) SageMaker Feature Store

Launched around DEC 2020
Amazon SageMaker Feature Store is a fully managed repository to store, update, retrieve, and share machine learning (ML) features in S3.
The feature set that was used to train the model needs to be available to make real-time predictions (inference).
Data Wrangler of SageMaker Studio can be used to engineer features and ingest features into a feature store
Feature Store - both online and offline stores can be ingested via separate Featuring Engineering Pipeline via SDK
Streaming sources can directly ingest features to the online feature store for inference or feature creation
Feature Store automatically builds an Amazon Glue Data Catalog when Feature Groups are created and can optionally be turned off

The below table shows various data stores used to maintain the features. Some open source frameworks like Feast have evolved as feature store platform and any key-value data store that supports fast lookup can be used as Feature Store.

The feature stores are end-stage of the feature engineering pipeline and the features can be stored in cloud Data Warehouses like Snowflake, RedShift too as shown in the image of featurestore.org.

record_identifier_value = str(2990130)
featurestore_runtime.get_record(FeatureGroupName=transaction_feature_group_name, RecordIdentifierValueAsString=record_identifier_value)

The feature group can be accessed as Hive external table too.

CREATE EXTERNAL TABLE IF NOT EXISTS sagemaker_featurestore.coupon (
  write_time TIMESTAMP
  event_time TIMESTAMP
  is_deleted BOOLEAN
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
  STORED AS
  INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
  OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'
LOCATION 's3://coupon-featurestore/onlinestore/139219451296/sagemaker/ap-south-1/offline-store/coupon-1621050755/data'

D) SageMaker Pipelines

Launched around DEC 2020
SageMaker natively supports MLOPS via the SageMaker project and pipelines are created during the SageMaker Project creation
MLOPS is a standard to streamline the continuous delivery of models. It is essential for a successful production-grade ML application.
SageMaker pipeline is a series of interconnected steps that are defined by a JSON pipeline definition to perform build, train and deploy or only train and deploy etc.
The alternate ways to set up the MLOPS in SageMaker are Mlflow, Airflow and Kubeflow, Step Functions, etc.

Docker Containers

SageMaker Studio itself runs from a Docker container. The docker containers can be used to migrate the existing on-premise live ML pipelines and models into the SageMaker environment.

Both stateful and stateless inference pipelines can be created. For example the anomaly and fraud detection pipelines are stateless and the example considered in this article is a stateful model inference pipeline.

SageMaker Container Demo
Download the Github folder. The container folder should show files as shown in the image.

The dataset is the same as we have considered for Autopilot Experiment.

The sckit-learn algorithm is used for the local training and model tuning. After various iterations, the features having less importance have been removed and then encoding has been performed for the key features.

The final encoded features (97 labels) are stored in coupon_train.csv and will be used for training and validation locally.

Docker Container Build

The following steps have to be performed in an orderly manner.

Build the image

docker build -t recommend-in-vehicle-coupon:latest .

Train the features in local mode

./train_local.sh recommend-in-vehicle-coupon:latest

Serve the model in local mode

./serve_local.sh recommend-in-vehicle-coupon:latest

The servers are up and waiting for request.

Predict locally

The payload.csv will have features to predict the model. Run below command to predict the response for the features available in the csv.

./predict.sh payload.csv

Once the request is accepted, servers listening will respond to the requests received.

Push Image

Once the local testing is completed, the container train, deploy and serve image can be pushed to AWS ECR. In case any code change is done, the final build and push step alone is enough.

./build_and_push.sh

Deployed Container Image

The AWS ECR images can be pulled and containers can be run from Lambda, AWS EKS etc.

Lambda Function

The SageMaker API calls meant for training, deployment and inference are created as Lambda Functions. Then deployed Lambda handler function should be integrated with API Gateway so that pipeline can be run for any triggered API event.
The lambda function kept in Github has three major blocks.

Create SageMaker Training Function

The lambda will read features from s3 and complete the training.

client = boto3.client("sagemaker", region_name=region)
        client.create_training_job(**create_training_params) 
        status = client.describe_training_job(TrainingJobName=job_name)["TrainingJobStatus"]

Create SageMaker Model and Endpoint Function

Create the model
The training job will place model artifacts in s3 and that model has to be registered with SageMaker.

create_model_response = client.create_model(
              ModelName=model_name, ExecutionRoleArn=role, PrimaryContainer=primary_container
              )

Create End Point Config

response = client.create_endpoint_config(
            EndpointConfigName=endpoint_config_name,
            ProductionVariants=[
                {
                    'VariantName': 'variant-1',
                    'ModelName': model_name,
                    'InitialInstanceCount': 1,
                    'InstanceType': 'ml.t2.medium'
                }
            ]
        )

Create End Point

response = client.create_endpoint(
            EndpointName=endpoint_name,
            EndpointConfigName=endpoint_config_name
        )

Invoke SageMaker Model Function
Based on the API request body message, the endpoint will be invoked by the Lambda.

response = client.invoke_endpoint(
            EndpointName=EndpointName,
            Body=event_body.encode('utf-8'),
            ContentType='text/csv'
        )

The status of the In-service endpoint and the requests made to the endpoint can be checked in the cloud watch logs.

Testing State-full Real-time Inference

Trigger SageMaker Training

Once API Gateway and Lambda have been integrated, Training Job can be triggered by passing the below request body to Lambda function.

{"key":"train_data"}

Trigger SageMaker Model and Endpoint Deployment

Once the training job is completed, deploy the model with the below request body. The training job should be the job which we created recently.

{"key" : "deploy_model",
"training_job" :"<training job name>"
}

Trigger SageMaker Model Endpoint

Invoke the endpoint with the below request. The feature is encoded and should be the same as we used to train.

The predicted response will be as shown below.

The events created during invoking can be viewed in cloud watch logs.

Conclusion

Machine Learning inference costs account for more than 80 percent of operational costs for running the ML workloads. The SageMaker capabilities like container orchestration, multi-model endpoint, serverless inference can save both operational and development costs. Also,the event-driven training and inference pipelines can enable any non-technical person from the sales or marketing team to refresh both batch and real-time predictions with a click of a button built using the mechanisms like API, webhooks from their sales portal on an Adhoc basis before running their campaign.

DEV Community