Julien Simon for AWS

Posted on May 20, 2019 • Originally published at Medium on May 20, 2019

Mastering the mystical art of model deployment, part 2: deploying Amazon SageMaker endpoints with…

#aws #ai #machinelearning #devops

Mastering the mystical art of model deployment, part 2: deploying Amazon SageMaker endpoints with AWS CloudFormation

In a previous article, I discussed several deployment strategies for machine learning models, and I showed you how to use the SageMaker SDK to deploy several models on the same prediction endpoint.

This is all fine and dandy, but in a production setting, I’d bet that most of you would prefer automating this properly with CloudFormation. One-click deployment, automatic clean-up and more: what’s not to like?

In this post, I’ll show you how to:

deploy a SageMaker endpoint backed by a single model ,
scale the endpoint by adding more instances,
add an extra production variant to the endpoint , i.e. add a second model to the endpoint and set up weighted round-robin,
do things The-Right-Way , with nice YAML templates and change sets.

As usual, all code is available on Gitlab.

Alright, magic time!

And this, my friend, is how you deploy two production variants.

Deploying a SageMaker endpoint

First, let’s write a template that lets us deploy a single model. We need to create Cloudformation resources for the model, the endpoint configuration and the endpoint itself. I’ve tried to keep the template as straightforward as possible, and it should work with any single-container model

For inference pipelines, you’ll have to list all containers in the Model resource.

As you can see, we’ll need to pass the container name, the location of the model artefact, and the role as parameters. These strings are not user-friendly, and it would be nicer to work with the training job name.

A bit of Python will help us with that. Thanks to the SageMaker SDK, we can easily describe a training job, extract the values we need and use them to create our stack.

All right, let’s run this.

$ python create-stack.py
xgboost-2019-05-09-15-20-51-276
{u'StackId': 'arn:aws:cloudformation:eu-west-1:123456789012:stack/endpoint-one-model/06661d80-78b5-11e9-9878-066324f10f26', 'ResponseMetadata': {'RetryAttempts': 0, 'HTTPStatusCode': 200, 'RequestId': '065e0749-78b5-11e9-a5d5-af6ed726348b', 'HTTPHeaders': {'x-amzn-requestid': '065e0749-78b5-11e9-a5d5-af6ed726348b', 'date': 'Fri, 17 May 2019 15:04:09 GMT', 'content-length': '388', 'content-type': 'text/xml'}}}

Meh. Let’s take a look at the Cloudformation console.

Pretty much what we expected: first create the model (i.e. “register” the model artifact in S3 as something that can be deployed), then the endpoint configuration, and finally the endpoint.

After a few minutes, the endpoint is in service and visible in the SageMaker console. Woohoo.

Updating the endpoint configuration

Now, let’s say we’d like to add a second instance to this endpoint. Now, don’t you start clicking in the console, ya hear me? This would introduce drift in the stack we just created, and we don’t like that around here.

Instead, let’s define a change set , and check its impact before executing it... which is how battle-scarred veterans manage production systems ;) I’m using the CLI (because), feel free to use your favorite language SDK if that works better for you.

$ aws cloudformation create-change-set \
--stack-name endpoint-one-model \
--change-set-name add-instances \
--use-previous-template \
--parameters ParameterKey="InstanceCount",ParameterValue="2" \
             ParameterKey=ModelName,UsePreviousValue=true \
             ParameterKey="ModelDataUrl",UsePreviousValue=true \
             ParameterKey="TrainingImage",UsePreviousValue=true \
             ParameterKey="RoleArn",UsePreviousValue=true

Basically, we’re telling Cloudformation: “hey, I’d like to use the same template with the same parameters as before, except that I want two instances now. Tell me what’s going to happen”.

No change is applied at this stage. We could call ‘aws cloudformation describe-change-set’ to see what’s going to happen, or we can look at a console for a nicer experience.

The instance count is defined in the endpoint configuration, which is itself associated to the endpoint, so it makes sense that both resources need to be updated. The harder question is: do they need to be replaced, and will the endpoint go down? ‘Conditional’ basically means ‘It depends’ (or ‘Roll a dice’ if you feel sarcastic). At this point, you would need to look at the Cloudformation documentation for these resources, and probably at the SageMaker documentation as well: in a production environment, getting this right is the difference between an invisible change and downtime, so RTFM.

I’ll save you the trouble (this time): the endpoint will not be taken down. A new endpoint configuration will be created, and applied to the endpoint without any downtime.

OK, now that we feel good about this change, let’s apply it.

$ aws cloudformation execute-change-set \
--change-set-name add-instances \
--stack-name endpoint-one-model

Done. Pretty quickly, we see the endpoint going into the ‘Updating’ status in the SageMaker console.

A few minutes later, the change set has been fully applied, and the Cloudformation console tells us the full story: a new endpoint configuration was created and applied to the endpoint.

Let’s take a quick look at the SageMaker console. Yup: the endpoint is ‘Active’ again, and it’s backed by two instances.

Pretty cool. Now let’s take things up a notch, and add a second model to the endpoint.

Adding a second model to the endpoint

Let’s say we’d like to use canary deployment to test a new model on the endpoint, starting with a fraction of live traffic and gradually increasing it to 100% (or rolling back…). In SageMaker terms, we need to create a second production variant running the new model.

Here’s the updated template. We’re simply adding a second Model resource , as well as a second production variant in the EndpointConfig resource. This second variant will get a third of the traffic (0.5/(1+0.5), meaning that the first one will get two thirds (1/1+0.5).

In a real production setting, we would certainly start lower (say, 10%) and gradually increase that number.

Here, I’d like to deploy another XGBoost model trained with the same container. If you want to use a model trained with another container, simply add another parameter, e.g. TrainingImage2.

Let’s create a change set. This time, we’re using an updated template, so we need to pass its location.

$ export MODEL2\_NAME="xgboost-190509-1528-016-39860b48"

$ export MODEL2\_ARTEFACT=""s3://sagemaker-eu-west-1-123456789012/sagemaker/DEMO-hpo-xgboost-dm/output/xgboost-190509-1528-016-39860b48/output/model.tar.gz"

$ aws cloudformation create-change-set \
--stack-name endpoint-one-model \
--change-set-name add-production-variant \
--template-body "file://endpoint-two-models.yml" \
--parameters \
   ParameterKey=ModelName2,ParameterValue=$MODEL2\_NAME \
   ParameterKey="ModelDataUrl2",ParameterValue=$MODEL2\_ARTEFACT \
   ParameterKey=ModelName,UsePreviousValue=true \
   ParameterKey="ModelDataUrl",UsePreviousValue=true \
   ParameterKey="TrainingImage",UsePreviousValue=true \
   ParameterKey="RoleArn",UsePreviousValue=true

Let’s check the impact in the console.

Cloudformation will create a new model, and then a new endpoint configuration which will be applied to the endpoint. Here too, the endpoint will stay up during the whole process.

Let’s execute the change set.

$ aws cloudformation execute-change-set --change-set-name add-production-variant --stack-name endpoint-one-model

Once again, the Cloudformation console displays the updates to the stack

After a few minutes, the update is complete. The SageMaker console confirms that the endpoint is ‘Active’, and that it’s now running two production variants. Note that variant-1 uses a single instance again, as defined in our updated template.

Once we’re satisfied that the model works as intended, we can update the stack again to shift all trafic to variant-2.

Cleaning up

What if we want to take this endpoint down? Super easy: just delete the stack, and Cloudformation will clean everything up automatically.

Careful with the ‘Delete Stack’ API: no confirmation will be asked for! Many DevOps engineers have learned this a bit too late… Needless to say, the AWS world breathed a sigh of relief when stack termination protection was introduced :)

$ aws cloudformation delete-stack --stack-name endpoint-one-model

Conclusion

As you can see, you can easily build safe and automated deployment workflows for SageMaker endpoints. Sure, the SageMaker SDK is great for development and testing, but when it comes to production, I don’t think you can beat the predictability and robustness of Cloudformation.

As always, thank you very much for reading. Happy to answer questions here or on Twitter.

Come, come to the Sabbath, let’s deploy Machine Learning models ;)