DEV Community

Cover image for AWS SageMaker and Canvas
Michael Wahl for AWS Community Builders

Posted on

AWS SageMaker and Canvas

In this post, I will go through a step-by-step process on how to build some cost/price predictions based on historical data, and without writing a single line of code!

Problem/Challenges
With some relatively small CSV files containing historical data, we can make some powerful predications or forecasts based on that historical data. Of course, the more data that you have, the better, but start with what you have. I am going to follow a use case for Manufacturing. There are many others you can follow in the AWS labs, or simply upload your own datasets once you have AWS SageMaker up and running, and then choose your own adventure.

Solution(s)
We can use a few different AWS services, including AWS S3, SageMaker, and Canvas. Basically, we will upload the historical data to a private AWS S3 bucket, then using AWS SageMaker we will build a model, analyze it, and finally predict target values.

Preparation
For simplicity, I am going to use a dataset from an AWS lab. Canvas for Transportation & Logistics: Supply Chain delivery on-time.

Steps

Part 1
The first thing we need to do after signing into the AWS Console, is to choose the region that works for you, not all may be available for SageMaker, I will be using US-East-1. SageMaker Home

Next we need to create a domain, click Get Started button.
Image description

Select the create domain, a new window opens called setup SageMaker Domain.

Enter a Name under Domain name, I called mine forecast, but you can name it whatever makes sense to you. Next you will need to create a new IAM role to allow access to your AWS account. Click Create a new role, then select Any S3 bucket and click Create New Role. Once it’s done, click Submit at the bottom of the page.

If you encounter any errors, they will be displayed at the top, but following these steps you should be fine. There is also an option to specify or limit the IAM policy to an S3 bucket, I decided not to limit it, and named my s3 bucket "sagemaker-forecasting".

At this point, it took some time to create the domain, so now is a great time to grab a coffee, cold beverage, or make sure you have your data ready to upload to S3.

As I mentioned, we will be using a synthetic dataset called Shipping Logs Dataset. This dataset contains complete shipping data for all products delivered including estimated time, shipping priority, carrier and origin. The dataset can be downloaded here

Next we need to upload the dataset to an Amazon S3 bucket. if you are not familiar, an S3 bucket is simply a place to store and retrieve data in the cloud. You can create a new bucket or use any existing buckets, I will be using an existing s3 bucket I created earlier, when we setup the domain.

If you need to create a new S3 bucket, simply go to the AWS Console and search for S3 or click here.

Click the Create bucket button. Give a bucket a unique name and keep all other parameters as default. Once the s3 bucket is created and ready, click upload and select the dataset or the CSV file from above, in my example it's the ShippingLogs.csv file.

At this point we can head back to the SageMaker console and check if the domain is ready, it should now have a status of InService, click on the domain name when its ready and InService.

Select the launch to the right of the user profile, and choose
Canvas. The first time you login to SageMaker Canvas it will prompt you, you can skip this for now.

Image description

On the left side you will see a menu, click Datasets and then click the Import. Select Amazon S3 as the Data Source, select your CSV file and click the Import data button at the bottom. You will see an option to preview the first 100 rows and import the data. If you see canvas-sample-shipping-logs.csv, you and skip the import step, and instead click on the join datasets button. Now drag and drop from the left side of the screen to the right two datasets: canvas-sample-product-descriptions and canvas-sample-shipping-logs or the name of the dataset if you uploaded to AWS S3. Make sure that ProductID is used as join key, by clicking on the join icon. Now click on Save & close to save this dataset, and give it a name such as Logistics or SupplyChain.

Image description

We are ready to build and train our new model.

Select the model with the checkbox on the left, then click on Create Model on the top right, name it, and click on Create.

Image description

The first step in training our ML model is to choose our target variable. For this example, the target is the ExpectedShippingDays variable. This is what we want to train the model to predict. Canvas will automatically detect that this is a Numeric Prediction problem (also known as regression). Taking a quick pause here, this is where you may have something slightly different to predict as you experiment and evaluate different datasets, and use cases. I wont cover this in this post, but we can also run similar for "classification" use cases, by predicting the "OnTimeDelivery" instead of "ExpectedShippingDays".

To train a preview model, click on the Preview model button on the top right of the screen. This can take some time, anywhere from 2-10minutes is what I saw. When training a model, Canvas provides two options: Quick Build and Standard Build.

Once done, Canvas will automatically move to the Analyze tab, to show us the results of our quick training.

At this point we can move on to predict, this will give us the predicted values of ExpectedShippingDays for our example.

Top comments (0)