selvakumar palanisamy

Posted on Sep 12, 2021

Machine learning -AWS ML

#aws #machinelearning #datascience

Machine Learning

There is an increasing focus on data and machine learning these days.Machine learning helps us to make informed decision and now we have data and scientific methods to derive more intelligent and accurate forecasts.

Machine learning provides systems the ability to automatically learn and improve from experience without being explicitly programmed. AWS ML is a robust,cloud-based service that makes it easy for developers of all skill levels to use machine learning technology.

Steps to build an ML model.

1.Gather the dataset.

2.Perform data analysis to produce a low dimensional 
representation of a higher dimensional data set using 
unsupervised machine learning techniques like PCA and SOM.

3.Data cleaning and curation to get the pre-processed 
data. 

4.Split the data set into two  
   i) Training data set (80% of the pre-processed dataset) 
   ii) Test data set (20% of the pre-processed dataset)

5.Apply machine learning algorithms to get the validation 
  model and test model.

6.Derive the prediction value from the Model.

7.Evaluate the model performance using confusion matrix 
 and other ML formulas.

8.Re-iterate through this process until you get sufficient 
 accuracy for the prediction.
9.Deploy the model to use it in Production (Enterprise 
 level)

AWS ML

Amazon ML supports three types of ML models.

Binary classification

Multi-class classification

Regression

Both Binary classification and Multi-class classification comes under supervised learning and Regression comes under unsupervised learning.

Binary classification  Predict values that can only have two categories such as true or false.

Multi-class classification Predict values that belong to limited, predefined categories.

Regression (squared loss function + SGD) Predict a numeric value.

Below table provides details about the ML algorithm's

AWS ML STACK

Amazon SageMaker, machine-learning platform that takes care of abstracting a ton of software development skills necessary to accomplish the task.

SageMaker supports frameworks like TensorFlow, PyTorch, Apache MXNet, Chainer, Keras, Gluon, Horovod, Scikit-learn, and Deep Graph Library.

Amazon SageMaker Studio, the fully integrated development environment (IDE) for machine learning and helps developers to set up an end-to-end ML solution.

Simple Machine learning Pipeline

Automated ml pipeline can be created using the AWS code pipeline and Sagemaker services.

Build: Compile and build your custom source application code and output a docker image.

Training: Run training on the docker image with SageMaker API and output model artefacts to S3 bucket.

Sagemaker Jupyter notebook to build and deploy, train the model

Let's see how to build the sagemaker jupyter notebook to solve the common Bike-sharing demand prediction problem.

Problem : You are provided hourly rental data spanning two years. The training set is comprised of the first 19 days of each month, while the test set is the 20th to the end of the month. You must predict the total count of bikes rented during each hour covered by the test set, using only information available prior to the rental period.

First Import Libraries

import numpy as np
import pandas as pd
import boto3
import re
import sagemaker
from sagemaker import get_execution_role

Upload Data to S3

# Specify your bucket name
bucket_name = 'selva-ml-sagemaker'

training_folder = r'bikerental/training/'
validation_folder = r'bikerental/validation/'
test_folder = r'bikerental/test/'

s3_model_output_location = r's3://{0}/bikerental/model'.format(bucket_name)
s3_training_file_location = r's3://{0}/{1}'.format(bucket_name,training_folder)
s3_validation_file_location = r's3://{0}/{1}'.format(bucket_name,validation_folder)
s3_test_file_location = r's3://{0}/{1}'.format(bucket_name,test_folder)

def write_to_s3(filename, bucket, key):
    with open(filename,'rb') as f: # Read in binary mode
        return boto3.Session().resource('s3').Bucket(bucket).Object(key).upload_fileobj(f)

write_to_s3('bike_train.csv', 
            bucket_name,
            training_folder + 'bike_train.csv')

write_to_s3('bike_validation.csv',
            bucket_name,
            validation_folder + 'bike_validation.csv')

write_to_s3('bike_test.csv',
            bucket_name,
            test_folder + 'bike_test.csv')

Build

use_spot_instances = True
max_run = 3600 # in seconds
max_wait = 7200 if use_spot_instances else None # in seconds

job_name = 'xgboost-bikerental-v1'

checkpoint_s3_uri = None

if use_spot_instances:
    checkpoint_s3_uri = f's3://{bucket_name}/bikerental/checkpoints/{job_name}'

print (f'Checkpoint uri: {checkpoint_s3_uri}')

sess = sagemaker.Session()
role = get_execution_role()
container = sagemaker.image_uris.retrieve("xgboost",sess.boto_region_name,version="1.2-2")

estimator = sagemaker.estimator.Estimator(
    container,
    role,
    instance_count=1,
    instance_type='ml.m5.xlarge',
    output_path=s3_model_output_location,
    sagemaker_session=sess,
    base_job_name = job_name,
    use_spot_instances=use_spot_instances,
    max_run=max_run,
    max_wait=max_wait,
    checkpoint_s3_uri=checkpoint_s3_uri)

estimator.set_hyperparameters(max_depth=5,
                              objective="reg:squarederror",
                              eta=0.1,
                              num_round=150)

estimator.hyperparameters()


training_input_config = sagemaker.session.TrainingInput(
    s3_data=s3_training_file_location,
    content_type='csv',
    s3_data_type='S3Prefix')

validation_input_config = sagemaker.session.TrainingInput(
    s3_data=s3_validation_file_location,
    content_type='csv',
    s3_data_type='S3Prefix'
)

data_channels = {'train': training_input_config, 'validation': validation_input_config}

Train and deploy

estimator.fit(data_channels)
predictor = estimator.deploy(initial_instance_count=1,                           instance_type='ml.m5.xlarge',endpoint_name = job_name)

Run Predictions

from sagemaker.serializers import CSVSerializer
predictor.serializer = CSVSerializer()
predictor.predict(‘input’)

Happy Machine learning coding

DEV Community