An image classifier is a tool that is developed in the context of supervised learning, where through techniques associated with the field of deep learning, mainly convolutional neural networks (CNN), we seek to extract and learn features, shapes, and textures within an image, in order to achieve a classification model with a level of accuracy according to the conditions of the business context.
To build an image classification model, we must first obtain records of these, i.e. images of the classes within the context to work, and then start a modeling process, based on a life cycle widely studied in the area of machine learning (ML), which indicates that there are 4 fundamental pillars to develop an ML solution, with this we mean a data engineering procedure, application of artificial intelligence algorithms, performance evaluation of the model achieved, ending with deployment and monitoring of the solution.
Each of the stages mentioned involves extended development times, both in reaching the best model, as well as its deployment and use within an organization, if tools or work environments suitable for the creation of such solutions are not considered, which can become an over-adjusted solution in relation to the model code as such, a non-scalable solution from the point of view of deployment, and therefore have no business impact. In this post we will address the construction of an image classification model, relying on the great tool of pre-built solutions offered by Sagemaker Studio, that is, the JumpStart service, adding an alternative deployment via clicks, within the AWS cloud. In detail we will develop the following points:
- JumpStart service review
- Development of a solution using ResNet 152 algorithm from the family of models offered by Pytorch.
- Solution deployment using a lambda function, and an application with Simple Notification Service (SNS) notification service.
JumpStart of Sagemaker Studio
JumpStart can be understood as the evolution of the concept of built-in algorithms within AWS, offering a series of algorithms, not only pre-built but also pre-trained, such as computer vision algorithms.
At the same time, JumpStart allows you to train a model with your information, i.e. it is possible to train with your data, and deploy the solution in the cloud through clicks. In the following image you can visualize all JumpStart options:
In the following example to be developed, we will show the simple steps that allow us to develop a deep learning solution in the JumpStart service.
Practical example
The case to be developed applies in the context of the classification of construction materials, in which there are 6 types of objects, that is to say, we have 6 classes, which we must identify by means of a machine learning model. For this case, we selected the ResNet 152 model, from the Pytorch framework, which belongs to one of the multiple solutions offered by JumpStart. This particular algorithm has the best performance within the ResNet family, algorithms proposed by Kaiming He et al, 2015 in the article Deep Residual Learning for Image Recognition.
One of the particularities of this algorithm is the depth of the implemented layers, i.e. 152 layers deep.
Data
The amount of information for each class is sufficient to start the iteration directly with JumpStart, since the class with the least amount of observations is close to 300 and the largest has about 2,000 records. Using a code developed in a Jupyter notebook, we validate the extension of each file, since it must be in .jpg format, and a dataset is generated for train and another for test, in a proportion of 90/10 respectively.
Train and evaluation
Once the data is configured in S3, we proceed to deploy the Sagemaker Studio service. This will facilitate the use of the JumpStart service. Once the Studio interface is enabled, we go to the service containing the models and search for the ResNet 152 algorithm, select it and proceed to complete the information requested for training with the data from our exercise.
Finally, in 3 clicks we are already training a solution, which will take approximately 20 minutes to finish, tracking in cloud watch the training metrics.
The most important part of this process is to generate our .pth file compressed in a .tar.gz file, also called the model artifact.
Batch Transform
A common alternative is to perform a test by deploying an endpoint and configure a routine that allows running the inference on a batch of images, however, a question arose, what would be the cost of inferring on thousands of images, and not necessarily in real-time? therefore I decided to explore how to perform a batch job with the artifact generated by the JumpStart training.
For this process, the most important thing is that the model artifact must have a specific structure, storing a python inference file, or also known as an entrypoint.py when we develop a model locally. The structure of that artifact is shown below:
/tar.gz
|-- tar
| |-- label_info.json
| |-- model.pth
| `-- code
| -- _init_.py
| -- inference.py
| -- version
| `-- lib
| `-- constants
| -- _init_.py
| -- constants.py
Where the file inference.py contains the following functions:
import json
import logging
import os
import re
import numpy as np
import torch
from constants import constants
from PIL import Image
from sagemaker_inference import content_types
from sagemaker_inference import encoder
from six import BytesIO
logging.basicConfig(format="%(asctime)s %(message)s", level=logging.INFO)
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
class PyTorchIC:
"""Model class that wraps around the Torch model."""
def __init__(self, model, model_dir):
"""Image classification class that wraps around the Torch model.
Stores the model inside class and read the labels from a JSON file in the model directory.
"""
self.model = model
with open(os.path.join(model_dir, constants.LABELS_INFO)) as f:
self.labels = json.loads(f.read())[constants.LABELS]
def forward(self, tensors):
"""Make a forward pass."""
input_batch = tensors.unsqueeze(0)
input_batch = input_batch.to(DEVICE)
with torch.no_grad():
output = self.model(input_batch)
return torch.nn.functional.softmax(output[0], dim=0)
@classmethod
def tensorize(cls, input_data):
"""Prepare the image, return the tensor."""
try:
from torchvision import transforms
except ImportError:
raise
transform = transforms.Compose(
[
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
]
)
return transform(input_data).to(DEVICE)
@classmethod
def decode(cls, input_data, content_type):
"""Decode input with content_type"""
_ = content_type
return input_data
@classmethod
def encode(cls, predictions, accept):
"""Encode results with accept"""
return encoder.encode(predictions, accept)
def __call__(self, input_data, content_type=content_types.JSON, accept=content_types.JSON, **kwargs):
"""Decode the image, tensorize it, make a forward pass and return the encoded prediction."""
input_data = self.decode(input_data=input_data, content_type=content_type)
tensors = self.tensorize(input_data)
predictions = self.forward(tensors)
predictions = predictions.cpu()
output = {constants.PROBABILITIES: predictions}
if accept.endswith(constants.VERBOSE_EXTENSION):
output[constants.LABELS] = self.labels
predicted_label_idx = np.argmax(predictions)
output[constants.PREDICTED_LABEL] = self.labels[predicted_label_idx]
accept = accept.rstrip(constants.VERBOSE_EXTENSION)
return self.encode(output, accept=accept)
def model_fn(model_dir):
"""Create our inference task as a delegate to the model.
This runs only once per one worker.
"""
for root, dirs, files in os.walk(model_dir):
for file in files:
if re.compile(".*\\.pth").match(file):
checkpoint = re.compile(".*\\.pth").match(file).group()
try:
model = torch.load(checkpoint)
if torch.cuda.is_available():
model.to("cuda")
model.eval()
return PyTorchIC(model=model, model_dir=model_dir)
except Exception:
logging.exception("Failed to load model")
raise
def transform_fn(task: PyTorchIC, input_data, content_type, accept):
"""Make predictions against the model and return a serialized response.
The function signature conforms to the SM contract.
Args:
task (obj): model loaded by model_fn, in our case is one of the Task.
input_data (obj): the request data.
content_type (str): the request content type.
accept (str): accept header expected by the client.
Returns:
obj: the serialized prediction result or a tuple of the form
(response_data, content_type)
"""
# input_data = decoder.decode(input_data, content_type)
if content_type == "application/x-image":
input_data = Image.open(BytesIO(input_data)).convert("RGB")
try:
output = task(input_data=input_data, content_type=content_type, accept=accept)
return output
except Exception:
logging.exception("Failed to do transform")
raise
raise ValueError('{{"error": "unsupported content type {}"}}'.format(content_type or "unknown"))
With this structure set up, we proceed to run the batch jobs for each class, where we generate a confusion matrix for multiple classes.
While high values can be seen in the metrics, two of the classes are not performing well with data outside of the training sample,class-1 and class-4, so we decided to apply data augmentation to those classes.
Data Augmentation
For this process, functions predefined by Pytorch were used, using transformers that are applied to the images to obtain new files, incorporating modifications to the base image. The transformers applied are shown below:
from PIL import Image
from torchvision import transforms as tr
from torchvision.transforms import Compo
pipeline1 = Compose(
[tr.RandomRotation(degrees = 90),
tr.RandomRotation(degrees = 270)
])
augmented_image1 = pipeline1(img = im)
augmented_image1.show()
An example is shown below, the original image is:
The new image generated:
The process was applied to the two classes with the lowest performance, on the training data. We randomly selected 20% of the images from each class to run these transformations. After training the algorithm again, the new confusion matrix was obtained, where there was an increase in the metrics derived from this evaluation tool for classification problems, being this the model to be deployed.
Model use: Inference notification on email
For this exercise, we chose to use an AWS lambda function, which allows us to invoke the model when an image is received in an S3 bucket.
We also added the feature of sending an e-mail notification every time an inference is generated, that is, every time an image is uploaded to the bucket, the user is notified by e-mail of the model's inference.
Closing words
In this post we managed to develop an end-to-end process, in the context of deep learning algorithms, applying the 4 fundamental pillars of artificial intelligence modeling, and building a simple application of the use of these algorithms. The next challenge is to generate the deployment of this model through the AWS ml-ops orchestrator and test the consumption of the solution under a productive architecture. The core value of this publication was to demonstrate how the JumpStart service allows us to obtain artificial intelligence solutions, through a few clicks, training with your data, and deploying a solution in the cloud.
Top comments (0)