DEV Community

Cover image for Architecting MLOps for Computer Vision using AWS Sagemaker

Architecting MLOps for Computer Vision using AWS Sagemaker

In this post, we will dive into the world of Machine Learning Operations (MLOps) and explore how it can revolutionize the process of creating, deploying, and managing machine learning models. We will also walk you through a practical use case involving computer vision to demonstrate the real-world application of MLOps principles and how AWS SageMaker plays a pivotal role in streamlining this journey.

What is MLOps?

Let's start with the basics. MLOps, short for Machine Learning Operations, is all about making the process of creating, deploying, and managing machine learning models as smooth and efficient as possible. It's like taking the principles of DevOps, which helps streamline software development, and applying them to the world of machine learning. Imagine it as the smart engine that powers the lifecycle of machine learning, making it easier to build and deploy intelligent solutions.

Image description
https://neptune.ai/blog/mlops

Now, let's break down MLOps into four key principles that guide us through this journey. First, we have 'Version control,' which is like keeping track of changes in our machine learning assets, just like you would with documents. Next, there's 'Automation,' think of it as creating repeatable and consistent processes using ML pipelines to make your work smoother. Then, 'Continuous X' is all about continuous training and deployment of models, ensuring they stay up to date. Lastly, 'Model governance' is like having a structured process to review and validate our models, making sure they're reliable. As for the three levels of MLOps implementation, think of them as steps in our journey. Level 0 is the manual phase, where everything is done by hand. Level 1 brings in continuous model training through pipelines. And Level 2, that's the gold standard, where we cover everything from data collection, training, all the way to deployment, creating a seamless MLOps process.

Use case

Before we delve into MLOps, consider this practical scenario: improving quality control in a metal luggage tag manufacturing line using machine learning. Our goal is to detect defects like scratches in real-time, whether on the edge for immediate monitoring or in a cloud environment like AWS SageMaker.

Image description
https://aws.amazon.com/blogs/machine-learning/

MLOps pipeline

Now, let's outline the architecture of our MLOps pipeline. At this stage, we're focusing on the big picture, not specific technologies. Our goal is to automate the entire process, starting with capturing raw images of metal tags using an edge camera. We then label these images with bounding boxes, ensuring version control for traceability. Next, we train, fine-tune, and evaluate the model before deploying it, either at the edge for real-time inferences or in the cloud with AWS SageMaker.

Image description

MLOps Solution Architecture

Let's now take a closer look at how we can build our architecture using AWS services. We've designed a workflow that seamlessly integrates these services to optimize our MLOps pipeline. We start by storing raw image data in Amazon S3 for cost-efficiency. Next, we orchestrate the labeling process with AWS Step Functions and SageMaker Ground Truth, leveraging AWS Lambda for data preparation. For model training and evaluation, we rely on SageMaker Pipelines, SageMaker Training jobs, and SageMaker Processing jobs. The trained model is registered in Amazon SageMaker Model Registry for management and version control. For edge deployment, we automate the process with Step Functions and utilize AWS IoT Greengrass as our edge device runtime environment. Alternatively, we can deploy the model as SageMaker endpoints in the cloud for flexibility. This architecture minimizes operational effort by leveraging managed and serverless services while maintaining the integrity of our MLOps pipeline.

Image description

CI/CD pipeline

Now, let's explore how we integrate our data labeling, model training, and edge deployment steps into a fully automated CI/CD system using AWS services. We begin with AWS CodeCommit as our Git repository and use the AWS CDK to automate infrastructure deployment, allowing for independent deployments of each part. AWS CodePipeline is utilized to automate both code and infrastructure deployments, and we have separate pipelines for assets and workflows. This setup ensures that changes in our code or data trigger the orchestration process, making our end-to-end pipeline responsive to updates and data modifications.

Image description

Orchestrating Data Labelling

We employ AWS Step Functions to automate our labeling pipeline, which comprises four major steps. These steps include checking for new images, data preparation, initiating labeling and label verification jobs, and writing the final labels to the feature store.

Image description

Orchestrating Model Training

In this workflow, we have three key steps: data processing, model training, and model registration. First, data is loaded and split, then the model is trained and evaluated for mAP. If it meets the threshold, the model is registered in the SageMaker Model Registry.

Image description

Model deployment

After training and evaluation, deploying an ML model for real-world impact is crucial. Continuing our journey, we now turn our attention to automating the edge deployment phase in our end-to-end MLOps pipeline. There are two approaches for model deployment: one is IoT Greengrass for real-time edge inference, but today, due to time constraints, we'll focus on the second approach, AWS SageMaker endpoints, which offers a more straightforward deployment process. We've ensured portability and optimization by exporting the model in ONNX format and registering it in Amazon SageMaker Model Registry. Our deployment package consists of the trained model in ONNX format, and alongside it, we've implemented a private component for inference code. This component handles data preparation, communication with the model, and post-processing of results to ensure seamless and effective model deployment in production systems.

Image description

Summary

MLOps streamlines machine learning by applying DevOps principles, featuring version control, automation, continuous training, and model governance. We explored applying MLOps principles to an industrial computer vision use case, improving quality control in luggage tag manufacturing. Leveraging AWS services, we created an efficient MLOps pipeline, managing data, labeling, training, and deployment seamlessly. Through automated CI/CD using AWS tools, we ensure responsiveness to code and data changes throughout the end-to-end pipeline, ensuring efficiency and flexibility.

References

Top comments (0)