DEV Community

Cover image for Tools for efficient MLOps (Machine Learning DevOps)
amananandrai
amananandrai

Posted on

Tools for efficient MLOps (Machine Learning DevOps)

What is MLOps?

To answer the basic question of "What is MLOps?" we need to understand first that what is DevOps. DevOps is a set of practices that combines software development and IT operations. It aims to shorten the systems development life cycle and provide continuous delivery with high software quality. DevOps is complementary with Agile software development; several DevOps aspects came from Agile methodology. DevOps is the offspring of agile software development – born from the need to keep up with the increased software velocity and throughput agile methods have achieved.

The Basic definition of DevOps

DevOps is the combination of cultural philosophies, practices, and tools that increases an organization’s ability to deliver applications and services at high velocity: evolving and improving products at a faster pace than organizations using traditional software development and infrastructure management processes. This speed enables organizations to better serve their customers and compete more effectively in the market.

Under a DevOps model, development and operations teams are no longer “siloed.” Sometimes, these two teams are merged into a single team where the engineers work across the entire application lifecycle, from development and test to deployment to operations, and develop a range of skills not limited to a single function.
Improve deployment frequency

Goals of DevOps

DevOps tries to achieve the following goals:-

  • Achieve faster time to market
  • Lower failure rate of new releases
  • Shorten lead time between fixes
  • Improve mean time to recovery

How to achieve these Goals

The following are DevOps best practices which help in achieving the above mentioned Goals :-

  • Continuous Integration
  • Continuous Delivery
  • Microservices
  • Infrastructure as Code
  • Monitoring and Logging
  • Communication and Collaboration

Alt Text


MLOps is basically DevOps for Machine Learning models. It is a tedious job to deploy a ML model. Machine Learning models can't be deployed as traditional Softwares. MLOps tools help the Machine Learning engineers to fast-track the process of development of models and delivering it for client use. It helps in monitoring the models, providing feedback, and comparing different models.

Some of the famous MLOps tools are :-

1 - Kubeflow

The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. It provides a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures. Anywhere you are running Kubernetes, you should be able to run Kubeflow.

Services provided by Kubeflow

  • Notebooks
    Kubeflow includes services to create and manage interactive Jupyter notebooks. You can customize your notebook deployment and your compute resources to suit your data science needs. Experiment with your workflows locally, then deploy them to a cloud when you're ready.

  • TensorFlow model training
    Kubeflow provides a custom TensorFlow training job operator that you can use to train your ML model. In particular, Kubeflow's job operator can handle distributed TensorFlow training jobs. Configure the training controller to use CPUs or GPUs and to suit various cluster sizes.

  • Model serving
    Kubeflow supports a TensorFlow Serving container to export trained TensorFlow models to Kubernetes. Kubeflow is also integrated with Seldon Core, an open source platform for deploying machine learning models on Kubernetes, and NVIDIA Triton Inference Server for maximized GPU utilization when deploying ML/DL models at scale.

  • Pipelines
    Kubeflow Pipelines is a comprehensive solution for deploying and managing end-to-end ML workflows. Use Kubeflow Pipelines for rapid and reliable experimentation. You can schedule and compare runs, and examine detailed reports on each run.

  • Multi-framework
    Our development plans extend beyond TensorFlow. We're working hard to extend the support of PyTorch, Apache MXNet, MPI, XGBoost, Chainer, and more. We also integrate with Istio and Ambassador for ingress, Nuclio as a fast multi-purpose serverless framework, and Pachyderm for managing your data science pipelines.

2 - Comet

Comet provides a self-hosted and cloud-based meta machine learning platform allowing data scientists and teams to track, compare, explain and optimize experiments and models.

Services provided by Comet

  • Fast Integration

Add a single line of code to your notebook or script and start tracking your experiments. Works wherever you run your code, with any machine learning library, and for any machine learning task.

# import comet_ml in the top of your file
from comet_ml import Experiment

# Add the following code anywhere in your machine learning file
experiment = Experiment(project_name="my-project", workspace="my-workspace")
Enter fullscreen mode Exit fullscreen mode
  • Compare Experiments
    Easily compare experiments—code, hyperparameters, metrics, predictions, dependencies, system metrics, and more—to understand differences in model performance.

  • Debug Your Models
    View, analyze, and gain insights from your model predictions. Visualize samples with dedicated modules for vision, audio, text and tabular data to detect over-fitting and easily identify issues with your dataset.

  • Meta Machine Learning
    Build better models faster by using state-of-the-art hyperparameter optimizations and supervised early stopping.

3 - Weights & Biases

It is similar to comet and provides tools for Experiment tracking, model optimization, and dataset versioning in Machine Learning.

Services provided by Weights & Biases

  • Central dashboard
    Add a few lines to your script, and each time you train a new version of your model, you'll see a new experiment stream live to your dashboard.

  • Fast integration
    Add a few lines to your script to start logging results. Our lightweight integration works with any Python script.

import torch
import torch.nn as nn

import wandb
wandb.init(project="pedestrian-detection")

# Log any metric from your training script
wandb.log({"acc": accuracy, "val_acc": val_accuracy})
Enter fullscreen mode Exit fullscreen mode
  • Collaborative reports
    Explain how your model works, show graphs of how your model versions improved, discuss bugs, and demonstrate progress towards milestones.

  • Hyperparameter sweeps
    Optimize models with our massively scalable hyperparameter search tool. Sweeps are lightweight, fast to set up, and plug in to your existing infrastructure for running models.

  • Reproducible models
    Save everything you need to reproduce models later— the latest git commit, hyperparameters, model weights, and even sample test predictions. You can save experiment files directly to W&B or store pointers to your own storage.

  • System metrics
    Visualize live metrics like GPU utilization to identify training bottlenecks and avoid wasting expensive resources.

  • Visualize predictions
    Log model predictions to see how your model is performing, and identify problem areas during training. We support rich media including images, video, audio, and 3D objects.

4 - MLflow

An open source platform for the machine learning lifecycle.
MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.

Services provided by MLflow

  • MLflow Tracking
    Record and query experiments: code, data, config, and results

  • MLflow Projects
    Package data science code in a format to reproduce runs on any platform

  • MLflow Models
    Deploy machine learning models in diverse serving environments

  • Model Registry
    Store, annotate, discover, and manage models in a central repository


This is my summary of some MLOps tools. Please, upvote the article if you like it or it has helped you in some way.

Top comments (3)

Collapse
 
yetudada profile image
Yetunde Dada

This is a fantastic article! Have you had the chance to check out Kedro? It puts the software engineering back into data science code and is being thought of as a tool that helps with MLOps for that reason.

Here's a link: github.com/quantumblacklabs/kedro/...

Collapse
 
amananandrai profile image
amananandrai

I have heard about Kedro but haven't checked it yet. I will now surely look into it. Thanks for liking the article.

Collapse
 
schmowser profile image
Victor Warno

What framework would you use for Model Monitoring? Or would that be part of the Cloud you are deploying your model service to?