DEV Community

Cover image for MLflow Tutorial with Image Recognition Example (Mnist)
Fei
Fei

Posted on • Edited on • Originally published at dev.to

MLflow Tutorial with Image Recognition Example (Mnist)

Introduction

What is MLOps?

MLOps(Machine Learning Operations) is a core function of Machine Learning engineering, aimed to simplify the deployment, maintenance, monitoring of machine learning models in production with reliability and efficiency. MLOps needs the collaboration from data scientists, devops engineers, and operation engineers[1].

Image description

Fig 1. Process of MLOps [2]

What is MLflow?

MLflow is an open source MLOps platform developed by Databricks. It helps machine learning engineers track and manage the models, code, dependencies, as well as deploy the models to the production environment. In other words, MLflow can simplify the process of building, training, deploying, and monitoring machine-learning models. For more details, please check https://mlflow.org/docs/latest/what-is-mlflow.html .

Image description
Fig 2. MLflow Component[3]

Why are MLOps and MLflow?

When training neural networks or machine learning models, we often face challenges listed as below, we can manually document model parameters in a text file or back up models for each experiment. This can be cumbersome and hinder collaboration with others. However, with the assistance of MLOps and MLflow, we can effortlessly train, deploy, and track models.

  • Challenge 1: How to track your model's version when you need to retrain your models?
    We need to train our models, and adjust the parameters, environments, dependencies, datasets. It is hard to maintain the models when we need to collaborate with others.

  • Challenge 2: How to deploy a model and provide services to the public?
    After we train a model, we need to publish the models and provide services to others. We need to provide APIs, and maintain the service reliable 7x24.

  • Challenge 3: How to monitor the models?
    Again, we need to track the service, maintain the service.

MLflow Demo:

In this example, we use an image recognition model based on Keras/TensorFlow, and MNIST dataset. If you are interested, you can follow[4] for the example of sklearn_logistic_regression.

Installation & Setup

  • Step 1. Install conda
    https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html

  • Step 2. Create a virtual environment for mflow
    cd $your_folder
    conda create -n mlflow (you can custom your environment name)

  • Step 3: Activate the conda environment
    conda activate mlflow

  • Step 4: Install mlfow
    conda install mlflow (or pip install mlflow)

  • Step 5: Run an MLFlow server with a filestore backend, the default port is 5000, you can change it by ‘-p’ option (i.e., mlflow server -p 5111)
    mlflow server -h 0.0.0.0 --backend-store-uri /home/xxx/mlruns ( you can custom your backend uri path)

  • Step 6: Test the web http://localhost:5000

Image description

Prepare and Run the MLflow Project

  • Step1: Clone the MLflow project[5][6]:
    git clone https://github.com/RoboticsAndCloud/mlflow-examples.git

  • Step 2: Go to the mlflow-examples - Keras/TensorFlow - MNIST example:
    cd mlflow-examples/python/keras_tf_mnist

Here, notice three files: MLproject, conda.yaml, train.py

MLproject: a YAML defines the MLflow project structure, including project name, dependencies(in conda.yaml), parameters and entry point. Check https://mlflow.org/docs/latest/projects.html for more details.
Enter fullscreen mode Exit fullscreen mode

Image description

conda.yaml: a YAML file defines your project’s dependencies

Enter fullscreen mode Exit fullscreen mode

Image description

train.py: A python script which shows how to train your models and log the parameters into MLflow
Enter fullscreen mode Exit fullscreen mode

Image description

  • Step 3: Set the tracking URI
    Here, we can export the environment variable to ensure our results are logged to our MLflow server
    export MLfLOW_TRACKING_URI=http://localhost:5000

  • Step 4: Running the MLflow Experiment
    mlflow run . --experiment-name=keras_mnist --run-name runname_first_keras_keras_mnist

  • Step 5: Check the MLflow Dashboard http://localhost:5000

Image description

Model Serving

Model Serving exposes the models, allowing us to access the service through REST API endpoints.

  • Step 1: Get the run ID

Image description

  • Step 2: Server the model
    mlflow models serve -m runs:/5c6a476d67d84239b01f874241f4009f/keras-model --port 5001

  • Step 3: Send recognition request and test the service

Target File Image '0':(You can find some from Mnist dataset)

Image description

Set the tracking URI

export MLfLOW_TRACKING_URI=http://localhost:5000

Send request

python3 keras_predict.py --model-uri runs:/5c6a476d67d84239b01f874241f4009f/keras-model  --data-path /home/ascc/LF_Workspace/mnist_png/testing/0/9993.png

Enter fullscreen mode Exit fullscreen mode

Results

We can see ‘0’ is the highest probability.

Image description

Send HTTP request and Results
Remember, the port should be the same with the model server port, here is 5001

python convert_png_to_mlflow_json.py /home/ascc/LF_Workspace/mnist_png/testing/0/9993.png | curl -X POST -H "Content-Type:application/json"   -d @-   http://localhost:5001/invocations
Enter fullscreen mode Exit fullscreen mode

Image description

Model version control:

Once the model has been verified, you can proceed to register it, verify its version, and deploy it in another production environment.

Version control is crucial when deploying models in a production environment. It enables you to monitor changes and revert to a specific version if needed.

The development process can be segmented into multiple stages for model verification. MLflow defines three key stages: staging, production, and archived.

Image description

Errors and solution you may meet:

  • Error 1 ModuleNotFoundError: No module named 'pip._vendor.six'
Solution :You need update your pipenv
pip install pip -U
pip install pipenv -U
Enter fullscreen mode Exit fullscreen mode
  • Error 2 AttributeError: module 'virtualenv.create.via_global_ref.builtin.cpython.mac_os' has no attribute 'CPython2macOsFramework'
Solution: Virtualenv installed twice ( apt, pip, delete the python3-virtualenv by apt, sudo apt purge python3-virtualenv)
Enter fullscreen mode Exit fullscreen mode
  • Error 3 mlflow.exceptions.MlflowException: Run '37011fce0ac847dbaa31efb5fabf842d' not found
Solution: You miss your tracking URI, 
export MLfLOW_TRACKING_URI=http://localhost:5000
Enter fullscreen mode Exit fullscreen mode

Summary:

This article demonstrates how to train, track, and manage a model using the MLflow platform. MLflow enables seamless design, deployment, and monitoring of machine learning models. Furthermore, we posit that MLOps can significantly enhance contributions to the field of AI.

Reference:

[1]MLOps https://www.databricks.com/glossary/mlops
[2]What Is MLOps https://ml-ops.org/content/mlops-principles
[3]MLflow component https://www.datacamp.com/tutorial/mlflow-streamline-machine-learning-workflow
[4]Getting Started With MLflow https://saturncloud.io/blog/getting-started-with-mlflow/
[5]MLflow examples https://github.com/amesar/mlflow-examples
[6]MLflow examples https://github.com/RoboticsAndCloud/mlflow-examples

Top comments (0)