Jayesh

Posted on Jun 28, 2022 • Edited on Jul 2, 2022

Not just another MLflow on Kubernetes article

There's a ton of content on the internet about deploying an MLflow Tracking Server to Kubernetes and it's great! But if that's the case, then why does this article exist?

When I was looking for a way to run MLflow on our cloud, I wanted a solution that offers configurability in terms of what cloud and storage I wanted to use and absolute flexibility in the mode of operation of the server (more on that ahead). In this article, I show you what I learned while developing such a solution and how you can adapt and use it for your own specific needs. Come, join me 🙋‍♂️

A refresher on MLflow
Why is it a challenge?
The outline for the solution
Preparing the Helm Release
Setting up the Ingress
Test your MLflow server
YAYY! 🥳

A refresher on MLflow

MLflow is an essential component to a great number of ML practitioners and ML Engineers. If you have ever felt the need to track your experiments, either in the form of logs and metadata or as artifacts from your pipeline runs, you know MLflow Tracking is the go-to tool! 😎

Discussing what MLflow is and its pros and cons is beyond the scope of this article but there's wonderful documentation available along with many more third-party resources online that you can use to get started!

Even with its popularity and utility, it is not always straightforward to have it deployed on your systems 😕, especially on the cloud, with liberal customizations and authentication built in.

Why is it a challenge?

To make the situation and the challenges clearer, take a look at the number of scenarios possible 😯 for your MLflow tracking server deployment.

We won't jump into details as you can read the documentation page which explains them very well. Our concern here is to understand that each of these scenarios requires you to run the server with certain flags. As such, our solution should fundamentally allow easy customization.

In addition to being customizable in the cloud environment, we also need a way to control traffic around our service to protect the data that is stored on the server 🥷.

The outline

There are three broad steps that I can think of when it comes to getting MLflow runnning on your cluster.

✅ Prepare a Helm chart for the tracking server that allows maximum customization.
✅ Set up an ingress controller and an ingress resource to control traffic to your MLflow service.
✅ Add authentication to secure your service and test if it works!

Now, for someone who hasn't dealt much with Kubernetes before, some of these tasks may sound too daunting. Fret not! In this article, I use a tool called Devtron to make Kubernetes deployments much simpler. Setting up and managing full Helm releases can be done very intuitively through the Devtron UI.

However, that is not a pre-requisite. I have also explained how you could achieve the same outcomes in the traditional way. 💫

The code in this article is taken from my repository at GitHub. Feel free to experiment with it 🧪

Preparing the Helm Release

When it comes to deploying apps to your cluster, nothing comes close to using Helm, in terms of simplicity and ease of execution. Therefore, I will walk you through the entire journey, starting from scratch to having a full-fledged chart ready for sailing! ⛵

If that sounded too challenging or if you need only some basic customizations, I have a fast track just for you 😉

Going All Gas, No Brakes!

In this process, we'll use a community Helm chart which allows setting simple configurations as our choice to deploy the MLflow tracking server. It's hosted here!

To deploy it to our cluster, we are going to make use of the Devtron Dashboard. However, you can also do a simple helm install.

Setting up Devtron is simple - I'll let its documentation guide you better. Once deployed, follow the steps below.

We'll first add the repository for this chart to our Devtron configuration. Go into the Global Configurations tab and select "Add Repository".
Fill in the details as above and save it.
Now you can go to the chart store and click "Deploy" on the MLflow chart, and that's it.

You can now skip directly to the next section 😎 on setting up the ingress!

For folks who want a finer control on the tracking server and more customizations, follow along. This is going to be fun!

What we're going to do can be split into three smaller steps.

Prepare a Docker Image of the MLflow tracking Server.
Create a Helm project.
Examine the manifests and create customizable values as necessary.

Let's jump into each of the them to get a grip of what's brewing ♨️.

Prepare a Docker Image

The heart of this step lies in the Dockerfile. It's a file that acts as the recipe 🥗 for your image. You can define what goes into your image, what commands are run when this image is executed, the environment of this image and much more.

To know what to put inside the Dockerfile, we have to think about what happens when you run an MLflow tracking server locally. To put it very simply, there's this main command that needs to be executed in your environment and it takes care of setting up the server and making it listen on a port.



mlflow server  \
--backend-store-uri "./mlflow/..." \
--default-artifact-root "./mlflow/..." \
--host "127.0.0.1" \
--port 5000

Note 💁
In order for this command to run inside a container environment, you need to choose the host as "0.0.0.0" so that it can run on all IP addresses (including the one where you are deploying this code).

Now, what we need is for this command to be run inside our container once the image starts, but with the added option to modify the value of any of its flags. Therefore we'll require a bash script that can take a variety of input flags and apply them accordingly to the base command.

I have created a script ✍️ that can take inputs for an artifact store and metadata store which you can use. It's a basic one but you can extend it to include more options like the proxied access scenario.

We will now put this script inside our Dockerfile to be run when the container gets created.



FROM python:3.9

RUN pip install mlflow && \
    pip install awscli --upgrade --user && \
    pip install boto3==1.24.10

ENV PORT 5000

COPY scripts/run.sh /

WORKDIR /
RUN chmod +x run.sh

ENTRYPOINT ["./run.sh"]

Some observations 🤓:

It uses a pre-built Python image as base so that we don't have to install pip and other tools ourselves.
We then install the mlflow library along with awscli and boto. I'm using AWS since I plan to use S3 as the artifact store.
You then copy over the script (which contains your command) into your container, make it an executable and set it to run once the container is spun up.

Note 💁
You may also want to add ENV variables as part of the Dockerfile for storing credentials for storage access, depending on what scenario you wish to deploy for.

Building and pushing the image

You can make use of a GitHub action to help build your Dockerfile into an image and push it to the registry of your choice. Take a look at my workflow file here for inspiration!

If you prefer to follow along with a video, I have just the thing you need 👇

Create a Helm project

Once the image is built, we are ready to deploy our applications to a Kubernetes cluster. The fastest way to set up the deployment, the service, and all other related resources is to create a Helm chart.

Run this command inside your desired directory to get some starter code for your chart.



helm create mlflow-tracking

You will now see a whole bunch of files that get created in the directory. Our interest lies in the deployment.yaml file inside the templates folder, specifically in the following lines of code.



containers:
        - name: {{ .Chart.Name }}
          securityContext:
            {{- toYaml .Values.securityContext | nindent 12 }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          command: ["./run.sh"]
          args: ["-m", "{{ .Values.metadata_store }}", "-a", "{{ .Values.artifact_store }}"]

Some observations 🤓:

We've used some variables to denote the image repository and tag.
Also modified are the command and the args parameters.
- The command is what will be executed when the container runs. This overrides what we had set in the Dockerfile.
- The args parameter specifies the metadata store and the artifact store, in accordance with the the flags we've set while building our script.

Values

Where do you set these values, though? Important obvious question but there's a simple answer. 🥁🥁Inside the values.yaml!



image:
  repository: wjayesh/mlflow-tracking-server
  pullPolicy: Always
  # Overrides the image tag whose default is the chart appVersion.
  tag: latest

containers:
  port: 5000

metadata_store: ""    # add the metadata store URL
artifact_store: ""    # add the artifact store URL

As you can see, I've added my image name and tag here for the chart to access while deploying the server.

Note 💁
You can also consider changing the service type inside values.yaml to LoadBalancer instead of the default ClusterIP, if you want to access it on a public IP address outside of your cluster environment.

That's all you need to do to set up your chart. We're now ready for launch! 🧑‍🚀

Deploy to cluster 🚀

Run this command to install the server to the cluster.



helm install <release-name> mlflow-tracking/ \
    --values mlflow-tracking/values.yaml

Now, verify if the MLflow service is up and running by using the following command ✅.



kubectl get svc

Setting up the Ingress

Once the MLflow tracking server is deployed, you can start making requests to its endpoint. All good, then why this extra step? Security!💂
We don't want to keep the server exposed to the outside world without any sort of checks in place since it holds critical information like your models, artifacts and logs.

We'll set up something called an Ingress in Kubernetes. It essentially controls the traffic, to and from a service. You can add rules, paths and authentication very easily by defining an ingress resource.

Three more steps and I promise that will be the end 🤐😂

Creating an Ingress Controller
Creating a secret which stores the credentials you want to apply to your server.
Finally, the ingress resource with your MLflow service as its backend.

Ingress Controller

An ingress controller is what actually accepts traffic from outside Kubernetes and sends it to the services, following the rules set in the ingress resources that you define. Naturally, we need to have a controller running first for the ingress definitions to work.

Let's head back to our Devtron dashboard (If you wish to not use it you can also do a simple Helm install)

Go into the Charts Store
You can see the "NGINX" logo. That's the ingress controller we'll install for this example. Click on "Deploy"!

The controller should be running inside the ingress-nginx namespace. Confirm by executing this command.



kubectl get pods -n ingress-nginx

Creating a username and password for the server 🗝️

Head over to any website that allows creation of an .htpasswd file. Enter your desired username and password and you'll get get a similar output to the following. ```

jayesh:$apr1$i4yl6mjs$aq3wqiZGbiJYqgypQeYGK/


2. Store the contents in a file called `auth`. It's important 🧐 that it be called **auth** otherwise NGINX would throw a 503 error. 

3. Now execute the following command to create a Kubernetes secret. 

```bash


kubectl create secret generic basic-auth --from-file=auth

Creating the ingress rules

We can now finally create the rules to govern control to our MLflow service.

Create an ingress resource by applying the following YAML.



apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: mlflow-ingress
  annotations:
    # type of authentication
    nginx.ingress.kubernetes.io/auth-type: basic
    # your secret with user credentials
    nginx.ingress.kubernetes.io/auth-secret: basic-auth
    # message to display 
    nginx.ingress.kubernetes.io/auth-realm: 'Please authenticate first'
spec:
  rules:
  - http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: mlflow-tracking
            port:
              number: 5000
  ingressClassName: nginx

We're now ready to test the MLflow tracking server with our username and password 🥳

Test the MLflow server

The folowing command can be used to get the tracking URL for the MLflow server. The EXTERNAL_IP field is the IP of the ingress controller and the path "/" is configured already (inside our ingress resource) to direct to the MLflow tracking server.



kubectl get service ingress-nginx-controller -n ingress-nginx

Run the following command, first without the "-u" flag and then with it to confirm that the MLflow tracking server is now authenticated!



curl -v <hostname> -u '<username>:<password>'

Yay! 👏👏

If you have come this far, pat yourself on the back 💪. You have successfully created a customized image for the MLflow tracking server, built a Helm chart for it, defined traffic access control for the service and tested it with your custom username and password combination.

Although this is a fun way to get MLflow running on your cluster, it is certainly not the simplest 😂. In my next article, I'll dramatically reduce all of this work down to a handful of commands for you to run and have everything that we discussed running on your setup like new! Keep an eye out; I'm just as excited as you 👀

In the meantime, feel free to get in touch with me if you have any questions, or any ambitious ideas about what we should hack about next 👷
Here's everything you need: bio.link/wjayesh

See ya! And happy coding 🙋‍♂️

The above blog is submitted as part of 'Devtron Blogathon 2022' - https://devtron.ai/
Check out Devtron's GitHub repo - https://github.com/devtron-labs/devtron/ and give a ⭐ to show your love & support.
Follow Devtron on LinkedIn - https://www.linkedin.com/company/devtron-labs/ and Twitter - https://twitter.com/DevtronL/, to keep yourself updated on this Open Source project.

DEV Community