DEV Community

loading...

Putting an ML model into production using Feast and Kubeflow on Azure (Part I)

wjayesh profile image Jayesh Updated on ・17 min read

Alt Text

Introduction

A lot of teams are adopting machine learning (ML) for their products to enable them to achieve more and deliver value. When we think of implementing an ML solution, there are two parts that come to mind:

  • Model development.
  • Model deployment and CI/CD.

The first is the job of a data scientist, who researches on what model architecture to use, what optimization algorithms work best, and other work pertaining to making a working model.

Once the model is showing satisfactory response on local inputs, it is time to put it into production, where it can be served to the public and client applications.

The requirements from such a production system include, but are not limited to, the following:

  • Ability to train the model at scale, across different compute as required.
  • Workflow automation that entails preprocessing of data, training, and serving the model.
  • An easy-to-configure system of serving the model that works the same for all major frameworks used for machine learning, like Tensorflow and such.
  • Platform agnostic; something that runs as good on your local setup as on any cloud provider.
  • A user-friendly and intuitive interface that allows the data scientists with zero knowledge of Kubernetes to leverage its power.

Kubeflow is an open-source machine learning toolkit for cloud-native applications that covers all that we discussed above and provides an easy way for anyone to get started with deploying ML in production.

image

The Kubeflow Central Dashboard

In addition, we need some common storage for features that our model would use for training and serving.
This storage should allow

  • Serving of features for both training and inference.
  • Data consistency and accurate merging of data from multiple sources.
  • Prevent errors like data leaks from happening and so on.

A lot of the teams working on machine learning have their own pipelines for fetching data, creating features, and storing and serving them but in this article, I'll introduce and work with Feast, an open-source feature store for ML.

image

From tecton.ai

I'll link an amazing article here (What is a Feature Store) that explains in detail what feature stores are, why they are important, and how Feast works in this context.

We'll talk about Feast in the next article in this series.

Contents

  • Why do we need Kubeflow?
  • Prerequisites
  • Setup

  • Feast for feature store

  • Kubeflow pipelines

  • Pipeline steps

  • Run the pipeline

  • Conclusion

Why do we need Kubeflow?

Firstly, most of the folks developing machine learning systems are not experts in distributed systems. To be able to efficiently deploy a model, a fair amount of experience in GitOps, Kubernetes, containerization, and networking is expected. This is because managing these services is very complex, even for relatively less sophisticated solutions. A lot of time is wasted in preparing the environment and tweaking the configurations before model training can begin.

confuse billa

Secondly, building a highly customized and "hacked-together" solution means that any plans to change the environment or orchestration provider will require a sizeable re-write of code and infrastructure configuration.

Kubeflow is a composable, portable solution that runs on Kubernetes and makes using your ML stack easy and extensible.

Prerequisites

Before we begin, I'll recommend that you get familiar with the following, in order to better understand the code and the workflow.

  • Docker and Kubernetes (I have built a hands-on, step-by-step course that covers both these concepts. Find it here and the associated YouTube playlist here!
    image

  • Azure Kubernetes Service. We'll be using it as the platform of choice for deploying our solution. You can check out this seven-part tutorial on Microsoft Learn (an awesome site to learn new tech really fast!) that will help you get started with using AKS for deploying your application.
    image This page has other quickstarts and helpful tutorials.

Although we're using AKS for this project, you can always take the same code and deploy it on any provider or locally using platforms like Minikube, kind and such.

  • A typical machine learning workflow and the steps involved. You don't need the knowledge to build a model, but at least an idea of what the process looks like.
    ML workflow
    This is an example workflow, from this article by Google Cloud.

  • Some knowledge about Kubeflow and its components. If you're a beginner, you can watch a talk I delivered on Kubeflow at Azure Kubernetes Day from 4.55 hours.

Setup

1) Create an AKS cluster and deploy Kubeflow on it.
image
image
The images above show how you can create an AKS cluster from the Azure portal.

This tutorial will walk you through all the steps required, from creating a new cluster (if you haven't already) to finally having a running Kubeflow deployment.

2) Once Kubeflow is installed, you'll be able to visit the Kubeflow Central Dashboard by either port-forwarding, using the following command:

kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80
Enter fullscreen mode Exit fullscreen mode

or by getting the address for the Istio ingress resource, if Kubeflow has configured one:

kubectl get ingress -n istio-system
Enter fullscreen mode Exit fullscreen mode

The address field in the output of the command above can be visited to open up the dashboard. The following image shows how a dashboard looks on start-up.
image

The default credentials are email admin@kubeflow.org and password 12341234. You can configure this following the docs here.

3) Install Feast on your cluster. Follow the guide here.

4) Finally, clone this repository to get all the code for the next steps.

GitHub logo wjayesh / production_ml

Code to take an ML model into production using Feast and Kubeflow

phew

Feast for feature store

After reading the article I linked above for Feast, I assume you've had some idea of what Feast is used for, and why it is important.

For now, we won't be going into the details on how Feast is implemented and will reserve it for the next edition, for the sake of readability.
The code below doesn't use Feast to fetch the features, but local files. In the next article, we'll see how we can plug Feast into our solution without affecting a major portion of the code.

Kubeflow Pipelines

Kubeflow Pipelines is a component of Kubeflow that provides a platform for building and deploying ML workflows, called pipelines. Pipelines are built from self-contained sets of code called pipeline components. They are reusable and help you perform a fixed set of tasks together and manage them through the Kubeflow Pipelines UI.

Kubeflow pipelines offer the following features:

  • A user interface (UI) for managing and tracking experiments, jobs, and runs.
  • An engine for scheduling multi-step ML workflows.
  • An SDK for defining and manipulating pipelines and components.
  • Integration with Jupyter Notebooks to allow running the pipelines from code itself.

image

This is how the Kubeflow pipelines homepage looks like. It contains some sample pipelines that you can run and observe how each step progresses in a rich UI.

You can also create a new pipeline with your own code, using the following screen.

image

More hands-on footage can be found in my talk that I'll link again here, that I delivered on Kubeflow at Azure Kubernetes Day

How to build a pipeline

The following image summarises the steps I'll take in the next section, to create a pipeline.
image

  • The first step is to write your code into a file and then build a Docker image that contains the file.

  • Once an image is ready, you can use it in your pipeline code to create a step, using something called ContainerOp. It is part of the Python SDK for Kubeflow.
    ContainerOp is used as a pipeline step to define a container operation. In other words, it takes in details about what image to run, with what parameters and commands, as a step in the pipeline.

  • The code that helps you to define and interact with Kubeflow pipelines and components is the kfp.dsl package. DSL stands for domain-specific language and it allows you to write code in python which is then converted into Argo pipeline configuration behind the scenes.

Argo pipelines use containers as steps and have a YAML definition for each step; the DSL package transforms your python code into the definitions that the Argo backend can use to run a workflow.

All of the concepts discussed above will be used in the next section when we start writing our pipeline steps and the code can be found at this GitHub repository.

GitHub logo wjayesh / production_ml

Code to take an ML model into production using Feast and Kubeflow

Pipeline steps

mlapp

The pipeline we'll be building will consist of four steps, each one built to perform independent tasks.

  • Fetch - get data from Feast feature store into a persistent volume.
  • Train - use training data to train the RL model and store the model into persistent volume.
  • Export - move the model to an s3 bucket.
  • Serve - deploy the model for inference using KFServing.

Reminder to keep drinking water.

Fetch

The fetch step would act as a bridge between the feature store (or some other source for our training data) and our pipeline code. While designing the code in this step, we should try to avoid any dependency on future steps. This will ensure that we could switch implementations for a feature store without affecting the code of the entire pipeline.
In the development phase, I have used a .csv file for sourcing the training data, and later, when we switch to Feast, we won't have to make any changes to our application code.

The directory structure looks like the following:

image

  • fetch_from_source.py - the python code that talks to the source of training data, downloads the data, processes it and stores it in a directory inside a persistent volume.
  • userinput.csv (only in dev environment) - the file that serves as the input data. In production, it'll be redundant as the input will be sourced through Feast. You can find that the default values of the arguments inside the code point to this location.
  • Dockerfile - the image definition for this step which copies the python code to the container, installs the dependencies, and runs the file.

Writing the pipeline step

# fetch data
    operations['fetch'] = dsl.ContainerOp(
        name='fetch',
        image='insert image name:tag',
        command=['python3'],
        arguments=[
            '/scripts/fetch_from_source.py',
            '--base_path', persistent_volume_path,
            '--data', training_folder,
            '--target', training_dataset,
        ]
    )
Enter fullscreen mode Exit fullscreen mode

A few things to note:

  • The step is defined as a ContainerOp which allows you to specify an image for the pipeline step which is run as a container.
  • The file fetch_from_source.py will be run when this step is executed.
  • The arguments here include
    • base_path: The path to the volume mount.
    • data: The directory where the input training data is to be stored.
    • target: The filename to store good input data.

How does it work?

The code inside the fetch_from_source.py file first parses the arguments passed while running it.
It then defines the paths for retrieving and storing the input data.

base_path = Path(args.base_path).resolve(strict=False) 
data_path = base_path.joinpath(args.data).resolve(strict=False)
target_path = Path(data_path).resolve(strict=False).joinpath(args.target)
Enter fullscreen mode Exit fullscreen mode

The target_path exists on the mounted volume. This ensures that the training data is visible to the subsequent pipeline steps.
In the next steps, the .csv file is read as a pandas Dataframe and after some processing, it is stored on the target_path using the following line.

df.to_csv(target_path, index=False)
Enter fullscreen mode Exit fullscreen mode

Train

The training step contains the code for our ML model. Generally, there would be a number of files hosting different parts of the solution and one file (say, the critical file) which calls all the other functions. Therefore, the first step to containerizing your ML code is to identify the dependencies between the components in your code.
In our case, the file that is central to the training is run.py. This was created from the run.ipynb notebook because it is straightforward to execute a file inside the container using terminal commands.

The directory structure looks like the following:

image

  • code - directory to host all your ML application files.
  • train (only in dev environment) - contains the file that serves as the input data. In production, it'll be redundant as the input will be sourced from the persistent volume. You can find that the default values of the arguments inside the code point to this location.
  • Dockerfile - the image definition for this step which copies all the python files to the container, installs the dependencies, and runs the critical file.

Writing the pipeline step

    # train
    operations['training'] = dsl.ContainerOp(
        name='training',
        image='insert image name:tag',
        command=['python3'],
        arguments=[
            '/scripts/code/run.py',
            '--base_path', persistent_volume_path,
            '--data', training_folder,
            '--outputs', model_folder,
        ]
    )
Enter fullscreen mode Exit fullscreen mode

A few things to note:

  • The step is defined as a ContainerOp which allows you to specify an image for the pipeline step which is run as a container.
  • The file run.py will be run when this step is executed.
  • The arguments here include
    • base_path: The path to the volume mount.
    • data: The directory where the input training data is stored. This is the same as the target_path from the previous step.
    • outputs: The directory to store the trained model in.

How does it work?

The code inside the run.py file first parses the arguments passed while running it.
It then defines the paths for retrieving and storing the input data. This is similar to the code in the previous step.

The first thing that needs to be done is to retrieve input values from the data directory and then, passing them to the training function in your code. A sample is shown below; the pandas Dataframe is built on the input .csv file.

df = pd.read_csv(data_file)

# populate input values for the model to train on
feature_1 = df['feature_1'].iloc[0]
...
...

Enter fullscreen mode Exit fullscreen mode

These features are then passed to your ML code for training so this ensures separation of concern between
your retrieval and model code. The model code can be independently developed by an ML engineer without worrying about how the features will be fetched.

To make the model accessible to the "export" step later, we'll have to save it on the persistent volume. We can do it using TensorFlow's library commands.

def save(self, target_path):
    self.model.save(target_path)
Enter fullscreen mode Exit fullscreen mode

where the path would be the output path constructed using the arguments supplied while executing.

target_path = Path(args.base_path).resolve(
    strict=False).joinpath(args.outputs)

if not os.path.exists(target_path):
    os.mkdir(target_path)
Enter fullscreen mode Exit fullscreen mode

Export

Once the model is trained and stored on the persistent volume, we can then export it to an S3 bucket.

You can make use of the Minio service which gets installed with Kubeflow as S3 storage. Learn how to add Amazon S3 compatibility to Microsoft Azure Blob Storage here.

This step ensures that the model can later be made available to the public through an inference service. You may ask why we can't just use the existing storage; the answer lies in this file. The KFServing (explained later) component that we use for inference requires an "S3 or GCS compatible directory containing default model" for the model_uri parameter.

This step has a single goal of fetching the model from the disk and uploading it for the next step.

The directory structure looks like the following:

image

  • export_to_s3.py - the python code that makes use of AWS Boto3 SDK to connect to an S3 bucket and perform the upload.
  • model (only in dev environment) - the directory hosting the model to be uploaded. In production, this functionality will be achieved using a persistent volume at a path supplied to this step as input.
  • Dockerfile - the image definition for this step which copies the python code to the container, installs the dependencies, and runs the file.

Writing the pipeline step

# export model
    operations['export'] = dsl.ContainerOp(
        name='export',
        image='insert image name:tag',
        command=['python3'],
        arguments=[
            '/scripts/export_to_s3.py',
            '--base_path', persistent_volume_path,
            '--model', 'model',
            '--s3_bucket', export_bucket
        ]
    )
Enter fullscreen mode Exit fullscreen mode

A few things to note:

  • The step is defined as a ContainerOp which allows you to specify an image for the pipeline step which is run as a container.
  • The file export_to_s3.py will be run when this step is executed.
  • The arguments here include
    • base_path: The path to the volume mount.
    • model: The directory where the model is stored on the persistent volume.
    • s3_bucket: The name of the bucket to upload the model to.

How does it work?

The code inside the export_to_s3.py file first parses the arguments passed while running it.
It then defines the path for retrieving the model. This will be used for specifying the source for the boto3 code below.

Uploading to a bucket

Boto3 is the name of the Python SDK for AWS. It allows you to directly create, update, and delete AWS resources from your Python scripts. You can check out examples of using the SDK for accessing S# resources here. The upload_file method is of interest to us. Following standard practices, we can write the following code to perform the upload operation.

s3.Bucket(args.s3_bucket).upload_file(
        Filename=str(model_file), Key='model_latest.h5')
Enter fullscreen mode Exit fullscreen mode

Before we can run this step, we need to create the S3 client. The code below does that using the resource function of the client. You can look at the source code here and here to get an understanding of what parameters to supply to create it and what the resource function returns.

s3 = boto3.resource('s3', region_name='us-east-1',
                        endpoint_url="http://minio- 
                        service.kubeflow:9000",
                        aws_access_key_id='AK...',
                        aws_secret_access_key='zt...')
Enter fullscreen mode Exit fullscreen mode

You might notice that I've added the access key and the secret right into the code while initializing the s3 resource. This is not the best practice as it might lead to an accidental leak when you share your code online.
Thankfully, there are other ways of passing the credentials to the boto3 client.
You can either choose environment variables or create a file for storing your keys. Check out the docs here for more options.

Once the upload is done, we can move to implement our final step, the serving of the model.

Serve

To make our model accessible to other applications and users, we need to set up an inference service. In other words, put our model in production. The following things need to be determined before we start implementing this step.

  • The framework to be used. Kubeflow has support for Tensorflow Serving, Seldon core among others. See here.

  • The inputs to the model. This will be used to construct a JSON payload to be used for scoring.

Kubeflow Pipelines comes with a pre-defined KFServing component which can be imported from the GitHub repo and reused across the pipelines without the need to define it every time. KFServing is Kubeflow's solution for "productionizing" your ML models and works with a lot of frameworks like Tensorflow, sci-kit, and PyTorch among others.

prod line

The directory structure looks like the following:

image

  • kfserving-component.yaml - This is the KFServing component, imported from Github. Any customizations required can be made, after reading through the comments.

Writing the pipeline step

    kfserving = components.load_component_from_file(
        '/serve/kfserving-component.yaml')

    operations['serving'] = kfserving(
        action="apply",
        default_model_uri=f"s3://{export_bucket}/model_latest.h5",
        model_name="demo_model",
        framework="tensorflow",
    )
Enter fullscreen mode Exit fullscreen mode

A few things to note:

  • The kfserving component is loaded using a function from the Python SDK.
  • The component has several configurable parameters:
    • default_model_uri: This is the address to the model, expressed as an S3 URI.
    • model_name: The name of the model.
    • framework: The framework used for developing the model; could be TensorFlow, PyTorch, and such.

How does it work?

When this code is executed, KFServing will create an inference service. A sample is shown below:

apiVersion: serving.kubeflow.org/v1alpha2
kind: InferenceService
metadata:
  name: sample-serve
  namespace: default
spec:
  default:
    predictor:
      serviceAccountName: serve-sa
      tensorflow:
        resources:
          requests:
            cpu: 1
            memory: 1Gi
        runtimeVersion: 1.x.x
        storageUri: s3://bucket/model
Enter fullscreen mode Exit fullscreen mode

When the component is executed, you'll find the YAML definition of the InferenceService in the output logs. This is because the component file has the outputs parameter defined as below:

outputs:
  - {name: InferenceService Status,   type: String,                        description: 'Status JSON output of InferenceService'}
Enter fullscreen mode Exit fullscreen mode

The status of this resource will contain an HTTP URL, which can be accessed using the following command.

kubectl get inferenceservice <name of service> -n kfserving -o jsonpath='{.status.url}'
Enter fullscreen mode Exit fullscreen mode

We can then POST requests to this endpoint to get back predictions. A sample request is shown below:

url = f"http://{model}.{NAMESPACE}.svc.cluster.local/v1/models/{model}:predict"

! curl -L $url -d@input.json
Enter fullscreen mode Exit fullscreen mode

Run the pipeline

Once the pipeline code is ready, we can then compile the code to generate a tar file which can be uploaded to the Pipelines UI to create runs and experiments. You can also choose to directly run the pipeline code, using the SDK, on a Jupyter Notebook.
The code for it could be the following:

client = kfp.Client()
run_result = client.create_run_from_pipeline_func(
       pipeline_func=sample_pipeline,
       experiment_name='experiment_name',
       run_name='run_name',
       arguments='arguments',
   )
Enter fullscreen mode Exit fullscreen mode

On running the pipeline, you can see all the steps listed out as they are executing, on a graph. The properties pertaining to the step can be seen on the sidebar.
image
You can find input/output values, metadata, information about the volumes used, the pods that are running your step, and other helpful information from the UI itself.
Inkedmsedge_icriFbyFUE_LI

Conclusion

Congratulations on making it this far. I acknowledge that this turned out to be longer than most typical articles but I couldn't make myself agree to leave out any of the details that made way into this article.

To recap, we understood why Kubeflow is an important tool to help people with less experience in distributed systems and Kubernetes deploy and manage their machine learning models.

We looked at how a pipeline can be designed, what things to keep in mind while containerizing your code into different steps, and how we can stitch all the different tasks into one single reusable pipeline. The complete pipeline code can be found at the root directory in my repository linked earlier.

In the next article in the series, we'll take our solution one step closer to production by integrating Feast into our pipeline. The directories we used in our pipeline so far, can then be discarded and all exchange of data will be handled by the persistent volume and Feast.

I hope you have a better idea after reading this piece on how to start building machine learning pipelines using some amazing open-source tools. Feel free to reach out to me if you have any questions 👋

LinkedIn - wjayesh
Twitter - wjayesh

cAT phone

Discussion (0)

pic
Editor guide