DEV Community

martinbald81
martinbald81

Posted on

Deploying ML Models to Production Azure Databricks Integration with Wallaroo

As the world moves toward more data-driven decision making, especially with the advent of big data, ML, and AI, ML Operations or MLOps has defined itself as a discipline that makes data insights actionable.

They need to become actionable to become valuable and create business value. Data Scientists and ML engineers collaborate, and use tools and processes to control and maintain to integrate insights from machine learning and core business operations to drive strategic business outcomes.

The Wallaroo production ML platform integrates with the existing tools in your ML ecosystem , and seamlessly slots into your ML process to achieve faster ROI on your AI-enabled initiatives driving strategic business outcomes.

Businesses have made investments in tools to help facilitate the prepping and developing models however often struggle to get these models into production. Azure Databricks is one such tool used with solutions from BI to machine learning to process, store, clean, share, analyze, model, and monetize datasets. Azure Databricks can be used in machine learning to train models, track training parameters and models using experiments.

Wallaroo is especially powerful when paired with Databricks because it picks up where Databricks leaves off, in that you already have your connections to data stores, model registries, and repos which can be leveraged in the production deployment capabilities Wallaroo offers to ensure a tight feedback loop with the appropriate corrective and preventive actions across your training and production environments as models start to present anomalies or drift.

MLOps Lifecycle showing Wallaroo integration with Azure Databricks

In the figure above we see that in the MLOps life cycle, Databricks can be leveraged for loading and prepping data from your data sources and developing ML models, and benefits from Wallaroo’s production deployment, management, optimization, and observability capabilities that bring scale and efficiency for operationalizing your ML to move your business initiatives forward. How does Wallaroo integrate with Azure Databricks? It does this through providing a unified platform for model upload, deployment, and inferencing with anomaly detection, and observing model drift. We will step through an example of this in this article.

Once you have a trained model that you want to put into production you can access the Wallaroo SDK from within an Azure Databricks notebook. In this example, we will be using a well-known Boston house pricing model.

We’ll start from the Azure Portal, and go into Azure Databricks:

Azure Portal

From here, select the Azure Databricks instance you want to use:

Azure Databricks in the Azure Portal

We’ll use our Wallaroo-Sales-Demo instance, so we select that and click “Launch Workspace”

Launch Azure Databricks screen

This will open the Azure Databricks instance and the first time we use the Wallaroo SDK, it needs to be imported. To do that, select Compute from the menu on the left side and select the cluster this instance will be using.

Compute Console in Azure Databricks

Once the cluster is selected, go to the Libraries tab and click “Install new”.

Adding the library to the cluster

In the pop up, we select PyPi as the Library Source and fill in the package as ‘wallaroo==2022.4.0’ before clicking Install.

Selecting PyPi as the Library Source and fill in the package

Now, we’ll want to open our notebook, so we’ll select Workspace from the left menu and, in this case, we will select wallaroo-anomaly-detection.

Open the Notebook in Azure Databricks

And this loads our notebook:

Model Deployment and Anomaly Detection Notebook

Once loaded, we need to import the required libraries, including Wallaroo’s, into the notebook itself.

Importing the required libraries

After that, we will connect to a Wallaroo instance where all of the deployment, management, and observability will take place. Run the code block and click the URL that appears

Connecting to the Wallaroo Instance

You will be asked to login or be automatically redirected if SSO is set up, then click Yes to give Wallaroo the rights it needs to operate.

Login access granted button Yes No

You will see a successful login, and can close that tab.

Successful Login Screen

Once you are logged in, we now need to create a Wallaroo workspace (like Azure Databricks, this is a collaboration space in which all of Wallaroo’s functionality exists).

Workspace creation

Now that we have the workspace created, you can upload your model which, in our example, is the house pricing model coming from an Azure Databricks repo we cloned from GitHub.

Upload ML Model

With the model uploaded, we create our pipeline (inference workflows that allow you to put preprocessing, postprocessing, validation and one or more model steps) which, in this example, contains our model and a validation step for the output.

Building the pipeline and validation

With the pipeline configured, we can run a test inference to check that things are working as expected, both passing and failing validation.

Test Inference

We can also run multiple test inferences against a large data set.

Multiple Test Inference

In our case, we are looking to identify anomalies in the house pricing models against expected results so that we can decide to take preventive or corrective actions on the model to address the anomalies. We decided to visualize the data as a distribution in order to understand the frequency of our anomalies.

Anomaly Output Chart

From the distribution chart above we can see that there are some house pricing anomalies in the $3.5 million range.

Apart from visualization, we can also view anomalies in the inference logs.

Anomaly Output Table

As a general environment cleanliness step, we like to undeploy the pipeline which returns the resources back to the Wallaroo instance and helps reduce unnecessary cloud costs.

Undeploying the pipeline

From the example above we have seen that the integration of Wallaroo in Azure Databricks provides AI and ML practitioners and teams easy, end-to-end MLOps capabilities from testing and model development through to deploying repeatable, production model deployment, management, and observability. This process scales as the needs of the business grows while working with existing and familiar ML tools and helping to reduce change management overhead and realizing the value of data to the business sooner.

You can learn and get hands-on experience with the example above as well as other ML use cases with our free Wallaroo Community Edition, Tutorials, and YouTube channel.

Top comments (0)