If you are using Managed MLFlow in Databricks Workspace to train and save your models and can't figure out how to download and serve the model outside the Databricks environment using Docker, you are in luck!
In this article, I will touch upon the following points:
- Downloading MLFlow model from Databricks workspace model registry
- Packaging downloaded model and serve it in a container using Docker
Downloading MLFlow model from Databricks workspace
Databricks provides the managed version of MLFlow to write our experiments in a notebook and register the model in the provided MLFlow registry.
We'll use MLFlow's Python API to download a model.
To download a model from Databricks workspace you need to do two things:
- Set MLFlow tracking URI to
databricks
using python API - Setup databricks authentication. I prefer authenticating by setting the following environment variables, you can also use databricks CLI to authenticate:
DATABRICKS_HOST
DATABRICKS_TOKEN
Here's a basic code snippet to download a model from Databricks workspace model registry:
import os
import mlflow
from mlflow.store.artifact.models_artifact_repo import ModelsArtifactRepository
model_name = "example-model-name"
model_stage = "Staging" # Should be either 'Staging' or 'Production'
mlflow.set_tracking_uri("databricks")
os.makedirs("model", exist_ok=True)
local_path = ModelsArtifactRepository(
f'models:/{model_name}/{model_stage}').download_artifacts("", dst_path="model")
print(f'{model_stage} Model {model_name} is downloaded at {local_path}')
Running above python script will download an ML model in the model
directory.
Containerizing MLFlow model serving with Docker
The next step is to package this downloaded model in a docker image and serve a model when you run the image.
Here's a basic Dockerfile to do the same:
FROM continuumio/miniconda3
ENV MLFLOW_HOME /opt/mlflow
ENV MLFLOW_VERSION 1.12.1
ENV PORT 5000
RUN conda install -c conda-forge mlflow=${MLFLOW_VERSION}
COPY model/ ${MLFLOW_HOME}/model
WORKDIR ${MLFLOW_HOME}
RUN mlflow models prepare-env -m ${MLFLOW_HOME}/model
RUN useradd -d ${MLFLOW_HOME} mlflow
RUN chown mlflow: ${MLFLOW_HOME}
USER mlflow
CMD mlflow models serve -m ${MLFLOW_HOME}/model --host 0.0.0.0 --port ${PORT}
Few things to note from Dockerfile
:
- We are using base image
continuumio/miniconda3
because mlflow by default uses conda to install it's dependencies while preparing environment for the model serving - We are using non-root
mlflow
user with limited permissions to runmlflow models serve
process in a secure way - We are specifying host as
0.0.0.0
to listen on as default uses127.0.0.1
which will not let us access webserver started by mlflow from outside container
Wrapping up
This is a very simple and minimal example of how you can use docker to serve the MLFlow model trained in Databricks workspace.
I hope this article will help you save time in connecting the dots in different documentations.
So that is it, fellas. Thank you for reading the article. Wish you a great day. Peace out ✌.
Reference
https://www.mlflow.org/docs/latest/cli.html#mlflow-models-serve
https://databricks.com/blog/2019/10/17/managed-mlflow-now-available-on-databricks-community-edition.html
https://docs.databricks.com/dev-tools/cli/index.html#set-up-authentication
Top comments (0)