DEV Community

martinbald81
martinbald81

Posted on

ML Production: The Importance of Integrated Tools.

Machine Learning projects often involve complex tasks such as data preprocessing, model building, training, evaluation, deployment, and monitoring. In addition to this the roles that are tasked with ML projects such as Data Scientists and ML Engineers typically work with a collection of tools that, when it comes to production, are typically not contributing to successful deployments and sustainability of the ML solution. This is where it is crucial to have an ML production platform with an SDK that plugs into existing tools and software utilized in the pre production stages to create the connective tissue for the end to end ML Process.

Machine Learning End to End Journey to Production

Fig 1. ML End To End Journey

Examples of the production stages are available at the following links. Model Deployment, Model Validation & Checking, Model Monitoring & Observability.

At Wallaroo our SDK was designed with data scientists in mind and has incorporated direct feedback from our customers providing the capability for Data Scientists and ML Engineers to provide a simple, secure & scalable deployment that fits into your ML ecosystem and move ML models into production while using a development environment that is familiar to them. There are a number of benefits to the Wallaroo SDK;

Efficiency: Providing an environment to use pre-written code and functions that can save time and effort in building ML solutions from scratch.

Consistency: Providing a set of tools and resources that ensure consistency across different applications and platforms. Through this the practitioners do not have to learn a new tool or process and can work with existing software tools that they or their company have made an investment in.

Performance: Helping to optimize the performance and scalability of ML models by leveraging the features and capabilities of the underlying ML platform and framework such as low latency inference.

Support: Providing documentation, tutorials, and examples, as well as community support that can help data scientists and ML Engineers learn and troubleshoot ML models.

All of these benefits help to avoid costly delays to getting ML models into production and also contribute to lowering the costs associated with software ownership, retraining and learning new tools.

SDK Install Guides help you get plugged into production ML without leaving the familiar tools and software that you work in day to day. For example if you are using Azure Databricks in machine learning to train models, track training parameters and models using experiments the Wallaroo SDK is especially powerful when paired with Databricks because it picks up where Databricks leaves off, in that you already have your connections to data stores, model registries, and repos which can be leveraged in the production deployment capabilities Wallaroo offers to ensure a tight feedback loop with the appropriate corrective and preventive actions across your training and production environments as models start to present anomalies or drift.

Also if you are using Azure ML for model training and development you can continue your progress to production ML through deploying models to Wallaroo through the Wallaroo SDK. Through this integration into Azure ML Data scientists can easily upload their models and specify modeling pipelines via the Wallaroo SDK with just a few lines of python, using the notebook environment that they are most comfortable with. This helps to reduce change management overhead for production ML leading to improved scale and repeatable production model operations.

The same applies to Data Scientists using AWS Sagemaker and Google Vertex for model training and development where they can deploy models to Wallaroo through the Wallaroo SDK without leaving these familiar environments.

Finally, installing the Wallaroo SDK is very straightforward and can be accomplished in a few minutes using the commands below and providing the ability to use your own Jupyter Notebook environment.

Steps: To set up the Python virtual environment for use of the Wallaroo SDK:

1: From a terminal shell, create the Python virtual environment with conda. Replace wallaroosdk with the name of the virtual environment as required by your organization. Note that Python 3.8.6 and above is specified as a requirement for Python libraries used with the Wallaroo SDK. The following will install the latest version of Python 3.8.

conda create -n wallaroosdk python=3.8
Enter fullscreen mode Exit fullscreen mode

2: Activate the new environment.

conda activate wallaroosdk
Enter fullscreen mode Exit fullscreen mode

3: Optional steps for those that want to use the Wallaroo SDk from within Jupyter and similar environments:

a: Install the ipykernel library. This allows the JupyterHub notebooks to access the Python virtual environment as a kernel, and it is required for the second part of this tutorial.


    conda install ipykernel


b: Install the new virtual environment as a python kernel.


    ipython kernel install --user --name=wallaroosdk
Enter fullscreen mode Exit fullscreen mode

4: Install the Wallaroo SDK. This process may take several minutes while the other required Python libraries are added to the virtual environment.

pip install wallaroo==2023.1.0
Enter fullscreen mode Exit fullscreen mode

If you would like to try the Wallaroo SDK with any of the above environments you can use the SDK Guides, along with the SDK Essentials, and SDK Reference docs. You can also build your ML Production skills through the Free Wallaroo Community Edition.

Top comments (0)