How to beat Python’s pip: Inspecting the quality of machine learning software

#python #machinelearning #opensource #datascience

Following the previous article written about solving Python dependencies, we will take a look at the quality of software. This article will cover "inspections" of software stacks and will link a free dataset available on Kaggle. Even though the title says the quality of "machine learning software", principles and ideas can be reused for inspecting any software quality.

Application (Software & Hardware) Stack

Let’s consider a Python machine learning application. This application can use a machine learning library, such as TensorFlow. TensorFlow is in that case a direct dependency of the application and by installing it, the machine learning application is using directly TensorFlow and indirectly dependencies of TensorFlow. Examples of such indirect dependencies of our application can be NumPy or absl-py that are used by TensorFlow.

Our machine learning Python application and all the Python libraries run on top of a Python interpreter in some specific version. Moreover, they can use other additional native dependencies (provided by the operating system) such as glibc or CUDA (if running computations on GPU). To visualize this fact, let’s create a stack with all the items creating the application stack running on top of some hardware.

Note that an issue in any of the described layers causes that our Python application misbehaves, produces wrong output, produces runtime errors, or simply does not start at all.

Let’s try to identify any possible issues in the described stack by building the software and let’s have it running on our hardware. By doing so we can spot possible issues before pushing our application to a production environment or fine-tune the software so that we get the best possible out of our application on the hardware available.

On-demand software stack creation

If our application depends on a TensorFlow release starting version 2.0.0 (e.g. requirements on API offered by tensorflow>=2.0.0), we can test our application with different versions of TensorFlow up to the current 2.3.0 release available on PyPI to this date. The same can be applied to transitive dependencies of TensorFlow, e.g. absl-py, NumPy, or any other. A version change of any transitive dependency can be performed analogically to any other dependency in our software stack.

Dependency Monkey

Note one version change can completely change (or even invalidate) what dependencies in what versions will be present in the application stack considering the dependency graph and version range specifications of libraries present in the software stack. To create a pinned down list of packages in specific versions to be installed a resolver needs to be run in order to resolve packages and their version range requirements.

Do you remember the state space described in the first article of "How to beat Python’s pip" series? Dependency Monkey can in fact create the state space of all the possible software stacks that can be resolved respecting version range specifications. If the state space is too large to resolve in a reasonable time, it can be sampled.

A component called "Dependency Monkey" is capable of creating different software stacks considering the dependency graph and version specifications of packages in the dependency graph. This all is done offline based on pre-computed results from Thoth’s solver runs (see the previous article from "How to beat Python’s pip" series). The results of solver runs are synced into Thoth’s database so that they are available in a query-able form. Doing so enables Dependency Monkey to resolve software stacks at a fast pace (see a YouTube video on optimizing Thoth’s resolver). Moreover, the underlying algorithm can consider Python packages published on different Python indices (besides PyPI, it can also use custom TensorFlow builds from an index such as the AICoE one). We will do a more in-depth explanation of Dependency Monkey in one of the upcoming articles. If you are too eager, feel free to browse its online documentation.

Amun API

Now, let’s utilize a service called "Amun". This service was designed to accept a specification of the software stack and hardware and execute an application given the specification.

Amun is an OpenShift cluster native application, that utilizes OpenShift features (such as builds, container image registry, …) and Argo Workflows to run desired software on specific hardware using a specific software environment. The specification is accepted in a JSON format that is subsequently translated into respective steps that need to be done in order to test the given stack build and run.

The video linked above shows how Amun inspections are run and how the knowledge created is aggregated using OpenShift, Argo workflows, and Ceph. You can see inspected different TensorFlow builds tensorflow, tensorflow-cpu, intel-tensorflow and a community builds of TensorFlow for AVX2 instruction set support available on the AICoE index.

Thoth’s inspection dataset on Kaggle

We (Red Hat) have produced multiple inspections as part of the project Thoth where we tested different TensorFlow releases and different TensorFlow builds.

One such dataset is Thoth’s performance data set in version 1 on Kaggle. It’s consisting out of nearly 4000 files capturing information about inspection runs of TensorFlow stacks. A notebook published together with the dataset can help one exploring the dataset.

Project Thoth

Project Thoth is an application that aims to help Python developers. If you wish to be updated on any improvements and any progress we make in project Thoth, feel free to subscribe to our YouTube channel where we post updates as well as recordings from scrum demos. Reach out to our Twitter as well if you want to be informed about new stuff.

Stay tuned for any updates!