In this guide, you will learn how to quickly create a virtual development environment for your ML projects using DevPod, and how you can easily package and share your ML artifacts as a ModelKit using KitOps.
For machine learning (ML) engineers, inconsistent environments, dependency misalignments, and local hardware limitations are common bottlenecks. These issues not only delay progress but also hinder collaboration and onboarding. Imagine onboarding a new team member only to spend hours replicating the environment. This is a big waste of time. Is there any way out of this mess?
DevPod offers containerized, pre-configured environments that streamline onboarding, guarantee consistency, and support remote development, enabling teams to work more efficiently. It’s open source and infrastructure-independent, meaning it works with the stack your team already uses and prefers.
Similarly, ModelKits facilitate collaboration by allowing developers to create a single artifact for their ML projects that are easily shared, versioned, managed, and deployed. With containerized storage and versioning, organizations can ensure everyone works with the latest models and configurations. The checksum associated with each version further strengthens the security of the application being developed, and KitOps standard-based packaging format means your ModelKit is compatible with the tools your team have come to love and trust.
Installing and configuring DevPod
You can install and configure DevPod using the command line tools. To make things easier, the guide below uses the Graphical User Interface (GUI). To work with DevPod, you will need to connect to either a Kubernetes cluster, a cloud provider, or a Docker container. Since Docker is easiest to install and set up, you will connect your DevPod to a Docker container.
- Download and install Docker.
- Download the correct executable file from the official site and install it.
- Open DevPod and click on + Create button to create a new workspace.
-
In the new dialog that appears, choose Docker as your provider and Python as the interpreter. If you have VSCode installed, you can choose VSCode as the default editor or use VSCode in the browser by choosing VSCode Browser.
You can also provide your own container image with all the necessary libraries installed via the **Enter Workspace Source.** Developers can then start working directly without the hassle of installing common libraries and tools.
- Provide a meaningful name for your workspace in the Workspace Name section.
- Click on Create Workspace button to create the workspace.
Note: You will need to have Docker running.
- You should now see a new VSCode window or a webpage with VSCode.
You are now free to use this newly created workspace for your project. You can start writing your script to load data, clean data, or train an ML model. You can use it as if it was your own local editor. Let’s use the newly created workspace to create a wine quality classifier.
Train the wine classifier model
Download the wine quality dataset from Kaggle and save it as
winequality.csv
inside thedataset
folder in your workspace.-
Install
pandas
andscikit-learn
and freeze the requirements by executing the commands below:
pip install pandas scikit-learn pip freeze > requirements.txt
Create a new file,
train.py
to train and save the final model. The file should contain the following code:
# Importing necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
import joblib
# Load the dataset
file_path = 'dataset/winequality.csv' # Update with the correct file path if needed
df = pd.read_csv(file_path)
# Preprocessing: Separate features and target
X = df.drop('quality', axis=1)
y = df['quality']
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train a Random Forest Classifier
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
# Evaluate the model
y_pred = model.predict(X_test)
print("Classification Report:")
print(classification_report(y_test, y_pred))
# Save the model to disk
model_path = 'saved_model/wine_quality_model.pkl' # Specify the desired path to save the model
joblib.dump(model, model_path)
print(f"Model saved to {model_path}")
- Run the
train.py
script usingpython train.py
. You should now see the final saved model insaved_model
directory.
At this point, your directory structure should look something like this:
.
├── dataset
│ └── winequality.csv
├── requirements.txt
├── saved_model
│ └── wine_quality_model.pkl
└── train.py
Packaging and sharing with ModelKit
Once your model is trained, the next step is packaging and sharing it. ModelKit simplifies this process by bundling code, datasets, and models into an OCI-compliant package that enforces consistency and security.
Key benefits of ModelKit
Version-controlled and secured packaging:
Combine all project artifacts into a single bundle with versioning and SHA checksums for integrity.Seamless integration:
Works with OCI-compliant registries (e.g., Docker Hub and Jozu Hub) and integrates with popular tools like HuggingFace, ZenML, and Git.Effortless dependency management:
Ship dependencies alongside code for hassle-free execution.
How to use ModelKit
Installing Kit is easy as all you need to do is download the package, unarchive it, and move the kit executable to a location where your operating system can find it. In a container image based on Linux, you can achieve this by running the following commands:
wget https://github.com/jozu-ai/kitops/releases/latest/download/kitops-linux-x86_64.tar.gz
tar -xzvf kitops-linux-x86_64.tar.gz
sudo mv kit /usr/local/bin/
Verify your installation by running the command kit version
. Your output should look something like this:
Version: 0.2.5-29dbdc4
Commit: 29dbdc48bf2b5f9ee801d6454974e0b8474e916b
Built: 2024-06-06T17:53:35Z
Go version: go1.21.6
Once you have installed Kit, you will need to write a Kitfile to specify different components of your code that need to be packaged. You can use any text editor to create a new file Kitfile
without any extension and enter the following details:
manifestVersion: "1.0"
package:
name: Wine Classification
version: 0.0.1
authors: ["Bhuwan Bhatt"]
model:
name: wine-classification-v1
path: ./saved_model
description: Wine classification using sklearn
datasets:
- description: Dataset for the wine quality data
name: training data
path: ./dataset
code:
- description: Code for training
path: .
There are 5 major components in the code snippet above:
- manifestVersion: Specifies the version for the Kitfile.
- package: Specifies the metadata for the package.
- model: Specifies the model details, which contain the model's name, its path, and human-readable description.
- datasets: Similar to the model, specify the path, name, and description for the dataset.
- code: Specifies the directory containing code that needs to be packaged.
Once the Kit command line tools are installed and Kitfile is ready, you will need to log in to a container registry. To log in to DockerHub, use the below command:
kit login docker.io # Then enter details like username and password, password is hidden
You can then package the artifacts into a ModelKit using the following command:
kit pack . -t docker.io/<USERNAME>/<CONTAINER_NAME>:<CONTAINER_TAG>
# Example: `kit pack . -t docker.io/bhattbhuwan13/wine_classification:v1`
Finally, you can push the ModelKit to the remote hub:
kit push docker.io/<USERNAME>/<CONTAINER_NAME>:<CONTAINER_TAG>
# Example: kit push docker.io/bhattbhuwan13/wine_classification:v1
Now, your fellow developers can pull required components from the ModelKit or the entire package in their own instance of the DevPod using a single command. They can unpack specific components from the ModelKit:
kit unpack --datasets docker.io/<USERNAME>/<CONTAINER_NAME>:<CONTAINER_TAG>
# Example: kit unpack --datasets docker.io/bhattbhuwan13/wine_classification:v1
Or, they can unpack the entire ModelKit in their DevPod instance:
kit unpack docker.io/<USERNAME>/<CONTAINER_NAME>:<CONTAINER_TAG>
# Example: kit unpack docker.io/bhattbhuwan13/wine_classification:v1
At this point, your fellow developer can run the necessary tests to verify that the model or code works as expected. Once the tests run successfully, they can create a container with only the necessary components (scripts and model artifact) and push it to the registry, causing the deployment pipeline to trigger and deploy the latest model.
Why do these tools work so well?
By combining DevPod with KitOps ModelKits we can realize three key benefits.
Consistency across teams: DevPods enable all team members to work in identical environments. The flexibility to launch DevPod on top of cloud providers and a Kubernetes cluster further assists teams and organizations in augmenting the resource allocation process for individual machines. ModelKits standardize how projects are shared.
Improved collaboration: Share pre-configured DevPods and packaged ModelKits for seamless handoffs between teams. Organizations can create a new container image with KitOps and necessary ML libraries (Pandas, scikit-learn, Pytorch, etc.) pre-installed. The container image can then be used to launch new instances with DevPod. This way, all developers will have access to the same environment and have an easier time developing, collaborating, and sharing project artifacts with each other or among cross-functional teams.
Enhanced efficiency: Reduce setup time and focus on innovation by automating dependencies and artifact sharing.
With these tools, you can streamline development, enhance collaboration, and accelerate deployment.
Explore our resources, join the conversation on Discord, or check out our guide to get started.
Top comments (0)