Jesse Williams for KitOps

Posted on Jan 28 • Originally published at jozu.com

Accelerating ML Development with DevPods and ModelKits

#programming #ai #tutorial #opensource

In this guide, you will learn how to quickly create a virtual development environment for your ML projects using DevPod, and how you can easily package and share your ML artifacts as a ModelKit using KitOps.

For machine learning (ML) engineers, inconsistent environments, dependency misalignments, and local hardware limitations are common bottlenecks. These issues not only delay progress but also hinder collaboration and onboarding. Imagine onboarding a new team member only to spend hours replicating the environment. This is a big waste of time. Is there any way out of this mess?

DevPod offers containerized, pre-configured environments that streamline onboarding, guarantee consistency, and support remote development, enabling teams to work more efficiently. It’s open source and infrastructure-independent, meaning it works with the stack your team already uses and prefers.

Similarly, ModelKits facilitate collaboration by allowing developers to create a single artifact for their ML projects that are easily shared, versioned, managed, and deployed. With containerized storage and versioning, organizations can ensure everyone works with the latest models and configurations. The checksum associated with each version further strengthens the security of the application being developed, and KitOps standard-based packaging format means your ModelKit is compatible with the tools your team have come to love and trust.

Installing and configuring DevPod

You can install and configure DevPod using the command line tools. To make things easier, the guide below uses the Graphical User Interface (GUI). To work with DevPod, you will need to connect to either a Kubernetes cluster, a cloud provider, or a Docker container. Since Docker is easiest to install and set up, you will connect your DevPod to a Docker container.

Download and install Docker.
Download the correct executable file from the official site and install it.
Open DevPod and click on + Create button to create a new workspace.

In the new dialog that appears, choose Docker as your provider and Python as the interpreter. If you have VSCode installed, you can choose VSCode as the default editor or use VSCode in the browser by choosing VSCode Browser.

You can also provide your own container image with all the necessary libraries installed via the **Enter Workspace Source.** Developers can then start working directly without the hassle of installing common libraries and tools.

Provide a meaningful name for your workspace in the Workspace Name section.

Click on Create Workspace button to create the workspace.

Note: You will need to have Docker running.

You should now see a new VSCode window or a webpage with VSCode.

You are now free to use this newly created workspace for your project. You can start writing your script to load data, clean data, or train an ML model. You can use it as if it was your own local editor. Let’s use the newly created workspace to create a wine quality classifier.

Train the wine classifier model

Download the wine quality dataset from Kaggle and save it as winequality.csv inside the dataset folder in your workspace.
Install pandas and scikit-learn and freeze the requirements by executing the commands below:
```
pip install pandas scikit-learn
pip freeze > requirements.txt
```
Create a new file, train.py to train and save the final model. The file should contain the following code:

    # Importing necessary libraries
    import pandas as pd
    from sklearn.model_selection import train_test_split
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.metrics import classification_report
    import joblib

    # Load the dataset
    file_path = 'dataset/winequality.csv'  # Update with the correct file path if needed
    df = pd.read_csv(file_path)

    # Preprocessing: Separate features and target
    X = df.drop('quality', axis=1)
    y = df['quality']

    # Split the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Train a Random Forest Classifier
    model = RandomForestClassifier(random_state=42)
    model.fit(X_train, y_train)

    # Evaluate the model
    y_pred = model.predict(X_test)
    print("Classification Report:")
    print(classification_report(y_test, y_pred))

    # Save the model to disk
    model_path = 'saved_model/wine_quality_model.pkl'  # Specify the desired path to save the model
    joblib.dump(model, model_path)
    print(f"Model saved to {model_path}")

Run the train.py script using python train.py. You should now see the final saved model in saved_model directory.

At this point, your directory structure should look something like this:

    .
    ├── dataset
    │   └── winequality.csv
    ├── requirements.txt
    ├── saved_model
    │   └── wine_quality_model.pkl
    └── train.py

Packaging and sharing with ModelKit

Once your model is trained, the next step is packaging and sharing it. ModelKit simplifies this process by bundling code, datasets, and models into an OCI-compliant package that enforces consistency and security.

Key benefits of ModelKit

Version-controlled and secured packaging:
Combine all project artifacts into a single bundle with versioning and SHA checksums for integrity.
Seamless integration:
Works with OCI-compliant registries (e.g., Docker Hub and Jozu Hub) and integrates with popular tools like HuggingFace, ZenML, and Git.
Effortless dependency management:
Ship dependencies alongside code for hassle-free execution.

How to use ModelKit
Installing Kit is easy as all you need to do is download the package, unarchive it, and move the kit executable to a location where your operating system can find it. In a container image based on Linux, you can achieve this by running the following commands:

    wget https://github.com/jozu-ai/kitops/releases/latest/download/kitops-linux-x86_64.tar.gz

    tar -xzvf kitops-linux-x86_64.tar.gz

    sudo mv kit /usr/local/bin/

Verify your installation by running the command kit version. Your output should look something like this:

    Version: 0.2.5-29dbdc4
    Commit: 29dbdc48bf2b5f9ee801d6454974e0b8474e916b
    Built: 2024-06-06T17:53:35Z
    Go version: go1.21.6

Once you have installed Kit, you will need to write a Kitfile to specify different components of your code that need to be packaged. You can use any text editor to create a new file Kitfile without any extension and enter the following details:

    manifestVersion: "1.0"
    package:
      name: Wine Classification
      version: 0.0.1
      authors: ["Bhuwan Bhatt"]
    model:
      name: wine-classification-v1
      path: ./saved_model
      description: Wine classification using sklearn
    datasets:
      - description: Dataset for the wine quality data
        name: training data
        path: ./dataset

    code:
      - description: Code for training
        path: .

There are 5 major components in the code snippet above:

manifestVersion: Specifies the version for the Kitfile.
package: Specifies the metadata for the package.
model: Specifies the model details, which contain the model's name, its path, and human-readable description.
datasets: Similar to the model, specify the path, name, and description for the dataset.
code: Specifies the directory containing code that needs to be packaged.

Once the Kit command line tools are installed and Kitfile is ready, you will need to log in to a container registry. To log in to DockerHub, use the below command:

kit login docker.io # Then enter details like username and password, password is hidden

You can then package the artifacts into a ModelKit using the following command:

    kit pack . -t docker.io/<USERNAME>/<CONTAINER_NAME>:<CONTAINER_TAG>
    # Example: `kit pack . -t docker.io/bhattbhuwan13/wine_classification:v1`

Finally, you can push the ModelKit to the remote hub:

    kit push docker.io/<USERNAME>/<CONTAINER_NAME>:<CONTAINER_TAG>
    # Example: kit push docker.io/bhattbhuwan13/wine_classification:v1

Now, your fellow developers can pull required components from the ModelKit or the entire package in their own instance of the DevPod using a single command. They can unpack specific components from the ModelKit:

    kit unpack --datasets docker.io/<USERNAME>/<CONTAINER_NAME>:<CONTAINER_TAG>
    # Example: kit unpack --datasets docker.io/bhattbhuwan13/wine_classification:v1

Or, they can unpack the entire ModelKit in their DevPod instance:

    kit unpack docker.io/<USERNAME>/<CONTAINER_NAME>:<CONTAINER_TAG>
    # Example: kit unpack docker.io/bhattbhuwan13/wine_classification:v1

At this point, your fellow developer can run the necessary tests to verify that the model or code works as expected. Once the tests run successfully, they can create a container with only the necessary components (scripts and model artifact) and push it to the registry, causing the deployment pipeline to trigger and deploy the latest model.

Why do these tools work so well?

By combining DevPod with KitOps ModelKits we can realize three key benefits.

Consistency across teams: DevPods enable all team members to work in identical environments. The flexibility to launch DevPod on top of cloud providers and a Kubernetes cluster further assists teams and organizations in augmenting the resource allocation process for individual machines. ModelKits standardize how projects are shared.
Improved collaboration: Share pre-configured DevPods and packaged ModelKits for seamless handoffs between teams. Organizations can create a new container image with KitOps and necessary ML libraries (Pandas, scikit-learn, Pytorch, etc.) pre-installed. The container image can then be used to launch new instances with DevPod. This way, all developers will have access to the same environment and have an easier time developing, collaborating, and sharing project artifacts with each other or among cross-functional teams.
Enhanced efficiency: Reduce setup time and focus on innovation by automating dependencies and artifact sharing.

With these tools, you can streamline development, enhance collaboration, and accelerate deployment.

Explore our resources, join the conversation on Discord, or check out our guide to get started.

DEV Community

Accelerating ML Development with DevPods and ModelKits

Installing and configuring DevPod

Train the wine classifier model

Packaging and sharing with ModelKit

Why do these tools work so well?

Top comments (0)

Read next

The Future of Background Verification: AI and Predictive Analytics

Understanding Method Hiding and Overriding in C#: A Real-World Problem Solved

How AI Helps Reduce False Positives in Cyber Threat Detection?

TailAdmin 2.0 – A Major Upgrade with 400+ Components & Enhanced UI! 🚀