DEV Community

parmarjatin4911@gmail.com
parmarjatin4911@gmail.com

Posted on

How to Integrate Comet with Catboost Workflows

How to Integrate Comet with Catboost Workflows

Image description

Catboost is one of the most versatile gradient-boosting models. Its crucial capability is processing categorical data without converting it to numerical data. This means that the model can perform its function as you desire after specifying categorical data. An added benefit is that a data practitioner can easily and quickly establish a baseline with minimal data transformations.

Unfortunately, as you scroll through Comet’s supported libraries, you will discover the glaring lack of Catboost support. Fortunately, I have a clever but straightforward workaround leveraging Comet’s versatile capabilities.

Comet’s support for Tensorflow’s Tensorboard can become a saving grace for anyone using Catboost effectively with Comet. One integral feature of Catboost is that it uses Tensorboard to keep track of training runs and stores that information locally on the machine’s disk. Let’s look at the general workflow of performing this integration.

Requirements

There are a few things you need to install before you go through with this:

  1. Catboost.

  2. Comet’s official library.

  3. A Comet account that you can get by signing up here.

  4. Visual Studio Code (or any IDE that supports Tensorboard Integration).

With the above, it is now possible to proceed.

Catboost Integration with Tensorboard

Catboost integrates with Tensorboard to ensure we can adequately visualize training runs. It indicates the desired metric that one has specified during the training run, graphs it, and allows for customization. The picture below is an excellent example of what I am talking about.

An added advantage of this library is that it stores the information in a local directory or a directory you initially specified before the training run, as seen below.

Catboost directory by author

In this article, we use this feature to our advantage as Comet allows us to upload this information and create a custom panel to have a clear view of the given data. Despite the lack of support for this library, we see that the support for TensorFlow’s Tensorboard gives us an added advantage because we can leverage a single aspect of the Catboost library and upload critical information through an existing Comet function.

Let’s code!
Simple Project

Our simple project here will only focus on the capabilities Catboost, Tensorboard, and Comet offer. All data transformation has already been done, so we have a reasonably clean dataset that will be fed into the model.

The dataset of our choice is the Bengaluru House Prediction Dataset from Machinehack. This competition aims to test your skills in regression problems. We can now check our preprocessed data with Pandas to have a general overview of our data.

import pandas as pd

reading data in local directory

df = pd.read_csv("preprocessed_train_data.csv")

defining features(X) and targets(y)

X = df.drop(["price"], axis=1)
y = df["price"]

printing the top 5 positions of X and y

print(X.head())
print(y.head())

X data by author
y data by author

Now that we have seen our myriad of features (including one categorical one), we can incorporate this into Comet.
Comet Incorporation

Note: Import the Comet library first and initialize the project before proceeding with any other code for a smoother way forward.

First, import the Comet library and initialize our project under the name “catboost_comet.”

import comet_ml
comet_ml.init(project_name="catboost_comet")

We then define our categorical features and perform train-validation splits for the training of our model. Catboost requires a user to specify the categorical features that a dataset has.

from sklearn.model_selection import train_test_split
import numpy as np

defining categorical features

categorical_features_indices = np.where(X.dtypes !=np.float)[0]

Train-test split

X_train, X_validation, y_train, y_validation = train_test_split(X, y, train_size = 0.8, random_state = 12)

Now, we can feed the above information into Catboost’s regressor and perform training.

from catboost import CatBoostRegressor

defining model params

model = CatBoostRegressor(iterations=50, depth=3, learning_rate=0.1, loss_function='RMSE', early_stopping_rounds=5)

training model on data

model.fit(X_train, y_train, cat_features=categorical_features_indices, eval_set=(X_validation, y_validation), plot=True)

performing inference

y_valid = model.predict(X_validation)

Training run and results by author

After this run, there is the expectation that information on the training run will be stored in a folder for the Tensorboard to perform a visualization. This folder will be within the project’s directory. We can extract two files containing the critical info we need from it.
Local directory by author

Within the “catboost_info” directory, we will find the “learn_error.tsv” and “test_error.tsv” files. We shall log these files into Comet using “log_table().”

logging both tables to Comet

experiment.log_table("./src/catboost_info/learn_error.tsv")
experiment.log_table("./src/catboost_info/test_error.tsv")

Once those are complete, we can end the experiment and open Comet to see whether it was successful.

Ending experiment

experiment.end()

Comet Visualization

Once we open our Comet project page, we will find and open our project.
Project image by author
Screenshot of project by author

After pressing the prominent blue button in the middle, you will get the menu below and pick “Data Panel.”
Comet Menu by author

After opening that menu, we will find a drop-down menu called “Data Selection,” where we will see the two tables we initially uploaded.
Screenshot by author

We then pick the first table, and it will give a preview of the data. After we press it, it will appear on the original page with no panels, as seen below.
Screenshot by author.

When we press “Add” in the top right corner, we see the option to add another panel, and then we can repeat the process for the other table that we logged in.

We finally have logged information concerning a Catboost training run using Comet’s support for the Tensorboard. In this tutorial, we have successfully integrated a library that Comet does not officially support.

Top comments (0)