How to Integrate Comet with Catboost Workflows
Catboost is one of the most versatile gradient-boosting models. Its crucial capability is processing categorical data without converting it to numerical data. This means that the model can perform its function as you desire after specifying categorical data. An added benefit is that a data practitioner can easily and quickly establish a baseline with minimal data transformations.
Unfortunately, as you scroll through Comet’s supported libraries, you will discover the glaring lack of Catboost support. Fortunately, I have a clever but straightforward workaround leveraging Comet’s versatile capabilities.
Comet’s support for Tensorflow’s Tensorboard can become a saving grace for anyone using Catboost effectively with Comet. One integral feature of Catboost is that it uses Tensorboard to keep track of training runs and stores that information locally on the machine’s disk. Let’s look at the general workflow of performing this integration.
Requirements
There are a few things you need to install before you go through with this:
Catboost.
Comet’s official library.
A Comet account that you can get by signing up here.
Visual Studio Code (or any IDE that supports Tensorboard Integration).
With the above, it is now possible to proceed.
Catboost Integration with Tensorboard
Catboost integrates with Tensorboard to ensure we can adequately visualize training runs. It indicates the desired metric that one has specified during the training run, graphs it, and allows for customization. The picture below is an excellent example of what I am talking about.
An added advantage of this library is that it stores the information in a local directory or a directory you initially specified before the training run, as seen below.
Catboost directory by author
In this article, we use this feature to our advantage as Comet allows us to upload this information and create a custom panel to have a clear view of the given data. Despite the lack of support for this library, we see that the support for TensorFlow’s Tensorboard gives us an added advantage because we can leverage a single aspect of the Catboost library and upload critical information through an existing Comet function.
Let’s code!
Simple Project
Our simple project here will only focus on the capabilities Catboost, Tensorboard, and Comet offer. All data transformation has already been done, so we have a reasonably clean dataset that will be fed into the model.
The dataset of our choice is the Bengaluru House Prediction Dataset from Machinehack. This competition aims to test your skills in regression problems. We can now check our preprocessed data with Pandas to have a general overview of our data.
import pandas as pd
reading data in local directory
df = pd.read_csv("preprocessed_train_data.csv")
defining features(X) and targets(y)
X = df.drop(["price"], axis=1)
y = df["price"]
printing the top 5 positions of X and y
print(X.head())
print(y.head())
X data by author
y data by author
Now that we have seen our myriad of features (including one categorical one), we can incorporate this into Comet.
Comet Incorporation
Note: Import the Comet library first and initialize the project before proceeding with any other code for a smoother way forward.
First, import the Comet library and initialize our project under the name “catboost_comet.”
import comet_ml
comet_ml.init(project_name="catboost_comet")
We then define our categorical features and perform train-validation splits for the training of our model. Catboost requires a user to specify the categorical features that a dataset has.
from sklearn.model_selection import train_test_split
import numpy as np
defining categorical features
categorical_features_indices = np.where(X.dtypes !=np.float)[0]
Train-test split
X_train, X_validation, y_train, y_validation = train_test_split(X, y, train_size = 0.8, random_state = 12)
Now, we can feed the above information into Catboost’s regressor and perform training.
from catboost import CatBoostRegressor
defining model params
model = CatBoostRegressor(iterations=50, depth=3, learning_rate=0.1, loss_function='RMSE', early_stopping_rounds=5)
training model on data
model.fit(X_train, y_train, cat_features=categorical_features_indices, eval_set=(X_validation, y_validation), plot=True)
performing inference
y_valid = model.predict(X_validation)
Training run and results by author
After this run, there is the expectation that information on the training run will be stored in a folder for the Tensorboard to perform a visualization. This folder will be within the project’s directory. We can extract two files containing the critical info we need from it.
Local directory by author
Within the “catboost_info” directory, we will find the “learn_error.tsv” and “test_error.tsv” files. We shall log these files into Comet using “log_table().”
logging both tables to Comet
experiment.log_table("./src/catboost_info/learn_error.tsv")
experiment.log_table("./src/catboost_info/test_error.tsv")
Once those are complete, we can end the experiment and open Comet to see whether it was successful.
Ending experiment
experiment.end()
Comet Visualization
Once we open our Comet project page, we will find and open our project.
Project image by author
Screenshot of project by author
After pressing the prominent blue button in the middle, you will get the menu below and pick “Data Panel.”
Comet Menu by author
After opening that menu, we will find a drop-down menu called “Data Selection,” where we will see the two tables we initially uploaded.
Screenshot by author
We then pick the first table, and it will give a preview of the data. After we press it, it will appear on the original page with no panels, as seen below.
Screenshot by author.
When we press “Add” in the top right corner, we see the option to add another panel, and then we can repeat the process for the other table that we logged in.
We finally have logged information concerning a Catboost training run using Comet’s support for the Tensorboard. In this tutorial, we have successfully integrated a library that Comet does not officially support.
Top comments (0)