XGBoost vs LightGBM vs CatBoost: Kaggle Tabular Benchmark

#xgboost #lightgbm #catboost #kaggle

The 3-Way Race Nobody Expected

I ran the same Kaggle-style tabular dataset through XGBoost, LightGBM, and CatBoost with default settings. LightGBM trained in 2.3 seconds. XGBoost took 18.7 seconds. CatBoost? 47.2 seconds.

But here's the twist: CatBoost won on validation AUC by 0.008 points.

This mirrors what I see in Kaggle competitions — speed doesn't always correlate with leaderboard position. The library that takes longest to train often squeezes out that last 0.5% accuracy that separates gold from silver medals. But is it worth the wait?

I'll benchmark all three on a real dataset (Home Credit Default Risk), measure training time, memory usage, and predictive performance, then show you which hyperparameters actually matter. The results challenge the conventional wisdom that "LightGBM is always faster and good enough."