DEV Community

Jashwanth
Jashwanth

Posted on

I Benchmarked 8 ML Models on CPU (No Tuning, No Tricks). Here’s What Happened

What I Did

  • All models were tested under the same rules:
  • Default settings from their libraries
  • No hyperparameter tuning
  • Same preprocessing
  • Unique encoding for categorical features
  • No dataset-specific tricks
  • 3-Fold Cross Validation means
  • CPU only
  • Measured Single Inference P95 latency

Logistic Regression and KNN were scaled for fairness.
That’s it. No magic sauce.


What I Measured

For classification:

  • Accuracy (CV Mean)
  • Macro F1 (CV Mean)
  • Single Inference P95 (ms)

For regression:

  • CV RMSE
  • Test RMSE
  • Single Inference P95 (ms)

Because accuracy without latency is like buying a sports car without checking fuel cost.


Classification Results... What Surprised Me

Tree Models Still Dominate Accuracy

Across datasets like:

  • Adult
  • Credit Default
  • Santander
  • Fraud Detection

CatBoost, LightGBM, and XGBoost were very strong.

Example:
On Adult:

  • LightGBM → 0.8734 accuracy
  • CatBoost → 0.8726
  • XGBoost → 0.8594

Solid.

But here’s the twist.


Random Forest Is Slow. Like… Really Slow.

On almost every dataset:

RandomForest P95 latency ≈ 24–38 ms

If you serve millions of predictions per hour, that gap is not “small.”

That’s server bills.


Accuracy Differences Are Small. Latency Differences Are Massive.

Example: Credit Card Fraud

Accuracy:

  • CatBoost → 0.9996
  • RandomForest → 0.9995
  • SmartKNN → 0.9995
  • XGBoost → 0.9995

All basically identical.

Latency:

  • RandomForest → 25 ms
  • SmartKNN → 0.31 ms
  • XGBoost → 0.63 ms

Same accuracy.
80x latency difference.

That hit me.


KNN Is Fast… Until It Isn’t

Regular KNN sometimes exploded in latency.

Example:
Porto Seguro dataset:

  • KNN → 34.67 ms
  • SmartKNN → 0.35 ms

Same idea. Different implementation.

Distance methods are tricky.
In high dimensions, they behave nicely… until they don’t.

Curse of dimensionality is not theory. It’s pain.


Sometimes Simple Models Win

On Bank Marketing:

  • SmartKNN → 0.9982 accuracy
  • KNN → 0.9982
  • CatBoost → 0.9973
  • LightGBM → 0.9918

Tiny dataset-specific patterns matter.

No model wins everywhere.


Regression Results.. Same Story

Tree models are strong.

But again.. latency changes everything.

Example: Diamonds dataset

Best CV RMSE:

  • SmartKNN → 892
  • KNN → 933
  • RandomForest → 1153

But RandomForest P95 latency: 34 ms
SmartKNN: 0.19 ms

That gap is wild.


On California Housing:

Tree models dominate accuracy.

But distance models:

  • SmartKNN → 0.18 ms
  • KNN → 0.65 ms

Speed monsters.

Lower accuracy, yes.
But ultra-cheap inference.

Engineering is about tradeoffs.


Big Things I Learned

  1. No Model Wins Everywhere
  2. Accuracy Differences Are Often Tiny
  3. Default Models Are Already Very Strong
  4. P95 Latency Matters More Than You
  5. Tree Models Are Systems

So What Actually Matters?

If you’re doing Kaggle:
Maximize metric.

If you’re deploying:
Balance:

  • Accuracy
  • Latency
  • Memory
  • Predictability
  • Stability

Engineering is constraint optimization.

Not leaderboard chasing.


Top comments (0)