Jashwanth

Posted on Mar 2

I Benchmarked 8 ML Models on CPU (No Tuning, No Tricks). Here’s What Happened

#datascience #machinelearning #performance #python

What I Did

All models were tested under the same rules:
Default settings from their libraries
No hyperparameter tuning
Same preprocessing
Unique encoding for categorical features
No dataset-specific tricks
3-Fold Cross Validation means
CPU only
Measured Single Inference P95 latency

Logistic Regression and KNN were scaled for fairness.
That’s it. No magic sauce.

What I Measured

For classification:

Accuracy (CV Mean)
Macro F1 (CV Mean)
Single Inference P95 (ms)

For regression:

CV RMSE
Test RMSE
Single Inference P95 (ms)

Because accuracy without latency is like buying a sports car without checking fuel cost.

Classification Results... What Surprised Me

Tree Models Still Dominate Accuracy

Across datasets like:

Adult
Credit Default
Santander
Fraud Detection

CatBoost, LightGBM, and XGBoost were very strong.

Example:
On Adult:

LightGBM → 0.8734 accuracy
CatBoost → 0.8726
XGBoost → 0.8594

Solid.

But here’s the twist.

Random Forest Is Slow. Like… Really Slow.

On almost every dataset:

RandomForest P95 latency ≈ 24–38 ms

If you serve millions of predictions per hour, that gap is not “small.”

That’s server bills.

Accuracy Differences Are Small. Latency Differences Are Massive.

Example: Credit Card Fraud

Accuracy:

CatBoost → 0.9996
RandomForest → 0.9995
SmartKNN → 0.9995
XGBoost → 0.9995

All basically identical.

Latency:

RandomForest → 25 ms
SmartKNN → 0.31 ms
XGBoost → 0.63 ms

Same accuracy.
80x latency difference.

That hit me.

KNN Is Fast… Until It Isn’t

Regular KNN sometimes exploded in latency.

Example:
Porto Seguro dataset:

KNN → 34.67 ms
SmartKNN → 0.35 ms

Same idea. Different implementation.

Distance methods are tricky.
In high dimensions, they behave nicely… until they don’t.

Curse of dimensionality is not theory. It’s pain.

Sometimes Simple Models Win

On Bank Marketing:

SmartKNN → 0.9982 accuracy
KNN → 0.9982
CatBoost → 0.9973
LightGBM → 0.9918

Tiny dataset-specific patterns matter.

No model wins everywhere.

Regression Results.. Same Story

Tree models are strong.

But again.. latency changes everything.

Example: Diamonds dataset

Best CV RMSE:

SmartKNN → 892
KNN → 933
RandomForest → 1153

But RandomForest P95 latency: 34 ms
SmartKNN: 0.19 ms

That gap is wild.

On California Housing:

Tree models dominate accuracy.

But distance models:

SmartKNN → 0.18 ms
KNN → 0.65 ms

Speed monsters.

Lower accuracy, yes.
But ultra-cheap inference.

Engineering is about tradeoffs.

Big Things I Learned

No Model Wins Everywhere
Accuracy Differences Are Often Tiny
Default Models Are Already Very Strong
P95 Latency Matters More Than You
Tree Models Are Systems

So What Actually Matters?

If you’re doing Kaggle:
Maximize metric.

If you’re deploying:
Balance:

Accuracy
Latency
Memory
Predictability
Stability

Engineering is constraint optimization.

Not leaderboard chasing.

DEV Community

I Benchmarked 8 ML Models on CPU (No Tuning, No Tricks). Here’s What Happened

Top comments (0)