Jashwanth

Posted on Dec 28, 2025

SmartKNN - Large Scale Classification Benchmarks (CPU)

#algorithms #machinelearning #performance

This release presents initial classification benchmarks for SmartKNN, evaluated on million-scale datasets with a strong focus on single-prediction p95 latency and Macro-F1 under real production constraints.

All benchmarks are:

CPU-only
Single-query inference
Non-parametric, nonlinear models
Million-row scale datasets

More benchmarks (higher-dimensional datasets, regression tasks, mixed feature spaces) will be released soon.

Datasets Used

Dataset	OpenML ID	Approx Rows	Features (D)	Task	Source
BNG (Adult)	1180	~1M	15	Classification	OpenML / Kaggle
BNG (Australian)	1205	~1M	15	Classification	OpenML / Kaggle
BNG (Credit-G)	40514	~1M	21	Classification	OpenML / Kaggle
Click Prediction (Small)	1218	~2M	12	Classification	OpenML / Kaggle
Click	45556	~1M	12	Classification	OpenML / Kaggle
Census (Augmented)	43489	~1M	15	Classification	OpenML / Kaggle

Benchmark Results

BNG (Adult) — OpenML ID 1180

Model	Accuracy	Macro-F1	Train (s)	Batch (ms)	Single Med (ms)	Single P95 (ms)
XGBoost	0.9261	0.6365	29.05	0.003	0.261	0.309
LightGBM	0.9260	0.6373	20.38	0.009	0.704	0.790
CatBoost	0.9261	0.6353	44.66	0.016	0.453	0.495
SmartKNN	0.9039	0.6641	334.10	0.061	0.424	0.468

BNG (Australian) — OpenML ID 1205

Model	Accuracy	Macro-F1	Train (s)	Batch (ms)	Single Med (ms)	Single P95 (ms)
XGBoost	0.8753	0.8723	15.97	0.003	0.274	0.356
LightGBM	0.8753	0.8724	13.96	0.010	0.704	0.800
CatBoost	0.8748	0.8717	24.72	0.001	0.356	0.403
SmartKNN	0.8473	0.8435	63.30	0.033	0.361	0.410

BNG (Credit-G) — OpenML ID 40514

Model	Accuracy	Macro-F1	Train (s)	Batch (ms)	Single Med (ms)	Single P95 (ms)
XGBoost	0.8245	0.7790	29.68	0.004	0.265	0.309
LightGBM	0.8275	0.7834	21.82	0.016	0.708	0.786
CatBoost	0.8229	0.7753	54.90	0.023	0.501	0.532
SmartKNN	0.7682	0.7085	493.94	0.069	0.518	0.559

Click Prediction (Small) — OpenML ID 1218

Model	Accuracy	Macro-F1	Train (s)	Batch (ms)	Single Med (ms)	Single P95 (ms)
XGBoost	0.8411	0.5325	26.08	0.004	0.509	0.558
LightGBM	0.8413	0.5358	24.92	0.011	0.879	0.958
CatBoost	0.8392	0.5154	47.49	0.000	0.444	0.588
SmartKNN	0.8158	0.5792	159.64	0.076	0.555	0.597

Click — OpenML ID 45556

Model	Accuracy	Macro-F1	Train (s)	Batch (ms)	Single Med (ms)	Single P95 (ms)
XGBoost	0.7521	0.7521	12.05	0.004	0.531	0.588
LightGBM	0.7520	0.7520	12.74	0.012	0.911	1.345
CatBoost	0.7504	0.7504	20.62	0.001	0.419	0.466
SmartKNN	0.7005	0.7005	43.44	0.032	0.346	0.373

Census (Augmented) — OpenML ID 43489

Model	Accuracy	Macro-F1	Train (s)	Batch (ms)	Single Med (ms)	Single P95 (ms)
XGBoost	0.8859	0.8668	32.18	0.005	0.521	0.646
LightGBM	0.8861	0.8668	15.20	0.012	0.974	1.017
CatBoost	0.8861	0.8668	61.91	0.036	0.752	0.789
SmartKNN	0.8653	0.8427	718.21	0.107	0.699	0.811

Notes

SmartKNN is a non-parametric, instance-based model with ANN acceleration.
Benchmarks emphasize tail latency (p95) rather than average inference time.
All results are reproducible using publicly available datasets.

Further benchmarks covering regression tasks and higher-dimensional datasets will be released soon.

Positioning & Claim (Carefully Worded)

SmartKNN demonstrates state-of-the-art p95 single-prediction latency on CPU among non-parametric, nonlinear models at million-scale data sizes, while preserving instance-based decision behavior.

While tree-based models remain strong on average latency and accuracy, SmartKNN shows that KNN-style models can be competitive in tail latency, which is often the dominant concern in real production systems.

To our knowledge, SmartKNN is among the fastest CPU-only nonlinear, instance-based models evaluated at this scale with reported p95 single-query latency.

Reproducibility & Community Benchmarks

We strongly encourage the community to:

Run these benchmarks on different hardware
Test alternative ANN configurations
Compare against additional models
Share results publicly

If you:

Find a performance regression - open a GitHub Issue
Have questions, ideas, or improvements - start a GitHub Discussion
Run new benchmarks - post your results

Community validation and feedback will directly shape future releases.

Links

To know more about SmartKNN:

DEV Community