Jashwanth

Posted on Mar 26

SmartKNN vs Classical KNN: Regression Benchmark Results

#ai #programming #productivity #python

It’s been a while since I revisited KNN-style models for regression, so I decided to run a clean benchmark.

No tricks. No tuning wars. Just default settings and fair comparison.

This post summarizes how SmartKNN performs against classical KNN variants across multiple real-world datasets.

Benchmark Setup

14 regression datasets
All models run with default settings
No dataset-specific tuning
Final ranking based on average R² score

Models compared:

SmartKNN
KNN (Manhattan)
KNN (KDTree)
KNN (BallTree)
KNN (Distance)
KNN (Uniform)
KNN (Chebyshev)

Final Ranking (Average Performance)

Rank	Model	Avg R²	Avg RMSE	Avg MAE
1	SmartKNN	0.708249	18727.286422	10333.612683
2	KNN_manhattan	0.701272	18268.360893	10060.939069
3	KNN_balltree	0.692006	19154.367392	10651.626496
4	KNN_kdtree	0.692002	19154.366302	10651.625834
5	KNN_distance	0.691661	19154.367327	10651.626319
6	KNN_uniform	0.685943	19250.752618	10746.872163
7	KNN_chebyshev	0.668124	20885.061901	11864.294204

Key Takeaways

SmartKNN ranked #1 overall by average R²
Achieved this with default settings (no tuning)
Won 7 out of 14 datasets (highest among all models)
KNN_manhattan was the strongest baseline (6 wins)
Even before tuning, SmartKNN already leads

Dataset Win Count

Model	Dataset Wins
SmartKNN	7
KNN_manhattan	6
KNN_uniform	1
KNN_distance	0
KNN_kdtree	0
KNN_balltree	0
KNN_chebyshev	0

Per-Dataset Highlights

Instead of dumping all tables, here are some interesting cases:

Strong Wins (SmartKNN dominates)

pol

SmartKNN: 0.978 R²
KNN_manhattan: 0.955

elevator

SmartKNN: 0.726
Baselines ~0.66

brazilian_houses

SmartKNN: 0.933
Strong gap over others

Competitive Cases

NASA_PHM2008

KNN_manhattan slightly ahead
SmartKNN very close (0.568 vs 0.570)

diamonds

Manhattan wins, but margin is small
Tough / Noisy Datasets

dating_profile

SmartKNN still leads (0.304)
All models struggle overall

Interesting Observation

Even when SmartKNN doesn’t win:

It consistently stays near the top
Rarely collapses like weaker baselines
Performance is stable across datasets

What This Means

This benchmark is important for one reason:

No hyperparameter tuning was used

That means:

These are not cherry-picked results
No grid search advantage
Just raw, default behavior

And even in that setup:

SmartKNN still comes out on top.

KNN_manhattan is a very strong baseline:

Wins multiple datasets
Often very close to SmartKNN
Lower RMSE in some cases

So this is not a “destroyed everything” story.

It’s more like:

SmartKNN edges out consistently across diverse datasets with predictive performance

cross 14 regression datasets:

SmartKNN achieves the best average performance
Leads in both ranking and win count
Maintains stable results across different data types

And importantly:
This is before any dedicated tuning.

Links
NoteBook

Repo

Note

The results presented in this benchmark correspond to SmartKNN v0.2.2.

In the latest release (v0.2.3), SmartKNN introduces a new parameter: global_lambda, which integrates global dataset structure into the neighbor selection process. This enables the model to go beyond purely local distance calculations and better capture broader patterns within the data.

This enhancement is especially impactful for:

Noisy datasets
Complex or non-uniform distributions
Scenarios where traditional KNN methods struggle with local-only similarity

With this update, SmartKNN will deliver stronger and more consistent performance across certain datasets, and in many cases where it previously trailed or matched baseline methods, it is likely to take a clear lead.

Updated benchmarks with v0.2.3 will be shared soon.

DEV Community

SmartKNN vs Classical KNN: Regression Benchmark Results

Top comments (0)