DEV Community

Jashwanth
Jashwanth

Posted on

SmartKNN vs Classical KNN: Regression Benchmark Results

It’s been a while since I revisited KNN-style models for regression, so I decided to run a clean benchmark.

No tricks. No tuning wars. Just default settings and fair comparison.

This post summarizes how SmartKNN performs against classical KNN variants across multiple real-world datasets.


Benchmark Setup

  • 14 regression datasets
  • All models run with default settings
  • No dataset-specific tuning
  • Final ranking based on average R² score

Models compared:

  • SmartKNN
  • KNN (Manhattan)
  • KNN (KDTree)
  • KNN (BallTree)
  • KNN (Distance)
  • KNN (Uniform)
  • KNN (Chebyshev)

Final Ranking (Average Performance)

Rank Model Avg R² Avg RMSE Avg MAE
1 SmartKNN 0.708249 18727.286422 10333.612683
2 KNN_manhattan 0.701272 18268.360893 10060.939069
3 KNN_balltree 0.692006 19154.367392 10651.626496
4 KNN_kdtree 0.692002 19154.366302 10651.625834
5 KNN_distance 0.691661 19154.367327 10651.626319
6 KNN_uniform 0.685943 19250.752618 10746.872163
7 KNN_chebyshev 0.668124 20885.061901 11864.294204

Key Takeaways

  • SmartKNN ranked #1 overall by average R²
  • Achieved this with default settings (no tuning)
  • Won 7 out of 14 datasets (highest among all models)
  • KNN_manhattan was the strongest baseline (6 wins)
  • Even before tuning, SmartKNN already leads

Dataset Win Count

Model Dataset Wins
SmartKNN 7
KNN_manhattan 6
KNN_uniform 1
KNN_distance 0
KNN_kdtree 0
KNN_balltree 0
KNN_chebyshev 0

Per-Dataset Highlights

Instead of dumping all tables, here are some interesting cases:

  • Strong Wins (SmartKNN dominates)

pol

  • SmartKNN: 0.978 R²
  • KNN_manhattan: 0.955

elevator

  • SmartKNN: 0.726
  • Baselines ~0.66

brazilian_houses

  • SmartKNN: 0.933
  • Strong gap over others

Competitive Cases

NASA_PHM2008

  • KNN_manhattan slightly ahead
  • SmartKNN very close (0.568 vs 0.570)

diamonds

  • Manhattan wins, but margin is small
  • Tough / Noisy Datasets

dating_profile

  • SmartKNN still leads (0.304)
  • All models struggle overall

Interesting Observation

Even when SmartKNN doesn’t win:

  • It consistently stays near the top
  • Rarely collapses like weaker baselines
  • Performance is stable across datasets

What This Means

This benchmark is important for one reason:

No hyperparameter tuning was used

That means:

  • These are not cherry-picked results
  • No grid search advantage
  • Just raw, default behavior

And even in that setup:

SmartKNN still comes out on top.


KNN_manhattan is a very strong baseline:

  • Wins multiple datasets
  • Often very close to SmartKNN
  • Lower RMSE in some cases

So this is not a “destroyed everything” story.

It’s more like:

SmartKNN edges out consistently across diverse datasets with predictive performance


cross 14 regression datasets:

  • SmartKNN achieves the best average performance
  • Leads in both ranking and win count
  • Maintains stable results across different data types

And importantly:
This is before any dedicated tuning.


Links
NoteBook

Repo

Note

The results presented in this benchmark correspond to SmartKNN v0.2.2.

In the latest release (v0.2.3), SmartKNN introduces a new parameter: global_lambda, which integrates global dataset structure into the neighbor selection process. This enables the model to go beyond purely local distance calculations and better capture broader patterns within the data.

This enhancement is especially impactful for:

  • Noisy datasets
  • Complex or non-uniform distributions
  • Scenarios where traditional KNN methods struggle with local-only similarity

With this update, SmartKNN will deliver stronger and more consistent performance across certain datasets, and in many cases where it previously trailed or matched baseline methods, it is likely to take a clear lead.

Updated benchmarks with v0.2.3 will be shared soon.

Top comments (0)