It’s been a while since I revisited KNN-style models for regression, so I decided to run a clean benchmark.
No tricks. No tuning wars. Just default settings and fair comparison.
This post summarizes how SmartKNN performs against classical KNN variants across multiple real-world datasets.
Benchmark Setup
- 14 regression datasets
- All models run with default settings
- No dataset-specific tuning
- Final ranking based on average R² score
Models compared:
- SmartKNN
- KNN (Manhattan)
- KNN (KDTree)
- KNN (BallTree)
- KNN (Distance)
- KNN (Uniform)
- KNN (Chebyshev)
Final Ranking (Average Performance)
| Rank | Model | Avg R² | Avg RMSE | Avg MAE |
|---|---|---|---|---|
| 1 | SmartKNN | 0.708249 | 18727.286422 | 10333.612683 |
| 2 | KNN_manhattan | 0.701272 | 18268.360893 | 10060.939069 |
| 3 | KNN_balltree | 0.692006 | 19154.367392 | 10651.626496 |
| 4 | KNN_kdtree | 0.692002 | 19154.366302 | 10651.625834 |
| 5 | KNN_distance | 0.691661 | 19154.367327 | 10651.626319 |
| 6 | KNN_uniform | 0.685943 | 19250.752618 | 10746.872163 |
| 7 | KNN_chebyshev | 0.668124 | 20885.061901 | 11864.294204 |
Key Takeaways
- SmartKNN ranked #1 overall by average R²
- Achieved this with default settings (no tuning)
- Won 7 out of 14 datasets (highest among all models)
- KNN_manhattan was the strongest baseline (6 wins)
- Even before tuning, SmartKNN already leads
Dataset Win Count
| Model | Dataset Wins |
|---|---|
| SmartKNN | 7 |
| KNN_manhattan | 6 |
| KNN_uniform | 1 |
| KNN_distance | 0 |
| KNN_kdtree | 0 |
| KNN_balltree | 0 |
| KNN_chebyshev | 0 |
Per-Dataset Highlights
Instead of dumping all tables, here are some interesting cases:
- Strong Wins (SmartKNN dominates)
pol
- SmartKNN: 0.978 R²
- KNN_manhattan: 0.955
elevator
- SmartKNN: 0.726
- Baselines ~0.66
brazilian_houses
- SmartKNN: 0.933
- Strong gap over others
Competitive Cases
NASA_PHM2008
- KNN_manhattan slightly ahead
- SmartKNN very close (0.568 vs 0.570)
diamonds
- Manhattan wins, but margin is small
- Tough / Noisy Datasets
dating_profile
- SmartKNN still leads (0.304)
- All models struggle overall
Interesting Observation
Even when SmartKNN doesn’t win:
- It consistently stays near the top
- Rarely collapses like weaker baselines
- Performance is stable across datasets
What This Means
This benchmark is important for one reason:
No hyperparameter tuning was used
That means:
- These are not cherry-picked results
- No grid search advantage
- Just raw, default behavior
And even in that setup:
SmartKNN still comes out on top.
KNN_manhattan is a very strong baseline:
- Wins multiple datasets
- Often very close to SmartKNN
- Lower RMSE in some cases
So this is not a “destroyed everything” story.
It’s more like:
SmartKNN edges out consistently across diverse datasets with predictive performance
cross 14 regression datasets:
- SmartKNN achieves the best average performance
- Leads in both ranking and win count
- Maintains stable results across different data types
And importantly:
This is before any dedicated tuning.
Links
NoteBook
Note
The results presented in this benchmark correspond to SmartKNN v0.2.2.
In the latest release (v0.2.3), SmartKNN introduces a new parameter: global_lambda, which integrates global dataset structure into the neighbor selection process. This enables the model to go beyond purely local distance calculations and better capture broader patterns within the data.
This enhancement is especially impactful for:
- Noisy datasets
- Complex or non-uniform distributions
- Scenarios where traditional KNN methods struggle with local-only similarity
With this update, SmartKNN will deliver stronger and more consistent performance across certain datasets, and in many cases where it previously trailed or matched baseline methods, it is likely to take a clear lead.
Updated benchmarks with v0.2.3 will be shared soon.
Top comments (0)