DEV Community

Jashwanth Thatipamula
Jashwanth Thatipamula

Posted on

SmartKNN Regression Benchmarks — How Far Can a Smarter KNN Really Go?

Over the past weeks, we benchmarked SmartKNN against KNN, SVR, Decision Tree Regression, Random Forest, and XGBoost across 622–712 Id real datasets to test predictive accuracy, stability, and robustness.

The goal was simple:

Can a more intelligent KNN compete with modern regressors — and not just as a “baseline”?

Turns out… yes, and sometimes it punches way above its weight.


Benchmark Setup

Each model was run on 30 regression datasets with varying:

  • Feature dimensionality
  • Sample sizes (small → large)
  • Target distribution complexity
  • Noise levels

All models were run with minimal dataset-specific tuning to simulate real production usage.


Performance Highlights

Consistent Outperformance Over Classical KNN

Across most datasets, SmartKNN achieved the lowest MSE among KNN-based approaches.

Notable wins: 622, 634, 637, 644, 645, 653, 656

Meaning: the weighted neighbor distance + aggregation strategy generalizes well across different scales and feature distributions.


Stability Under Variance

SmartKNN maintains stable and high R² scores, especially in:

Scenario Outcome
Noisy datasets SmartKNN > KNN, DT
Medium-sized datasets SmartKNN ≈ SVR or better
High-variance targets SmartKNN reduces collapse seen in KNN

Even when SVR occasionally scored the lowest MSE, SmartKNN kept competitive performance with almost no tuning.


Classical KNN Isn’t Enough — Smart Weighting Matters

Standard KNN struggled badly on datasets like 637, 645, 697, while SmartKNN showed significantly lower error.

Model Avg MSE Notes
SmartKNN 2.88e⁷ Strong, consistent
KNN 2.91e⁷ Weaker on complex/high-variance data

The gap isn’t massive in absolute error — but SmartKNN wins reliably across datasets.


Comparison Against Other Regressors

Model Avg MSE Behavior Pattern
SmartKNN 2.88e⁷ Balanced accuracy + robustness + interpretability
SVR 2.47e⁷ Very strong on clean/low-noise datasets; brittle on noisy/high-dim
Decision Tree 6.12e⁷ High variance; huge errors on outliers
Random Forest Slightly better than SmartKNN Most robust across noise and complexity
XGBoost 2.897e⁷ Dominates dataset-by-dataset

The Surprise

Across all datasets globally, SmartKNN edges out XGBoost in average MSE:
SmartKNN -> 2.883e⁷
XGBoost -> 2.897e⁷

Not a huge difference — but a big revelation:

A KNN-based method can statistically match boosting-based regressors across Many of datasets.
But in Single datsets XGBoost almost outperformed SmartKNN and other models but the MSE of smartKNN is very close to XGBoost compared to KNN's


Pattern Observations

Dataset Type Best Models
Clean, small datasets SmartKNN ≈ RF ≈ XGBoost
Noisy / high-variance RF > XGBoost > SmartKNN
High-dimensional XGBoost > SmartKNN
Extremely weird / mis-scaled targets Everything fails without preprocessing
Structured & medium-noise ** SmartKNN dominates**

Interpretability Matters

Unlike XGBoost or Random Forest, SmartKNN is still a locally transparent model.

Benefits:

  • You can inspect which neighbors influenced the prediction
  • Weighted distance makes each feature contribution traceable
  • Works well when explainability is a requirement

Takeaways

If your priority is… Choose
Best performance with little tuning XGBoost / Random Forest
Balanced performance + interpretability SmartKNN
Simple baseline KNN
Small clean dataset SmartKNN or SVR

SmartKNN proves that KNN doesn’t need to stay a weak baseline.

With feature weighting + neighbor aggregation, it becomes a practical, general-purpose regression tool.


What’s Next for SmartKNN

Upcoming improvements (in active development):

  • ANN + brute hybrid backend
  • RL-based accuracy boost
  • Interpretability reports
  • Model save/load with full state

Final Thoughts

SmartKNN won’t dethrone boosting on every dataset — but:

It delivers accuracy + stability + interpretability without the cost of heavy tuning.

That makes it a solid choice for real-world datasets where consistency and explainability matter.


BenchMarks

  1. https://www.kaggle.com/code/jashwanththatipamula/smartknn-vs-svr-vs-decisiontrees-vs-knn
  2. https://www.kaggle.com/code/jashwanththatipamula/smartknn-vs-randomforest-vs-knn
  3. https://www.kaggle.com/code/jashwanththatipamula/smartknn-vs-xgboost-vs-knn

GitHub -https://github.com/thatipamula-jashwanth/smart-knn

  • If you like it give it a ⭐
  • Try SmartKNN in your next project

Jashwanth Thatipamula - Creator SmartKNN

Top comments (0)