Over the past weeks, we benchmarked SmartKNN against KNN, SVR, Decision Tree Regression, Random Forest, and XGBoost across 622–712 Id real datasets to test predictive accuracy, stability, and robustness.
The goal was simple:
Can a more intelligent KNN compete with modern regressors — and not just as a “baseline”?
Turns out… yes, and sometimes it punches way above its weight.
Benchmark Setup
Each model was run on 30 regression datasets with varying:
- Feature dimensionality
- Sample sizes (small → large)
- Target distribution complexity
- Noise levels
All models were run with minimal dataset-specific tuning to simulate real production usage.
Performance Highlights
Consistent Outperformance Over Classical KNN
Across most datasets, SmartKNN achieved the lowest MSE among KNN-based approaches.
Notable wins: 622, 634, 637, 644, 645, 653, 656
Meaning: the weighted neighbor distance + aggregation strategy generalizes well across different scales and feature distributions.
Stability Under Variance
SmartKNN maintains stable and high R² scores, especially in:
| Scenario | Outcome |
|---|---|
| Noisy datasets | SmartKNN > KNN, DT |
| Medium-sized datasets | SmartKNN ≈ SVR or better |
| High-variance targets | SmartKNN reduces collapse seen in KNN |
Even when SVR occasionally scored the lowest MSE, SmartKNN kept competitive performance with almost no tuning.
Classical KNN Isn’t Enough — Smart Weighting Matters
Standard KNN struggled badly on datasets like 637, 645, 697, while SmartKNN showed significantly lower error.
| Model | Avg MSE | Notes |
|---|---|---|
| SmartKNN | 2.88e⁷ | Strong, consistent |
| KNN | 2.91e⁷ | Weaker on complex/high-variance data |
The gap isn’t massive in absolute error — but SmartKNN wins reliably across datasets.
Comparison Against Other Regressors
| Model | Avg MSE | Behavior Pattern |
|---|---|---|
| SmartKNN | 2.88e⁷ | Balanced accuracy + robustness + interpretability |
| SVR | 2.47e⁷ | Very strong on clean/low-noise datasets; brittle on noisy/high-dim |
| Decision Tree | 6.12e⁷ | High variance; huge errors on outliers |
| Random Forest | Slightly better than SmartKNN | Most robust across noise and complexity |
| XGBoost | 2.897e⁷ | Dominates dataset-by-dataset |
The Surprise
Across all datasets globally, SmartKNN edges out XGBoost in average MSE:
SmartKNN -> 2.883e⁷
XGBoost -> 2.897e⁷
Not a huge difference — but a big revelation:
A KNN-based method can statistically match boosting-based regressors across Many of datasets.
But in Single datsets XGBoost almost outperformed SmartKNN and other models but the MSE of smartKNN is very close to XGBoost compared to KNN's
Pattern Observations
| Dataset Type | Best Models |
|---|---|
| Clean, small datasets | SmartKNN ≈ RF ≈ XGBoost |
| Noisy / high-variance | RF > XGBoost > SmartKNN |
| High-dimensional | XGBoost > SmartKNN |
| Extremely weird / mis-scaled targets | Everything fails without preprocessing |
| Structured & medium-noise | ** SmartKNN dominates** |
Interpretability Matters
Unlike XGBoost or Random Forest, SmartKNN is still a locally transparent model.
Benefits:
- You can inspect which neighbors influenced the prediction
- Weighted distance makes each feature contribution traceable
- Works well when explainability is a requirement
Takeaways
| If your priority is… | Choose |
|---|---|
| Best performance with little tuning | XGBoost / Random Forest |
| Balanced performance + interpretability | SmartKNN |
| Simple baseline | KNN |
| Small clean dataset | SmartKNN or SVR |
SmartKNN proves that KNN doesn’t need to stay a weak baseline.
With feature weighting + neighbor aggregation, it becomes a practical, general-purpose regression tool.
What’s Next for SmartKNN
Upcoming improvements (in active development):
- ANN + brute hybrid backend
- RL-based accuracy boost
- Interpretability reports
- Model save/load with full state
Final Thoughts
SmartKNN won’t dethrone boosting on every dataset — but:
It delivers accuracy + stability + interpretability without the cost of heavy tuning.
That makes it a solid choice for real-world datasets where consistency and explainability matter.
BenchMarks
- https://www.kaggle.com/code/jashwanththatipamula/smartknn-vs-svr-vs-decisiontrees-vs-knn
- https://www.kaggle.com/code/jashwanththatipamula/smartknn-vs-randomforest-vs-knn
- https://www.kaggle.com/code/jashwanththatipamula/smartknn-vs-xgboost-vs-knn
GitHub -https://github.com/thatipamula-jashwanth/smart-knn
- If you like it give it a ⭐
- Try SmartKNN in your next project
Jashwanth Thatipamula - Creator SmartKNN
Top comments (0)