“KNN secretly behaves like metric learning once you stop being lazy with distances.”
KNN is famous for being simple.
Too simple, in fact - which is why most people ignore it after their first ML course.
But SmartKNN flips that assumption.
Instead of treating KNN as a static, distance-based dinosaur, SmartKNN treats it as a geometry problem:
Before searching neighbors… what if we reshape the entire feature space itself?
This article isn’t just a code explanation - it’s the underlying logic that makes SmartKNN behave like a lightweight metric-learning model.
Let’s walk through how SmartKNN learns feature weights internally by analyzing the actual code that powers the system.
Feature Weighting as Geometry Engineering
Most models learn parameters.
SmartKNN learns geometry.
Each feature weight defines how much that axis matters in the distance function:
distance = sqrt( Σ weight[i] * (x[i] - y[i])² )
So learning good feature weights = learning the shape of the search space.
SmartKNN builds this shape using three independent “geometric sensors”:
- Linear signal detector
- Nonlinear dependency detector
- Structural importance detector
Let’s look at how each sensor works - not just the code, but the reasoning behind the code.
Sensor A - The Linear Sensitivity Test
_univariate_mse_weights
Imagine asking each feature:
“If I let you predict the target alone, can you do it consistently?”
This is exactly what univariate regression does.
Under the hood:
- Compute variance
- Compute slope via covariance
- Compute predicted y
- Measure MSE
- Convert to importance via 1/MSE
The key idea:
If removing a feature increases prediction error, that feature carries signal.
This gives SmartKNN a first-order approximation of which axes point toward the target.
Linear, yes.
But fast, inexpensive, and surprisingly revealing.
Sensor B - The Nonlinear Dependency Map
_mi_weights
Linear correlation misses shape.
Mutual Information does not.
This module answers:
Does this feature change the uncertainty of y, even if the pattern is nonlinear?
It works by:
- Sampling if dataset is large
- Binning both X[:, j] and y
- Estimating joint probabilities
- Computing MI via p(x,y) * log(p(x,y)/(p(x)p(y)))
MI shines when relationships are:
- thresholded
- curved
- discontinuous
- multi-modal
This is SmartKNN’s curvature detector.
It identifies axes where interesting nonlinear structure lives.
Sensor C — The Structural Importance Map
_rf_weights
If the first two sensors look at signal, this one looks at structure.
ExtraTrees can discover:
- interactions
- splits
- multi-stage decision paths
- mixed linear & nonlinear behavior
So SmartKNN asks a tree model:
If you had to cut the space into decision regions, which features would you use most?
ExtraTrees importance becomes a third vector of weights - a structural summary of how the space organizes itself.
This is SmartKNN’s topology detector.
Fusion: Turning Three Signals Into One Geometry
learn_feature_weights
Now SmartKNN blends all three vectors:
weights = α*MSE + β*MI + γ*RF
with defaults:
α = 0.4
β = 0.3
γ = 0.3
Finally, SmartKNN normalizes the result:
safe_normalize → remove NaNs, clip noise, sum to 1
At this point, SmartKNN has built its custom metric space - learned, not assumed.
The Algorithm’s Behavior Changes Completely
Once weights are learned, distance becomes:
- stretched along informative dimensions
- compressed along noisy ones
- zero for irrelevant ones (automatic feature selection)
This single weight vector does the work of:
- dimensionality reduction
- metric learning
- noise suppression
- structure amplification
- stability improvement all before SmartKNN looks at even one neighbor.
Why This Works Better Than "Normal" Feature Selection
Most feature selectors choose features before training a model.
SmartKNN does the opposite:
It chooses features to shape the model’s geometry itself.
It’s not selecting features for a model -
it’s selecting features for the metric that defines the model.
This is closer to:
- Mahalanobis metric learning
- attention mechanisms
- embedding weighting
but distilled into a fast, interpretable, classical ML approach.
Final Thoughts: SmartKNN Is KNN With a Learning Brain
KNN’s failure in high dimensions wasn’t because the idea was bad.
It was because the geometry was frozen.
SmartKNN unfreezes it.
It turns the dataset into three signals:
linear, nonlinear, structural -
then fuses them to build a custom metric space.
The result is a lightweight, interpretable, geometry-driven learner.
Not deep learning.
Not classical KNN.
Something in between.
SmartKNN didn’t just edge out KNN - it surpassed WeightedKNN
and a bunch of baseline models on OpenML datasets.
And this is only V1.
V2 is loading…
pip install smart-knn.
Jashwanth Thatipamula - Creator of SmartKNN
Top comments (0)