DEV Community

Jashwanth Thatipamula
Jashwanth Thatipamula

Posted on

How SmartKNN Learns Feature Weights Internally.

“KNN secretly behaves like metric learning once you stop being lazy with distances.”

KNN is famous for being simple.
Too simple, in fact - which is why most people ignore it after their first ML course.

But SmartKNN flips that assumption.

Instead of treating KNN as a static, distance-based dinosaur, SmartKNN treats it as a geometry problem:

Before searching neighbors… what if we reshape the entire feature space itself?

This article isn’t just a code explanation - it’s the underlying logic that makes SmartKNN behave like a lightweight metric-learning model.

Let’s walk through how SmartKNN learns feature weights internally by analyzing the actual code that powers the system.


Feature Weighting as Geometry Engineering

Most models learn parameters.
SmartKNN learns geometry.

Each feature weight defines how much that axis matters in the distance function:

distance = sqrt( Σ weight[i] * (x[i] - y[i])² )

Enter fullscreen mode Exit fullscreen mode

So learning good feature weights = learning the shape of the search space.

SmartKNN builds this shape using three independent “geometric sensors”:

  • Linear signal detector
  • Nonlinear dependency detector
  • Structural importance detector

Let’s look at how each sensor works - not just the code, but the reasoning behind the code.


Sensor A - The Linear Sensitivity Test

_univariate_mse_weights

Imagine asking each feature:

“If I let you predict the target alone, can you do it consistently?”

This is exactly what univariate regression does.

Under the hood:

  • Compute variance
  • Compute slope via covariance
  • Compute predicted y
  • Measure MSE
  • Convert to importance via 1/MSE

The key idea:
If removing a feature increases prediction error, that feature carries signal.

This gives SmartKNN a first-order approximation of which axes point toward the target.

Linear, yes.
But fast, inexpensive, and surprisingly revealing.


Sensor B - The Nonlinear Dependency Map

_mi_weights

Linear correlation misses shape.
Mutual Information does not.

This module answers:

Does this feature change the uncertainty of y, even if the pattern is nonlinear?

It works by:

  • Sampling if dataset is large
  • Binning both X[:, j] and y
  • Estimating joint probabilities
  • Computing MI via p(x,y) * log(p(x,y)/(p(x)p(y)))

MI shines when relationships are:

  • thresholded
  • curved
  • discontinuous
  • multi-modal

This is SmartKNN’s curvature detector.
It identifies axes where interesting nonlinear structure lives.


Sensor C — The Structural Importance Map

_rf_weights
If the first two sensors look at signal, this one looks at structure.

ExtraTrees can discover:

  • interactions
  • splits
  • multi-stage decision paths
  • mixed linear & nonlinear behavior

So SmartKNN asks a tree model:

If you had to cut the space into decision regions, which features would you use most?

ExtraTrees importance becomes a third vector of weights - a structural summary of how the space organizes itself.

This is SmartKNN’s topology detector.


Fusion: Turning Three Signals Into One Geometry

learn_feature_weights

Now SmartKNN blends all three vectors:

weights = α*MSE + β*MI + γ*RF
Enter fullscreen mode Exit fullscreen mode

with defaults:

α = 0.4

β = 0.3

γ = 0.3

Finally, SmartKNN normalizes the result:

safe_normalize → remove NaNs, clip noise, sum to 1

At this point, SmartKNN has built its custom metric space - learned, not assumed.


The Algorithm’s Behavior Changes Completely

Once weights are learned, distance becomes:

  • stretched along informative dimensions
  • compressed along noisy ones
  • zero for irrelevant ones (automatic feature selection)

This single weight vector does the work of:

  • dimensionality reduction
  • metric learning
  • noise suppression
  • structure amplification
  • stability improvement all before SmartKNN looks at even one neighbor.

Why This Works Better Than "Normal" Feature Selection

Most feature selectors choose features before training a model.

SmartKNN does the opposite:

It chooses features to shape the model’s geometry itself.

It’s not selecting features for a model -
it’s selecting features for the metric that defines the model.

This is closer to:

  • Mahalanobis metric learning
  • attention mechanisms
  • embedding weighting

but distilled into a fast, interpretable, classical ML approach.


Final Thoughts: SmartKNN Is KNN With a Learning Brain

KNN’s failure in high dimensions wasn’t because the idea was bad.
It was because the geometry was frozen.

SmartKNN unfreezes it.

It turns the dataset into three signals:
linear, nonlinear, structural -
then fuses them to build a custom metric space.

The result is a lightweight, interpretable, geometry-driven learner.

Not deep learning.
Not classical KNN.
Something in between.


SmartKNN didn’t just edge out KNN - it surpassed WeightedKNN
and a bunch of baseline models on OpenML datasets.
And this is only V1.
V2 is loading…

pip install smart-knn.
Enter fullscreen mode Exit fullscreen mode

Jashwanth Thatipamula - Creator of SmartKNN

Top comments (0)