DEV Community

Cover image for Curse of dimensionality...
Chakraborty Arya
Chakraborty Arya

Posted on

Curse of dimensionality...

Prerequisites : Familiarity with KNN

Most of us know that, more features(data) leads to more accuracy for a particular Machine Learning model. But for K Nearest Neighbours Classifier, things are a bit different. If it finds an irrelevant feature, it will still use it as a useful data, and eventually lead to noise/fallacy.
There is another instance. Let's suppose there's a data with 2 or more features with relatively the same meaning/values. Then our classifier will treat 2 features as seperate, and that feature will get eventually more weightage/importance. This can also lead to bad results.

Thus, one way is to assign weights to our features.
Instead of using summation of the Euclidean/Manhattan distance as our parameter, we should use summation of weight[i]*distance(between 2 data-points) as our parameter. Calculating the weight is not really difficult. We can assign random weights to the respective feature value. Then we can calculate the best suited weight[i] with the help of Cost(error) function and Gradient Descent to minimise the cost.

Another way is to use feature selection.
Generally in KNN, we use backward elemination for this purpose. It basically means that we calculate the accuracy of the ML model while keeping the feature and removing the feature. Accordingly it keeps or eleminates the feature(s) based on higher accuracy value. This is easier to implement for a dataset, than the previous one.

Question :
Consider the dataset given below :-

Image description

If we decide to give weights to manage our features in KNN and we have our data as shown below, what might possibly be the weights assigned to feature1 and feature2? There are other features also present in the data-set which are not shown for clarity purposes. Assume that weights vary between 0 and 100. Max weight is 100 and min weight is 0.

Answer - As we can notice that both the features are having similar values we give importance to only one feature and assign less weight to the other. Any one of the two features can be given higher weight.

Links :

Top comments (0)