DEV Community

Cover image for Bias & Variance
Nilavya Das
Nilavya Das

Posted on

Bias & Variance

In machine learning, these 2 words are mostly used. So today let’s talk about these terms what do you mean by these terms and why these 2 terms are so important in machine learning ??

So, before starting with bias and variance lets talk about underfitting, overfitting and good balance model means in these models.

So from the above, we can see that it is polynomial regression but we have used linear regression and therefore the error in both the train and test set will be high so we can say that the graph is wrongly plotted so we can say that it is underfitting the data.

In the above graph, we can see that we have used polynomial reg and the degree of poly used is 4 so the line passes through every point and it perfectly fits the data but there is a drawback that in this model if we give any new data or test data or cross-validation data the model won’t predict the correct value as the model has memorised the data and it will give a high error in a test or cross-validation set so it won’t be considered as a good fit so we called it Overfitting.

So up to now, we have a good idea of what is overfitting and underfitting so how to train a perfect model with low error in both test set and cross-validation set ?? So lets now have a look at the above diagram we can see that in this model all the points are not passed through the line nor that none of the points have passed some points have passed whereas some have not. So in this model error rate in train set is lower and in the test set or cross-validation data the error rate will be low so this model is called a well-balanced model.


So lets now come to bias and variance.

what do you mean by bias and variance?
So bias is nothing but the error in test dataset and variance is the error in train dataset so now let’s take a look in the above examples and relate the things, in underfitting we have high error in both test data and train data so we can say that it has high variance and high bias, and in overfitting, we know it has a low error in train data and high error in test data so we can say that it has high variance and low bias and a good model has a low error in both train and test data so we can say that it has low bias and low variance.

Top comments (0)