The Bias Variance Tradeoff

Foundational to any data science curriculum is the introduction of the terms bias and variance, and subsequently the trade-off that exists between the two. As machine learning continues to grow it is imperative that we understand these concepts, as they directly effect the predictions we make and the business value we can derive from our generated models. While machine learning may seem simple, one of the more difficult parts is optimizing your models but sometimes optimization can lead to over-fitting and if your model is too simple it may be under-fitting your data. The inevitable trade-off between these two aspects will greatly impact the validity of your model, and the predictions you make. But what is bias, what is variance, and what is this trade-off?

Bias

When we speak of bias, we aren’t talking about the standard bias us humans are susceptible to. Instead when it comes to machine learning, we are actually referring to the difference between our model’s prediction and the expected value (Prediction - Reality). When a model has high bias it consistently is making wrong predictions, and isn’t considering the complexity of our data. A model with high bias is under-fitting our data and consistently does so after training as well on testing/validation data.

We can identify if our model has high bias if the following occur:

We tend to get high training errors.
The validation error or test error will be similar to the training error.

We can compensate for high bias by doing the following:

We need to gather more input features, or generate new ones using feature engineering techniques.
We can add polynomial features in order to increase the complexity.
If we are using any regularization terms in our model, we can try to minimize them.

Variance

Similar to the statistical term, variance refers to the variability of our model’s predictions. A model with high variance does not generalize well, and instead pays a lot of attention to our training data. What ends up happening is we get a model which performs very well during training, but when introduced to our testing/validation or any unseen data, we see very high error rates. One way to think about it is like a travel route, if you were to take the route alone and map it onto a completely different area, it wouldn’t work as it only fits the particular origin/destination it was made for. In our case we aren’t making routes, but the concept still holds and we hope to create models which generalize well to unseen data similar to the data used during training.

We can identify whether the model has high variance if:

We tend to get low training error
The validation error or test error will be very high.

We can fix high variance by:

Gathering more training data, so that the model can learn more based on the patterns rather than the noise.
We can even try to reduce the input features or do feature selection, reducing model complexity.
If we are using any regularization terms in our model, we can try to maximize them.

The Trade-Off

Now knowing what bias and variance is, it is key to understand that when we minimize one, we are maximizing the other. A model with high bias will have low variance, and vice versa. Given a model, with respect to bias and variance, we can say a model’s error is the sum of three parts, the bias, variance, and random noise (E[x] = bias + variance + noise ).

The bias–variance decomposition is a way of analyzing a learning algorithm’s expected generalization error with respect to a particular problem as a sum of three terms, the bias, variance, and a quantity called the irreducible error, resulting from noise in the problem itself.

— Wikipedia ¹

The trade-off therefore is determining the optimal bias and variance levels, so as to minimize our overall error. With the steps I’ve listed previously for minimizing either of the two, one has to iteratively improve on the models generated until we arrive at one which is relatively balanced between bias and variance. If we don’t balance these two terms out, we’ll end up with a model that either under or over fits our data, which doesn’t give us any value when it comes to making predictions.

https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff ↩︎

DEV Community

The Bias Variance Tradeoff

Bias

Variance

The Trade-Off

Top comments (0)