The bias-variance trade-off is one of the most important aspect of machine learning projects. To approximate reality, different algorithms use mathematical and statistical techniques to optimize and best estimate model parameters. In between this optimization task, algorithms often encounter a dramatic term called error.
Errors can be divided into two as reducible and non-reducible errors. Among these two, reducible error is further divided into bias error and variance error. Irreducible error or uncertainty is associated with a natural variability in a system. Data Scientists try to reduce bias and variance errors in order to formulate an optimized and accurate model which might be taken into real world, we don’t know. However there is a trade-off between bias and variance in selecting the best model.
The term bias error shows how much a model’s predictions (y-hat) differ from the actual or expected outcome (y), over the training data. Or the model tries to over simplify the assumptions about the data which in turn underfits the training data. I would say this is purely related to the model selection procedure. Data scientists can re-sample the data and build another model and average the cost value to see the former issue still exists or not. If this average shows a significant difference from y, we should doubt there is high bias error.
The error due to variance is the amount by which the prediction, over one training set, differs from the expected predicted value, over all the training sets. When models become too complex, they will be sensitive to even the high degree variations of the training data. But overfitting the data instead of best fit, the same model may act weird with another similar dataset. As with bias, you can repeat the entire model building process multiple times.
find Part-2 of this post here
Top comments (0)