Too Simple or Too Complex? The Bias–Variance Tradeoff in Machine Learning

#ai #machinelearning #discuss #computerscience

The central question in supervised machine learning is not whether a model can fit the data it has seen — almost any model can do that. The real question is whether what it has learned will hold when the data changes. That distinction is what the bias-variance tradeoff makes precise, and understanding it separates a practitioner who can build a model from one who can diagnose why it behaves the way it does.

Bias is the extent to which a model's average prediction deviates from the true value it is intended to estimate. A high-bias model has learned a representation too simple for the problem — a straight line fitted through data that actually curves. It will be consistently wrong, not randomly wrong. Variance is the other side of the same problem—not systematic error, but instability. A model that has overtrained on a specific dataset will shift its predictions significantly if even a handful of training examples change, because it has encoded the particular quirks of that sample rather than the underlying pattern.

The tradeoff arises because reducing one tends to raise the other. A more expressive model — one with more parameters, more layers, or greater flexibility in how it draws decision boundaries — can reduce bias by capturing finer relationships in the data. But the same expressiveness that allows it to fit the training signal also allows it to fit the training noise. As model complexity rises, variance climbs with it. The total expected prediction error can be decomposed into three quantities: squared bias, variance, and irreducible noise — the randomness baked into the data itself that no model can eliminate.

Consider the range this creates. A linear regression model fitted to a nonlinear dataset exhibits high bias: the model is too constrained to reflect the true relationship between inputs and outputs. Push toward maximum complexity — a decision tree with no depth limit, or a polynomial of very high degree — and the curve fits every data point, including the random noise surrounding each one. Training error collapses toward zero; generalisation error climbs. Neither extreme is useful. The goal is the complexity level at which the sum of bias and variance is minimised, which varies with dataset size, noise level, and data distribution.

Regularisation addresses this directly. Ridge regression adds a penalty proportional to the squared magnitude of the model's coefficients, discouraging large weights and thereby reducing variance at the cost of a modest bias increase. Ensemble methods exploit the tradeoff differently: bagging trains multiple models on bootstrapped subsets and averages their predictions, which reduces variance without substantially affecting bias. Boosting moves in the opposite direction — sequentially training models to correct one another's errors, progressively reducing bias at the cost of increased variance if run too long.

The bias-variance tradeoff is not a problem to be solved. It is a constraint to be managed. Every modelling decision — model family, regularisation strength, training set size, feature representation — implicitly shifts the balance. Recognising how and in which direction is what allows those choices to be made deliberately rather than by trial and error.

DEV Community

Too Simple or Too Complex? The Bias–Variance Tradeoff in Machine Learning

Top comments (0)