DEV Community

Cover image for Bias and Variance
Shlok Kumar
Shlok Kumar

Posted on

Bias and Variance

Evaluating a machine learning model involves various metrics, such as Mean Squared Error (MSE) for regression and Precision, Recall, and ROC for classification problems. Among these evaluation metrics, bias and variance are critical concepts that help in parameter tuning and selecting well-fitted models.

What is Bias?

Bias refers to the error introduced by approximating a real-world problem, which may be complex, by a simplified model. It occurs when the model makes incorrect assumptions about the data.

In statistical terms, bias is defined as the difference between the expected prediction of a model and the actual value. Mathematically, it can be expressed as:

Bias(Ŷ) = E(Ŷ) - Y
Enter fullscreen mode Exit fullscreen mode

Where:

  • ( Y ) is the true value of the parameter.
  • ( Ŷ ) is the estimator based on a sample.

Characteristics of Bias

  • Low Bias: Indicates fewer assumptions are made, and the model closely matches the training dataset.
  • High Bias: Indicates more assumptions lead to underfitting, where the model fails to capture the underlying trend in the data. For example, a linear regression model might exhibit high bias if the data has a non-linear relationship.

Ways to Reduce High Bias

  1. Use a More Complex Model: Increase model complexity by adding layers in neural networks or using polynomial regression for non-linear datasets.
  2. Increase the Number of Features: Adding relevant features can help the model capture underlying patterns better.
  3. Reduce Regularization: If the model has high bias, reducing or removing regularization can improve performance.
  4. Increase the Size of the Training Data: More data can provide the model with additional examples to learn from.

What is Variance?

Variance measures how much a model's predictions fluctuate when trained on different subsets of the training data. High variance indicates that the model is sensitive to small changes in the training data, which can lead to overfitting.

The variance can be expressed mathematically as:

Variance = E[(Ŷ - E[Ŷ])²]
Enter fullscreen mode Exit fullscreen mode

Where ( E[Ŷ] ) is the expected value of the predicted values.

Characteristics of Variance

  • Low Variance: The model produces consistent estimates across different training sets but may underfit the data.
  • High Variance: The model is overly complex and fits the training data too closely, resulting in poor performance on unseen data.

Ways to Reduce Variance

  1. Cross-Validation: This technique helps in identifying overfitting and tuning hyperparameters.
  2. Feature Selection: Choosing only relevant features can decrease model complexity and variance.
  3. Regularization: Techniques like L1 and L2 regularization can help control variance.
  4. Ensemble Methods: Combining multiple models can improve generalization performance.
  5. Simplifying the Model: Reducing the number of parameters or layers in a neural network can help lower variance.
  6. Early Stopping: This technique stops training when performance on a validation set starts to degrade, preventing overfitting.

Examples of Bias and Variance in Machine Learning Algorithms

Algorithm Bias Variance
Linear Regression High Bias Low Variance
Decision Tree Low Bias High Variance
Random Forest Low Bias High Variance
Bagging Low Bias High Variance

Understanding bias and variance is essential for building robust machine learning models. By managing these two components, you can improve your model's performance and ensure it generalizes well to new, unseen data.

For more content, follow me at —  https://linktr.ee/shlokkumar2303

Top comments (0)