DEV Community

Cover image for Machine Learning Basics: Bias, Variance, and Regularization with Intuition and Formulas
likhitha manikonda
likhitha manikonda

Posted on

Machine Learning Basics: Bias, Variance, and Regularization with Intuition and Formulas

Machine Learning (ML) is about teaching computers to learn patterns from data. But models often fail to make good predictions. The main reasons are bias and variance. To balance them, we use regularization. Let’s break this down step by step.


🧩 Bias (Too Simple)

  • Bias is the error caused when a model makes overly simple assumptions.
  • Example: Predicting house prices using only the number of rooms.
  • High bias → underfitting: The model performs poorly on both training and test data because it hasn’t learned enough.

👉 Analogy: Bias is like a student who always answers “42” no matter the question. Simple, but wrong most of the time.


🎭 Variance (Too Sensitive)

  • Variance is the error caused when a model is too sensitive to training data.
  • Example: A student memorizes last year’s exam questions word‑for‑word. When the teacher changes the questions, the student fails.
  • High variance → overfitting: The model does great on training data but fails on new data.

👉 Analogy: Variance is like a student who copies every detail of the textbook but struggles when asked to explain in their own words.


⚖️ Bias–Variance Tradeoff Formula

The total error can be broken down as:

Where:

  • Bias² = error from oversimplification.
  • Variance = error from sensitivity to training data.
  • σ² = irreducible error (noise in data).

👉 Analogy: Imagine aiming arrows at a target.

  • Bias² = how far the arrows are from the bullseye (systematic error).
  • Variance = how spread out the arrows are (consistency).
  • σ² = wind blowing unpredictably (noise you can’t control).

📊 Training Error vs Test Error

We diagnose bias and variance by comparing errors:

Situation Training Error Test Error Diagnosis
High bias High High Underfitting
High variance Low High Overfitting
Balanced Low Low Just right

👉 Analogy: Training error is how well you do on practice exams. Test error is how well you do on the real exam. If you ace practice but fail the real one, you’re overfitting.


📈 Learning Curves

Learning curves show how errors change as you add more training data:

  • Training error (J_train): Mistakes on the data the model learned from.
  • Cross-validation error (J_cv): Mistakes on unseen data.

Key patterns:

  • As training set size increases:
    • Training error goes up (harder to fit everything perfectly).
    • Cross-validation error goes down (model generalizes better).

Diagnosing bias vs variance:

  • High bias (underfitting): Both J_train and J_cv flatten out at high error. Adding more data doesn’t help.
  • High variance (overfitting): J_train is very low, J_cv much higher. Adding more data helps J_cv come down closer to J_train.

👉 Analogy:

  • High bias = studying only one chapter, so you always miss key topics.
  • High variance = memorizing practice questions but failing when the exam changes.

🛠️ Fixing Bias vs Variance

Different strategies help depending on the problem:

  • High variance fixes (overfitting):

    • Get more training data.
    • Use fewer features (simplify the model).
    • Increase regularization (higher λ).
  • High bias fixes (underfitting):

    • Add more features (give the model more information).
    • Add polynomial features (make the model more flexible).
    • Decrease regularization (lower λ).

👉 Rule of thumb:

  • High variance → simplify or add more data.
  • High bias → make the model more powerful.

🛠️ Regularization Formulas

1. Linear Regression Loss (no regularization)

👉 Analogy: Measuring how far your guesses are from the correct answers, averaged across all questions.


2. Ridge Regression (L2 Regularization)

👉 Analogy: Teacher says “don’t use too many fancy words.” Keeps writing simple and consistent.


3. Lasso Regression (L1 Regularization)

👉 Analogy: Cleaning your room — throw away things you don’t need. Keeps only the most important features.


4. Elastic Net (Combination of L1 + L2)

👉 Analogy: Dieting with two rules: eat fewer sweets (L1) and smaller portions overall (L2).


🌦️ Everyday Analogy for λ (Lambda)

  • Small λ → model is free to be complex (risk of overfitting).
  • Large λ → model is forced to be simple (risk of underfitting).

👉 Analogy: λ is like the volume knob on a speaker. Too low → noisy and chaotic. Too high → too quiet. Just right → clear sound.


🖥️ Python Demo

import numpy as np
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Generate sample data
np.random.seed(0)
X = np.random.rand(100, 1) * 10
y = 2 * X.squeeze() + np.random.randn(100) * 2  # noisy linear relation

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# Linear Regression (no regularization)
lr = LinearRegression().fit(X_train, y_train)

# Ridge Regression (L2 regularization)
ridge = Ridge(alpha=1.0).fit(X_train, y_train)

# Lasso Regression (L1 regularization)
lasso = Lasso(alpha=0.1).fit(X_train, y_train)

print("Linear Regression Test Error:", mean_squared_error(y_test, lr.predict(X_test)))
print("Ridge Regression Test Error:", mean_squared_error(y_test, ridge.predict(X_test)))
print("Lasso Regression Test Error:", mean_squared_error(y_test, lasso.predict(X_test)))
Enter fullscreen mode Exit fullscreen mode

📉 Visualizing the Tradeoff (Imagine This)

Picture a U‑shaped curve:

  • On the left: High bias → model too simple, high error.
  • On the right: High variance → model too complex, high error.
  • In the middle: Sweet spot → balanced bias and variance, lowest error.
  • Regularization (λ) helps push the model toward this middle ground.

🚀 Key Takeaways

  • Bias = too simple → underfitting.
  • Variance = too complex → overfitting.
  • Training vs test errors and learning curves are your diagnostic tools.
  • Regularization (λ) controls complexity:
    • λ ↑ → simpler model, higher bias, lower variance.
    • λ ↓ → more complex model, lower bias, higher variance.
  • L1 (Lasso) → feature selection.
  • L2 (Ridge) → weight shrinkage.
  • Elastic Net → mix of both.
  • Fixing bias vs variance:
    • High variance → more data, fewer features, stronger regularization.
    • High bias → more features, polynomial terms, weaker regularization.
  • The goal: a model that learns enough but doesn’t memorize noise.

Top comments (0)