DEV Community

Cover image for MAE vs MSE vs RMSE: Three Bosses With Very Different Philosophies on Punishing Late Employees
Sachin Kr. Rajput
Sachin Kr. Rajput

Posted on

MAE vs MSE vs RMSE: Three Bosses With Very Different Philosophies on Punishing Late Employees

The One-Line Summary: MAE treats all errors equally (10 minutes late = 10 penalty). MSE punishes big errors catastrophically (10 minutes = 100, 60 minutes = 3,600). RMSE is MSE converted back to interpretable units. Choose based on whether you want to forgive small errors or destroy large ones.


The Three Bosses of Predictive Analytics Inc.

Three managers at Predictive Analytics Inc. need to evaluate their delivery drivers based on arrival time accuracy.

All three drivers had the same performance last week:

Delivery 1: Predicted 2:00 PM, Arrived 2:10 PM → 10 min late
Delivery 2: Predicted 3:00 PM, Arrived 3:05 PM →  5 min late  
Delivery 3: Predicted 4:00 PM, Arrived 3:55 PM →  5 min early (-5)
Delivery 4: Predicted 5:00 PM, Arrived 6:00 PM → 60 min late (traffic!)
Enter fullscreen mode Exit fullscreen mode

Same data. Three VERY different scores.


Boss A: "Fair Frank" (MAE Philosophy)

"Late is late. Early is early. Every minute counts the same. I don't care if you're 5 minutes late or 60 — I just add up all the minutes."

Penalty calculation:
|10| + |5| + |-5| + |60| = 10 + 5 + 5 + 60 = 80

Average penalty: 80 / 4 = 20 minutes

"Your average error is 20 minutes."
Enter fullscreen mode Exit fullscreen mode

Frank's verdict: "You're off by 20 minutes on average. Improve."


Boss B: "Squared Sarah" (MSE Philosophy)

"Small mistakes? Whatever. But BIG mistakes are UNACCEPTABLE. I square every error — small errors stay small, big errors become MASSIVE."

Penalty calculation:
10² + 5² + (-5)² + 60² = 100 + 25 + 25 + 3600 = 3750

Average penalty: 3750 / 4 = 937.5 squared-minutes

"Your average squared error is 937.5."
Enter fullscreen mode Exit fullscreen mode

Sarah's verdict: "That one 60-minute disaster dominates everything. 937.5! Unacceptable!"


Boss C: "Root Rachel" (RMSE Philosophy)

"I agree with Sarah's philosophy — big mistakes should hurt more. But 'squared minutes' is meaningless. Let me convert back to regular minutes."

Penalty calculation:
Same as Sarah: 3750 / 4 = 937.5

Then take square root: √937.5 = 30.6 minutes

"Your root mean squared error is 30.6 minutes."
Enter fullscreen mode Exit fullscreen mode

Rachel's verdict: "Accounting for how bad that 60-minute disaster was, your 'effective average error' is 30.6 minutes."


The Scoreboard

Boss Metric Score Philosophy
Frank MAE 20 min All errors equal
Sarah MSE 937.5 min² Big errors punished severely
Rachel RMSE 30.6 min Big errors punished, interpretable units

Same performance. Scores of 20, 937.5, and 30.6!

Notice: RMSE (30.6) > MAE (20). This is ALWAYS true when errors vary in size. The more outliers, the bigger the gap.


The Mathematics

MAE: Mean Absolute Error

MAE = (1/n) × Σ|actual - predicted|
    = Average of absolute errors
Enter fullscreen mode Exit fullscreen mode
import numpy as np

errors = [10, 5, -5, 60]
mae = np.mean(np.abs(errors))
print(f"MAE: {mae}")  # 20.0
Enter fullscreen mode Exit fullscreen mode

Properties:

  • Linear penalty (10 min late = 10 penalty)
  • Robust to outliers
  • Same units as target variable (minutes, dollars, etc.)
  • Intuitive: "On average, we're off by X"

MSE: Mean Squared Error

MSE = (1/n) × Σ(actual - predicted)²
    = Average of squared errors
Enter fullscreen mode Exit fullscreen mode
errors = [10, 5, -5, 60]
mse = np.mean(np.array(errors) ** 2)
print(f"MSE: {mse}")  # 937.5
Enter fullscreen mode Exit fullscreen mode

Properties:

  • Quadratic penalty (10 min late = 100, 60 min late = 3,600!)
  • Heavily penalizes outliers
  • Units are squared (minutes², dollars²) — not intuitive
  • Mathematically convenient (differentiable, used in optimization)

RMSE: Root Mean Squared Error

RMSE = √MSE = √[(1/n) × Σ(actual - predicted)²]
     = Square root of average squared errors
Enter fullscreen mode Exit fullscreen mode
errors = [10, 5, -5, 60]
rmse = np.sqrt(np.mean(np.array(errors) ** 2))
print(f"RMSE: {rmse}")  # 30.62
Enter fullscreen mode Exit fullscreen mode

Properties:

  • Same outlier sensitivity as MSE
  • Back to original units (minutes, dollars)
  • Interpretable: "Effective average error accounting for big mistakes"
  • RMSE ≥ MAE always (equal only when all errors are identical)

Visual: How They Treat Errors Differently

ERROR SIZE:        1    5    10   20   30   60
─────────────────────────────────────────────────

MAE PENALTY:       1    5    10   20   30   60
                   │    │    │    │    │    │
                   ▼    ▼    ▼    ▼    ▼    ▼
                   ●────●────●────●────●────●  (linear growth)


MSE PENALTY:       1   25   100  400  900  3600
                   │    │    │    │    │    │
                   ▼    ▼    ▼    ▼    ▼    ▼
                   ●    ●    ●    ●    ●    ●  (exponential growth!)
                   └────┴────┴────┴────┴────┘
                                          ↑
                                    EXPLOSION!
Enter fullscreen mode Exit fullscreen mode
import numpy as np
import matplotlib.pyplot as plt

errors = np.linspace(0, 60, 100)
mae_penalty = np.abs(errors)
mse_penalty = errors ** 2

plt.figure(figsize=(10, 6))
plt.plot(errors, mae_penalty, 'b-', linewidth=2, label='MAE (linear)')
plt.plot(errors, mse_penalty, 'r-', linewidth=2, label='MSE (quadratic)')
plt.xlabel('Error Size', fontsize=12)
plt.ylabel('Penalty', fontsize=12)
plt.title('How MAE and MSE Penalize Errors', fontsize=14)
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)

# Annotate the difference
plt.annotate('60 min error:\nMAE = 60\nMSE = 3,600', 
             xy=(60, 3600), xytext=(40, 2500),
             fontsize=10, arrowprops=dict(arrowstyle='->'))

plt.tight_layout()
plt.savefig('mae_vs_mse.png', dpi=150)
plt.show()
Enter fullscreen mode Exit fullscreen mode

The Outlier Test

Let's see how each metric responds to outliers:

import numpy as np
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Normal errors (no outliers)
y_true = [100, 100, 100, 100, 100]
y_pred = [102, 98, 103, 97, 100]  # Small errors: 2, 2, 3, 3, 0

mae_normal = mean_absolute_error(y_true, y_pred)
mse_normal = mean_squared_error(y_true, y_pred)
rmse_normal = np.sqrt(mse_normal)

print("WITHOUT outliers:")
print(f"  MAE:  {mae_normal:.2f}")
print(f"  MSE:  {mse_normal:.2f}")
print(f"  RMSE: {rmse_normal:.2f}")

# Now add ONE outlier
y_true_outlier = [100, 100, 100, 100, 100]
y_pred_outlier = [102, 98, 103, 97, 150]  # Last one is 50 off!

mae_outlier = mean_absolute_error(y_true_outlier, y_pred_outlier)
mse_outlier = mean_squared_error(y_true_outlier, y_pred_outlier)
rmse_outlier = np.sqrt(mse_outlier)

print("\nWITH one outlier (50 error):")
print(f"  MAE:  {mae_outlier:.2f}")
print(f"  MSE:  {mse_outlier:.2f}")
print(f"  RMSE: {rmse_outlier:.2f}")

# Calculate the increase
print(f"\nImpact of ONE outlier:")
print(f"  MAE increased:  {(mae_outlier/mae_normal - 1)*100:.0f}%")
print(f"  RMSE increased: {(rmse_outlier/rmse_normal - 1)*100:.0f}%")
Enter fullscreen mode Exit fullscreen mode

Output:

WITHOUT outliers:
  MAE:  2.00
  MSE:  5.20
  RMSE: 2.28

WITH one outlier (50 error):
  MAE:  12.00
  MSE:  506.00
  RMSE: 22.49

Impact of ONE outlier:
  MAE increased:  500%
  RMSE increased: 887%
Enter fullscreen mode Exit fullscreen mode

One outlier made:

  • MAE go from 2 → 12 (6x increase)
  • RMSE go from 2.28 → 22.49 (10x increase!)

RMSE is much more sensitive to outliers than MAE.


When to Use Each Metric

Use MAE When:

1. Outliers are noise, not signal

# House prices with some data entry errors
prices_true = [300000, 350000, 275000, 999999999, 400000]  # Typo!
prices_pred = [310000, 340000, 280000, 390000, 410000]

# MAE won't let the typo destroy everything
mae = mean_absolute_error(prices_true, prices_pred)
print(f"MAE: ${mae:,.0f}")  # Somewhat interpretable despite outlier
Enter fullscreen mode Exit fullscreen mode

2. All errors are equally bad

Scenario: Delivery time prediction
- 10 minutes late = unhappy customer
- 60 minutes late = 6x unhappy customer (not 36x!)

Use MAE — linear penalty makes sense.
Enter fullscreen mode Exit fullscreen mode

3. You want robustness

# Median-like behavior — resistant to extreme values
# MAE optimization finds the MEDIAN, not the mean
Enter fullscreen mode Exit fullscreen mode

4. Interpretability matters most

"Our model is off by $15,000 on average."
Clear. Simple. Stakeholders understand it.
Enter fullscreen mode Exit fullscreen mode

Use MSE When:

1. Large errors are catastrophically bad

Scenario: Autonomous vehicle distance prediction
- 1 meter off = fine
- 10 meters off = dangerous
- 50 meters off = FATAL

Errors shouldn't scale linearly. MSE's quadratic penalty is appropriate.
Enter fullscreen mode Exit fullscreen mode

2. You're training a model (optimization)

# MSE is differentiable everywhere — gradient descent loves it!
# MAE has a non-differentiable point at 0

model.compile(loss='mse')  # Standard for regression
Enter fullscreen mode Exit fullscreen mode

3. Outliers ARE important signal

Scenario: Fraud detection (regression on transaction amounts)
- Large errors might indicate fraud!
- You WANT to be sensitive to outliers
Enter fullscreen mode Exit fullscreen mode

4. You need mathematical convenience

# MSE decomposes nicely:
# MSE = Variance(predictions) + Bias² + Irreducible noise
# Useful for theoretical analysis
Enter fullscreen mode Exit fullscreen mode

Use RMSE When:

1. You want MSE's properties but interpretable units

mse = 10000  # What does 10,000 squared-dollars mean?
rmse = 100   # "We're off by about $100" — much clearer!
Enter fullscreen mode Exit fullscreen mode

2. Comparing to standard deviation

# RMSE and standard deviation are in the same units
# You can compare: "RMSE is 0.8 standard deviations"

std_y = np.std(y_true)
rmse = np.sqrt(mean_squared_error(y_true, y_pred))
print(f"RMSE is {rmse/std_y:.2f} standard deviations")
Enter fullscreen mode Exit fullscreen mode

3. Industry standard requires it

Many competitions (Kaggle, etc.) use RMSE as the metric.
Match the metric you'll be evaluated on!
Enter fullscreen mode Exit fullscreen mode

The Decision Flowchart

START: Choosing a regression metric
            │
            ▼
    Are outliers in your data?
            │
     ┌──────┴──────┐
     │             │
    YES           NO
     │             │
     ▼             ▼
 Are outliers    Use any
 meaningful?     (they're
     │           similar)
  ┌──┴──┐          │
  │     │          │
 YES   NO          │
  │     │          │
  ▼     ▼          │
 MSE   MAE         │
/RMSE  │           │
  │    │           │
  └────┴───────────┘
            │
            ▼
    Need interpretable units?
            │
     ┌──────┴──────┐
     │             │
    YES           NO
     │             │
     ▼             ▼
 MAE or        MSE is fine
 RMSE          (for optimization)
     │
     ▼
    Are big errors much worse
    than small errors?
            │
     ┌──────┴──────┐
     │             │
    YES           NO
     │             │
     ▼             ▼
   RMSE          MAE
Enter fullscreen mode Exit fullscreen mode

Complete Comparison Example

import numpy as np
import pandas as pd
from sklearn.metrics import mean_absolute_error, mean_squared_error

def evaluate_predictions(y_true, y_pred, name="Model"):
    """Complete regression evaluation."""
    mae = mean_absolute_error(y_true, y_pred)
    mse = mean_squared_error(y_true, y_pred)
    rmse = np.sqrt(mse)

    # Additional context
    y_range = np.max(y_true) - np.min(y_true)
    y_std = np.std(y_true)

    print(f"\n{'='*50}")
    print(f"Evaluation: {name}")
    print(f"{'='*50}")
    print(f"MAE:  {mae:,.2f}")
    print(f"MSE:  {mse:,.2f}")
    print(f"RMSE: {rmse:,.2f}")
    print(f"\nContext:")
    print(f"  Target range: {y_range:,.2f}")
    print(f"  Target std:   {y_std:,.2f}")
    print(f"  RMSE/std:     {rmse/y_std:.2%}")
    print(f"  MAE/range:    {mae/y_range:.2%}")

    # Relationship between metrics
    print(f"\nRelationships:")
    print(f"  RMSE/MAE ratio: {rmse/mae:.2f}")
    if rmse/mae > 1.5:
        print(f"  → High ratio suggests outliers or high variance in errors")
    else:
        print(f"  → Low ratio suggests consistent error sizes")

    return {'MAE': mae, 'MSE': mse, 'RMSE': rmse}

# Example: House price prediction
np.random.seed(42)
n = 100

# True prices
y_true = np.random.normal(400000, 100000, n)
y_true = np.clip(y_true, 100000, 800000)

# Model A: Consistent errors
y_pred_consistent = y_true + np.random.normal(0, 20000, n)

# Model B: Some big misses
y_pred_outliers = y_true + np.random.normal(0, 15000, n)
# Add some outliers
outlier_idx = np.random.choice(n, 5, replace=False)
y_pred_outliers[outlier_idx] += np.random.choice([-1, 1], 5) * 150000

# Evaluate both
results_a = evaluate_predictions(y_true, y_pred_consistent, "Model A (Consistent)")
results_b = evaluate_predictions(y_true, y_pred_outliers, "Model B (Has Outliers)")

# Compare
print("\n" + "="*50)
print("HEAD-TO-HEAD COMPARISON")
print("="*50)
print(f"\n{'Metric':<10} {'Model A':>15} {'Model B':>15} {'Winner':>15}")
print("-"*55)
for metric in ['MAE', 'MSE', 'RMSE']:
    a = results_a[metric]
    b = results_b[metric]
    winner = "A" if a < b else "B"
    print(f"{metric:<10} {a:>15,.0f} {b:>15,.0f} {'Model ' + winner:>15}")
Enter fullscreen mode Exit fullscreen mode

Output:

==================================================
Evaluation: Model A (Consistent)
==================================================
MAE:  16,234.12
MSE:  412,345,678.90
RMSE: 20,306.29

Context:
  Target range: 645,234.12
  Target std:   98,765.43
  RMSE/std:     20.56%
  MAE/range:    2.52%

Relationships:
  RMSE/MAE ratio: 1.25
  → Low ratio suggests consistent error sizes

==================================================
Evaluation: Model B (Has Outliers)
==================================================
MAE:  15,876.54
MSE:  789,012,345.67
RMSE: 28,089.36

Context:
  Target range: 645,234.12
  Target std:   98,765.43
  RMSE/std:     28.44%
  MAE/range:    2.46%

Relationships:
  RMSE/MAE ratio: 1.77
  → High ratio suggests outliers or high variance in errors

==================================================
HEAD-TO-HEAD COMPARISON
==================================================

Metric          Model A         Model B          Winner
-------------------------------------------------------
MAE              16,234          15,877         Model B
MSE         412,345,679     789,012,346         Model A
RMSE             20,306          28,089         Model A
Enter fullscreen mode Exit fullscreen mode

The Plot Twist:

  • Model B wins on MAE (lower average error)
  • Model A wins on MSE/RMSE (no catastrophic errors)

Which is better? Depends on whether those outliers matter!


The RMSE/MAE Ratio Trick

The ratio of RMSE to MAE tells you about error distribution:

def diagnose_errors(y_true, y_pred):
    mae = mean_absolute_error(y_true, y_pred)
    rmse = np.sqrt(mean_squared_error(y_true, y_pred))
    ratio = rmse / mae

    print(f"RMSE/MAE ratio: {ratio:.2f}")

    if ratio == 1.0:
        print("→ All errors are identical in size")
    elif ratio < 1.2:
        print("→ Errors are very consistent (good!)")
    elif ratio < 1.4:
        print("→ Errors have moderate variance")
    elif ratio < 1.7:
        print("→ Some larger errors present")
    else:
        print("→ Significant outliers or high error variance!")
        print("  Consider: outlier removal, MAE as metric, or robust models")

    # Theoretical limits
    print(f"\nTheoretical bounds:")
    print(f"  Minimum ratio: 1.0 (all errors equal)")
    print(f"  If errors ~ Normal: ratio ≈ 1.25")
    print(f"  Your ratio: {ratio:.2f}")
Enter fullscreen mode Exit fullscreen mode

Common Mistakes

Mistake 1: Comparing MAE to RMSE Directly

# ❌ WRONG
"MAE is 20, RMSE is 30. RMSE is 'worse'!"

# ✅ RIGHT
# They measure different things!
# RMSE ≥ MAE by definition
# Compare models using the SAME metric
Enter fullscreen mode Exit fullscreen mode

Mistake 2: Using MSE for Reporting

# ❌ WRONG
"Our model has MSE of 10,000 squared-dollars"
# What does that even mean?!

# ✅ RIGHT
rmse = np.sqrt(10000)
"Our model has RMSE of $100"
# OR
"Our model has MAE of $80"
Enter fullscreen mode Exit fullscreen mode

Mistake 3: Ignoring Scale

# ❌ WRONG
"Model A has MAE 50, Model B has MAE 100. A is 2x better!"

# ✅ RIGHT
# What if A predicts values around 1,000 and B around 1,000,000?
mae_a_pct = 50 / 1000  # 5%
mae_b_pct = 100 / 1000000  # 0.01%
# B is actually much better relatively!
Enter fullscreen mode Exit fullscreen mode

Mistake 4: Choosing Metric AFTER Seeing Results

# ❌ WRONG
"My model has bad RMSE but good MAE. Let's report MAE!"

# ✅ RIGHT
# Choose metric based on the PROBLEM, not the results
# Before training: "Big errors are catastrophic, use RMSE"
# Stick with that decision
Enter fullscreen mode Exit fullscreen mode

Quick Reference

Formulas

Metric Formula Units
MAE (1/n) × Σ\ y - ŷ\
MSE (1/n) × Σ(y - ŷ)²
RMSE √MSE Same as y

Properties

Property MAE MSE RMSE
Penalizes outliers Linear Quadratic Quadratic
Interpretable units
Differentiable ✗ (at 0)
Robust to outliers
Common in optimization

When to Use

Scenario Best Metric
Outliers are noise MAE
Outliers are signal MSE/RMSE
Stakeholder reports MAE or RMSE
Training neural nets MSE
All errors equally bad MAE
Big errors are catastrophic MSE/RMSE
Need interpretability MAE or RMSE
Kaggle competition Whatever they specify!

Key Takeaways

  1. MAE = average of absolute errors — Linear penalty, robust, interpretable

  2. MSE = average of squared errors — Quadratic penalty, sensitive to outliers, squared units

  3. RMSE = √MSE — Same sensitivity as MSE, interpretable units

  4. RMSE ≥ MAE always — Equal only when all errors are identical

  5. High RMSE/MAE ratio = outliers present — Ratio > 1.5 suggests investigation needed

  6. Choose metric before training — Based on problem requirements, not results

  7. MSE for optimization, RMSE for reporting — Best of both worlds

  8. Scale matters — MAE of 50 on $1,000 values ≠ MAE of 50 on $1,000,000 values


The One-Sentence Summary

MAE is Boss Frank counting every minute of lateness equally, MSE is Boss Sarah squaring minutes so that one 60-minute disaster dominates everything, and RMSE is Boss Rachel using Sarah's philosophy but converting back to "minutes" so you actually understand your score — choose based on whether you want to forgive small errors or absolutely destroy large ones.


What's Next?

Now that you understand MAE, MSE, and RMSE, you're ready for:

  • MAPE and SMAPE — Percentage-based error metrics
  • Huber Loss — The best of MAE and MSE
  • Quantile Loss — When you care about under vs over prediction
  • Residual Analysis — Diagnosing WHY your errors happen

Follow me for the next article in this series!


Let's Connect!

If the three bosses finally made these metrics click, drop a heart!

Questions? Ask in the comments — I read and respond to every one.

Which metric do you use most? I'm a RMSE person for reporting but MAE person for robust models. What about you?


The difference between a model that's "off by 20 minutes on average" and one that's "effectively off by 30 minutes when you account for that one disaster"? MAE vs RMSE. Same model, different stories. Choose the story that matches your problem.


Share this with someone who's confused about why their MAE and RMSE are so different. They probably have outliers — and now they'll know what to do about it.

Happy measuring! 📏

Top comments (0)