The One-Line Summary: MAE treats all errors equally (10 minutes late = 10 penalty). MSE punishes big errors catastrophically (10 minutes = 100, 60 minutes = 3,600). RMSE is MSE converted back to interpretable units. Choose based on whether you want to forgive small errors or destroy large ones.
The Three Bosses of Predictive Analytics Inc.
Three managers at Predictive Analytics Inc. need to evaluate their delivery drivers based on arrival time accuracy.
All three drivers had the same performance last week:
Delivery 1: Predicted 2:00 PM, Arrived 2:10 PM → 10 min late
Delivery 2: Predicted 3:00 PM, Arrived 3:05 PM → 5 min late
Delivery 3: Predicted 4:00 PM, Arrived 3:55 PM → 5 min early (-5)
Delivery 4: Predicted 5:00 PM, Arrived 6:00 PM → 60 min late (traffic!)
Same data. Three VERY different scores.
Boss A: "Fair Frank" (MAE Philosophy)
"Late is late. Early is early. Every minute counts the same. I don't care if you're 5 minutes late or 60 — I just add up all the minutes."
Penalty calculation:
|10| + |5| + |-5| + |60| = 10 + 5 + 5 + 60 = 80
Average penalty: 80 / 4 = 20 minutes
"Your average error is 20 minutes."
Frank's verdict: "You're off by 20 minutes on average. Improve."
Boss B: "Squared Sarah" (MSE Philosophy)
"Small mistakes? Whatever. But BIG mistakes are UNACCEPTABLE. I square every error — small errors stay small, big errors become MASSIVE."
Penalty calculation:
10² + 5² + (-5)² + 60² = 100 + 25 + 25 + 3600 = 3750
Average penalty: 3750 / 4 = 937.5 squared-minutes
"Your average squared error is 937.5."
Sarah's verdict: "That one 60-minute disaster dominates everything. 937.5! Unacceptable!"
Boss C: "Root Rachel" (RMSE Philosophy)
"I agree with Sarah's philosophy — big mistakes should hurt more. But 'squared minutes' is meaningless. Let me convert back to regular minutes."
Penalty calculation:
Same as Sarah: 3750 / 4 = 937.5
Then take square root: √937.5 = 30.6 minutes
"Your root mean squared error is 30.6 minutes."
Rachel's verdict: "Accounting for how bad that 60-minute disaster was, your 'effective average error' is 30.6 minutes."
The Scoreboard
| Boss | Metric | Score | Philosophy |
|---|---|---|---|
| Frank | MAE | 20 min | All errors equal |
| Sarah | MSE | 937.5 min² | Big errors punished severely |
| Rachel | RMSE | 30.6 min | Big errors punished, interpretable units |
Same performance. Scores of 20, 937.5, and 30.6!
Notice: RMSE (30.6) > MAE (20). This is ALWAYS true when errors vary in size. The more outliers, the bigger the gap.
The Mathematics
MAE: Mean Absolute Error
MAE = (1/n) × Σ|actual - predicted|
= Average of absolute errors
import numpy as np
errors = [10, 5, -5, 60]
mae = np.mean(np.abs(errors))
print(f"MAE: {mae}") # 20.0
Properties:
- Linear penalty (10 min late = 10 penalty)
- Robust to outliers
- Same units as target variable (minutes, dollars, etc.)
- Intuitive: "On average, we're off by X"
MSE: Mean Squared Error
MSE = (1/n) × Σ(actual - predicted)²
= Average of squared errors
errors = [10, 5, -5, 60]
mse = np.mean(np.array(errors) ** 2)
print(f"MSE: {mse}") # 937.5
Properties:
- Quadratic penalty (10 min late = 100, 60 min late = 3,600!)
- Heavily penalizes outliers
- Units are squared (minutes², dollars²) — not intuitive
- Mathematically convenient (differentiable, used in optimization)
RMSE: Root Mean Squared Error
RMSE = √MSE = √[(1/n) × Σ(actual - predicted)²]
= Square root of average squared errors
errors = [10, 5, -5, 60]
rmse = np.sqrt(np.mean(np.array(errors) ** 2))
print(f"RMSE: {rmse}") # 30.62
Properties:
- Same outlier sensitivity as MSE
- Back to original units (minutes, dollars)
- Interpretable: "Effective average error accounting for big mistakes"
- RMSE ≥ MAE always (equal only when all errors are identical)
Visual: How They Treat Errors Differently
ERROR SIZE: 1 5 10 20 30 60
─────────────────────────────────────────────────
MAE PENALTY: 1 5 10 20 30 60
│ │ │ │ │ │
▼ ▼ ▼ ▼ ▼ ▼
●────●────●────●────●────● (linear growth)
MSE PENALTY: 1 25 100 400 900 3600
│ │ │ │ │ │
▼ ▼ ▼ ▼ ▼ ▼
● ● ● ● ● ● (exponential growth!)
└────┴────┴────┴────┴────┘
↑
EXPLOSION!
import numpy as np
import matplotlib.pyplot as plt
errors = np.linspace(0, 60, 100)
mae_penalty = np.abs(errors)
mse_penalty = errors ** 2
plt.figure(figsize=(10, 6))
plt.plot(errors, mae_penalty, 'b-', linewidth=2, label='MAE (linear)')
plt.plot(errors, mse_penalty, 'r-', linewidth=2, label='MSE (quadratic)')
plt.xlabel('Error Size', fontsize=12)
plt.ylabel('Penalty', fontsize=12)
plt.title('How MAE and MSE Penalize Errors', fontsize=14)
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
# Annotate the difference
plt.annotate('60 min error:\nMAE = 60\nMSE = 3,600',
xy=(60, 3600), xytext=(40, 2500),
fontsize=10, arrowprops=dict(arrowstyle='->'))
plt.tight_layout()
plt.savefig('mae_vs_mse.png', dpi=150)
plt.show()
The Outlier Test
Let's see how each metric responds to outliers:
import numpy as np
from sklearn.metrics import mean_absolute_error, mean_squared_error
# Normal errors (no outliers)
y_true = [100, 100, 100, 100, 100]
y_pred = [102, 98, 103, 97, 100] # Small errors: 2, 2, 3, 3, 0
mae_normal = mean_absolute_error(y_true, y_pred)
mse_normal = mean_squared_error(y_true, y_pred)
rmse_normal = np.sqrt(mse_normal)
print("WITHOUT outliers:")
print(f" MAE: {mae_normal:.2f}")
print(f" MSE: {mse_normal:.2f}")
print(f" RMSE: {rmse_normal:.2f}")
# Now add ONE outlier
y_true_outlier = [100, 100, 100, 100, 100]
y_pred_outlier = [102, 98, 103, 97, 150] # Last one is 50 off!
mae_outlier = mean_absolute_error(y_true_outlier, y_pred_outlier)
mse_outlier = mean_squared_error(y_true_outlier, y_pred_outlier)
rmse_outlier = np.sqrt(mse_outlier)
print("\nWITH one outlier (50 error):")
print(f" MAE: {mae_outlier:.2f}")
print(f" MSE: {mse_outlier:.2f}")
print(f" RMSE: {rmse_outlier:.2f}")
# Calculate the increase
print(f"\nImpact of ONE outlier:")
print(f" MAE increased: {(mae_outlier/mae_normal - 1)*100:.0f}%")
print(f" RMSE increased: {(rmse_outlier/rmse_normal - 1)*100:.0f}%")
Output:
WITHOUT outliers:
MAE: 2.00
MSE: 5.20
RMSE: 2.28
WITH one outlier (50 error):
MAE: 12.00
MSE: 506.00
RMSE: 22.49
Impact of ONE outlier:
MAE increased: 500%
RMSE increased: 887%
One outlier made:
- MAE go from 2 → 12 (6x increase)
- RMSE go from 2.28 → 22.49 (10x increase!)
RMSE is much more sensitive to outliers than MAE.
When to Use Each Metric
Use MAE When:
1. Outliers are noise, not signal
# House prices with some data entry errors
prices_true = [300000, 350000, 275000, 999999999, 400000] # Typo!
prices_pred = [310000, 340000, 280000, 390000, 410000]
# MAE won't let the typo destroy everything
mae = mean_absolute_error(prices_true, prices_pred)
print(f"MAE: ${mae:,.0f}") # Somewhat interpretable despite outlier
2. All errors are equally bad
Scenario: Delivery time prediction
- 10 minutes late = unhappy customer
- 60 minutes late = 6x unhappy customer (not 36x!)
Use MAE — linear penalty makes sense.
3. You want robustness
# Median-like behavior — resistant to extreme values
# MAE optimization finds the MEDIAN, not the mean
4. Interpretability matters most
"Our model is off by $15,000 on average."
Clear. Simple. Stakeholders understand it.
Use MSE When:
1. Large errors are catastrophically bad
Scenario: Autonomous vehicle distance prediction
- 1 meter off = fine
- 10 meters off = dangerous
- 50 meters off = FATAL
Errors shouldn't scale linearly. MSE's quadratic penalty is appropriate.
2. You're training a model (optimization)
# MSE is differentiable everywhere — gradient descent loves it!
# MAE has a non-differentiable point at 0
model.compile(loss='mse') # Standard for regression
3. Outliers ARE important signal
Scenario: Fraud detection (regression on transaction amounts)
- Large errors might indicate fraud!
- You WANT to be sensitive to outliers
4. You need mathematical convenience
# MSE decomposes nicely:
# MSE = Variance(predictions) + Bias² + Irreducible noise
# Useful for theoretical analysis
Use RMSE When:
1. You want MSE's properties but interpretable units
mse = 10000 # What does 10,000 squared-dollars mean?
rmse = 100 # "We're off by about $100" — much clearer!
2. Comparing to standard deviation
# RMSE and standard deviation are in the same units
# You can compare: "RMSE is 0.8 standard deviations"
std_y = np.std(y_true)
rmse = np.sqrt(mean_squared_error(y_true, y_pred))
print(f"RMSE is {rmse/std_y:.2f} standard deviations")
3. Industry standard requires it
Many competitions (Kaggle, etc.) use RMSE as the metric.
Match the metric you'll be evaluated on!
The Decision Flowchart
START: Choosing a regression metric
│
▼
Are outliers in your data?
│
┌──────┴──────┐
│ │
YES NO
│ │
▼ ▼
Are outliers Use any
meaningful? (they're
│ similar)
┌──┴──┐ │
│ │ │
YES NO │
│ │ │
▼ ▼ │
MSE MAE │
/RMSE │ │
│ │ │
└────┴───────────┘
│
▼
Need interpretable units?
│
┌──────┴──────┐
│ │
YES NO
│ │
▼ ▼
MAE or MSE is fine
RMSE (for optimization)
│
▼
Are big errors much worse
than small errors?
│
┌──────┴──────┐
│ │
YES NO
│ │
▼ ▼
RMSE MAE
Complete Comparison Example
import numpy as np
import pandas as pd
from sklearn.metrics import mean_absolute_error, mean_squared_error
def evaluate_predictions(y_true, y_pred, name="Model"):
"""Complete regression evaluation."""
mae = mean_absolute_error(y_true, y_pred)
mse = mean_squared_error(y_true, y_pred)
rmse = np.sqrt(mse)
# Additional context
y_range = np.max(y_true) - np.min(y_true)
y_std = np.std(y_true)
print(f"\n{'='*50}")
print(f"Evaluation: {name}")
print(f"{'='*50}")
print(f"MAE: {mae:,.2f}")
print(f"MSE: {mse:,.2f}")
print(f"RMSE: {rmse:,.2f}")
print(f"\nContext:")
print(f" Target range: {y_range:,.2f}")
print(f" Target std: {y_std:,.2f}")
print(f" RMSE/std: {rmse/y_std:.2%}")
print(f" MAE/range: {mae/y_range:.2%}")
# Relationship between metrics
print(f"\nRelationships:")
print(f" RMSE/MAE ratio: {rmse/mae:.2f}")
if rmse/mae > 1.5:
print(f" → High ratio suggests outliers or high variance in errors")
else:
print(f" → Low ratio suggests consistent error sizes")
return {'MAE': mae, 'MSE': mse, 'RMSE': rmse}
# Example: House price prediction
np.random.seed(42)
n = 100
# True prices
y_true = np.random.normal(400000, 100000, n)
y_true = np.clip(y_true, 100000, 800000)
# Model A: Consistent errors
y_pred_consistent = y_true + np.random.normal(0, 20000, n)
# Model B: Some big misses
y_pred_outliers = y_true + np.random.normal(0, 15000, n)
# Add some outliers
outlier_idx = np.random.choice(n, 5, replace=False)
y_pred_outliers[outlier_idx] += np.random.choice([-1, 1], 5) * 150000
# Evaluate both
results_a = evaluate_predictions(y_true, y_pred_consistent, "Model A (Consistent)")
results_b = evaluate_predictions(y_true, y_pred_outliers, "Model B (Has Outliers)")
# Compare
print("\n" + "="*50)
print("HEAD-TO-HEAD COMPARISON")
print("="*50)
print(f"\n{'Metric':<10} {'Model A':>15} {'Model B':>15} {'Winner':>15}")
print("-"*55)
for metric in ['MAE', 'MSE', 'RMSE']:
a = results_a[metric]
b = results_b[metric]
winner = "A" if a < b else "B"
print(f"{metric:<10} {a:>15,.0f} {b:>15,.0f} {'Model ' + winner:>15}")
Output:
==================================================
Evaluation: Model A (Consistent)
==================================================
MAE: 16,234.12
MSE: 412,345,678.90
RMSE: 20,306.29
Context:
Target range: 645,234.12
Target std: 98,765.43
RMSE/std: 20.56%
MAE/range: 2.52%
Relationships:
RMSE/MAE ratio: 1.25
→ Low ratio suggests consistent error sizes
==================================================
Evaluation: Model B (Has Outliers)
==================================================
MAE: 15,876.54
MSE: 789,012,345.67
RMSE: 28,089.36
Context:
Target range: 645,234.12
Target std: 98,765.43
RMSE/std: 28.44%
MAE/range: 2.46%
Relationships:
RMSE/MAE ratio: 1.77
→ High ratio suggests outliers or high variance in errors
==================================================
HEAD-TO-HEAD COMPARISON
==================================================
Metric Model A Model B Winner
-------------------------------------------------------
MAE 16,234 15,877 Model B
MSE 412,345,679 789,012,346 Model A
RMSE 20,306 28,089 Model A
The Plot Twist:
- Model B wins on MAE (lower average error)
- Model A wins on MSE/RMSE (no catastrophic errors)
Which is better? Depends on whether those outliers matter!
The RMSE/MAE Ratio Trick
The ratio of RMSE to MAE tells you about error distribution:
def diagnose_errors(y_true, y_pred):
mae = mean_absolute_error(y_true, y_pred)
rmse = np.sqrt(mean_squared_error(y_true, y_pred))
ratio = rmse / mae
print(f"RMSE/MAE ratio: {ratio:.2f}")
if ratio == 1.0:
print("→ All errors are identical in size")
elif ratio < 1.2:
print("→ Errors are very consistent (good!)")
elif ratio < 1.4:
print("→ Errors have moderate variance")
elif ratio < 1.7:
print("→ Some larger errors present")
else:
print("→ Significant outliers or high error variance!")
print(" Consider: outlier removal, MAE as metric, or robust models")
# Theoretical limits
print(f"\nTheoretical bounds:")
print(f" Minimum ratio: 1.0 (all errors equal)")
print(f" If errors ~ Normal: ratio ≈ 1.25")
print(f" Your ratio: {ratio:.2f}")
Common Mistakes
Mistake 1: Comparing MAE to RMSE Directly
# ❌ WRONG
"MAE is 20, RMSE is 30. RMSE is 'worse'!"
# ✅ RIGHT
# They measure different things!
# RMSE ≥ MAE by definition
# Compare models using the SAME metric
Mistake 2: Using MSE for Reporting
# ❌ WRONG
"Our model has MSE of 10,000 squared-dollars"
# What does that even mean?!
# ✅ RIGHT
rmse = np.sqrt(10000)
"Our model has RMSE of $100"
# OR
"Our model has MAE of $80"
Mistake 3: Ignoring Scale
# ❌ WRONG
"Model A has MAE 50, Model B has MAE 100. A is 2x better!"
# ✅ RIGHT
# What if A predicts values around 1,000 and B around 1,000,000?
mae_a_pct = 50 / 1000 # 5%
mae_b_pct = 100 / 1000000 # 0.01%
# B is actually much better relatively!
Mistake 4: Choosing Metric AFTER Seeing Results
# ❌ WRONG
"My model has bad RMSE but good MAE. Let's report MAE!"
# ✅ RIGHT
# Choose metric based on the PROBLEM, not the results
# Before training: "Big errors are catastrophic, use RMSE"
# Stick with that decision
Quick Reference
Formulas
| Metric | Formula | Units |
|---|---|---|
| MAE | (1/n) × Σ\ | y - ŷ\ |
| MSE | (1/n) × Σ(y - ŷ)² | y² |
| RMSE | √MSE | Same as y |
Properties
| Property | MAE | MSE | RMSE |
|---|---|---|---|
| Penalizes outliers | Linear | Quadratic | Quadratic |
| Interpretable units | ✓ | ✗ | ✓ |
| Differentiable | ✗ (at 0) | ✓ | ✓ |
| Robust to outliers | ✓ | ✗ | ✗ |
| Common in optimization | ✗ | ✓ | ✗ |
When to Use
| Scenario | Best Metric |
|---|---|
| Outliers are noise | MAE |
| Outliers are signal | MSE/RMSE |
| Stakeholder reports | MAE or RMSE |
| Training neural nets | MSE |
| All errors equally bad | MAE |
| Big errors are catastrophic | MSE/RMSE |
| Need interpretability | MAE or RMSE |
| Kaggle competition | Whatever they specify! |
Key Takeaways
MAE = average of absolute errors — Linear penalty, robust, interpretable
MSE = average of squared errors — Quadratic penalty, sensitive to outliers, squared units
RMSE = √MSE — Same sensitivity as MSE, interpretable units
RMSE ≥ MAE always — Equal only when all errors are identical
High RMSE/MAE ratio = outliers present — Ratio > 1.5 suggests investigation needed
Choose metric before training — Based on problem requirements, not results
MSE for optimization, RMSE for reporting — Best of both worlds
Scale matters — MAE of 50 on $1,000 values ≠ MAE of 50 on $1,000,000 values
The One-Sentence Summary
MAE is Boss Frank counting every minute of lateness equally, MSE is Boss Sarah squaring minutes so that one 60-minute disaster dominates everything, and RMSE is Boss Rachel using Sarah's philosophy but converting back to "minutes" so you actually understand your score — choose based on whether you want to forgive small errors or absolutely destroy large ones.
What's Next?
Now that you understand MAE, MSE, and RMSE, you're ready for:
- MAPE and SMAPE — Percentage-based error metrics
- Huber Loss — The best of MAE and MSE
- Quantile Loss — When you care about under vs over prediction
- Residual Analysis — Diagnosing WHY your errors happen
Follow me for the next article in this series!
Let's Connect!
If the three bosses finally made these metrics click, drop a heart!
Questions? Ask in the comments — I read and respond to every one.
Which metric do you use most? I'm a RMSE person for reporting but MAE person for robust models. What about you?
The difference between a model that's "off by 20 minutes on average" and one that's "effectively off by 30 minutes when you account for that one disaster"? MAE vs RMSE. Same model, different stories. Choose the story that matches your problem.
Share this with someone who's confused about why their MAE and RMSE are so different. They probably have outliers — and now they'll know what to do about it.
Happy measuring! 📏
Top comments (0)