Mean Absolute Percentage Error (MAPE) is the default forecast accuracy metric in most business forecasting contexts. It is also deeply flawed: it explodes when actuals approach zero, it is asymmetric (overforecasting and underforecasting by the same absolute amount produce different MAPE values), and it provides no baseline comparison — a model with 15% MAPE might be excellent or terrible depending on what a naive forecast would produce.
The Four-Metric Stack
import numpy as np
from scipy import stats
def forecast_evaluation(actual, forecast):
n = len(actual)
errors = actual - forecast
# MAPE — use only where actuals are safely nonzero
mape = np.mean(np.abs(errors / actual)) * 100
# RMSE — penalizes large errors; same units as the series
rmse = np.sqrt(np.mean(errors**2))
# MASE — scaled against naive (lag-1) forecast; > 1 means worse than naive
naive_mae = np.mean(np.abs(np.diff(actual)))
mase = np.mean(np.abs(errors)) / naive_mae
# Theil's U — ratio to naive forecast RMSE; < 1 means better than naive
naive_rmse = np.sqrt(np.mean(np.diff(actual)**2))
theils_u = rmse / naive_rmse
# Ljung-Box — tests whether residuals are white noise (p > 0.05 = no autocorrelation)
lb_stat, lb_pvalue = stats.acorr_ljungbox(errors, lags=[10], return_df=False)
return {
'MAPE': mape,
'RMSE': rmse,
'MASE': mase,
"Theil's U": theils_u,
'Ljung-Box p-value': lb_pvalue[0]
}
A complete forecast evaluation reports all five outputs. MASE and Theil's U below 1.0 confirm the model outperforms a naive baseline. A Ljung-Box p-value above 0.05 confirms residuals are white noise — meaning the model has extracted all available signal and is not leaving systematic patterns unmodeled.
Read the full article with interpretation framework and regime-switching analysis →
Top comments (0)