DEV Community

Cover image for So I Explored Forecasting Metrics... Now I Want Your Two Cents ๐Ÿ’ญ
Retiago Drago
Retiago Drago

Posted on

So I Explored Forecasting Metrics... Now I Want Your Two Cents ๐Ÿ’ญ

Outlines

Introduction ๐ŸŒŸ

Diving into the world of regression and forecasting metrics can be a real head-scratcher, especially if you're a newcomer. Trust me, I've been thereโ€”haphazardly applying every popular metric scholars swear by, only to be left puzzled by the results.

Ever wondered why your MAE, MSE, and RMSE values look stellar, but your MAPE is through the roof? Yep, me too.

That's why I set out on this journey to create an experimental notebook, aiming to demystify how different metrics actually behave.

The Objective ๐ŸŽฏ

The goal of this notebook isn't to find the "one metric to rule them all" for a specific dataset. Instead, I want to understand how various metrics respond to controlled conditions in both dataset and model. Think of this as a comparative study, a sort of "Metrics 101" through the lens of someone who's still got that new-car smell in the field. This way, when I'm plunged into real-world scenarios, I'll have a better grip on interpreting my metrics.

Metrics Investigated ๐Ÿ”

To get a comprehensive view, I've opted to explore a selection of metrics that are commonly leveraged in regression and forecasting problems. Here's the lineup:

  1. Mean Absolute Error (MAE):

    • Definition: It measures the average magnitude of the errors between predicted and observed values.
    • Formula:
      MAE=1nโˆ‘i=1nโˆฃyiโˆ’y^iโˆฃMAE = \frac{1}{n}\sum_{i=1}^{n} |y_i - \hat{y}_i|
      where (yi)(y_i) is the actual value, (y^i)(\hat{y}_i) is the predicted value, and (n)(n) is the number of observations.
  2. Mean Squared Error (MSE):

    • Definition: It measures the average of the squares of the errors between predicted and observed values. It gives more weight to large errors.
    • Formula:
      MSE=1nโˆ‘i=1n(yiโˆ’y^i)2MSE = \frac{1}{n}\sum_{i=1}^{n} (y_i - \hat{y}_i)^2
  3. Root Mean Squared Error (RMSE):

    • Definition: It represents the sample standard deviation of the differences between predicted and observed values. It's the square root of MSE.
    • Formula:
      RMSE=MSERMSE = \sqrt{MSE}
  4. Mean Absolute Percentage Error (MAPE):

    • Definition: It measures the average of the absolute percentage errors between predicted and observed values.
    • Formula:
      MAPE=100nโˆ‘i=1nโˆฃyiโˆ’y^iyiโˆฃMAPE = \frac{100}{n}\sum_{i=1}^{n} \left| \frac{y_i - \hat{y}_i}{y_i} \right|
      Note: MAPE can be problematic if the actual value (yi)(y_i) is zero for some observations.
  5. Mean Absolute Scaled Error (MASE):

    • Definition: It measures the accuracy of forecasts relative to a naive baseline method. If MASE is lower than 1, the forecast is better than the naive forecast.
    • Formula:
      MASE=โˆ‘i=1nโˆฃyiโˆ’y^iโˆฃโˆ‘i=2nโˆฃyiโˆ’yiโˆ’1โˆฃMASE = \frac{\sum_{i=1}^{n} |y_i - \hat{y}i|}{\sum{i=2}^{n} |y_i - y_{i-1}|}
  6. R-squared (Coefficient of Determination):

    • Definition: It indicates the proportion of the variance in the dependent variable that is predictable from the independent variable(s).
    • Formula:
      R2=1โˆ’โˆ‘i=1n(yiโˆ’y^i)2โˆ‘i=1n(yiโˆ’yห‰)2R^2 = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y}i)^2}{\sum{i=1}^{n} (y_i - \bar{y})^2}
      where (yห‰)(\bar{y}) is the mean of the observed data.
  7. Symmetric Mean Absolute Percentage Error (sMAPE):

    • Definition: It's a variation of MAPE that addresses some of its issues, especially when the actual value is zero.
    • Formula:
      sMAPE=100nโˆ‘i=1nโˆฃyiโˆ’y^iโˆฃ(โˆฃyiโˆฃ+โˆฃy^iโˆฃ)/2sMAPE = \frac{100}{n} \sum_{i=1}^{n} \frac{|y_i - \hat{y}_i|}{(|y_i| + |\hat{y}_i|)/2}
  8. Mean Bias Deviation (MBD):

    • Definition: It calculates the average percentage bias in the predicted values.
    • Formula:
      MBD=100nโˆ‘i=1nyiโˆ’y^iyiMBD = \frac{100}{n}\sum_{i=1}^{n} \frac{y_i - \hat{y}_i}{y_i}

Severity and Directional Emojis ๐Ÿ”ฅ๐Ÿ‘‰

Let's face it, numbers alone can be dry, and if you're like me, you might crave a more visceral sense of how well your model is doing. Enter severity and directional emojis. These little symbols provide a quick visual cue for interpreting metric results, ranging from "you're nailing it" to "back to the drawing board."

Disclaimer: Keep in mind that these categorizations are user-defined and could vary depending on the context in which you're working.

Standard Error Metrics (MAE, MSE, RMSE) Categorization ๐Ÿ“Š

To clarify, the concept of Normalized Error Range stems from dividing the error by the range (max - min) of the training data.

Category Normalized Error Range
Perfect Exactly 0
Very Acceptable 0<xโ‰ค0.050 < x \leq 0.05
Acceptable 0.05<xโ‰ค0.10.05 < x \leq 0.1
Moderate 0.1<xโ‰ค0.20.1 < x \leq 0.2
High 0.2<xโ‰ค0.30.2 < x \leq 0.3
Very High 0.3<xโ‰ค10.3 < x \leq 1
Exceedingly High x>1x > 1

Percentage Error (MAPE, sMAPE, MBDev) Categorization ๐Ÿ“‰

Category Error Magnitude (%) Direction
Perfect Exactly 0% -
Very Acceptable 0<xโ‰ค50 < x \leq 5 Over/Under
Acceptable 5<xโ‰ค105 < x \leq 10 Over/Under
Moderate 10<xโ‰ค2010 < x \leq 20 Over/Under
High 20<xโ‰ค3020 < x \leq 30 Over/Under
Very High 30<xโ‰ค10030 < x \leq 100 Over/Under
Exceedingly High x>100x > 100 Over/Under

R2 Score Categorization ๐Ÿ“ˆ

Category R2 Value Range
Perfect Exactly 1
Very Acceptable 0.95โ‰คx<10.95 \leq x < 1
Acceptable 0.9โ‰คx<0.950.9 \leq x < 0.95
Moderate 0.8โ‰คx<0.90.8 \leq x < 0.9
High 0.7โ‰คx<0.80.7 \leq x < 0.8
Very High 0.5โ‰คx<0.70.5 \leq x < 0.7
Exceedingly High 0<x<0.50 < x < 0.5
Doesn't Explain Variability Exactly 0
Worse Than Simple Mean Model x<0x < 0

MASE Categorization ๐Ÿ“‹

Category MASE Value Range
Perfect Exactly 0
Very Acceptable 0<xโ‰ค0.10 < x \leq 0.1
Acceptable 0.1<xโ‰ค0.50.1 < x \leq 0.5
Moderate 0.5<xโ‰ค0.90.5 < x \leq 0.9
High 0.9<xโ‰ค10.9 < x \leq 1
Equivalent to Naive Model Exactly 1
Worse Than Naive Forecast Model x>1x > 1

Severity Emojis ๐Ÿšจ

Metric Emoji
Perfect ๐Ÿ’ฏ
Very Acceptable ๐Ÿ‘Œ
Acceptable โœ”๏ธ
Moderate โ—
High โŒ
Very High ๐Ÿ’€
Exceedingly High โ˜ 
Doesn't Explain Variability ๐Ÿšซ
Worse Than Simple Mean Model ๐Ÿ›‘
Equivalent to Naive Model โš–
Worse Than Naive Forecast Model ๐Ÿคฌ

Directional Emojis โžก๏ธ

Metric Emoji
Overestimation ๐Ÿ“ˆ
Underestimation ๐Ÿ“‰
Nan / None ๐Ÿ™…โ€โ™‚๏ธ

Methodology ๐Ÿ“š

For this experiment, I've synthesized datasets using mathematical functions like sine and cosine, which offer a controlled level of predictability. On the modeling end, I've used statsmodels.tsa.ar_model.AutoReg and OffsetModel. I chose AutoReg for its foundational role in time series forecasting, while OffsetModel serves to mimic good performance by shifting test data. This entire endeavor is laser-focused on forecasting problems, underscoring the fact that all forecasting issues are essentially regression problems, just not the other way around as far as I understand.

Highlights of Findings โœจ

To navigate the labyrinth of metrics, I've laid out my explorations in a tree graph, which you can check out below:

metrics exploration

The table here provides just a glimpse into the first phase of my deep dive into metrics. For those hungry for the full rundown, it's available right here. Click ๐Ÿ“Š for the plot.

Plot Based on Variant Dataset Model R2 MAE MSE RMSE MASE MAPE sMAPE MBDev
๐Ÿ“Š Test Size Small=1 cosโก(x)\cos(x) AutoReg ๐Ÿ™…โ€โ™‚๏ธ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿคฌ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿ‘Œ๐Ÿ“ˆ
๐Ÿ“Š OffsetModel ๐Ÿ™…โ€โ™‚๏ธ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿคฌ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿ‘Œ๐Ÿ“ˆ
๐Ÿ“Š sinโก(x)\sin(x) AutoReg ๐Ÿ™…โ€โ™‚๏ธ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿคฌ โ˜  โ˜  โ˜ ๐Ÿ“‰
๐Ÿ“Š OffsetModel ๐Ÿ™…โ€โ™‚๏ธ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿคฌ โ˜  โ˜  โ˜ ๐Ÿ“ˆ
๐Ÿ“Š Small=2 cosโก(x)\cos(x) AutoReg ๐Ÿ›‘ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿคฌ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿ‘Œ๐Ÿ“ˆ
๐Ÿ“Š OffsetModel ๐Ÿ›‘ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿคฌ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿ‘Œ๐Ÿ“ˆ
๐Ÿ“Š sinโก(x)\sin(x) AutoReg ๐Ÿ›‘ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿคฌ โ˜  โ˜  โ˜ ๐Ÿ“‰
๐Ÿ“Š OffsetModel ๐Ÿ›‘ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿคฌ โ˜  โ˜  โ˜ ๐Ÿ“ˆ
๐Ÿ“Š Mid cosโก(x)\cos(x) AutoReg ๐Ÿ›‘ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿคฌ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿ‘Œ๐Ÿ“ˆ
๐Ÿ“Š OffsetModel ๐Ÿ›‘ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿคฌ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿ‘Œ๐Ÿ“ˆ
๐Ÿ“Š sinโก(x)\sin(x) AutoReg ๐Ÿ›‘ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿคฌ โ˜  ๐Ÿ’€ โ˜ ๐Ÿ“‰
๐Ÿ“Š OffsetModel ๐Ÿ›‘ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿคฌ โ˜  โ˜  โ˜ ๐Ÿ“ˆ
๐Ÿ“Š Large cosโก(x)\cos(x) AutoReg ๐Ÿ›‘ โ— โœ”๏ธ โŒ ๐Ÿคฌ ๐Ÿ‘Œ โŒ ๐Ÿ’€๐Ÿ“ˆ
๐Ÿ“Š OffsetModel โ— ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿคฌ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿ‘Œ๐Ÿ“ˆ
๐Ÿ“Š sinโก(x)\sin(x) AutoReg ๐Ÿ›‘ โŒ โ— โŒ ๐Ÿคฌ โ˜  ๐Ÿ’€ โ˜ ๐Ÿ“‰
๐Ÿ“Š OffsetModel ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿ‘Œ ๐Ÿคฌ โ˜  โ— โ˜ ๐Ÿ“ˆ

Key Insights ๐Ÿ”‘

  1. Inconsistent R2 Scores: Almost all of the AutoReg and OffsetModel experiments yielded R2 scores that were either nonexistent (๐Ÿ™…โ€โ™‚๏ธ) or poor (๐Ÿ›‘). Only one OffsetModel experiment on a large dataset achieved an "Acceptable" R2 score (๐Ÿ‘Œ).

  2. Good Performance on Standard Errors: Across various test sizes and datasets, both AutoReg and OffsetModel generally performed "Very Acceptable" (๐Ÿ‘Œ) in terms of MAE, MSE, and RMSE metrics.

  3. Problematic MASE Scores: Every model configuration led to "Worse Than Naive Forecast Model" (๐Ÿคฌ) MASE scores. This suggests that these models might not be better than a simple naive forecast in certain aspects.

  4. Diverse MAPE and sMAPE Responses: The models varied significantly in their MAPE and sMAPE scores, ranging from "Very Acceptable" (๐Ÿ‘Œ) to "Exceedingly High" (โ˜  and ๐Ÿ’€), especially on sine ( sinโก(x)\sin(x) ) datasets.

  5. Bias Direction: The Directional Emojis indicate a tendency for the models to either overestimate (๐Ÿ“ˆ) or underestimate (๐Ÿ“‰) the values. The direction of bias appears consistent within the same dataset but varies between datasets.

  6. Complexity vs. Error: Larger test sizes didn't necessarily yield better error metrics. In fact, some larger test sizes led to "High" (โŒ) and even "Very High" (๐Ÿ’€) errors, as seen in the last row of the table.

  7. Dataset Sensitivity: The models' performance was noticeably different between the sine ( sinโก(x)\sin(x) ) and cosine ( cosโก(x)\cos(x) ) datasets, showing that dataset characteristics heavily influence metric values.

  8. Best Scenario: If one had to pick, OffsetModel with a large dataset and the sine function ( sinโก(x)\sin(x) ) yielded a balanced outcome, achieving "Acceptable" (๐Ÿ‘Œ) ratings in almost all metrics, barring MASE (๐Ÿคฌ).

  9. Limitations & Risks: It's important to remember that these experiments used synthetic data and specific models; thus, the results may not be universally applicable. Caution should be exercised when generalizing these insights.

Please note that these insights are derived from synthetic data and controlled experiments. They are intended to offer a glimpse into the behavior of different metrics and should be used with caution in practical applications.

Points for Critique ๐Ÿค”

I'm all ears for any constructive feedback on various fronts:

Did I get the interpretation of these metrics right?
Are there any hidden biases that I might have missed?
Is there a more suitable metric that should be on my radar?
Did you spot a typo? Yes, those bother me too.

Digging into metrics is a lot like treasure hunting; you don't really know what you've got until you put it under the microscope. That's why I'm so eager to get your feedback. I've listed a few questions above, but let's delve a bit deeper.

  • Interpretation of Metrics: I've given my best shot at understanding these metrics, but it's entirely possible that I've overlooked some nuances. If you think I've missed the mark or if you have a different angle, I'm keen to hear it.
  • Potential Biases: When you're neck-deep in numbers, it's easy to develop tunnel vision and miss out on the bigger picture. Have I fallen into this trap? Your external perspective could provide invaluable insights.
  • Alternative Metrics: While I've focused on some of the most commonly used metrics, the field is vast. If there's a gem I've missed, do let me know. I'm always up for adding another tool to my analytical toolbox.
  • Typos and Errors: Mistakes are the bane of any data scientist's existence, and not just in code. If you've spotted a typo, I'd appreciate the heads up. After all, clarity is key when it comes to complex topics like this.

So, am I on the right track, or is there room for improvement?

Your input could be the missing puzzle piece in my metrics exploration journey.

Conclusion ๐Ÿคโœ…

So there it isโ€”my metric safari in a nutshell. It's been an enlightening experience for me, and I hope it shines some light for you too. I'm still on the learning curve, and I'd love to hear your thoughts. Whether it's a critique or a thumbs-up, all feedback is golden.

If this sparked your curiosity, let's keep the conversation going. Feel free to drop a comment below, write your own post in response, or reach out to me directly at my link. If you'd like to delve deeper, the full summary of my findings is available here. Better yet, why not conduct your own investigations? I'd be thrilled to see where you take it. You can follow my progress and check out my portfolio repository here

GitHub logo ranggakd / DAIly

A bunch of Data analysis +AI notebooks I'd worked on almost a daiLY basis

DAIly

A bunch of Data Analysis and Artificial Intelligence notebooks ๐Ÿค– I'd worked on almost a daiLY basis ๐Ÿ‘จโ€๐Ÿ’ป

Ideas

This directory might contain notes or outlines of potential data analysis or AI projects that I'm considering working on in the future. These might be in the form of brainstorming notebooks, rough outlines powerpoint of project ideas, or notes on interesting data sources or tools that I want to explore further

Goodbye Average Rating System Hello Helpful Rating System

Redefining the average rating system by factoring in people's feedback

Regression and Forecasting Metrics Exploration

Navigating the maze of regression and forecasting metrics to understand their behavior and implications

back to โฌ†

Tips

This directory might contain more practical information, such as code snippets or tutorials that I've found helpful in my data analysis and AI work. These could be tips on how to use specific libraries
โ€ฆ

Let's keep exploring and innovating together!

Check out all the links on my beacons.ai page

Top comments (2)

Collapse
 
biomathcode profile image
Pratik sharma

Such a nice article man, thank you so much

Collapse
 
ranggakd profile image
Retiago Drago

Thanks, you could share it so people could find this useful like you do! ๐Ÿค—
Here's the Medium version if you like