Dron Bhattacharya

Posted on Feb 13 • Edited on Feb 14

R2-Score Explained

#machinelearning #ai #datascience

Learning Outcomes

Define the R2-score and its role in regression analysis
Interpret numerical R2 values to assess model fit
Identify specific scenarios where R2 is helpful or misleading.

Why do we need R2-score?

Imagine we are asked to guess the CO2 emissions of a car we have never seen. Without any data (eg. engine size, fuel consumption), our safest bet is to guess the average of all known cars. Which means, for all cars we always predict the mean CO2 emission.

But can prediction be better than the mean? Yes

We can build a linear regression model which uses fuel consumption as a feature to predict the CO2 emissions. Now our predictions are much closer to the actual values than just guessing the mean!

But how can we quantify how much it has improved with respect to the mean? This is exactly what the R2-score tells us.

A score of 1.0 indicates a perfect fit, where every data point sits exactly on the regression line. Conversely, a score of 0.0 suggests that the model is no better at predicting the outcome than simply guessing the average value every single time. And if the model is somehow worse than guessing the average, R2-score becomes negative.

Formula

R² = 1 - (Variance remaining with model / Total variance)

Components:

Total Variance = How spread out the actual values are from their mean
- Squared differences: (actual - mean)²
Variance Remaining with Model = How spread out the prediction errors are
- Squared errors: (actual - predicted)²

How to interpret R2-score?

To truly grasp the R2-score, we must understand Variance. Variance measures how far the data points are "spread out" from the mean.

Let's assume that we got R2-score=0.73 using Engine-size as a feature to predict CO2.

What does this mean?

It means, that the model was able to capture 73% of the influence of all the features that makes the target variable change. The other 27% of the uncaptured influence can be other features like fuel consumption, number of cylinders or fuel type. So if there is a change in the target variable caused by this 27% of left out influence, our model will not be able to predict that.
From that statement we can also infer that the feature Engine-size has 73% influence on the target variable CO2 emission

When to use R2-score and what are its limitations?

The R2 score is a fantastic tool for comparing different models on the same dataset. We can quickly see if "Fuel Consumption" is a stronger predictor than "Engine Size". If Model A has an R2 of 0.92 and Model B has 0.75, Model A is the clear winner.

But note that R2 is not suitable for comparing models across different datasets.

Can we use R2 score for Non-Linear models? While R2 can be technically calculated for non-linear regression, it is fundamentally designed for linear models and may not accurately reflect the fit of non-linear models, making other metrics more appropriate.

Another very important thing to remember is that a high R2 does not guarantee that the model is accurate enough for real-world use. It is a relative measure, not an absolute one. The model might explain 99% of the variance, but if the remaining 1% results in an RSME of ±500 tons of CO2, the model might still be useless.

While R2 tells us how well the model fits the trend, RMSE tells us how big the actual error is, in actual units.

Summary

The R2 score measures how much better a model is compared to a "dumb" baseline that just guesses the average. A score of 0 means no value is added; a negative score means things got worse and a score between 0 and 1 means the model accounts for variance better than the average.
Think of R2 as a percentage of the data's "wiggle" that a model understands. If R2 is 0.80, the model captures 80% of the reasons why the target variable changes.
Never rely on R2 alone. It tells about correlation and fit, but not about prediction precision. Always validate high scores with error metrics like RMSE to ensure the model is practical.
High R2 on training data doesn't guarantee good predictions outside the data range.
R2-Score works best for linear models.

Review Questions

If a model has an R2 score of -0.5, what does that imply about its predictive power compared to a simple mean?
Why is an R2 of 1.0 often a red flag for "overfitting" in real-world data?
Why might a model with a high R2 still be dangerous to deploy in a high-stakes environment (like healthcare)?

DEV Community