DEV Community

Cover image for Day 5 : Is Your Model Actually Good? - Evaluation Metrics
Chanchal Singh
Chanchal Singh

Posted on

Day 5 : Is Your Model Actually Good? - Evaluation Metrics

You prepare for an exam.

You give a mock test.
You get 72 marks.

Now the real question is not:

“Did I pass or fail?”

The real question is:

“How good is 72?”

Is it:

  • Better than before?
  • Good enough?
  • Just lucky?

That’s exactly what model evaluation is about.


Why We Need Evaluation

A model can always give predictions.

But prediction alone means nothing.

We must ask:

  • Can I trust this model?
  • Will it work on new data?
  • Is it learning patterns or memorizing data?

Evaluation answers these questions.


R-squared (R²): The Most Popular Metric

Imagine this.

You’re trying to predict house prices.

Before using ML, your best guess is:

“All houses cost around ₹50 lakh.”

That’s your baseline.

Now your model predicts different prices for different houses.

R square visualization

R² asks:

“How much better is your model compared to this dumb guess?”


R² in simple words

R² tells how much of the problem your model explains.

R² demonstration

  • R² = 0.80 → model explains 80% pattern
  • R² = 0.20 → model explains very little
  • R² = 1 → perfect (rare, suspicious)

Important Truth About R²

High R² does not always mean good model.

Why?

  • It can overfit
  • It can memorize
  • It can fail on new data

That’s why we never trust R² alone.


Residuals: Listening to the Model’s Mistakes

Residual = actual value − predicted value.

Think of residuals as model’s complaints.

If residuals look:

  • Random → model is healthy
  • Patterned → model is missing something

Residual plots help us see:

“Is the model behaving logically?”


Standard Error (SE): How Confident Is the Model?

Imagine two friends predicting house prices.

Friend A:

  • Usually wrong by ₹5,000

Friend B:

  • Usually wrong by ₹50,000

Who do you trust more?

Standard Error tells:

“On average, how far predictions are from truth.”

Lower SE = more reliable model.


Train vs Test Performance (Very Important)

If:

  • Training accuracy is very high
  • Testing accuracy is low

That means:

Model memorized instead of learning.

This is how we detect overfitting.

Like a student who learns answers by heart but fails when questions change.

This problem is called overfitting — the model knows the past too well,
but can’t handle anything new.


Tiny Real-Life Thought 🧠

If someone always scores high in practice tests
but fails in the real exam —

you know something is wrong.

Same with ML models.


3-Line Takeaway

  • Evaluation tells if model is trustworthy
  • R² shows explained variation
  • SE shows prediction reliability

What’s Coming Next 👀

Now the big question:

Why do some models fail even when metrics look good?

That leads us to:

👉 Day 6 — Why Linear Regression Breaks (Assumptions & Multicollinearity)

I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!

Connect on Linkedin: https://www.linkedin.com/in/chanchalsingh22/
Connect on YouTube: https://www.youtube.com/@Brains_Behind_Bots

Top comments (0)