How to Evaluate ML Models Step by Step

📘 Model Evaluation: Accuracy, Precision, Recall, and Cross-Validation
✅ Why Model Evaluation?
When you build a machine learning model, you need to check how good it is.

Evaluation metrics tell you:

Is the model making correct predictions?
Does it generalize well to new data?
Is it biased toward certain classes?

🔍 1. Accuracy
Definition:
The percentage of predictions your model got right.
Formula:

Example:

If your model predicts 90 out of 100 correctly → Accuracy = 90%.
Code Example:
from sklearn.metrics import accuracy_score

y_true = [1, 0, 1, 1, 0]
y_pred = [1, 0, 1, 0, 0]

accuracy = accuracy_score(y_true, y_pred)
print("Accuracy:", accuracy)

🔍 2. Precision
Definition:
Of all the items predicted as positive, how many were actually positive?
Formula:

Why Important:
When false positives are costly (e.g., predicting spam emails incorrectly).
Example:
If your model predicts 10 positives, but only 8 are correct → Precision = 8/10 = 0.8.
Code Example:

from sklearn.metrics import precision_score

precision = precision_score(y_true, y_pred)
print("Precision:", precision)

🔍 3. Recall
Definition:
Of all actual positives, how many did the model correctly identify?
Formula:

Why Important:
When missing positives is costly (e.g., detecting diseases).
Example:
If there are 12 actual positives and your model finds 8 → Recall = 8/12 = 0.67.
Code Example:

from sklearn.metrics import recall_score

recall = recall_score(y_true, y_pred)
print("Recall:", recall)

✅ Precision vs Recall

Precision: How accurate are your positive predictions?
Recall: How many actual positives did you find?

🔍 4. F1-Score
Definition:
The harmonic mean of Precision and Recall.

Why Important:
Balances both metrics when you need a single score.
Code Example:

from sklearn.metrics import f1_score

f1 = f1_score(y_true, y_pred)
print("F1 Score:", f1)

🔍 5. Cross-Validation
Definition:
Instead of testing your model on one split, test it on multiple splits.
Why:
Ensures your model works well on different subsets and isn’t just lucky.
Method:
k-fold cross-validation (split data into k parts, train/test k times).
Code Example:

from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
import numpy as np

model = RandomForestClassifier()
scores = cross_val_score(model, X, y, cv=5)
print("Cross-Validation Scores:", scores)
print("Average Score:", np.mean(scores))

✅ Visual Example
Imagine a confusion matrix:

Accuracy = (TP + TN) / Total
Precision = TP / (TP + FP)
Recall = TP / (TP + FN)

✅ Real-Life Analogy
Accuracy: How often you guess correctly overall.
Precision: When you say “yes,” how often you’re right.
Recall: How many “yes” cases you actually find.
Cross-Validation: Testing your recipe in different kitchens.

DEV Community

How to Evaluate ML Models Step by Step

Top comments (0)