Machine Learning models are often judged by numbers, but many beginners (and even practitioners) misunderstand what those numbers actually mean. A model showing 95% accuracy might still be useless in real-world scenarios.
In this post, we’ll break down:
- Types of errors in Machine Learning
- Confusion Matrix
- Accuracy
- Precision
- Recall
- F1 Score
All explained intuitively, with examples you can confidently explain in interviews or apply in projects.
1️⃣ Types of Errors in Machine Learning
In a classification problem, predictions fall into four categories:
| Actual \ Predicted | Positive | Negative |
|---|---|---|
| Positive | True Positive (TP) | False Negative (FN) |
| Negative | False Positive (FP) | True Negative (TN) |
🔴 False Positive (Type I Error)
- Model predicts Positive, but the actual result is Negative
- Example: Email marked as Spam but it is actually Not Spam
🔵 False Negative (Type II Error)
- Model predicts Negative, but the actual result is Positive
- Example: Medical test says No Disease but the patient actually has it
These errors directly impact evaluation metrics.
2️⃣ Confusion Matrix (The Foundation)
A confusion matrix summarizes prediction results:
Predicted
P N
Actual P TP FN
Actual N FP TN
All metrics are derived from this table.
3️⃣ Accuracy
📌 Definition
Accuracy measures how often the model is correct.
📐 Formula
Accuracy = (TP + TN) / (TP + FP + FN + TN)
❗ Problem with Accuracy
Accuracy can be misleading in imbalanced datasets.
Example:
- 99 normal patients
- 1 patient with disease
- Model predicts No Disease for everyone
Accuracy = 99%, but the model is dangerous.
👉 Accuracy alone is not enough.
4️⃣ Precision
📌 Definition
Precision answers:
Of all predicted positives, how many are actually positive?
📐 Formula
Precision = TP / (TP + FP)
🎯 When to focus on Precision?
When False Positives are costly.
Examples:
- Spam detection
- Fraud detection
You don’t want to wrongly flag legitimate cases.
5️⃣ Recall (Sensitivity)
📌 Definition
Recall answers:
Of all actual positives, how many did the model correctly identify?
📐 Formula
Recall = TP / (TP + FN)
🎯 When to focus on Recall?
When False Negatives are dangerous.
Examples:
- Cancer detection
- Accident detection
Missing a positive case can have severe consequences.
6️⃣ Precision vs Recall Tradeoff
Increasing Precision often decreases Recall, and vice versa.
| Scenario | Priority |
|---|---|
| Spam Filter | Precision |
| Disease Detection | Recall |
| Fraud Detection | Recall |
This tradeoff leads us to F1 Score.
7️⃣ F1 Score
📌 Definition
F1 Score is the harmonic mean of Precision and Recall.
📐 Formula
F1 = 2 × (Precision × Recall) / (Precision + Recall)
✅ Why F1 Score?
- Balances Precision & Recall
- Best for imbalanced datasets
- Penalizes extreme values
If either Precision or Recall is low, F1 score drops sharply.
8️⃣ Summary Table
| Metric | Best Used When | Focus |
|---|---|---|
| Accuracy | Balanced data | Overall correctness |
| Precision | FP costly | Prediction quality |
| Recall | FN costly | Detection completeness |
| F1 Score | Imbalanced data | Balanced performance |
9️⃣ Real-World Case Studies
Understanding metrics becomes much clearer when we map them to real-world problems. Below are some common and interview-relevant case studies.
🏥 Case Study 1: Disease Detection (Cancer / COVID)
Scenario:
A model predicts whether a patient has a disease.
Critical Error: False Negative (FN)
- Predicting Healthy when the patient actually has the disease
Why Recall Matters More:
- Missing a sick patient can delay treatment and cost lives
- It is acceptable to have some false alarms (FPs), but not to miss real cases
✅ Primary Metric: Recall
In healthcare, we prioritize recall over precision.
💳 Case Study 2: Credit Card Fraud Detection
Scenario:
The model identifies fraudulent transactions.
Critical Error: False Negative (FN)
- Fraud transaction marked as legitimate
Tradeoff:
- Too many false positives annoy customers.
- Too many false negatives cause financial loss.
Here if we optimize only for Recall then even the slightest unusual transaction would be considered fraud. If we use Precision, then it would only detect fraud if it were very confident.
✅ Best Metric: F1 Score
F1 balances customer experience and fraud prevention.
📧 Case Study 3: Spam Email Detection
Scenario:
Classifying emails as Spam or Not Spam.
Critical Error: False Positive (FP)
- Important email marked as spam
Why Precision Matters:
- Users may miss critical emails (job offers, OTPs, invoices)
✅ Primary Metric: Precision
🚗 Case Study 4: Autonomous Driving (Pedestrian Detection)
Scenario:
Detecting pedestrians using camera and sensor data.
Critical Error: False Negative (FN)
- Pedestrian not detected
Why Recall is Crucial:
- Missing even one pedestrian can be fatal
✅ Primary Metric: Recall
🏭 Case Study 5: Manufacturing Defect Detection
Scenario:
Detecting defective products on an assembly line.
Critical Error Depends On:
- High FP → Waste and increased cost
- High FN → Faulty product reaches customer
Balanced Approach:
- Use Precision + Recall together
✅ Best Metric: F1 Score
🔚 Final Thoughts
Never blindly trust accuracy.
Always ask:
- What kind of error is more dangerous?
- Is my dataset imbalanced?
- What is the real-world cost of FP vs FN?
Understanding these metrics makes you a better ML engineer, not just a model builder.
If this helped you, feel free to share or comment your favorite ML pitfall!!
Top comments (0)