Machine Learning models are often judged by numbers, but many beginners (and even practitioners) misunderstand what those numbers actually mean. A model showing 95% accuracy might still be useless in real-world scenarios.
In this post, we’ll break down:
- Types of errors in Machine Learning
- Confusion Matrix
- Accuracy
- Precision
- Recall
- F1 Score
All explained intuitively, with examples you can confidently explain in interviews or apply in projects.
1️⃣ Types of Errors in Machine Learning
In a classification problem, predictions fall into four categories:
| Actual \ Predicted | Positive | Negative |
|---|---|---|
| Positive | True Positive (TP) | False Negative (FN) |
| Negative | False Positive (FP) | True Negative (TN) |
🔴 False Positive (Type I Error)
- Model predicts Positive, but the actual result is Negative
- Example: Email marked as Spam but it is actually Not Spam
🔵 False Negative (Type II Error)
- Model predicts Negative, but the actual result is Positive
- Example: Medical test says No Disease but the patient actually has it
These errors directly impact evaluation metrics.
2️⃣ Confusion Matrix (The Foundation)
A confusion matrix summarizes prediction results:
Predicted
P N
Actual P TP FN
Actual N FP TN
All metrics are derived from this table.
3️⃣ Accuracy
📌 Definition
Accuracy measures how often the model is correct.
📐 Formula
Accuracy = (TP + TN) / (TP + FP + FN + TN)
❗ Problem with Accuracy
Accuracy can be misleading in imbalanced datasets.
Example:
- 99 normal patients
- 1 patient with disease
- Model predicts No Disease for everyone
Accuracy = 99%, but the model is dangerous.
👉 Accuracy alone is not enough.
4️⃣ Precision
📌 Definition
Precision answers:
Of all predicted positives, how many are actually positive?
📐 Formula
Precision = TP / (TP + FP)
🎯 When to focus on Precision?
When False Positives are costly.
Examples:
- Spam detection
- Fraud detection
You don’t want to wrongly flag legitimate cases.
5️⃣ Recall (Sensitivity)
📌 Definition
Recall answers:
Of all actual positives, how many did the model correctly identify?
📐 Formula
Recall = TP / (TP + FN)
🎯 When to focus on Recall?
When False Negatives are dangerous.
Examples:
- Cancer detection
- Accident detection
Missing a positive case can have severe consequences.
6️⃣ Precision vs Recall Tradeoff
Increasing Precision often decreases Recall, and vice versa.
| Scenario | Priority |
|---|---|
| Spam Filter | Precision |
| Disease Detection | Recall |
| Fraud Detection | Recall |
This tradeoff leads us to F1 Score.
7️⃣ F1 Score
📌 Definition
F1 Score is the harmonic mean of Precision and Recall.
📐 Formula
F1 = 2 × (Precision × Recall) / (Precision + Recall)
✅ Why F1 Score?
- Balances Precision & Recall
- Best for imbalanced datasets
- Penalizes extreme values
If either Precision or Recall is low, F1 score drops sharply.
8️⃣ Summary Table
| Metric | Best Used When | Focus |
|---|---|---|
| Accuracy | Balanced data | Overall correctness |
| Precision | FP costly | Prediction quality |
| Recall | FN costly | Detection completeness |
| F1 Score | Imbalanced data | Balanced performance |
9️⃣ Real-World Case Studies
Understanding metrics becomes much clearer when we map them to real-world problems. Below are some common and interview-relevant case studies.
🏥 Case Study 1: Disease Detection (Cancer / COVID)
Scenario:
A model predicts whether a patient has a disease.
Critical Error: False Negative (FN)
- Predicting Healthy when the patient actually has the disease
Why Recall Matters More:
- Missing a sick patient can delay treatment and cost lives
- It is acceptable to have some false alarms (FPs), but not to miss real cases
✅ Primary Metric: Recall
In healthcare, we prioritize recall over precision.
💳 Case Study 2: Credit Card Fraud Detection
Scenario:
The model identifies fraudulent transactions.
Critical Error: False Negative (FN)
- Fraud transaction marked as legitimate
Tradeoff:
- Too many false positives annoy customers
- Too many false negatives cause financial loss
✅ Best Metric: F1 Score
F1 balances customer experience and fraud prevention.
📧 Case Study 3: Spam Email Detection
Scenario:
Classifying emails as Spam or Not Spam.
Critical Error: False Positive (FP)
- Important email marked as spam
Why Precision Matters:
- Users may miss critical emails (job offers, OTPs, invoices)
✅ Primary Metric: Precision
🚗 Case Study 4: Autonomous Driving (Pedestrian Detection)
Scenario:
Detecting pedestrians using camera and sensor data.
Critical Error: False Negative (FN)
- Pedestrian not detected
Why Recall is Crucial:
- Missing even one pedestrian can be fatal
✅ Primary Metric: Recall
🏭 Case Study 5: Manufacturing Defect Detection
Scenario:
Detecting defective products on an assembly line.
Critical Error Depends On:
- High FP → Waste and increased cost
- High FN → Faulty product reaches customer
Balanced Approach:
- Use Precision + Recall together
✅ Best Metric: F1 Score
🔚 Final Thoughts
Never blindly trust accuracy.
Always ask:
- What kind of error is more dangerous?
- Is my dataset imbalanced?
- What is the real-world cost of FP vs FN?
Understanding these metrics makes you a better ML engineer, not just a model builder.
If this helped you, feel free to share or comment your favorite ML pitfall!!
Top comments (0)