DEV Community

Cover image for Understanding Errors in Machine Learning: Accuracy, Precision, Recall & F1 Score
Ananya S
Ananya S

Posted on

Understanding Errors in Machine Learning: Accuracy, Precision, Recall & F1 Score

Machine Learning models are often judged by numbers, but many beginners (and even practitioners) misunderstand what those numbers actually mean. A model showing 95% accuracy might still be useless in real-world scenarios.

In this post, we’ll break down:

  • Types of errors in Machine Learning
  • Confusion Matrix
  • Accuracy
  • Precision
  • Recall
  • F1 Score

All explained intuitively, with examples you can confidently explain in interviews or apply in projects.


1️⃣ Types of Errors in Machine Learning

In a classification problem, predictions fall into four categories:

Actual \ Predicted Positive Negative
Positive True Positive (TP) False Negative (FN)
Negative False Positive (FP) True Negative (TN)

🔴 False Positive (Type I Error)

  • Model predicts Positive, but the actual result is Negative
  • Example: Email marked as Spam but it is actually Not Spam

🔵 False Negative (Type II Error)

  • Model predicts Negative, but the actual result is Positive
  • Example: Medical test says No Disease but the patient actually has it

These errors directly impact evaluation metrics.


2️⃣ Confusion Matrix (The Foundation)

A confusion matrix summarizes prediction results:

                Predicted
               P       N
Actual P      TP      FN
Actual N      FP      TN
Enter fullscreen mode Exit fullscreen mode

All metrics are derived from this table.


3️⃣ Accuracy

📌 Definition

Accuracy measures how often the model is correct.

📐 Formula

Accuracy = (TP + TN) / (TP + FP + FN + TN)
Enter fullscreen mode Exit fullscreen mode

❗ Problem with Accuracy

Accuracy can be misleading in imbalanced datasets.

Example:

  • 99 normal patients
  • 1 patient with disease
  • Model predicts No Disease for everyone

Accuracy = 99%, but the model is dangerous.

👉 Accuracy alone is not enough.


4️⃣ Precision

📌 Definition

Precision answers:

Of all predicted positives, how many are actually positive?

📐 Formula

Precision = TP / (TP + FP)
Enter fullscreen mode Exit fullscreen mode

🎯 When to focus on Precision?

When False Positives are costly.

Examples:

  • Spam detection
  • Fraud detection

You don’t want to wrongly flag legitimate cases.


5️⃣ Recall (Sensitivity)

📌 Definition

Recall answers:

Of all actual positives, how many did the model correctly identify?

📐 Formula

Recall = TP / (TP + FN)
Enter fullscreen mode Exit fullscreen mode

🎯 When to focus on Recall?

When False Negatives are dangerous.

Examples:

  • Cancer detection
  • Accident detection

Missing a positive case can have severe consequences.


6️⃣ Precision vs Recall Tradeoff

Increasing Precision often decreases Recall, and vice versa.

Scenario Priority
Spam Filter Precision
Disease Detection Recall
Fraud Detection Recall

This tradeoff leads us to F1 Score.


7️⃣ F1 Score

📌 Definition

F1 Score is the harmonic mean of Precision and Recall.

📐 Formula

F1 = 2 × (Precision × Recall) / (Precision + Recall)
Enter fullscreen mode Exit fullscreen mode

✅ Why F1 Score?

  • Balances Precision & Recall
  • Best for imbalanced datasets
  • Penalizes extreme values

If either Precision or Recall is low, F1 score drops sharply.


8️⃣ Summary Table

Metric Best Used When Focus
Accuracy Balanced data Overall correctness
Precision FP costly Prediction quality
Recall FN costly Detection completeness
F1 Score Imbalanced data Balanced performance

9️⃣ Real-World Case Studies

Understanding metrics becomes much clearer when we map them to real-world problems. Below are some common and interview-relevant case studies.


🏥 Case Study 1: Disease Detection (Cancer / COVID)

Scenario:
A model predicts whether a patient has a disease.

Critical Error: False Negative (FN)

  • Predicting Healthy when the patient actually has the disease

Why Recall Matters More:

  • Missing a sick patient can delay treatment and cost lives
  • It is acceptable to have some false alarms (FPs), but not to miss real cases

Primary Metric: Recall

In healthcare, we prioritize recall over precision.


💳 Case Study 2: Credit Card Fraud Detection

Scenario:
The model identifies fraudulent transactions.

Critical Error: False Negative (FN)

  • Fraud transaction marked as legitimate

Tradeoff:

  • Too many false positives annoy customers
  • Too many false negatives cause financial loss

Best Metric: F1 Score

F1 balances customer experience and fraud prevention.


📧 Case Study 3: Spam Email Detection

Scenario:
Classifying emails as Spam or Not Spam.

Critical Error: False Positive (FP)

  • Important email marked as spam

Why Precision Matters:

  • Users may miss critical emails (job offers, OTPs, invoices)

Primary Metric: Precision


🚗 Case Study 4: Autonomous Driving (Pedestrian Detection)

Scenario:
Detecting pedestrians using camera and sensor data.

Critical Error: False Negative (FN)

  • Pedestrian not detected

Why Recall is Crucial:

  • Missing even one pedestrian can be fatal

Primary Metric: Recall


🏭 Case Study 5: Manufacturing Defect Detection

Scenario:
Detecting defective products on an assembly line.

Critical Error Depends On:

  • High FP → Waste and increased cost
  • High FN → Faulty product reaches customer

Balanced Approach:

  • Use Precision + Recall together

Best Metric: F1 Score


🔚 Final Thoughts

Never blindly trust accuracy.
Always ask:

  • What kind of error is more dangerous?
  • Is my dataset imbalanced?
  • What is the real-world cost of FP vs FN?

Understanding these metrics makes you a better ML engineer, not just a model builder.


If this helped you, feel free to share or comment your favorite ML pitfall!!

Top comments (0)