Ananya S

Posted on Dec 16, 2025 • Edited on Dec 22, 2025

Understanding Errors in Machine Learning: Accuracy, Precision, Recall & F1 Score

#python #beginners #machinelearning #ai

Machine Learning models are often judged by numbers, but many beginners (and even practitioners) misunderstand what those numbers actually mean. A model showing 95% accuracy might still be useless in real-world scenarios.

In this post, we’ll break down:

Types of errors in Machine Learning
Confusion Matrix
Accuracy
Precision
Recall
F1 Score

All explained intuitively, with examples you can confidently explain in interviews or apply in projects.

1️⃣ Types of Errors in Machine Learning

In a classification problem, predictions fall into four categories:

Actual \ Predicted	Positive	Negative
Positive	True Positive (TP)	False Negative (FN)
Negative	False Positive (FP)	True Negative (TN)

🔴 False Positive (Type I Error)

Model predicts Positive, but the actual result is Negative
Example: Email marked as Spam but it is actually Not Spam

🔵 False Negative (Type II Error)

Model predicts Negative, but the actual result is Positive
Example: Medical test says No Disease but the patient actually has it

These errors directly impact evaluation metrics.

2️⃣ Confusion Matrix (The Foundation)

A confusion matrix summarizes prediction results:

                Predicted
               P       N
Actual P      TP      FN
Actual N      FP      TN

All metrics are derived from this table.

3️⃣ Accuracy

📌 Definition

Accuracy measures how often the model is correct.

📐 Formula

Accuracy = (TP + TN) / (TP + FP + FN + TN)

❗ Problem with Accuracy

Accuracy can be misleading in imbalanced datasets.

Example:

99 normal patients
1 patient with disease
Model predicts No Disease for everyone

Accuracy = 99%, but the model is dangerous.

👉 Accuracy alone is not enough.

4️⃣ Precision

📌 Definition

Precision answers:

Of all predicted positives, how many are actually positive?

📐 Formula

Precision = TP / (TP + FP)

🎯 When to focus on Precision?

When False Positives are costly.

Examples:

Spam detection
Fraud detection

You don’t want to wrongly flag legitimate cases.

5️⃣ Recall (Sensitivity)

📌 Definition

Recall answers:

Of all actual positives, how many did the model correctly identify?

📐 Formula

Recall = TP / (TP + FN)

🎯 When to focus on Recall?

When False Negatives are dangerous.

Examples:

Cancer detection
Accident detection

Missing a positive case can have severe consequences.

6️⃣ Precision vs Recall Tradeoff

Increasing Precision often decreases Recall, and vice versa.

Scenario	Priority
Spam Filter	Precision
Disease Detection	Recall
Fraud Detection	Recall

This tradeoff leads us to F1 Score.

7️⃣ F1 Score

📌 Definition

F1 Score is the harmonic mean of Precision and Recall.

📐 Formula

F1 = 2 × (Precision × Recall) / (Precision + Recall)

✅ Why F1 Score?

Balances Precision & Recall
Best for imbalanced datasets
Penalizes extreme values

If either Precision or Recall is low, F1 score drops sharply.

8️⃣ Summary Table

Metric	Best Used When	Focus
Accuracy	Balanced data	Overall correctness
Precision	FP costly	Prediction quality
Recall	FN costly	Detection completeness
F1 Score	Imbalanced data	Balanced performance

9️⃣ Real-World Case Studies

Understanding metrics becomes much clearer when we map them to real-world problems. Below are some common and interview-relevant case studies.

🏥 Case Study 1: Disease Detection (Cancer / COVID)

Scenario:
A model predicts whether a patient has a disease.

Critical Error: False Negative (FN)

Predicting Healthy when the patient actually has the disease

Why Recall Matters More:

Missing a sick patient can delay treatment and cost lives
It is acceptable to have some false alarms (FPs), but not to miss real cases

✅ Primary Metric: Recall

In healthcare, we prioritize recall over precision.

💳 Case Study 2: Credit Card Fraud Detection

Scenario:
The model identifies fraudulent transactions.

Critical Error: False Negative (FN)

Fraud transaction marked as legitimate

Tradeoff:

Too many false positives annoy customers.
Too many false negatives cause financial loss.

Here if we optimize only for Recall then even the slightest unusual transaction would be considered fraud. If we use Precision, then it would only detect fraud if it were very confident.

✅ Best Metric: F1 Score

F1 balances customer experience and fraud prevention.

📧 Case Study 3: Spam Email Detection

Scenario:
Classifying emails as Spam or Not Spam.

Critical Error: False Positive (FP)

Important email marked as spam

Why Precision Matters:

Users may miss critical emails (job offers, OTPs, invoices)

✅ Primary Metric: Precision

🚗 Case Study 4: Autonomous Driving (Pedestrian Detection)

Scenario:
Detecting pedestrians using camera and sensor data.

Critical Error: False Negative (FN)

Pedestrian not detected

Why Recall is Crucial:

Missing even one pedestrian can be fatal

✅ Primary Metric: Recall

🏭 Case Study 5: Manufacturing Defect Detection

Scenario:
Detecting defective products on an assembly line.

Critical Error Depends On:

High FP → Waste and increased cost
High FN → Faulty product reaches customer

Balanced Approach:

Use Precision + Recall together

✅ Best Metric: F1 Score

🔚 Final Thoughts

Never blindly trust accuracy.
Always ask:

What kind of error is more dangerous?
Is my dataset imbalanced?
What is the real-world cost of FP vs FN?

Understanding these metrics makes you a better ML engineer, not just a model builder.

If this helped you, feel free to share or comment your favorite ML pitfall!!

DEV Community

Understanding Errors in Machine Learning: Accuracy, Precision, Recall & F1 Score

1️⃣ Types of Errors in Machine Learning

🔴 False Positive (Type I Error)

🔵 False Negative (Type II Error)

2️⃣ Confusion Matrix (The Foundation)

3️⃣ Accuracy

📌 Definition

📐 Formula

❗ Problem with Accuracy

Example:

4️⃣ Precision

📌 Definition

📐 Formula

🎯 When to focus on Precision?

Examples:

5️⃣ Recall (Sensitivity)

📌 Definition

📐 Formula

🎯 When to focus on Recall?

Examples:

6️⃣ Precision vs Recall Tradeoff

7️⃣ F1 Score

📌 Definition

📐 Formula

✅ Why F1 Score?

8️⃣ Summary Table

9️⃣ Real-World Case Studies

🏥 Case Study 1: Disease Detection (Cancer / COVID)

💳 Case Study 2: Credit Card Fraud Detection

📧 Case Study 3: Spam Email Detection

🚗 Case Study 4: Autonomous Driving (Pedestrian Detection)

🏭 Case Study 5: Manufacturing Defect Detection

🔚 Final Thoughts

Top comments (0)