Sachin Kr. Rajput

Posted on Jan 21

The Confusion Matrix: A Courtroom Drama Where Every Verdict Falls Into One of Four Boxes

#python #machinelearning #datascience #beginners

The One-Line Summary: A confusion matrix is a 2×2 table showing exactly HOW your model is right and HOW it's wrong — distinguishing between "said yes correctly," "said yes incorrectly," "said no correctly," and "said no incorrectly." Each type of error has different consequences.

The Four Verdicts of Justice

Judge Harrison had presided over 1,000 criminal trials in her career.

Every trial ended in one of four ways:

Verdict 1: Guilty Person Convicted ✓

Reality:     GUILTY
Verdict:     GUILTY
Outcome:     Justice served. Criminal behind bars.
Name:        TRUE POSITIVE (TP)

The system worked. A guilty person was correctly identified as guilty.

Verdict 2: Innocent Person Acquitted ✓

Reality:     INNOCENT
Verdict:     NOT GUILTY
Outcome:     Justice served. Free person stays free.
Name:        TRUE NEGATIVE (TN)

The system worked. An innocent person was correctly identified as innocent.

Verdict 3: Innocent Person Convicted ✗

Reality:     INNOCENT
Verdict:     GUILTY
Outcome:     DISASTER. Innocent person in prison.
Name:        FALSE POSITIVE (FP)
             Also called: "Type I Error"
             Legal term: "Wrongful conviction"

The system failed catastrophically. An innocent person was wrongly condemned.

Verdict 4: Guilty Person Acquitted ✗

Reality:     GUILTY
Verdict:     NOT GUILTY
Outcome:     FAILURE. Criminal walks free.
Name:        FALSE NEGATIVE (FN)
             Also called: "Type II Error"
             Legal term: "Guilty person escapes justice"

The system failed. A guilty person escaped punishment.

Judge Harrison's Career Summary

After 1,000 trials:

                        ACTUAL STATUS
                    Guilty      Innocent
                 ┌───────────┬───────────┐
      Guilty     │    180    │     30    │
VERDICT          │    TP     │    FP     │
                 │  (Correct)│ (Wrongful │
                 │           │conviction)│
                 ├───────────┼───────────┤
    Not Guilty   │     20    │    770    │
                 │    FN     │    TN     │
                 │(Criminal  │ (Correct  │
                 │ escaped)  │ acquittal)│
                 └───────────┴───────────┘

This is a confusion matrix.

It shows EXACTLY how the judge performed — not just "right vs wrong" but WHICH kind of right and WHICH kind of wrong.

Anatomy of a Confusion Matrix

                           ACTUAL CLASS
                      Positive     Negative
                    ┌────────────┬────────────┐
                    │            │            │
      Positive      │     TP     │     FP     │
                    │            │            │
PREDICTED           │  "Hit"     │  "False    │
CLASS               │            │   Alarm"   │
                    ├────────────┼────────────┤
                    │            │            │
      Negative      │     FN     │     TN     │
                    │            │            │
                    │  "Miss"    │  "Correct  │
                    │            │  Rejection"│
                    └────────────┴────────────┘

The Four Cells:

Cell	Name	Meaning	Good or Bad?
TP	True Positive	Predicted YES, was YES	✓ GOOD
TN	True Negative	Predicted NO, was NO	✓ GOOD
FP	False Positive	Predicted YES, was NO	✗ BAD (False alarm)
FN	False Negative	Predicted NO, was YES	✗ BAD (Missed it)

Reading the Matrix: Step by Step

Let's decode Judge Harrison's matrix:

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix

# Judge Harrison's 1000 trials
# Actual: 200 guilty (1), 800 innocent (0)
# Predicted verdicts
y_actual = [1]*200 + [0]*800
y_verdict = [1]*180 + [0]*20 + [1]*30 + [0]*770

# Create confusion matrix
cm = confusion_matrix(y_actual, y_verdict)
print("Confusion Matrix:")
print(cm)

Output:

Confusion Matrix:
[[770  30]
 [ 20 180]]

Wait, that looks different! Let me explain the orientation.

Orientation Matters!

Scikit-learn arranges it as:

                    PREDICTED
                  Neg (0)   Pos (1)
              ┌──────────┬──────────┐
    Neg (0)   │   TN     │    FP    │
ACTUAL        │   770    │    30    │
              ├──────────┼──────────┤
    Pos (1)   │   FN     │    TP    │
              │   20     │   180    │
              └──────────┴──────────┘

Reading guide:

Top-left (770): Actual Innocent, Predicted Innocent → TN (Correct acquittal)
Top-right (30): Actual Innocent, Predicted Guilty → FP (Wrongful conviction!)
Bottom-left (20): Actual Guilty, Predicted Innocent → FN (Criminal escaped!)
Bottom-right (180): Actual Guilty, Predicted Guilty → TP (Justice served)

Visualizing the Confusion Matrix

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# The confusion matrix
cm = np.array([[770, 30],
               [20, 180]])

# Create visualization
fig, ax = plt.subplots(figsize=(8, 6))

# Plot heatmap
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=['Innocent\n(Predicted)', 'Guilty\n(Predicted)'],
            yticklabels=['Innocent\n(Actual)', 'Guilty\n(Actual)'],
            annot_kws={'size': 16}, ax=ax)

# Add labels for each cell
cell_labels = [['TN\n(Correct Acquittal)', 'FP\n(Wrongful Conviction)'],
               ['FN\n(Criminal Escaped)', 'TP\n(Justice Served)']]

for i in range(2):
    for j in range(2):
        ax.text(j + 0.5, i + 0.75, cell_labels[i][j],
                ha='center', va='center', fontsize=9, color='gray')

plt.title('Judge Harrison\'s Confusion Matrix\n(1,000 Trials)', fontsize=14)
plt.ylabel('ACTUAL', fontsize=12)
plt.xlabel('PREDICTED', fontsize=12)
plt.tight_layout()
plt.savefig('confusion_matrix.png', dpi=150)
plt.show()

Deriving ALL Metrics from the Matrix

The confusion matrix is the source of truth. Every metric comes from it:

# The four values
TN, FP = 770, 30
FN, TP = 20, 180

# Total
total = TN + FP + FN + TP  # 1000

# === ACCURACY ===
# How often was the judge correct overall?
accuracy = (TP + TN) / total
print(f"Accuracy: {accuracy:.1%}")  # 95.0%

# === PRECISION ===
# When the judge said "guilty," how often was the person actually guilty?
precision = TP / (TP + FP)
print(f"Precision: {precision:.1%}")  # 85.7%

# === RECALL (Sensitivity) ===
# Of all the guilty people, how many did the judge correctly convict?
recall = TP / (TP + FN)
print(f"Recall: {recall:.1%}")  # 90.0%

# === SPECIFICITY ===
# Of all the innocent people, how many did the judge correctly acquit?
specificity = TN / (TN + FP)
print(f"Specificity: {specificity:.1%}")  # 96.3%

# === F1 SCORE ===
# Harmonic mean of precision and recall
f1 = 2 * (precision * recall) / (precision + recall)
print(f"F1 Score: {f1:.1%}")  # 87.8%

# === FALSE POSITIVE RATE ===
# Of all innocent people, how many were wrongly convicted?
fpr = FP / (FP + TN)
print(f"False Positive Rate: {fpr:.1%}")  # 3.8%

# === FALSE NEGATIVE RATE ===
# Of all guilty people, how many escaped justice?
fnr = FN / (FN + TP)
print(f"False Negative Rate: {fnr:.1%}")  # 10.0%

Output:

Accuracy: 95.0%
Precision: 85.7%
Recall: 90.0%
Specificity: 96.3%
F1 Score: 87.8%
False Positive Rate: 3.8%
False Negative Rate: 10.0%

The Visual Cheat Sheet

                           ACTUAL
                    Positive     Negative
                   ┌────────────┬────────────┐
                   │            │            │
     Positive      │     TP     │     FP     │──► Precision = TP/(TP+FP)
                   │            │            │    "When I say YES, am I right?"
    PREDICTED      ├────────────┼────────────┤
                   │            │            │
     Negative      │     FN     │     TN     │
                   │            │            │
                   └────────────┴────────────┘
                         │            │
                         ▼            ▼
                      Recall      Specificity
                    TP/(TP+FN)    TN/(TN+FP)
                    "Did I find   "Did I correctly
                     all YES?"    reject all NO?"


        Accuracy = (TP + TN) / Total = Diagonal / Everything

        F1 = 2 × (Precision × Recall) / (Precision + Recall)

Why Both Errors Matter (Differently)

Let's compare two judges:

Judge A: "Better Safe Than Sorry"

Convicts cautiously. Would rather let guilty go free than convict innocent.

                    PREDICTED
                  Innocent   Guilty
              ┌──────────┬──────────┐
    Innocent  │   790    │    10    │  ← Only 10 wrongful convictions!
ACTUAL        ├──────────┼──────────┤
    Guilty    │   100    │   100    │  ← But 100 criminals escaped!
              └──────────┴──────────┘

Accuracy: 89%
Precision: 90.9% (When guilty verdict, usually correct)
Recall: 50% (Only half of criminals caught!)

Judge B: "Zero Tolerance"

Convicts aggressively. Would rather wrongly convict than let criminal escape.

                    PREDICTED
                  Innocent   Guilty
              ┌──────────┬──────────┐
    Innocent  │   650    │   150    │  ← 150 wrongful convictions!
ACTUAL        ├──────────┼──────────┤
    Guilty    │    10    │   190    │  ← Only 10 criminals escaped!
              └──────────┴──────────┘

Accuracy: 84%
Precision: 55.9% (Many guilty verdicts are wrong!)
Recall: 95% (Almost all criminals caught!)

Same job. Different philosophies. Different errors.

Metric	Judge A	Judge B	What it means
Accuracy	89%	84%	Overall correctness
Precision	90.9%	55.9%	Trust in guilty verdict
Recall	50%	95%	Criminals caught
FP (Wrongful)	10	150	Innocent in prison
FN (Escaped)	100	10	Criminals free

Which is better? Depends on your values!

Criminal justice system: "Better 10 guilty go free than 1 innocent suffer" → Judge A
Airport security: "Can't let any threat through" → Judge B philosophy

Multi-Class Confusion Matrices

Real problems often have more than 2 classes:

from sklearn.metrics import confusion_matrix, classification_report
import seaborn as sns
import matplotlib.pyplot as plt

# Animal classifier: Cat vs Dog vs Bird
y_true = ['cat']*100 + ['dog']*100 + ['bird']*100
y_pred = (['cat']*85 + ['dog']*10 + ['bird']*5 +    # True cats
          ['cat']*15 + ['dog']*80 + ['bird']*5 +    # True dogs
          ['cat']*5  + ['dog']*10 + ['bird']*85)    # True birds

# Create confusion matrix
labels = ['cat', 'dog', 'bird']
cm = confusion_matrix(y_true, y_pred, labels=labels)

print("Confusion Matrix:")
print(cm)
print("\nClassification Report:")
print(classification_report(y_true, y_pred))

Output:

Confusion Matrix:
[[85 10  5]
 [15 80  5]
 [ 5 10 85]]

Classification Report:
              precision    recall  f1-score   support

         cat       0.81      0.85      0.83       100
         dog       0.80      0.80      0.80       100
        bird       0.89      0.85      0.87       100

    accuracy                           0.83       300
   macro avg       0.84      0.83      0.83       300
weighted avg       0.84      0.83      0.83       300

Reading the Multi-Class Matrix

                         PREDICTED
                    Cat      Dog      Bird
                ┌────────┬────────┬────────┐
         Cat    │   85   │   10   │    5   │  ← 85 cats correct
                │        │        │        │     10 cats called dogs
   ACTUAL       │        │        │        │      5 cats called birds
                ├────────┼────────┼────────┤
         Dog    │   15   │   80   │    5   │  ← 15 dogs called cats
                │        │        │        │     80 dogs correct
                │        │        │        │      5 dogs called birds
                ├────────┼────────┼────────┤
        Bird    │    5   │   10   │   85   │  ←  5 birds called cats
                │        │        │        │     10 birds called dogs
                │        │        │        │     85 birds correct
                └────────┴────────┴────────┘

Diagonal = Correct predictions
Off-diagonal = Errors (which class confused with which)

Insight: Dogs and cats get confused with each other more than with birds!
         (15 + 10 cat-dog confusions vs 5 + 5 bird-cat, 5 + 10 bird-dog)

Visualizing Multi-Class

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Confusion matrix
cm = np.array([[85, 10, 5],
               [15, 80, 5],
               [5, 10, 85]])

labels = ['Cat', 'Dog', 'Bird']

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=labels, yticklabels=labels,
            annot_kws={'size': 14})

plt.title('Animal Classifier Confusion Matrix', fontsize=14)
plt.ylabel('Actual', fontsize=12)
plt.xlabel('Predicted', fontsize=12)

# Add percentage annotations
for i in range(3):
    for j in range(3):
        total_actual = cm[i].sum()
        pct = cm[i, j] / total_actual * 100
        plt.text(j + 0.5, i + 0.7, f'({pct:.0f}%)',
                ha='center', va='center', fontsize=9, color='gray')

plt.tight_layout()
plt.savefig('multiclass_cm.png', dpi=150)
plt.show()

Normalized Confusion Matrices

Raw counts can be misleading with imbalanced classes. Normalize!

from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt

# Imbalanced: 900 cats, 100 dogs
y_true = ['cat']*900 + ['dog']*100
y_pred = ['cat']*850 + ['dog']*50 + ['cat']*30 + ['dog']*70

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Raw counts
cm = confusion_matrix(y_true, y_pred, labels=['cat', 'dog'])
ConfusionMatrixDisplay(cm, display_labels=['cat', 'dog']).plot(ax=axes[0], cmap='Blues')
axes[0].set_title('Raw Counts')

# Normalized by TRUE class (rows sum to 1)
cm_recall = confusion_matrix(y_true, y_pred, labels=['cat', 'dog'], normalize='true')
ConfusionMatrixDisplay(cm_recall, display_labels=['cat', 'dog']).plot(ax=axes[1], cmap='Blues', values_format='.2f')
axes[1].set_title('Normalized by Actual\n(Recall per class)')

# Normalized by PREDICTED class (columns sum to 1)
cm_precision = confusion_matrix(y_true, y_pred, labels=['cat', 'dog'], normalize='pred')
ConfusionMatrixDisplay(cm_precision, display_labels=['cat', 'dog']).plot(ax=axes[2], cmap='Blues', values_format='.2f')
axes[2].set_title('Normalized by Predicted\n(Precision per class)')

plt.tight_layout()
plt.savefig('normalized_cm.png', dpi=150)
plt.show()

Normalization options:

Normalize	What it shows	Diagonal is
`'true'` (rows)	Recall per class	How many of actual class X did we find?
`'pred'` (cols)	Precision per class	How many predicted X were correct?
`'all'`	Proportion of total	Percentage of all predictions

What the Matrix Reveals

Pattern 1: Diagonal Dominance = Good

        ┌─────┬─────┬─────┐
        │ 95  │  3  │  2  │
        ├─────┼─────┼─────┤
        │  4  │ 92  │  4  │
        ├─────┼─────┼─────┤
        │  1  │  5  │ 94  │
        └─────┴─────┴─────┘

Strong diagonal = model correctly classifies most samples

Pattern 2: One Row is Scattered = Class is Hard to Classify

        ┌─────┬─────┬─────┐
        │ 90  │  5  │  5  │  ← Class A is well-classified
        ├─────┼─────┼─────┤
        │ 30  │ 40  │ 30  │  ← Class B is confused with everything!
        ├─────┼─────┼─────┤
        │  5  │ 10  │ 85  │  ← Class C is okay
        └─────┴─────┴─────┘

Class B needs: more data, better features, or is inherently ambiguous

Pattern 3: Symmetric Off-Diagonal = Mutual Confusion

        ┌─────┬─────┬─────┐
        │ 70  │ 25  │  5  │
        ├─────┼─────┼─────┤
        │ 22  │ 73  │  5  │  ← A and B confuse each other!
        ├─────┼─────┼─────┤
        │  3  │  2  │ 95  │
        └─────┴─────┴─────┘

A↔B confusion suggests: similar features, need better discrimination

Pattern 4: Asymmetric = One-Way Confusion

        ┌─────┬─────┐
        │ 90  │ 10  │  ← Some A predicted as B
        ├─────┼─────┤
        │  2  │ 98  │  ← Almost no B predicted as A
        └─────┴─────┘

B "steals" from A, but A doesn't steal from B
Maybe: B is more "general" or has broader features

Complete Code: Confusion Matrix Analysis

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris

def analyze_confusion_matrix(y_true, y_pred, labels=None, title="Confusion Matrix"):
    """
    Complete confusion matrix analysis with visualization.
    """
    # Create confusion matrix
    cm = confusion_matrix(y_true, y_pred, labels=labels)

    # Calculate metrics for binary classification
    if len(cm) == 2:
        TN, FP, FN, TP = cm.ravel()

        print("=" * 50)
        print("CONFUSION MATRIX ANALYSIS")
        print("=" * 50)
        print(f"\nRaw Matrix:")
        print(f"  TN={TN}, FP={FP}")
        print(f"  FN={FN}, TP={TP}")

        print(f"\nDerived Metrics:")
        print(f"  Accuracy:     {(TP+TN)/(TP+TN+FP+FN):.1%}")
        print(f"  Precision:    {TP/(TP+FP) if (TP+FP) > 0 else 0:.1%}")
        print(f"  Recall:       {TP/(TP+FN) if (TP+FN) > 0 else 0:.1%}")
        print(f"  Specificity:  {TN/(TN+FP) if (TN+FP) > 0 else 0:.1%}")
        print(f"  F1 Score:     {2*TP/(2*TP+FP+FN) if (2*TP+FP+FN) > 0 else 0:.1%}")
        print(f"\nError Analysis:")
        print(f"  False Positives: {FP} (Type I Error)")
        print(f"  False Negatives: {FN} (Type II Error)")

    # Full classification report for any number of classes
    print(f"\nClassification Report:")
    print(classification_report(y_true, y_pred, target_names=labels))

    # Visualization
    fig, axes = plt.subplots(1, 2, figsize=(12, 5))

    # Raw counts
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
                xticklabels=labels, yticklabels=labels, ax=axes[0])
    axes[0].set_title(f'{title}\n(Raw Counts)')
    axes[0].set_ylabel('Actual')
    axes[0].set_xlabel('Predicted')

    # Normalized by row (recall)
    cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
    sns.heatmap(cm_normalized, annot=True, fmt='.2f', cmap='Blues',
                xticklabels=labels, yticklabels=labels, ax=axes[1])
    axes[1].set_title(f'{title}\n(Normalized by Actual)')
    axes[1].set_ylabel('Actual')
    axes[1].set_xlabel('Predicted')

    plt.tight_layout()
    plt.savefig('cm_analysis.png', dpi=150)
    plt.show()

    return cm

# Example usage with Iris dataset
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
    iris.data, iris.target, test_size=0.3, random_state=42
)

model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

cm = analyze_confusion_matrix(
    y_test, y_pred, 
    labels=iris.target_names,
    title="Iris Classifier"
)

Common Mistakes

Mistake 1: Misreading Row vs Column

# The confusion matrix in sklearn:
# - Rows = ACTUAL (true) class
# - Columns = PREDICTED class

# ❌ WRONG interpretation
"Row 0, Column 1 means: Predicted class 0, actual class 1"

# ✅ RIGHT interpretation  
"Row 0, Column 1 means: Actual class 0, predicted as class 1 (FP for class 1)"

Mistake 2: Ignoring Off-Diagonal Patterns

# ❌ WRONG: Only looking at diagonal
"Accuracy is 85%, we're good!"

# ✅ RIGHT: Analyze WHERE errors occur
cm = confusion_matrix(y_true, y_pred)
# Which classes confuse each other?
# Is the confusion symmetric?
# Is one class responsible for most errors?

Mistake 3: Not Normalizing for Imbalanced Classes

# ❌ WRONG: Raw counts with imbalanced data
# Class A: 950 samples, Class B: 50 samples
# Raw CM might show 900 correct for A, only 20 for B

# ✅ RIGHT: Normalize to see true per-class performance
cm_normalized = confusion_matrix(y_true, y_pred, normalize='true')
# Now you see: A recall = 94.7%, B recall = 40% ← problem revealed!

Mistake 4: Confusing FP and FN

Remember:
- FALSE POSITIVE: Predicted Positive, Actually Negative
  "Cried wolf when there was no wolf"
  "Convicted an innocent person"

- FALSE NEGATIVE: Predicted Negative, Actually Positive
  "Said no wolf when there was one"
  "Let guilty person go free"

Mnemonic: The second word (Positive/Negative) is what you PREDICTED
          "False" means you were WRONG about it

Quick Reference

The Matrix Layout (sklearn)

                    PREDICTED
                 Class 0    Class 1
              ┌───────────┬───────────┐
    Class 0   │    TN     │    FP     │
ACTUAL        ├───────────┼───────────┤
    Class 1   │    FN     │    TP     │
              └───────────┴───────────┘

All Metrics Derived

Metric	Formula	From Matrix
Accuracy	(TP+TN)/Total	Diagonal / All
Precision	TP/(TP+FP)	Bottom-right / Right column
Recall	TP/(TP+FN)	Bottom-right / Bottom row
Specificity	TN/(TN+FP)	Top-left / Top row
F1	2×P×R/(P+R)	Harmonic mean
FPR	FP/(FP+TN)	Top-right / Top row
FNR	FN/(FN+TP)	Bottom-left / Bottom row

Key Takeaways

A confusion matrix shows HOW you're right and HOW you're wrong — Not just overall performance
Four cells: TP, TN, FP, FN — True/False × Positive/Negative
Rows = Actual, Columns = Predicted — In sklearn's convention
Every metric comes from these four numbers — Accuracy, precision, recall, F1, all of them
Normalize for imbalanced classes — Raw counts hide poor performance on minority classes
Analyze patterns — Which classes confuse each other? Why?
Different errors have different costs — FP ≠ FN in real applications
Visualize it — Heatmaps reveal patterns numbers hide

The One-Sentence Summary

A confusion matrix is Judge Harrison's career report card showing not just how often she was right (accuracy), but exactly how she failed — 30 innocent people wrongly convicted (FP) and 20 guilty criminals who walked free (FN) — because "wrong" isn't just wrong, it's which KIND of wrong that determines real-world consequences.

What's Next?

Now that you can read a confusion matrix, you're ready for:

ROC Curves — Visualizing the FP vs TP tradeoff
Precision-Recall Curves — For imbalanced problems
Cost-Sensitive Analysis — When FP ≠ FN in dollars
Multi-Label Classification — When one sample has multiple classes

Follow me for the next article in this series!

Let's Connect!

If the confusion matrix finally makes sense, drop a heart!

Questions? Ask in the comments — I read and respond to every one.

What's the most surprising thing you've discovered in a confusion matrix? I once found a model that confused "airplane" with "bird" 60% of the time — feature engineering fixed it!

The difference between knowing your model is "85% accurate" and knowing it wrongly convicts 30 innocent people while letting 20 criminals go free? The confusion matrix. Accuracy is a summary. The matrix is the full story.

Share this with someone who only looks at accuracy. They're missing where their model actually fails.

Happy debugging! ⚖️

DEV Community

The Confusion Matrix: A Courtroom Drama Where Every Verdict Falls Into One of Four Boxes

The Four Verdicts of Justice

Verdict 1: Guilty Person Convicted ✓

Verdict 2: Innocent Person Acquitted ✓

Verdict 3: Innocent Person Convicted ✗

Verdict 4: Guilty Person Acquitted ✗

Judge Harrison's Career Summary

Anatomy of a Confusion Matrix

Reading the Matrix: Step by Step

Orientation Matters!

Visualizing the Confusion Matrix

Deriving ALL Metrics from the Matrix

The Visual Cheat Sheet

Why Both Errors Matter (Differently)

Judge A: "Better Safe Than Sorry"

Judge B: "Zero Tolerance"

Multi-Class Confusion Matrices

Reading the Multi-Class Matrix

Visualizing Multi-Class

Normalized Confusion Matrices

What the Matrix Reveals

Pattern 1: Diagonal Dominance = Good

Pattern 2: One Row is Scattered = Class is Hard to Classify

Pattern 3: Symmetric Off-Diagonal = Mutual Confusion

Pattern 4: Asymmetric = One-Way Confusion

Complete Code: Confusion Matrix Analysis

Common Mistakes

Mistake 1: Misreading Row vs Column

Mistake 2: Ignoring Off-Diagonal Patterns

Mistake 3: Not Normalizing for Imbalanced Classes

Mistake 4: Confusing FP and FN

Quick Reference

The Matrix Layout (sklearn)

All Metrics Derived

Key Takeaways

The One-Sentence Summary

What's Next?

Let's Connect!

Top comments (0)