Banso D. Wisdom

Posted on Apr 22, 2019 • Edited on Apr 30, 2019

Understanding the Confusion Matrix

#machinelearning #deeplearning #python

A confusion matrix is a table that describes the performance of a classifier/classification model. It contains information about the actual and prediction classifications done by the classifier and this information is used to evaluate the performance of the classifier.

Note that the confusion matrix is only used for classification tasks, and as such cannot be used in regression models or other non-classification models.

Before we go on, let's look at some terms.

Classifier: A classifier is basically an algorithm that uses "knowledge" gotten from training data to map input data to a particular category or class. Classifiers are either binary classifiers or multi-class/multi-categorical/multi-label/multi-output classifiers.
Training and test data: When building a classification model/classifier, datasets are split into training data and test data which have associated labels. A label is an expected output which is the category or class data belongs to.
Actual classifications: This is the expected output (labels) on the data.
Prediction classifications: This is the output given by the classifier for a particular input data.

An example: Let's say we have built a classifier to categorize an input image of a car as either a sedan or not, and we have an image in our dataset that has been labeled as a non-sedan but the classification model classifies as a sedan.
In this scenario, the actual classification is non-sedan while the prediction classification is sedan.

Types of Confusion Matrices

There are two types of confusion matrices:

2-class confusion matrix
Multi-class confusion matrix

2-Class Confusion Matrix

A 2-class as the name implies is a confusion matrix that describes the performance of a binary classification model. A 2-class matrix for the sedan classifier I described earlier can be visualized as such:

In this visualization, we have two sections which have been outlined. We have the predicted classifications section which contains two subsections for each of the classes and the actual classifications section which has two subsections for each of the classes.

If this is your first time seeing a confusion matrix, I know you must be wondering what all the variables in the table represent. It is quite simple actually, I will explain as simply as I can, but before I do it is important to know that these variables represent a number of predictions.

The variable a

The variable a falls under the Non-sedan sub-section in both the Actual and Predicted classification sections. This means a predictions were made that correctly classified an image of a non-sedan [as a non-sedan].

The variable b

The variable b falls under the Non-sedan sub-section in the Actual classification section and under the Sedan sub-section in the Predicted classification section. This means b predictions were made that incorrectly classified an image of a non-sedan as a sedan.

The variable c

The variable c falls under the Sedan sub-section in the Actual classification section and under the Non-sedan sub-section in the Predicted classification section. This means c predictions were made that incorrectly classified an image of a sedan as a non-sedan.

The variable d

The variable d falls under the Sedan sub-section in both the Actual and Predicted classification sections. This means d predictions were made that correctly classified an image of a sedan [as a sedan].

Easy peasy lemon squeezy. (I hope? 😅)

But wait, we're not done yet.......

Now we have our confusion matrix for our sedan classifier, but how does this help us ascertain our classifier's performance/efficiency?
To ascertain the performance of a classifier using the confusion matrix and the data it contains, there are some standard metrics that we can calculate [for] using the data(variables) in the confusion matrix.

Accuracy

Accuracy in a 2-Class confusion matrix is the ratio of the total number of correct predictions to the total number of predictions.
From our confusion matrix, we can see that a and d predictions were made that correctly classified the input image and b and c predictions were made that incorrectly classified the input image.

Therefore, accuracy can be calculated as:

Accuracy = (a + d) / (a + b + c + d)

Where, a + d is the total number of correct predictions and a + b + c + d is the total number of predictions made.

True positives, True negatives, False positives and False negative

With relation to our classifier and confusion matrix:

True positives (TP) are the number of predictions where an image of a sedan is correctly classified [as a sedan].
From our confusion matrix, the variable d is also the TP.

True negatives (TN) are the number of predictions where an image of a non-sedan is correctly classified [as a non-sedan].
From our confusion matrix, the variable a is also our TN.

False positives (FP) are the number of predictions where an image of a non-sedan is incorrectly classified as a sedan.
From our confusion matrix, the variable b is also our FP.

False negatives (FN) are the number of predictions where an image of a sedan is incorrectly classified as a non-sedan.
From our confusion matrix, the variable c is also our FN.

True Positive Rate

The true positive rate is a ratio of the true positives to the sum of the true positives and false negatives. It shows how often the classifier classifies an image of a sedan as a sedan.

Therefore, the true positive rate can be calculated as:

True Positive Rate = d / (c + d)
Where d is TP and c is FN

True positive rate is also known as recall or sensitivity

False Positive Rate

The false positive rate is a ratio of the false positives to the sum of the true negatives and false positives. It shows how often the classifier classifies an image of a non-sedan as a sedan.

Therefore, the false positive rate can be calculated as:

False Positive Rate = b / (a + b)
Where a is TN and b is FP

True Negative Rate

The true negative rate is a ratio of the true negatives to the sum of the true negatives and false positives. It shows how often the classifier classifies an image of a non-sedan as a non-sedan.

Therefore, the false positive rate can be calculated as:

True Negative Rate = a / (a + b)
Where a is TN and b is FP

The true negative rate is also known as specificity.

False Negative Rate

The false negative rate is a ratio of the false negatives to the sum of the false negatives and true positives. It shows how often the classifier classifies an image of a sedan as a non-sedan.

Therefore, the false positive rate can be calculated as:

False Negative Rate = c / (c + d)
Where d is TP and c is FN

Precision

The precision is a ratio of the true positives to the sum of the true positives and false positives. It shows how often the classifier classifies an input image as a sedan and it turns out to be correct.

It is calculated as:

Precision = d / (b + d)
Where d is TP and b is FP

An Example

Suppose we have the image below as the confusion matrix for our classifier, we can use the metrics defined above to evaluate its performance.

From the confusion matrix, we can see that:

4252 predictions were made that correctly classified a non-sedan [as a non-sedan]. Therefore, our variable a and the True Negative (TN) is 4252.

875 predictions were made that incorrectly classified a non-sedan as a sedan. Therefore, our variable b and the False Positive (FP) is 875.

421 predictions were made that incorrectly classified a sedan as a non-sedan. Therefore, our variable c and the False Negative (FN) is 421.

4706 predictions were made that correctly classified a sedan [as a sedan]. Therefore, our variable d and the True Positive (TP) is 4706

Using the data we've "extracted", we can calculate the aforementioned metrics and ascertain the performance of the classifier. We can already tell that the classifier performs well since the number of correct predictions is greater than the number of incorrect predictions.

Accuracy

Accuracy = (a + d) / (a + b + c + d)
= (4252 + 4706) / (4252 + 875 + 421 + 4706)
= (8958) / (10254)
= 0.8736102984201287
Accuracy = 0.87

Therefore the classifier has an accuracy of 0.87 which is 87%

True Positive Rate

TPR = TP / (TP + FN)
= 4706 / (4706 + 421)
= 4706 / 5127
= 0.917885703140238
TPR = 0.92

Therefore the classifier has a True Positive Rate of 0.92 which is 92%

False Positive Rate

FPR = FP / (FP + TN)
= 875 / (875 + 4252)
= 875 / 5127
= 0.1706651062999805
FPR = 0.17

Therefore the classifier has a False Positive Rate of 0.17 which is 17%

True Negative Rate

TNR = TN / (TN + FP)
= 4252 / (4252 + 875)
= 4252 / 5127
= 0.8293348937000195
TNR = 0.83

Therefore the classifier has a True Negative Rate of 0.83 which is 83%

False Negative Rate

FNR = FN / (FN + TP)
= 421 / (421 + 4706)
= 421 / 5127
= 0.082114296859762
FNR = 0.08

Therefore the classifier has a False Negative Rate of 0.08 which is 8%

Precision

Precision = TP / (TP + FP)
= 4706 / (4706 + 875)
= 4706 / 5581
= 0.8293348937000195
Precision = 0.83

Therefore the classifier has a Precision of 0.83 which is 83%

How to generate a confusion matrix using Python

import matplotlib.pylab as plt
import itertools
import numpy as np
from sklearn.metrics import confusion_matrix

def plot_confusion_matrix(cm, classes,normalize=False):
    plt.figure(figsize = (5,5))
    plt.imshow(cm, interpolation='nearest', cmap=plt.cm.Blues)
    plt.title('Confusion matrix')
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=90)
    plt.yticks(tick_marks, classes)
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]

    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, cm[i, j],
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")
    plt.tight_layout()
    plt.ylabel('Actual')
    plt.xlabel('Predicted')

dict_characters = {0: 'Non-sedan', 1: 'Sedan'}

y_pred = model.predict(test_data)
y_pred_classes = np.argmax(y_pred, axis=1)
y_true = np.argmax(test_labels, axis=1)

confusion_mat = confusion_matrix(y_true, y_pred_classes)
plot_confusion_matrix(confusion_mat, classes = list(dict_characters.values()))

To generate a confusion matrix, we utilize numpy, matplotlib.pylab to visualize the matrix, the confusion_matrix function from the sklearn.metrics package to generate the confusion matrix, and itertools for looping/iteration.

First, we define a function plot_confusion_matrix that takes the generated confusion matrix and expected/possible classes as arguments and the uses matplotlib.pylab to visualize the confusion matrix.

In the snippet, we assume we have already have our trained model and training and test data with associated labels.

dict_characters is a dictionary of the two possible classes, in our case, "non-sedan" and "sedan".
y_pred is a numpy array of predictions done by the classifier on the test data
model is our trained classifier/algorithm
test_data is our test data
y_pred_classes is a numpy array of indices relative to y_pred which is the array of predictions done by the classifier on the test data.
y_true is a numpy array of indices relative to the actual/correct labels of the test_data.
test_labels is a list of labels of the test data.

Using the above, we use the confusion_matrix function from sklearn.metrics to generate the confusion matrix, passing in the correct values (y_true) and the estimated values returned by the classifier (y_pred_classes) and we store the generated confusion matrix in a variable confusion_mat.

We then pass the confusion matrix (confusion_mat) and a list of the values of our possible classes (dict_characters) as arguments to the plot_confusion_matrix function which then visualizes the confusion matrix.

In my next post, I [hopefully] would be writing on the multi-class confusion matrix.

Top comments (3)

Manza • May 16 '19

hi I have 101 classes and their accuracies, I want to draw confusion matrix for them. My code is in Pytorch

A portion of my code is following

video_pred = [np.argmax(x[0]) for x in output]

video_labels = [x[1] for x in output]

print('Accuracy {:.02f}% ({})'.format(
    float(np.sum(np.array(video_pred) == np.array(video_labels))) / len(video_pred) * 100.0,
    len(video_pred)))       

#Accuracy per Class
for i in range(num_class):

    indicies_correct = np.where(np.array(video_pred) == np.array(video_labels))

    class_total = np.sum(np.array(video_labels) == i)

    class_correct = np.sum(np.array(video_pred)[indicies_correct ] == i)

    print('Class {}: Accuracy {:.02f}%'.format(i, class_correct / class_total * 100))



confusion_matrix = torch.zeros(num_class, num_class)

data_gen = enumerate(data_loader)
#with torch.no_grad():

for i, (data, label) in data_gen:
    data = data
    label = label
    outputs = model_ft(data)
    _, preds = torch.max(outputs, 1)
    for t, p in zip(label.view(-1), preds.view(-1)):
        print(t, p)
        confusion_matrix[t.long(), p.long()] += 1

print(confusion_matrix)

print(confusion_matrix.diag()/confusion_matrix.sum(1))

Banso D. Wisdom • May 18 '19 • Edited

Hi, Manza. I've not worked with Pytorch but I believe it should be similar to the example I gave.

From the snippet you gave, since you already have the confusion matrix (confusion_matrix), what you need to do is create a dictionary of all the classes like this:

dict = { 0: 'Class 1', 1: 'Class 2', 2: 'Class 3',..., n: 'Class n' }

Then pass your confusion matrix (confusion_matrix) and the dictionary of classes (dict) to the function plot_confusion_matrix in the example I gave or a similar function, probably one you've written to your preferences.

That should output a visualization of your confusion matrix.

Sebastian • Apr 23 '19

Nice little summary. Not confusing at all ;)