DEV Community

Cover image for Performance measures for Imbalanced Classes

Posted on

Performance measures for Imbalanced Classes

Imbalanced data means the data that is having more samples of a single class or category and very less data of all other classes. It is a problem of classification problem of Machine Learning. In Supervised Machine Learning classification is one of the significant problems to be solved. It is basically of two types Binary classification and Multi-class classification. In Binary classification the samples are divided into two categories known as classes in machine learning terms. Classification means predicting the labels of the samples present in a dataset(a collection of data having features and labels). A simple example of binary classification is identifying whether the image is of a dog or a cat. Multi-class classification example is identifying the digits based on images which contains 10 classes representing each digit.

Imbalanced data are often present in most real-world scenarios like in the case of spam detection of e-mails there are very few e-mails of type spam, in the case of cancer detection very few cases are of cancerous type, so accuracy which is the ratio of correctly predicted classes to the total number of samples cannot be used for performance evaluation of imbalanced dataset. An example to explain this situation is if we have a dataset containing e-mails with spam and non-spam e-mails, and if the total number of e-mails is 1000 and the number of e-mails that are spam is 10 then the accuracy for a model would be 99.90% if it predicted all the samples to be non-spam. Thus accuracy cannot be a good measurement to assess the performance of the model for this dataset. We need some other performance measures to evaluate the model on an imbalanced dataset.

Some of the performance measures for imbalanced dataset based models are-

  • Precision
  • Recall
  • Specificity
  • F1-score
  • Geometric Mean
  • Index Balanced Accuracy

To understand these performance measures we need to understand some terms first. These terms are True Postive (TP), True Negative (TN), False Positive (FP), and False Negative (FN).

True Positive means that the positive classes are predicted accurately and True Negative means that negative classes are predicted accurately. If the actual class is positive and is predicted as negative then it is known as False Negative. If the actual class is negative and the predicted class is positive then it is called False Positive. A summary for which is given in the table below.

Predicted Positive Predicted Negative
Actual Positive True Positive (TP) False Negative (FN)
Actual Negative False Positive (FP) True Negative (TN)

Also, an image describing the TP, TN, FP, and FN is given below

Alt Text


It is the ratio of correctly predicted positive classes to the total number of positively predicted classes. The formula for precision is

Precision = TP/(TP+FP)


Recall is also known as Sensitivity and True Positive Rate. It is the ratio of correctly predicted positive classes to the total actual positive classes. The formula for recall is

Recall = TP/(TP+FN)


It is also known as True Negative Rate. It is the ratio of correctly predicted negative classes to the total actual negative classes. The formula for specificity is

Specificity = TN/(TN+FP)


F1-score is the harmonic mean of precision and recall. The formula for F1-score is

F1-score = 2*Precision*Recall/(Precision+Recall)

Geometric Mean

Geometric mean is the square root of the product of Recall and Specificity. The formula for Geometric Mean is

Geometric Mean = √(Recall*Specificity) or Geometric Mean = √(True Positive Rate * True Negative Rate)

Index Balanced Accuracy

It is a new metric for measuring performance. First of all Dominance is calculated which is the difference of True Positive Rate and True Negative Rate. The formula is Dominance = Recall-Specificity. The value of dominance is assigned a weight α. The formula of Index Balanced Accuracy (IBA) is IBA = (1 + α*Dominance)(GMean²). In simplified terms it is

IBA = (1 + α*(Recall-Specificity))*(Recall*Specificity)

The imbalanced learn library of Python provides all these metrics to measure the performance of imbalanced classes. It can be imported as follow

from imblearn import metrics
Enter fullscreen mode Exit fullscreen mode

An example of the code for finding different measures is

y_true = [1, 1, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 2]
y_pred = [1, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2]
target_names = [ 'class 1', 'class 2'] 
result = metrics.classification_report_imbalanced(y_true, y_pred, target_names=target_names)
Enter fullscreen mode Exit fullscreen mode


Alt Text

Here pre is precision, rec is recall, spe is specificity, f1 is f1-score, geo is geometric mean, iba is index balanced accuracy and sup is support. The default value of α is 0.1 for IBA.

Top comments (1)

mayankjoshi profile image
mayank joshi

Great article Aman 👏🏼