DEV Community

penthaapatel
penthaapatel

Posted on • Edited on • Originally published at Medium

Using Multiple Linear regression to solve banknote authentication problem

To predict whether a given banknote is authentic given a number of measures taken from a photograph.

Dataset used - Banknote Authentication Data Set

It is binary classification problem. The dataset consists of 5 columns as follows:

Attributes:

  1. Variance of Wavelet Transformed image (continuous).
  2. Skewness of Wavelet Transformed image (continuous).
  3. Kurtosis of Wavelet Transformed image (continuous).
  4. Entropy of image (continuous).

Labels(Target)

  1. Output (0 for authentic, 1 for inauthentic).

Load the dataset

Load dataset into a pandas dataframe from the csv file.

Analyse the data

Using scatter matrix.

Prepare the data

Break the data (labels and attributes) into two subsets: a test set and a training set.

Create the model

Using Linear Regression.

Train the model

Train the classifier using training set.

Evaluate the model

Evaluate using test set.
Generate confusion matrix for varying threshold values.
Calculate misclassification rate.
Calculate area under ROC.

Dependence of Misclassification Error on Decision Threshold

Error rate vs threshold

In-sample ROC for banknote classifier

In-sample ROC

Out-of-sample ROC for banknote classifier

Out-of-sample ROC

The entire code to this problem can be found on my github profile-
Code on GitHub - github.com/penthaapatel/BankNoteAuthentication

Top comments (0)