DEV Community

Cover image for Dimensionality Reduction with Fisher’s Linear Discriminant: An investigation of decision boundaries
Rafael Rocha
Rafael Rocha

Posted on

Dimensionality Reduction with Fisher’s Linear Discriminant: An investigation of decision boundaries

Introduction

In the classification problems, each input vector x is assigned to one of K discrete classes Ck. The input space is divided into decision regions whose boundaries are called decision boundaries. Datasets (set of input vectors) whose classes can be separated exactly by linear decision boundaries are said to be linearly separable.

One simplest approach used in classification problems involves the building of discriminant functions that assign each vector x to a specific class, denoted Ck.

One way to better visualize the classification through discriminant functions is in terms of dimensionality reduction. In this blog post is used Fisher's linear discriminant that take the D - dimensional input vector x and project it down to a smaller dimension vector.

Dataset

The dataset used is the Iris dataset that contains 3 classes (Setosa, Versicolour and Virginica) of 50 examples each, where each class refers to a type of iris plant. The dataset has 4 attributes or features that is (in cm): sepal length, sepal width, petal length, and petal width. The figure below shows three lines of the dataset each of a specific class where Setosa, Versicolour and Virginica are assassinated as class 0, 1 and 2, respectively.

Iris dataset

Fisher's linar discriminant

The Fisher's linear discriminant take the D - dimensional input vector x and project it down to a R - dimensional vector, using the equation below:

y=wTy y=w^{T}y

Where x is the input vector, w is the weight vector and y is the projected vector of smaller dimension. As the dataset is used and not just an input vector, the x, w and y will be the matrices, X, W and Y, respectively.

The iris dataset has a D = 4 dimensions since it has 4 features, thus, R must be smaller than 4.

The purpose of Fisher's linear discriminant is find the matrix W of size N x R. First the matrix SW called within-class covariance matrix is obtained by following equation:

SW=k=1KSk S_{W}=\sum_{k=1}^{K}S_{k}

Where

SK=cϵCk(xnmk)(xnmk)T S_{K}=\sum_{c\epsilon C_{k}}^{}\left ( x_{n}- m_{k} \right)\left ( x_{n}- m_{k} \right )^T
mk=1NkcϵCkxn m_{k}=\frac{1}{N_{k}}\sum_{c\epsilon C_{k}}^{}x_{n}

The mean of the examples of a specific class is determined by mk and Nk is the number of examples in class Ck.

Next, the between-class covariance matrix SB is obtained by equation below:

SB=k=1KNk(mkm)(mkm)T S_{B}=\sum_{k=1}^{K}N_{k}\left ( m_{k}-m \right)\left ( m_{k}-m \right )^T

Where m is the mean of total dataset.

The values of matrix W are determined by the eigenvectors of (SW^-1)SB that correspond to the D' largest eigenvalues.

The code

The code is done in Python and two loops are the core of fisher's linear discriminant. These loops are used to get both SW and SB matrices to determine W by eigenvectors of (SW^-1)SB later. The first loop is shown below:

for k in Ck:
  Xc = X[labels==k]
  Sk.append(np.cov(Xc.T))
  mk.append(np.mean(Xc, 0))
  Nk[k] = len(Xc)
Enter fullscreen mode Exit fullscreen mode

The initial line in the loop gets the examples of the specific class Ck. Next, are obtained the matrix Sk, used to obtain SW later. The next two lines get mk and Nk to calculate SB in the next loop.

The second loop is shown below. The first line obtains the within-class covariance matrix SW, and the subsequent lines are used to get the between-class matrix.

for k in Ck:
  SW += Sk[k]
  temp = mk[k]-m
  temp.shape = (D, 1)
  SB += np.dot(Nk[k], temp*temp.T)
Enter fullscreen mode Exit fullscreen mode

Lastly, we obtain the matrix through eigenvectors of (SW^-1)SB as shown in the code below:

invSw = np.linalg.pinv(SW)
invSw_by_SB = np.dot(invSw, SB)

eigenvalues, eigenvectors = np.linalg.eig(invSw_by_SB)

sort_eigval = np.argsort(eigenvalues)[::-1]
sort_eigval_index = np.argsort(eigenvalues)[::-1]

W = eigenvectors[:, sort_eigval_index[0:R]] # Weight matrix

Y = np.dot(W.T, X.T)
Y = Y.T # Projected data
Enter fullscreen mode Exit fullscreen mode

Initially, the (SW^-1)SB is calculated, followed by obtaining eigenvalues and eigenvectors of this result. Thus, W is obtained by the first R columns of eigenvectors in descending order of the eigenvalues.

Results

The figure below shows the scatter plot of Petal length and Petal width. Analyzing those two features is noted the difficulty of visualizing the decisions boundaries between three classes, specifically the Versicolour and Virginica.

Scatter plot of Sepal length and Sepal width.

Applying the fisher’s linear discriminant with R = 2 in the Iris dataset, we obtain the projected data with 2 features, as shown in the figure below. With the new two features, it is observed a better distinction between the three classes as shown an improvement in visualization of the decision boundaries.

Projeted data with R = 2

To exemplify, performing the dimensionality reduction with R = 1, we also achieved good decision boundaries between the classes.

Projected data with R = 1.

Thus, fisher’s linear discriminant is a good way to investigate the decision boundaries in classification tasks, like the Iris dataset, through dimensionality reduction.


The complete code is available on Github and Colab. Follow the blog if the blog post it is helpful for you.

Top comments (0)