# Implementing Simple PCA using NumPy

### Sıddık AÇIL ・2 min read

*I am open to job offers, feel free to contact me for any vacancies abroad.*

In this article, I will implement PCA algorithm from scratch using Python's NumPy. To test my results, I used PCA implementation of scikit-learn.

```
from sklearn.decomposition import PCA
import numpy as np
k = 1 # target dimension(s)
pca = PCA(k) # Create a new PCA instance
data = np.array([[0.5, 1], [0, 0]]) # 2x2 data matrix
print("Data: ", data)
print("Reduced: ", pca.fit_transform(data)) # fit and transform
```

This results in:

```
[[-0.55901699]
[ 0.55901699]]
```

### Centering Data Points

Make origin the centroid of our data.

```
data = data - data.mean(axis=0) # Center data points
print("Centered Matrix: ", data)
```

### Get Covariance Matrix

Get covariance matrix of our features.

```
cov = np.cov(data.T) / data.shape[0] # Get covariance matrix
print("Covariance matrix: ", cov)
```

### Perform Eigendecomposition on Covariance Matrix

Eigendecomposition extracts eigenvalues and corresponding eigenvectors of a matrix

```
v, w = np.linalg.eig(cov)
```

### Sort Eigenvectors According to Eigenvalues

Most numerical libraries offer eigenvectors pre-sorted, However, this is not the case for NumPy. Therefore, we need to argsort the eigenvalue vector to get sorting indices and perform sorting on columns of eigenvalues.

```
idx = v.argsort()[::-1] # Sort descending and get sorted indices
v = v[idx] # Use indices on eigv vector
w = w[:,idx] #
print("Eigenvalue vektoru: ", v)
print("Eigenvektorler: ", w)
```

###
Get First *K* Eigenvectors

Our aim in PCA is to **construct a new feature space**. Eigenvectors are the **axes** of this new feature space and eigenvalues denote the **magnitude of variance** along that axis. In other words, a higher eigenvalue means more variance on the corresponding principal axis. Therefore, the set of axes with the highest variances are the most important features in this new feature space since they hold most of the information. By getting the first K columns of eigenvector matrix, which have the K biggest eigenvalues, we form what is called a projection matrix. The dot product of our data matrix and projection matrix, *which sounds pretty cool but it is actually pretty straightforward*, is the reduced feature space, the result of PCA.

```
print("Sonuc: ", data.dot(w[:, :k]))
# Get the dot product of w with the first K columns of eigenvector matrix
# (a.k.a) projection matrix
```

Which also results in:

```
[[-0.55901699]
[ 0.55901699]]
```

This is it. Any corrections or suggestions are welcome.

### Extras

Square vs. Non-Square: Eigendecomposition and SVD

Original Turkish article at

## CPP Eigen ve Python Numpy ile PCA Gerçeklemesi - Sıddık Açıl - Medium

### Sıddık Açıl ・ 4 min read

My go to article on PCA