PCA From Scratch: Compress Data, Keep the Signal

#machinelearning #ai #beginners #datascience

Got 100 features and can't visualize or train fast? PCA finds the few directions that carry most of the information and throws away the rest — and you can watch it happen on a 2D scatter. Here's PCA computed for real in the browser.

📉 See the principal axes (drag the correlation slider): https://dev48v.infy.uk/ml/day14-pca.html

The idea: rotate to where the variance is

PCA finds new axes — principal components — pointing along the directions your data spreads the most. PC1 is the longest spread, PC2 the next (perpendicular to it), and so on. Keep the top few and you've compressed the data with minimal loss.

How it's actually computed

Center the data (subtract the mean).
Build the covariance matrix (how features vary together).
Find its eigenvectors (the component directions) and eigenvalues (how much variance each holds).
Sort by eigenvalue, keep the top-k, project onto them.

The demo draws PC1/PC2 through the cloud and shows the "variance explained" percentages — collapse to PC1 and watch how much information survives.