Got 100 features and can't visualize or train fast? PCA finds the few directions that carry most of the information and throws away the rest — and you can watch it happen on a 2D scatter. Here's PCA computed for real in the browser.
📉 See the principal axes (drag the correlation slider): https://dev48v.infy.uk/ml/day14-pca.html
The idea: rotate to where the variance is
PCA finds new axes — principal components — pointing along the directions your data spreads the most. PC1 is the longest spread, PC2 the next (perpendicular to it), and so on. Keep the top few and you've compressed the data with minimal loss.
How it's actually computed
- Center the data (subtract the mean).
- Build the covariance matrix (how features vary together).
- Find its eigenvectors (the component directions) and eigenvalues (how much variance each holds).
- Sort by eigenvalue, keep the top-k, project onto them.
The demo draws PC1/PC2 through the cloud and shows the "variance explained" percentages — collapse to PC1 and watch how much information survives.
When to reach for it
Visualization (high-dim → 2D), compression, denoising, speeding up training. Scale your features first; remember it only captures linear structure.
🔨 Built from scratch (center → covariance → eigen → project) on the page: https://dev48v.infy.uk/ml/day14-pca.html
Part of MachineLearningFromZero. 🌐 https://dev48v.infy.uk
Top comments (0)