This is part two of a series Principal Components in TypeScript
TL;DR
If you need a TL;DR, just read the code here:
https://www.npmjs.com/package/pca-js?activeTab=code
Not a Code Blog
This is not a code blog. There’s no easy copy-paste solution here.
If that’s what you want, go straight to the source code above.
Step 1: Normalize the Data
Our first step is to normalize data by subtracting the mean.
An elegant way to do this:
- Create a unit square matrix
- Multiply it with the data
- Scale by
1 / number_of_rows
This gives you the mean matrix, which you subtract from the original data.
This is formally known as the Deviation Matrix.
Example Code
let unit = unitSquareMatrix(matrix.length);
let deviationMatrix = subtract(matrix, multiplyAndScale(unit, matrix, 1 / matrix.length));
const D = deviationMatrix;
Where multiplyAndScale fuses matrix multiplication and scaling:
/**
* Fix for #11, OOM on moderately large datasets, fuses scale and multiply into a single operation to save memory
*/
export function multiplyAndScale(a: Matrix, b: Matrix, factor: number): Matrix {
assertValidMatrices(a, b, "a", "b")
const aRows = a.length;
const aCols = a[0].length;
const bCols = b[0].length;
const flat = new Float64Array(aRows * bCols);
for (let i = 0; i < aRows; i++) {
for (let k = 0; k < aCols; k++) {
const aVal = a[i][k] * factor;
const iOffset = i * bCols;
for (let j = 0; j < bCols; j++) {
flat[iOffset + j] += aVal * b[k][j];
}
}
}
const result: Matrix = [];
for (let i = 0; i < aRows; i++) {
result[i] = Array.from(flat.subarray(i * bCols, (i + 1) * bCols));
}
return result;
}
But… Is This Optimal?
Not really.
- Triple loop → O(n³) worst case
- Can still cause memory issues
A simpler approach:
matrix.map(row =>
row.map((v, i) => v - matrix.reduce((s, r) => s + r[i], 0) / matrix.length)
);
The library prefers elegance over optimization, assuming eventual GPU acceleration.
Step 2: Deviation Scores
Next step:
Dᵀ @ D
- Multiply transpose of deviation matrix with itself
-
@= matrix multiplication (Python notation)
Then divide by number of rows → Variance-Covariance Matrix
Why Are We Doing This?
Because raw data is useless for analysis.
We reshape it into a form that:
- Has structure
- Encodes relationships
- Can be mathematically decomposed
Variance-Covariance Matrix
For 3 features:
[
[var(f1), cov(f1,f2), cov(f1,f3)],
[cov(f2,f1), var(f2), cov(f2,f3)],
[cov(f3,f1), cov(f3,f2), var(f3)]
]
- Diagonal → variance
- Off-diagonal → covariance
What is SVD?
Singular Value Decomposition:
SVD = U * Σ * Vᵀ
- U → row characteristics
- V → column characteristics
- Σ (Sigma) → importance (weights)
For PCA:
- We care about columns
-
So:
- V → eigenvectors
- Σ → eigenvalues
What Are Eigenvectors?
“Eigen” = German for “own” (yes really)
Think of eigenvectors as:
- Characteristic directions in your data
Eigenvalues:
- Tell you how important each direction is
Choosing Components
Sort eigenvalues in descending order.
Then:
percentage_explained = Σ(selected eigenvalues) / Σ(all eigenvalues)
- Pick top 1 → decent approximation
- Pick top 2 → better
- Pick all → original data
Usually:
- First component explains ~80% (Pareto principle)
Compressed Data
compressed = selected_eigenvectors × centered_dataᵀ
Example:
- Original: 4 columns × 30 rows
- Reduced: 2 columns × 30 rows
Now scale that to millions of rows 👀
Reconstructing Data
original ≈ selected_eigenvectorsᵀ × compressed + mean
- This is lossy compression
- Some information is gone forever
What Did We Actually Achieve?
Compression… but let’s make it real.
Example: Student Scores
You’re a teacher. Three exams, varying difficulty.
Raw Data
| Student | Exam 1 | Exam 2 | Exam 3 |
|---|---|---|---|
| 1 | 40 | 50 | 60 |
| 2 | 50 | 70 | 60 |
| 3 | 80 | 70 | 90 |
| 4 | 50 | 60 | 80 |
Means
| Exam 1 | Exam 2 | Exam 3 |
|---|---|---|
| 55 | 62.5 | 72.5 |
- Exam 1 = hardest
- Student 3 dominates
Simple Averages
| Student | Avg Score |
|---|---|
| 1 | 50.00 |
| 2 | 60.00 |
| 3 | 80.00 |
| 4 | 63.33 |
Problem:
- Student 2 vs 4 is unclear
PCA Results
| PC | Eigenvalue | Eigenvector | % Variance |
|---|---|---|---|
| PC1 | 520.1 | [0.74, 0.28, 0.60] | 84.3% |
| PC2 | 78.1 | [0.23, 0.74, -0.63] | 12.7% |
| PC3 | 18.5 | [0.63, -0.61, -0.48] | 3.0% |
We take PC1 (Pareto win).
Construct New Score
Score = (Exam1 × 0.74) + (Exam2 × 0.28) + (Exam3 × 0.60)
Interpretation:
- A normalized “true ability” score
- ~84.3% accurate
Final Scores
| Student | Score |
|---|---|
| 1 | 31 |
| 2 | 44 |
| 3 | 84 |
| 4 | 53 |
Now:
- Student 4 > Student 2 (consistency across difficulty)
- Student 3 = clearly top
Reality Check
Yes… this is a bit hand-wavy.
That’s why PCA is mainly used for:
- Dimensionality reduction
- Not deep semantic interpretation
Example issue:
- Physics vs Chemistry vs Math
- Hard to interpret combined score meaningfully
Where This Goes Next
Instead of:
- Collapsing variables blindly
We move toward:
- Interpretable components
- Naming eigenvectors based on weights
What’s Next?
Before that… we go to Part 3.
We’ll answer:
What is a neural network really doing?
If GPT = brain
Then ConvNet = eyes 👀
Outro
See you in Part 3…
Or don’t. idc 😄
Top comments (0)