bitanath

Posted on Apr 21

Principal Components in TypeScript (Part 2)

#typescript #datascience

This is part two of a series Principal Components in TypeScript

TL;DR

If you need a TL;DR, just read the code here:
https://www.npmjs.com/package/pca-js?activeTab=code

Not a Code Blog

This is not a code blog. There’s no easy copy-paste solution here.
If that’s what you want, go straight to the source code above.

Step 1: Normalize the Data

Our first step is to normalize data by subtracting the mean.

An elegant way to do this:

Create a unit square matrix
Multiply it with the data
Scale by 1 / number_of_rows

This gives you the mean matrix, which you subtract from the original data.

This is formally known as the Deviation Matrix.

Example Code

let unit = unitSquareMatrix(matrix.length);
let deviationMatrix = subtract(matrix, multiplyAndScale(unit, matrix, 1 / matrix.length));
const D = deviationMatrix;

Where multiplyAndScale fuses matrix multiplication and scaling:

/**
 * Fix for #11, OOM on moderately large datasets, fuses scale and multiply into a single operation to save memory
 */
export function multiplyAndScale(a: Matrix, b: Matrix, factor: number): Matrix {
    assertValidMatrices(a, b, "a", "b")

    const aRows = a.length;
    const aCols = a[0].length;
    const bCols = b[0].length;

    const flat = new Float64Array(aRows * bCols);

    for (let i = 0; i < aRows; i++) {
        for (let k = 0; k < aCols; k++) {
            const aVal = a[i][k] * factor;
            const iOffset = i * bCols;
            for (let j = 0; j < bCols; j++) {
                flat[iOffset + j] += aVal * b[k][j];
            }
        }
    }

    const result: Matrix = [];
    for (let i = 0; i < aRows; i++) {
        result[i] = Array.from(flat.subarray(i * bCols, (i + 1) * bCols));
    }
    return result;
}

But… Is This Optimal?

Not really.

Triple loop → O(n³) worst case
Can still cause memory issues

A simpler approach:

matrix.map(row =>
  row.map((v, i) => v - matrix.reduce((s, r) => s + r[i], 0) / matrix.length)
);

The library prefers elegance over optimization, assuming eventual GPU acceleration.

Step 2: Deviation Scores

Next step:

Dᵀ @ D

Multiply transpose of deviation matrix with itself
@ = matrix multiplication (Python notation)

Then divide by number of rows → Variance-Covariance Matrix

Why Are We Doing This?

Because raw data is useless for analysis.

We reshape it into a form that:

Has structure
Encodes relationships
Can be mathematically decomposed

Variance-Covariance Matrix

For 3 features:

[
  [var(f1),    cov(f1,f2), cov(f1,f3)],
  [cov(f2,f1), var(f2),    cov(f2,f3)],
  [cov(f3,f1), cov(f3,f2), var(f3)]
]

Diagonal → variance
Off-diagonal → covariance

What is SVD?

Singular Value Decomposition:

SVD = U * Σ * Vᵀ

U → row characteristics
V → column characteristics
Σ (Sigma) → importance (weights)

For PCA:

We care about columns
So:
- V → eigenvectors
- Σ → eigenvalues

What Are Eigenvectors?

“Eigen” = German for “own” (yes really)

Think of eigenvectors as:

Characteristic directions in your data

Eigenvalues:

Tell you how important each direction is

Choosing Components

Sort eigenvalues in descending order.

Then:

percentage_explained = Σ(selected eigenvalues) / Σ(all eigenvalues)

Pick top 1 → decent approximation
Pick top 2 → better
Pick all → original data

Usually:

First component explains ~80% (Pareto principle)

Compressed Data

compressed = selected_eigenvectors × centered_dataᵀ

Example:

Original: 4 columns × 30 rows
Reduced: 2 columns × 30 rows

Now scale that to millions of rows 👀

Reconstructing Data

original ≈ selected_eigenvectorsᵀ × compressed + mean

This is lossy compression
Some information is gone forever

What Did We Actually Achieve?

Compression… but let’s make it real.

Example: Student Scores

You’re a teacher. Three exams, varying difficulty.

Raw Data

Student	Exam 1	Exam 2	Exam 3
1	40	50	60
2	50	70	60
3	80	70	90
4	50	60	80

Means

Exam 1	Exam 2	Exam 3
55	62.5	72.5

Exam 1 = hardest
Student 3 dominates

Simple Averages

Student	Avg Score
1	50.00
2	60.00
3	80.00
4	63.33

Problem:

Student 2 vs 4 is unclear

PCA Results

PC	Eigenvalue	Eigenvector	% Variance
PC1	520.1	[0.74, 0.28, 0.60]	84.3%
PC2	78.1	[0.23, 0.74, -0.63]	12.7%
PC3	18.5	[0.63, -0.61, -0.48]	3.0%

We take PC1 (Pareto win).

Construct New Score

Score = (Exam1 × 0.74) + (Exam2 × 0.28) + (Exam3 × 0.60)

Interpretation:

A normalized “true ability” score
~84.3% accurate

Final Scores

Student	Score
1	31
2	44
3	84
4	53

Now:

Student 4 > Student 2 (consistency across difficulty)
Student 3 = clearly top

Reality Check

Yes… this is a bit hand-wavy.

That’s why PCA is mainly used for:

Dimensionality reduction
Not deep semantic interpretation

Example issue:

Physics vs Chemistry vs Math
Hard to interpret combined score meaningfully

Where This Goes Next

Instead of:

Collapsing variables blindly

We move toward:

Interpretable components
Naming eigenvectors based on weights

What’s Next?

Before that… we go to Part 3.

We’ll answer:

What is a neural network really doing?

If GPT = brain
Then ConvNet = eyes 👀

Outro

See you in Part 3…

Or don’t. idc 😄

DEV Community