DEV Community

bitanath
bitanath

Posted on

Principal Components in TypeScript (Part 2)

This is part two of a series Principal Components in TypeScript

TL;DR

If you need a TL;DR, just read the code here:
https://www.npmjs.com/package/pca-js?activeTab=code


Not a Code Blog

This is not a code blog. There’s no easy copy-paste solution here.
If that’s what you want, go straight to the source code above.


Step 1: Normalize the Data

Our first step is to normalize data by subtracting the mean.

An elegant way to do this:

  • Create a unit square matrix
  • Multiply it with the data
  • Scale by 1 / number_of_rows

This gives you the mean matrix, which you subtract from the original data.

This is formally known as the Deviation Matrix.

Example Code

let unit = unitSquareMatrix(matrix.length);
let deviationMatrix = subtract(matrix, multiplyAndScale(unit, matrix, 1 / matrix.length));
const D = deviationMatrix;
Enter fullscreen mode Exit fullscreen mode

Where multiplyAndScale fuses matrix multiplication and scaling:

/**
 * Fix for #11, OOM on moderately large datasets, fuses scale and multiply into a single operation to save memory
 */
export function multiplyAndScale(a: Matrix, b: Matrix, factor: number): Matrix {
    assertValidMatrices(a, b, "a", "b")

    const aRows = a.length;
    const aCols = a[0].length;
    const bCols = b[0].length;

    const flat = new Float64Array(aRows * bCols);

    for (let i = 0; i < aRows; i++) {
        for (let k = 0; k < aCols; k++) {
            const aVal = a[i][k] * factor;
            const iOffset = i * bCols;
            for (let j = 0; j < bCols; j++) {
                flat[iOffset + j] += aVal * b[k][j];
            }
        }
    }

    const result: Matrix = [];
    for (let i = 0; i < aRows; i++) {
        result[i] = Array.from(flat.subarray(i * bCols, (i + 1) * bCols));
    }
    return result;
}
Enter fullscreen mode Exit fullscreen mode

But… Is This Optimal?

Not really.

  • Triple loop → O(n³) worst case
  • Can still cause memory issues

A simpler approach:

matrix.map(row =>
  row.map((v, i) => v - matrix.reduce((s, r) => s + r[i], 0) / matrix.length)
);
Enter fullscreen mode Exit fullscreen mode

The library prefers elegance over optimization, assuming eventual GPU acceleration.


Step 2: Deviation Scores

Next step:

Dᵀ @ D
Enter fullscreen mode Exit fullscreen mode
  • Multiply transpose of deviation matrix with itself
  • @ = matrix multiplication (Python notation)

Then divide by number of rows → Variance-Covariance Matrix


Why Are We Doing This?

Because raw data is useless for analysis.

We reshape it into a form that:

  • Has structure
  • Encodes relationships
  • Can be mathematically decomposed

Variance-Covariance Matrix

For 3 features:

[
  [var(f1),    cov(f1,f2), cov(f1,f3)],
  [cov(f2,f1), var(f2),    cov(f2,f3)],
  [cov(f3,f1), cov(f3,f2), var(f3)]
]
Enter fullscreen mode Exit fullscreen mode
  • Diagonal → variance
  • Off-diagonal → covariance

What is SVD?

Singular Value Decomposition:

SVD = U * Σ * Vᵀ
Enter fullscreen mode Exit fullscreen mode
  • U → row characteristics
  • V → column characteristics
  • Σ (Sigma) → importance (weights)

For PCA:

  • We care about columns
  • So:

    • V → eigenvectors
    • Σ → eigenvalues

What Are Eigenvectors?

“Eigen” = German for “own” (yes really)

Think of eigenvectors as:

  • Characteristic directions in your data

Eigenvalues:

  • Tell you how important each direction is

Choosing Components

Sort eigenvalues in descending order.

Then:

percentage_explained = Σ(selected eigenvalues) / Σ(all eigenvalues)
Enter fullscreen mode Exit fullscreen mode
  • Pick top 1 → decent approximation
  • Pick top 2 → better
  • Pick all → original data

Usually:

  • First component explains ~80% (Pareto principle)

Compressed Data

compressed = selected_eigenvectors × centered_dataᵀ
Enter fullscreen mode Exit fullscreen mode

Example:

  • Original: 4 columns × 30 rows
  • Reduced: 2 columns × 30 rows

Now scale that to millions of rows 👀


Reconstructing Data

original ≈ selected_eigenvectorsᵀ × compressed + mean
Enter fullscreen mode Exit fullscreen mode
  • This is lossy compression
  • Some information is gone forever

What Did We Actually Achieve?

Compression… but let’s make it real.


Example: Student Scores

You’re a teacher. Three exams, varying difficulty.

Raw Data

Student Exam 1 Exam 2 Exam 3
1 40 50 60
2 50 70 60
3 80 70 90
4 50 60 80

Means

Exam 1 Exam 2 Exam 3
55 62.5 72.5
  • Exam 1 = hardest
  • Student 3 dominates

Simple Averages

Student Avg Score
1 50.00
2 60.00
3 80.00
4 63.33

Problem:

  • Student 2 vs 4 is unclear

PCA Results

PC Eigenvalue Eigenvector % Variance
PC1 520.1 [0.74, 0.28, 0.60] 84.3%
PC2 78.1 [0.23, 0.74, -0.63] 12.7%
PC3 18.5 [0.63, -0.61, -0.48] 3.0%

We take PC1 (Pareto win).


Construct New Score

Score = (Exam1 × 0.74) + (Exam2 × 0.28) + (Exam3 × 0.60)
Enter fullscreen mode Exit fullscreen mode

Interpretation:

  • A normalized “true ability” score
  • ~84.3% accurate

Final Scores

Student Score
1 31
2 44
3 84
4 53

Now:

  • Student 4 > Student 2 (consistency across difficulty)
  • Student 3 = clearly top

Reality Check

Yes… this is a bit hand-wavy.

That’s why PCA is mainly used for:

  • Dimensionality reduction
  • Not deep semantic interpretation

Example issue:

  • Physics vs Chemistry vs Math
  • Hard to interpret combined score meaningfully

Where This Goes Next

Instead of:

  • Collapsing variables blindly

We move toward:

  • Interpretable components
  • Naming eigenvectors based on weights

What’s Next?

Before that… we go to Part 3.

We’ll answer:

What is a neural network really doing?

If GPT = brain
Then ConvNet = eyes 👀


Outro

See you in Part 3…

Or don’t. idc 😄

Top comments (0)