Principal Components in TypeScript (Part 3)

#typescript #analytics #datascience

This is part three of a series Principal Components in TypeScript and focuses on the application of PCA to a visual explanation of vision neural networks

TL;DR

If you need a TL;DR, just read the code or grab the package here on npm

Not a Code Blog

This is not a code blog. There’s no easy copy-paste solution here.
If that’s what you want, go straight to the source code above.

Now this post attempts to use PCA in a totally different direction compared to the vanilla dimensionality reduction or to data compression discussed in the earlier posts. Basically, for this post I'm going to walk you through how to attempt to uncover insights from CNN features. But before that, we must answer a very important question.

Are CNNs outdated? Is this the age of Transformers 🤖

Not exactly, as ConvNext proved in 2022, CNNs still have a role to play in most vision tasks, as well as being pretty much the only option for highly performant edge deployment on extremely compute limited devices.

Since we have now answered the above question unconvincingly, let us attempt to figure out the how to do it. A note of warning, while I have specified typescript here (since that is what my library is written in) we still do not have a convincing CNN solution in pure JS or Typescript. The good news, I am working on one, and even though it isn't close to release I am going to leave the link to my github here for posterity's sake when it actually does release.

Channel/Self-Attention Visualization - What is it and What does it look like?

A CNN learns in terms of features not in terms of pixels. A three channel image is reduced in width and height dimensions, with information being processed in each individual channel. So you could take a Red, Green and Blue channels image and then have 128 channels individually as the output of a single layer that identify some key feature of the image.

For self-attention though, it is usually in terms of pixels, every single pixel corresponds to every other pixel in some sort of way. Is this really useful to a CNN? Opinions are split ... what I have found in my experiments is that it does cause a certain degree of training uncertainty, while performing better than baseline, but not super higher from making the network wider. There's a lot of architectures out there that make much better use of self attention than a CNN.

Features Visualization

Let's visualize features instead! Very simply put, this is ideal for dimensionality reduction. A most respected Technique used is Grad-CAM, which basically chooses an output, then backpropagates and highlights the neural layer activations.

As you might have thought about it, this process is quite slow, also cumbersome. But highly accurate. You get to see exactly what the layer is focusing on, and can tweak your training examples accordingly.

There are other techniques, similar to Grad-CAM that either adjust the scoring parameters, or use input perturbation to build an understanding, but these too, are cumbersome and time consuming.

However, what if you wanted to do it quickly, in one pass, with minimal code and get a directional answer?

Eigen CAM

From the article here applying eigencam to yolo -> Released in 2020 by Muhammad et al., it is based on class activation maps (CAM), focusing on making sense of what a model learns from the visual data in order to arrive at the predictions that it makes.

These class activation maps are useful for visualization in the model explainability space as the concept of significant features is aligned with how humans generally comprehend vision, so anyone is capable of looking at a class activation map, comparing it with the contents of the original image, and determining whether the model is truly grasping the important visual concepts that the human is seeing as well.

In simpler terms

This is just PCA to determine which areas of the image are focused on by feature maps

Here's the pytorch OR numpy equivalent in python and keep in mind that both have an svd method and both have a way get eigenvectors as a basis transformation. So the bulk of the work is basically done for us and all we need to do is provide an input:

        _, _, vT = torch.linalg.svd(feature)
        v1 = vT[:, :, 0, :][..., None, :]
        heatmap = feature @ v1.repeat(1, 1, v1.shape[3], 1)
        heatmap = heatmap.sum(1)
        heatmap -= heatmap.min()
        heatmap = heatmap / heatmap.max() * 255

It's that simple, you now get a visual heatmap of the layer. Here's how to use it in an actual model.. I am taking a pretrained mobilenet v3 from torchvision as my backbone, but it could be pretty much any CNN.

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.preprocess = T.Compose([T.Resize(768),T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
        self.layer_name = "features.16.0"
        self.model = mobilenet_v3_large(True).eval()
        self.hooked = {}

    def forward(self,x):
        #Identify the layer you want to study
        #This is usually either a middle hidden layer or the final layer before FC
        hook = self.model.features[16][0].register_forward_hook(self._forward_hook)
        tensor = self.preprocess(x).unsqueeze(0)
        output = self.model(tensor)
        feature = self.hooked['output']
        h,w = output.shape
        hook.remove()
        _, _, vT = torch.linalg.svd(feature)
        v1 = vT[:, :, 0, :][..., None, :]
        heatmap = feature @ v1.repeat(1, 1, v1.shape[3], 1)
        heatmap = heatmap.sum(1)
        heatmap -= heatmap.min()
        heatmap = heatmap / heatmap.max() * 255
        return heatmap

    def _forward_hook(self, module, inputs: Tuple[torch.Tensor], outputs):
        self.hooked['output'] = outputs

The JS equivalent, using PCA-JS

Thanks to yours truly, the library also exposes a really simple to use API that provides both SVD and the actual eigenvectors. So here's what you can do (with the caveat of not quite having a neural network lib in pure js yet)

import PCA from 'pca-js';

//feature is a [C][H][W] Array of Arrays
function heatmapFromFeature(feature) {
  const v1 = PCA.getEigenVectors(feature)[0].eigenvector;
  const heatmap = feature.map(row => {
    const rowSum = row.reduce((a, b) => a + b, 0);
    return v1.map(c => c * rowSum);
  });
  const flat = heatmap.flat();
  let lo = Infinity, hi = -Infinity;
  for (const v of flat) {
    if (v < lo) lo = v;
    if (v > hi) hi = v;
  }
  const range = hi - lo || 1;
  return heatmap.map(row => row.map(v => ((v - lo) / range) * 255));
}

Basically any 3D matrix in nchw format where n=1 should suffice.

Does it actually work?

Now that we've gone through the mechanics of the thing.. we come to the most important question ever asked. Does the damn thing work?

As you can see from the above, eigen vectors are a fast and cheap way to get directional insights instead of detailed explanations. For a full understanding of what your feature layer is learning, techniques like Grad-CAM++ are still SOTA.

In the next (and final) article, we will look at another off-the-beaten-track application of PCA. Namely, how to utilize it in a pure data scenario to get actual insights.