DEV Community

Cover image for πŸ“˜ The Science of Un-Mixing Data (PCA & ICA)
Sajjad Rahman
Sajjad Rahman

Posted on

πŸ“˜ The Science of Un-Mixing Data (PCA & ICA)

Part 1: The Math Toolbox (Prerequisites)

Before we can understand PCA and ICA, we need to understand the tools they use. Think of these as the "rules of the game" for handling data.

πŸ”Ή Basic Concepts

  • The Matrix (The Table): Data is organized into a matrix, which is just a giant grid of numbers. The columns usually represent different types of measurements (like different microphones or cameras), and the rows represent each specific moment in time we recorded.

  • The Vector (The Arrow): A single row or column from that table is called a vector. Mathematically, a vector is like an arrow pointing to a specific spot in a multi-dimensional space.

  • Inner Product (The Shadow): This is a way to multiply two vectors together. It tells us how much one vector "overlaps" with another. We use this to project our data onto new axes to see it from a better angle.

  • Basis Vectors (The Directions): These are the "original" directions we use to measure things, like the X and Y axes on a graph.


πŸ”Ή Statistical Concepts

  • Covariance (Redundancy): This measures how much two measurements "change together". If Measurement A always goes up when Measurement B goes up, they are redundant (highly correlated), meaning we don't really need both.

πŸ‘‰ Covariance Equation (Added, not replacing anything):

cov(A,B)=1nβˆ‘aibi \text{cov}(A,B) = \frac{1}{n} \sum a_i b_i

  • Gaussian Distribution (The Bell Curve): This is a smooth, bell-shaped curve that represents "randomness" or "noise". The Central Limit Theorem says that if you mix many different signals together, the result will always look like a Gaussian bell curve.

  • Kurtosis (The Peakedness): This is a math score that measures how "sharp" or "peaked" a distribution of numbers is. A high kurtosis means the data has a sharp point, while a Gaussian curve has a kurtosis of zero.


Part 2: PCA (Principal Component Analysis)


The Goal

The Goal: To simplify a giant pile of data by finding the "best angle" to look at it.


How it Works

  • How it works: PCA looks at the Covariance Matrix of the data to find where the measurements are repeating each other.

  • Eigenvectors and Eigenvalues: PCA calculates special directions called eigenvectors. The "largest" eigenvector points in the direction where the most "action" (variance) is happening.


Core Equation (Added)

P=Dβ‹…E P = D \cdot E
  • (D): data matrix
  • (E): eigenvectors of covariance
  • (P): principal components

πŸ‘‰ Also:

  • Covariance becomes diagonal after PCA

Key Properties

  • Dimensionality Reduction: By ignoring the tiny eigenvectors (which usually represent noise) and keeping only the big ones, we can make a huge data set much smaller without losing the important stuff.

  • The Rule of Orthogonality: In PCA, the new axes we find must always be orthogonalβ€”which is a fancy way of saying they must be at 90-degree right angles to each other.


Algorithm Steps (Added Section)

PCA Steps

  1. Center data (mean = 0)
  2. Compute covariance matrix
  3. Find eigenvectors
  4. Project data

Why PCA Can Fail (Added Section)

πŸ‘‰ PCA fails when:

  • Data is non-linear
  • Or variance β‰  true structure

Example:

  • Ferris wheel β†’ PCA cannot find circular motion

πŸ’‘ Add this line:

PCA only captures linear structure, ICA can handle more complex separation.


Part 3: ICA (Independent Component Analysis)


The Goal

The Goal: To solve the "Cocktail Party Problem"β€”taking a messy mixture of signals and separating them into their original, clear sources.


How it Works

  • Blind Source Separation: ICA is used when we have mixtures (like two microphones recording two people) but we don't know exactly how they were mixed.

  • The Weight Matrix ($W$): ICA tries to find a mathematical "unmixing" tool called a weight matrix. When we multiply our messy data by this matrix, the original signals should pop out.


Core Equation (Added)

Y=Xβ‹…W Y = X \cdot W
  • (X): mixed signals
  • (W): unmixing matrix
  • (Y): independent sources

πŸ‘‰ This is THE most important ICA equation


Key Concepts

  • The Search for Independence: ICA is stricter than PCA. It doesn't just want the data to be "not repeating"; it wants the signals to be statistically independent, meaning what happens in one signal tells you absolutely nothing about the other.

  • Non-Gaussianity (The Secret Trick): Because mixed-up signals look like smooth bell curves, ICA rotates the data until it finds the directions with maximum kurtosis (the most peaked shapes). A sharp peak usually means you've found a pure, unmixed source.

πŸ‘‰ Improved understanding:

  • Mixtures β†’ Gaussian (Central Limit Theorem)
  • Sources β†’ non-Gaussian (peaked)

πŸ’‘ Key idea:

ICA works because mixing makes data Gaussian, so unmixing looks for non-Gaussian signals


  • Flexibility: Unlike PCA, the axes in ICA do not have to be at right angles. They can point in any direction needed to find the sources.

Algorithm Steps (Added Section)

ICA Steps

  1. Center + whiten data
  2. Initialize weights
  3. Maximize non-Gaussianity
  4. Iterate until convergence

Part 4: PCA vs ICA (Added Comparison Table)

Feature PCA ICA
Goal Max variance Independence
Output Uncorrelated Independent
Axes Orthogonal Not required
Uses Compression Signal separation
Assumption Gaussian OK Non-Gaussian needed

πŸ‘‰ Lecture explicitly says:

  • PCA β†’ decorrelation
  • ICA β†’ independence (stronger)

Part 5: Why do we use these?

These tools are used in many cool ways:

  1. Fetal Heart Monitoring: Separating a baby's tiny heartbeat from the mother's much louder heartbeat.
  2. EEG (Brain Waves): Removing "trash" signals like eye blinks or heartbeats from recordings of brain activity.
  3. fMRI (Brain Imaging): Finding which specific parts of the brain are working together during a task.
  4. Computer Vision: Understanding how our eyes and brain recognize edges and shapes in the world around us.

Top comments (0)