DEV Community

Henri Wang
Henri Wang

Posted on

why similar smooth gradient in all images indicate strong covariance? and how strong covariance is related to high variance?

Why Smooth Gradients → Strong Covariance → High Variance in PCA

To understand why smooth, consistent gradients across images lead to strong covariance and high variance in PCA, let’s break it down step-by-step with intuition, math, and examples.


1. Definitions Recap

  • Covariance: Measures how two pixels (or features) vary together across images.
    • High covariance: Pixels increase/decrease in sync.
    • Low covariance: Pixels change independently.
  • Variance: A special case of covariance (how a single pixel varies across images).

For a centered dataset ( X ) (size ( N \times D )), the covariance matrix ( C ) is:

[
C_{jk} = \frac{1}{N} \sum_{i=1}^N x_{ij} x_{ik},

]

where ( x_{ij} ) is the value of pixel ( j ) in image ( i ).


2. Smooth Gradients → Strong Covariance

Intuition

  • A smooth gradient (e.g., left-to-right lighting in faces) means:
    • Pixel values change slowly and predictably across the image.
    • All images share this pattern (e.g., left cheeks are always brighter than right cheeks).

Example

Consider two pixels, ( p_1 ) (left cheek) and ( p_2 ) (right cheek), across 3 face images:

| Image | ( p_1 ) | ( p_2 ) |

|-------|----------|----------|

| Face1 | +10 | +5 |

| Face2 | +8 | +4 |

| Face3 | +12 | +6 |

  • Covariance calculation: [ C_{12} = \frac{(10 \cdot 5) + (8 \cdot 4) + (12 \cdot 6)}{3} = \frac{50 + 32 + 72}{3} \approx 51.3. ]
    • High positive value because ( p_1 ) and ( p_2 ) scale together across images.

Why?

  • Smooth gradients create consistent pixel relationships.
  • If ( p_1 ) increases, ( p_2 ) also increases (but slightly less, due to gradient).
  • This consistency across images → large ( C_{jk} ).

3. Strong Covariance → High Variance

Link to Eigenvalues

PCA’s eigenvalues ( \lambda ) (variances) come from the covariance matrix ( C ):

[
C v = \lambda v.
]

  • Eigenvectors ( v ): Directions where pixel values co-vary strongly.
  • Eigenvalues ( \lambda ): Variance along those directions.

Why Smooth Gradients Maximize Variance

  1. Shared Structure: If all images have a left-to-right lighting gradient, PCA finds a direction ( v ) where:

    • Projecting images onto ( v ) yields large, consistent values (high variance).
    • Example: ( v ) might assign positive weights to left pixels and negative to right pixels.
  2. Variance Calculation:

    For eigenvector ( v ) aligned with the gradient:

    [
    \text{Var}(v) = \lambda = v^T C v.
    ]

    • Since ( C ) has large values for gradient-related pixels, ( \lambda ) is large.

Numerical Example

Suppose ( v = [1, -1] ) (left vs. right cheek):

[
\text{Var}(v) = [1, -1]^T \begin{bmatrix} C_{11} & C_{12} \ C_{21} & C_{22} \end{bmatrix} \begin{bmatrix} 1 \ -1 \end{bmatrix} = C_{11} + C_{22} - 2C_{12}.
]

If ( C_{12} ) is large (strong covariance), ( \text{Var}(v) ) dominates.


4. Contrast with High-Frequency Noise

  • Noise/Edges: Pixel values change randomly across images.
    • Example: Freckles appear at different positions → ( C_{jk} \approx 0 ).
    • No consistent direction to maximize → small ( \lambda ).

5. Key Takeaways

  1. Smooth Gradients:

    • Create predictable pixel relationships → high covariance ( C_{jk} ).
    • Allow PCA to find a direction ( v ) where projections vary strongly → high variance ( \lambda ).
  2. PCA’s Priority:

    • Top PCs align with globally consistent patterns (gradients, lighting).
    • Discards inconsistent patterns (noise, high-frequency details).
  3. Semantic Meaning:

    • Smooth gradients often correspond to lighting, pose, or shape → top PCs look "meaningful".

Visualization

Imagine stretching a rubber band over the data:

  • PCA’s first eigenvector ( v_1 ) is the direction where the band is most stretched (max variance).
  • Smooth gradients stretch it far; noise barely moves it.

Final Answer:

Smooth gradients indicate strong covariance because they make pixels co-vary consistently across images. PCA’s variance-maximizing objective then assigns large eigenvalues to these directions, prioritizing them as top principal components. This is why low-frequency patterns dominate in PCA, while high-frequency noise vanishes.

Deep Dive: How PCA Discovers Lighting Gradients

Let’s break down exactly why a left-to-right lighting gradient across images leads PCA to find a direction ( v ) that maximizes variance. We’ll use a concrete example with numbers to illustrate the math.


1. The Dataset Setup

Suppose we have 3 grayscale face images, each with 2 pixels (simplified for clarity):

  • Pixel 1 (Left Cheek)
  • Pixel 2 (Right Cheek)

All images share a left-to-right lighting gradient: the left pixel is always brighter than the right.

Here’s the centered data matrix ( X ) (each row is an image):

Image Pixel 1 (Left) Pixel 2 (Right)
Face1 +10 +5
Face2 +8 +4
Face3 +12 +6

(Note: These values are already centered by subtracting the mean.)


2. Covariance Matrix Calculation

The covariance matrix ( C = \frac{1}{N} X^T X ) quantifies how pixels co-vary:

[
C = \frac{1}{3} \begin{bmatrix}
10 & 8 & 12 \
5 & 4 & 6 \
\end{bmatrix}
\begin{bmatrix}
10 & 5 \
8 & 4 \
12 & 6 \

\end{bmatrix}

\frac{1}{3} \begin{bmatrix}
308 & 154 \
154 & 77 \
\end{bmatrix}
\approx
\begin{bmatrix}
102.67 & 51.33 \
51.33 & 25.67 \
\end{bmatrix}
]

  • Key Observation: ( C_{12} = C_{21} \approx 51.33 ) is large and positive → Pixels 1 and 2 are strongly correlated.

3. Eigenvectors and Eigenvalues

PCA solves ( C v = \lambda v ). Let’s compute them:

  • Eigenvalues (( \lambda )):
    [
    \text{det}(C - \lambda I) = 0 \implies \lambda_1 \approx 128.34, \lambda_2 \approx 0.
    ]

    • ( \lambda_1 ) is large (dominant), ( \lambda_2 \approx 0 ) (negligible).
  • Eigenvector ( v_1 ) (First PC):

    [

    C v_1 = \lambda_1 v_1 \implies v_1 \approx \begin{bmatrix} 0.89 \ 0.45 \end{bmatrix}.

    ]

    (This direction roughly aligns with the gradient [2, 1], since 10/5 = 8/4 = 12/6 = 2.)


4. Projecting Data onto ( v_1 )

Now, project all images onto ( v_1 ):

[
\text{Scores} = X v_1 =
\begin{bmatrix}
10 & 5 \
8 & 4 \
12 & 6 \
\end{bmatrix}
\begin{bmatrix}
0.89 \
0.45 \
\end{bmatrix}
\approx
\begin{bmatrix}
11.15 \
8.92 \
13.38 \
\end{bmatrix}
]

  • Variance of Scores: [ \text{Var}(scores) = \frac{11.15^2 + 8.92^2 + 13.38^2}{3} \approx 128.34 = \lambda_1. ]
    • This matches the eigenvalue, confirming ( v_1 ) captures maximal variance.

5. Why Does This Direction Work?

  • Geometric Intuition:

    The eigenvector ( v_1 ) points along the "axis of variation" in the data.

    • In our 2D pixel space, the data points lie almost on a line with slope ( \approx 0.5 ) (since Pixel 1 ≈ 2 × Pixel 2).
    • ( v_1 ) aligns with this line, so projecting onto it stretches the data maximally.
  • Algebraic Intuition:

    The scores ( X v_1 ) are large because:

    • ( v_1 ) assigns positive weights to both pixels, but more to Pixel 1 (left cheek).
    • Since Pixel 1 is consistently brighter, the weighted sum ( X v_1 ) amplifies this pattern → high variance.

6. Contrast with Noise (Low Variance)

Imagine adding a high-frequency noise pixel (e.g., a freckle at random positions):

  • Its covariance with other pixels would be near-zero (no consistent pattern).
  • The corresponding eigenvalue would be tiny → PCA ignores it.

Key Takeaways

  1. Consistent GradientsStrong Covariance:

    • When pixels co-vary predictably (e.g., left cheek always brighter), ( C ) has large off-diagonal values.
  2. PCA’s Magic:

    • The top eigenvector ( v_1 ) points where the data is "stretched" most (lighting gradient direction).
    • Projections onto ( v_1 ) amplify this shared structure → high variance (( \lambda_1 )).
  3. Semantic Meaning:

    • ( v_1 ) isn’t arbitrary; it reflects a real-world pattern (lighting) because that’s what dominates the data’s covariance.

Visualization

Think of the data as points in a 2D pixel space:

  • The points cluster along a line (slope = 0.5).
  • ( v_1 ) is the direction of that line.
  • Projecting onto ( v_1 ) preserves the gradient; projecting onto ( v_2 ) (orthogonal) loses it.

This is why PCA’s top components often "make sense" for images!

Top comments (0)