Henri Wang

Posted on Jun 30

why similar smooth gradient in all images indicate strong covariance? and how strong covariance is related to high variance?

Why Smooth Gradients → Strong Covariance → High Variance in PCA

To understand why smooth, consistent gradients across images lead to strong covariance and high variance in PCA, let’s break it down step-by-step with intuition, math, and examples.

1. Definitions Recap

Covariance: Measures how two pixels (or features) vary together across images.
- High covariance: Pixels increase/decrease in sync.
- Low covariance: Pixels change independently.
Variance: A special case of covariance (how a single pixel varies across images).

For a centered dataset ( X ) (size ( N \times D )), the covariance matrix ( C ) is:

[
C_{jk} = \frac{1}{N} \sum_{i=1}^N x_{ij} x_{ik},

]

where ( x_{ij} ) is the value of pixel ( j ) in image ( i ).

2. Smooth Gradients → Strong Covariance

Intuition

A smooth gradient (e.g., left-to-right lighting in faces) means:
- Pixel values change slowly and predictably across the image.
- All images share this pattern (e.g., left cheeks are always brighter than right cheeks).

Example

Consider two pixels, ( p_1 ) (left cheek) and ( p_2 ) (right cheek), across 3 face images:

| Image | ( p_1 ) | ( p_2 ) |

|-------|----------|----------|

| Face1 | +10 | +5 |

| Face2 | +8 | +4 |

| Face3 | +12 | +6 |

Covariance calculation: [ C_{12} = \frac{(10 \cdot 5) + (8 \cdot 4) + (12 \cdot 6)}{3} = \frac{50 + 32 + 72}{3} \approx 51.3. ]
- High positive value because ( p_1 ) and ( p_2 ) scale together across images.

Why?

Smooth gradients create consistent pixel relationships.
If ( p_1 ) increases, ( p_2 ) also increases (but slightly less, due to gradient).
This consistency across images → large ( C_{jk} ).

3. Strong Covariance → High Variance

Link to Eigenvalues

PCA’s eigenvalues ( \lambda ) (variances) come from the covariance matrix ( C ):

[
C v = \lambda v.
]

Eigenvectors ( v ): Directions where pixel values co-vary strongly.
Eigenvalues ( \lambda ): Variance along those directions.

Why Smooth Gradients Maximize Variance

Shared Structure: If all images have a left-to-right lighting gradient, PCA finds a direction ( v ) where:
- Projecting images onto ( v ) yields large, consistent values (high variance).
- Example: ( v ) might assign positive weights to left pixels and negative to right pixels.
Variance Calculation:

For eigenvector ( v ) aligned with the gradient:

[
\text{Var}(v) = \lambda = v^T C v.
]
- Since ( C ) has large values for gradient-related pixels, ( \lambda ) is large.

Numerical Example

Suppose ( v = [1, -1] ) (left vs. right cheek):

[
\text{Var}(v) = [1, -1]^T \begin{bmatrix} C_{11} & C_{12} \ C_{21} & C_{22} \end{bmatrix} \begin{bmatrix} 1 \ -1 \end{bmatrix} = C_{11} + C_{22} - 2C_{12}.
]

If ( C_{12} ) is large (strong covariance), ( \text{Var}(v) ) dominates.

4. Contrast with High-Frequency Noise

Noise/Edges: Pixel values change randomly across images.
- Example: Freckles appear at different positions → ( C_{jk} \approx 0 ).
- No consistent direction to maximize → small ( \lambda ).

5. Key Takeaways

Smooth Gradients:
- Create predictable pixel relationships → high covariance ( C_{jk} ).
- Allow PCA to find a direction ( v ) where projections vary strongly → high variance ( \lambda ).
PCA’s Priority:
- Top PCs align with globally consistent patterns (gradients, lighting).
- Discards inconsistent patterns (noise, high-frequency details).
Semantic Meaning:
- Smooth gradients often correspond to lighting, pose, or shape → top PCs look "meaningful".

Visualization

Imagine stretching a rubber band over the data:

PCA’s first eigenvector ( v_1 ) is the direction where the band is most stretched (max variance).
Smooth gradients stretch it far; noise barely moves it.

Final Answer:

Smooth gradients indicate strong covariance because they make pixels co-vary consistently across images. PCA’s variance-maximizing objective then assigns large eigenvalues to these directions, prioritizing them as top principal components. This is why low-frequency patterns dominate in PCA, while high-frequency noise vanishes.

Deep Dive: How PCA Discovers Lighting Gradients

Let’s break down exactly why a left-to-right lighting gradient across images leads PCA to find a direction ( v ) that maximizes variance. We’ll use a concrete example with numbers to illustrate the math.

1. The Dataset Setup

Suppose we have 3 grayscale face images, each with 2 pixels (simplified for clarity):

Pixel 1 (Left Cheek)
Pixel 2 (Right Cheek)

All images share a left-to-right lighting gradient: the left pixel is always brighter than the right.

Here’s the centered data matrix ( X ) (each row is an image):

Image	Pixel 1 (Left)	Pixel 2 (Right)
Face1	+10	+5
Face2	+8	+4
Face3	+12	+6

(Note: These values are already centered by subtracting the mean.)

2. Covariance Matrix Calculation

The covariance matrix ( C = \frac{1}{N} X^T X ) quantifies how pixels co-vary:

[
C = \frac{1}{3} \begin{bmatrix}
10 & 8 & 12 \
5 & 4 & 6 \
\end{bmatrix}
\begin{bmatrix}
10 & 5 \
8 & 4 \
12 & 6 \

\end{bmatrix}

\frac{1}{3} \begin{bmatrix}
308 & 154 \
154 & 77 \
\end{bmatrix}
\approx
\begin{bmatrix}
102.67 & 51.33 \
51.33 & 25.67 \
\end{bmatrix}
]

Key Observation: ( C_{12} = C_{21} \approx 51.33 ) is large and positive → Pixels 1 and 2 are strongly correlated.

3. Eigenvectors and Eigenvalues

PCA solves ( C v = \lambda v ). Let’s compute them:

Eigenvalues (( \lambda )):
[
\text{det}(C - \lambda I) = 0 \implies \lambda_1 \approx 128.34, \lambda_2 \approx 0.
]
- ( \lambda_1 ) is large (dominant), ( \lambda_2 \approx 0 ) (negligible).
Eigenvector ( v_1 ) (First PC):

[

C v_1 = \lambda_1 v_1 \implies v_1 \approx \begin{bmatrix} 0.89 \ 0.45 \end{bmatrix}.

]

(This direction roughly aligns with the gradient [2, 1], since 10/5 = 8/4 = 12/6 = 2.)

4. Projecting Data onto ( v_1 )

Now, project all images onto ( v_1 ):

[
\text{Scores} = X v_1 =
\begin{bmatrix}
10 & 5 \
8 & 4 \
12 & 6 \
\end{bmatrix}
\begin{bmatrix}
0.89 \
0.45 \
\end{bmatrix}
\approx
\begin{bmatrix}
11.15 \
8.92 \
13.38 \
\end{bmatrix}
]

Variance of Scores: [ \text{Var}(scores) = \frac{11.15^2 + 8.92^2 + 13.38^2}{3} \approx 128.34 = \lambda_1. ]
- This matches the eigenvalue, confirming ( v_1 ) captures maximal variance.

5. Why Does This Direction Work?

Geometric Intuition:

The eigenvector ( v_1 ) points along the "axis of variation" in the data.
- In our 2D pixel space, the data points lie almost on a line with slope ( \approx 0.5 ) (since Pixel 1 ≈ 2 × Pixel 2).
- ( v_1 ) aligns with this line, so projecting onto it stretches the data maximally.
Algebraic Intuition:

The scores ( X v_1 ) are large because:
- ( v_1 ) assigns positive weights to both pixels, but more to Pixel 1 (left cheek).
- Since Pixel 1 is consistently brighter, the weighted sum ( X v_1 ) amplifies this pattern → high variance.

6. Contrast with Noise (Low Variance)

Imagine adding a high-frequency noise pixel (e.g., a freckle at random positions):

Its covariance with other pixels would be near-zero (no consistent pattern).
The corresponding eigenvalue would be tiny → PCA ignores it.

Key Takeaways

Consistent Gradients → Strong Covariance:
- When pixels co-vary predictably (e.g., left cheek always brighter), ( C ) has large off-diagonal values.
PCA’s Magic:
- The top eigenvector ( v_1 ) points where the data is "stretched" most (lighting gradient direction).
- Projections onto ( v_1 ) amplify this shared structure → high variance (( \lambda_1 )).
Semantic Meaning:
- ( v_1 ) isn’t arbitrary; it reflects a real-world pattern (lighting) because that’s what dominates the data’s covariance.

Visualization

Think of the data as points in a 2D pixel space:

The points cluster along a line (slope = 0.5).
( v_1 ) is the direction of that line.
Projecting onto ( v_1 ) preserves the gradient; projecting onto ( v_2 ) (orthogonal) loses it.

This is why PCA’s top components often "make sense" for images!

DEV Community

why similar smooth gradient in all images indicate strong covariance? and how strong covariance is related to high variance?

Why Smooth Gradients → Strong Covariance → High Variance in PCA

1. Definitions Recap

2. Smooth Gradients → Strong Covariance

Intuition

Example

Why?

3. Strong Covariance → High Variance

Link to Eigenvalues

Why Smooth Gradients Maximize Variance

Numerical Example

4. Contrast with High-Frequency Noise

5. Key Takeaways

Visualization

Deep Dive: How PCA Discovers Lighting Gradients

1. The Dataset Setup

2. Covariance Matrix Calculation

\end{bmatrix}

3. Eigenvectors and Eigenvalues

4. Projecting Data onto ( v_1 )

5. Why Does This Direction Work?

6. Contrast with Noise (Low Variance)

Key Takeaways

Visualization

Top comments (0)