Why Smooth Gradients → Strong Covariance → High Variance in PCA
To understand why smooth, consistent gradients across images lead to strong covariance and high variance in PCA, let’s break it down step-by-step with intuition, math, and examples.
1. Definitions Recap
-
Covariance: Measures how two pixels (or features) vary together across images.
- High covariance: Pixels increase/decrease in sync.
- Low covariance: Pixels change independently.
- Variance: A special case of covariance (how a single pixel varies across images).
For a centered dataset ( X ) (size ( N \times D )), the covariance matrix ( C ) is:
[
C_{jk} = \frac{1}{N} \sum_{i=1}^N x_{ij} x_{ik},
]
where ( x_{ij} ) is the value of pixel ( j ) in image ( i ).
2. Smooth Gradients → Strong Covariance
Intuition
- A smooth gradient (e.g., left-to-right lighting in faces) means:
- Pixel values change slowly and predictably across the image.
- All images share this pattern (e.g., left cheeks are always brighter than right cheeks).
Example
Consider two pixels, ( p_1 ) (left cheek) and ( p_2 ) (right cheek), across 3 face images:
| Image | ( p_1 ) | ( p_2 ) |
|-------|----------|----------|
| Face1 | +10 | +5 |
| Face2 | +8 | +4 |
| Face3 | +12 | +6 |
-
Covariance calculation:
[
C_{12} = \frac{(10 \cdot 5) + (8 \cdot 4) + (12 \cdot 6)}{3} = \frac{50 + 32 + 72}{3} \approx 51.3.
]
- High positive value because ( p_1 ) and ( p_2 ) scale together across images.
Why?
- Smooth gradients create consistent pixel relationships.
- If ( p_1 ) increases, ( p_2 ) also increases (but slightly less, due to gradient).
- This consistency across images → large ( C_{jk} ).
3. Strong Covariance → High Variance
Link to Eigenvalues
PCA’s eigenvalues ( \lambda ) (variances) come from the covariance matrix ( C ):
[
C v = \lambda v.
]
- Eigenvectors ( v ): Directions where pixel values co-vary strongly.
- Eigenvalues ( \lambda ): Variance along those directions.
Why Smooth Gradients Maximize Variance
-
Shared Structure: If all images have a left-to-right lighting gradient, PCA finds a direction ( v ) where:
- Projecting images onto ( v ) yields large, consistent values (high variance).
- Example: ( v ) might assign positive weights to left pixels and negative to right pixels.
-
Variance Calculation:
For eigenvector ( v ) aligned with the gradient:
[
\text{Var}(v) = \lambda = v^T C v.
]- Since ( C ) has large values for gradient-related pixels, ( \lambda ) is large.
Numerical Example
Suppose ( v = [1, -1] ) (left vs. right cheek):
[
\text{Var}(v) = [1, -1]^T \begin{bmatrix} C_{11} & C_{12} \ C_{21} & C_{22} \end{bmatrix} \begin{bmatrix} 1 \ -1 \end{bmatrix} = C_{11} + C_{22} - 2C_{12}.
]
If ( C_{12} ) is large (strong covariance), ( \text{Var}(v) ) dominates.
4. Contrast with High-Frequency Noise
-
Noise/Edges: Pixel values change randomly across images.
- Example: Freckles appear at different positions → ( C_{jk} \approx 0 ).
- No consistent direction to maximize → small ( \lambda ).
5. Key Takeaways
-
Smooth Gradients:
- Create predictable pixel relationships → high covariance ( C_{jk} ).
- Allow PCA to find a direction ( v ) where projections vary strongly → high variance ( \lambda ).
-
PCA’s Priority:
- Top PCs align with globally consistent patterns (gradients, lighting).
- Discards inconsistent patterns (noise, high-frequency details).
-
Semantic Meaning:
- Smooth gradients often correspond to lighting, pose, or shape → top PCs look "meaningful".
Visualization
Imagine stretching a rubber band over the data:
- PCA’s first eigenvector ( v_1 ) is the direction where the band is most stretched (max variance).
- Smooth gradients stretch it far; noise barely moves it.
Final Answer:
Smooth gradients indicate strong covariance because they make pixels co-vary consistently across images. PCA’s variance-maximizing objective then assigns large eigenvalues to these directions, prioritizing them as top principal components. This is why low-frequency patterns dominate in PCA, while high-frequency noise vanishes.
Deep Dive: How PCA Discovers Lighting Gradients
Let’s break down exactly why a left-to-right lighting gradient across images leads PCA to find a direction ( v ) that maximizes variance. We’ll use a concrete example with numbers to illustrate the math.
1. The Dataset Setup
Suppose we have 3 grayscale face images, each with 2 pixels (simplified for clarity):
- Pixel 1 (Left Cheek)
- Pixel 2 (Right Cheek)
All images share a left-to-right lighting gradient: the left pixel is always brighter than the right.
Here’s the centered data matrix ( X ) (each row is an image):
Image | Pixel 1 (Left) | Pixel 2 (Right) |
---|---|---|
Face1 | +10 | +5 |
Face2 | +8 | +4 |
Face3 | +12 | +6 |
(Note: These values are already centered by subtracting the mean.)
2. Covariance Matrix Calculation
The covariance matrix ( C = \frac{1}{N} X^T X ) quantifies how pixels co-vary:
[
C = \frac{1}{3} \begin{bmatrix}
10 & 8 & 12 \
5 & 4 & 6 \
\end{bmatrix}
\begin{bmatrix}
10 & 5 \
8 & 4 \
12 & 6 \
\end{bmatrix}
\frac{1}{3} \begin{bmatrix}
308 & 154 \
154 & 77 \
\end{bmatrix}
\approx
\begin{bmatrix}
102.67 & 51.33 \
51.33 & 25.67 \
\end{bmatrix}
]
- Key Observation: ( C_{12} = C_{21} \approx 51.33 ) is large and positive → Pixels 1 and 2 are strongly correlated.
3. Eigenvectors and Eigenvalues
PCA solves ( C v = \lambda v ). Let’s compute them:
-
Eigenvalues (( \lambda )):
[
\text{det}(C - \lambda I) = 0 \implies \lambda_1 \approx 128.34, \lambda_2 \approx 0.
]- ( \lambda_1 ) is large (dominant), ( \lambda_2 \approx 0 ) (negligible).
Eigenvector ( v_1 ) (First PC):
[
C v_1 = \lambda_1 v_1 \implies v_1 \approx \begin{bmatrix} 0.89 \ 0.45 \end{bmatrix}.
]
(This direction roughly aligns with the gradient [2, 1], since 10/5 = 8/4 = 12/6 = 2.)
4. Projecting Data onto ( v_1 )
Now, project all images onto ( v_1 ):
[
\text{Scores} = X v_1 =
\begin{bmatrix}
10 & 5 \
8 & 4 \
12 & 6 \
\end{bmatrix}
\begin{bmatrix}
0.89 \
0.45 \
\end{bmatrix}
\approx
\begin{bmatrix}
11.15 \
8.92 \
13.38 \
\end{bmatrix}
]
-
Variance of Scores:
[
\text{Var}(scores) = \frac{11.15^2 + 8.92^2 + 13.38^2}{3} \approx 128.34 = \lambda_1.
]
- This matches the eigenvalue, confirming ( v_1 ) captures maximal variance.
5. Why Does This Direction Work?
-
Geometric Intuition:
The eigenvector ( v_1 ) points along the "axis of variation" in the data.- In our 2D pixel space, the data points lie almost on a line with slope ( \approx 0.5 ) (since Pixel 1 ≈ 2 × Pixel 2).
- ( v_1 ) aligns with this line, so projecting onto it stretches the data maximally.
-
Algebraic Intuition:
The scores ( X v_1 ) are large because:- ( v_1 ) assigns positive weights to both pixels, but more to Pixel 1 (left cheek).
- Since Pixel 1 is consistently brighter, the weighted sum ( X v_1 ) amplifies this pattern → high variance.
6. Contrast with Noise (Low Variance)
Imagine adding a high-frequency noise pixel (e.g., a freckle at random positions):
- Its covariance with other pixels would be near-zero (no consistent pattern).
- The corresponding eigenvalue would be tiny → PCA ignores it.
Key Takeaways
-
Consistent Gradients → Strong Covariance:
- When pixels co-vary predictably (e.g., left cheek always brighter), ( C ) has large off-diagonal values.
-
PCA’s Magic:
- The top eigenvector ( v_1 ) points where the data is "stretched" most (lighting gradient direction).
- Projections onto ( v_1 ) amplify this shared structure → high variance (( \lambda_1 )).
-
Semantic Meaning:
- ( v_1 ) isn’t arbitrary; it reflects a real-world pattern (lighting) because that’s what dominates the data’s covariance.
Visualization
Think of the data as points in a 2D pixel space:
- The points cluster along a line (slope = 0.5).
- ( v_1 ) is the direction of that line.
- Projecting onto ( v_1 ) preserves the gradient; projecting onto ( v_2 ) (orthogonal) loses it.
This is why PCA’s top components often "make sense" for images!
Top comments (0)