DEV Community

Samyak Jain
Samyak Jain

Posted on

Understanding SVD's Intuition (Singular Value Decomposition)

What is SVD

Singular Value Decomposition (SVD) is a fundamental matrix factorisation technique in linear algebra that decomposes a matrix into three simpler component matrices. It's incredibly versatile and powerful, serving as the backbone for numerous applications across various fields.

Why Was SVD Developed?

SVD was developed to solve the problem of finding the best approximation of a matrix by a lower-rank matrix. Mathematicians needed a way to:

  1. Factorise the original matrix into three smaller matrices that capture hidden relationship , It will find latent features that explain the patterns in a given matrix.
  2. Understand the fundamental structure of linear transformations.
  3. Analyse the underlying properties of matrices regardless of their dimensions.

The core purpose was to find a way to decompose any matrix into simpler, more manageable components that reveal its essential properties - specifically its rank, range (column space), and null space.

What does it mean to find the best approximation of a matrix by a lower-rank matrix?

When we have a large, complex matrix A with rank r, we often want to simplify it to save computational resources while preserving as much of the original information as possible.

Finding the "best approximation" means:

  • Creating a new matrix  with lower rank k (where k < r)
  • Ensuring this approximation minimises the error (typically measured as the Frobenius norm ||A - Â||)
  • Capturing the most important patterns/structures in the original data

This is valuable because lower-rank matrices:

  • Require less storage space
  • Allow faster computations
  • Often filter out noise while preserving signal

What is a Latent Feature?

A latent feature is:

A property or concept that isn't explicitly observed in the data, but is inferred from patterns across the matrix.

In simple terms:

  • It's a hidden factor that explains why people behave the way they do.
  • You don’t know what it is, but you see its effects.

Analogy: Music Taste

Imagine a user-song matrix: rows = users, columns = songs, entries = ratings.

You don’t label features like "likes punk rock" or "prefers instrumental", but...

  • If two users rate the same songs highly,
  • And those songs share some vibe,

Then you can infer there's some latent preference shared.
That hidden dimension — like “preference for energetic music” — is a latent feature.

Definition

Given any real matrix A ∈ ℝm×n, SVD says:

Where:

Component Shape Role
U m×k Orthonormal matrix — maps original rows into a k-dimensional latent space. Each row is a **latent [[Vectors\
Σ k×k Diagonal matrix with non-negative real numbers called singular values, sorted from largest to smallest. These represent the importance (energy) of each dimension in the latent space.
Vᵀ k×n Orthonormal matrix — maps original columns into the same k-dimensional latent space. Each column is a latent vector for a column of A.

The value k = rank(A) in exact decomposition, or you can choose a smaller k for approximation (low-rank SVD).

📊 What Each Matrix Encodes (in abstract terms)

Component Encodes
U Each row in A becomes a vector in a latent space — capturing how each row projects onto abstract "directions" in the data.

This matrix shows how each user aligns with the latent features discovered by SVD.

It does not label them explicitly as “loves punk rock” or “loves romantic music,” but it captures underlying preferences.

In essence, U describes what kind of latent features (or preferences) each user has, without explicitly telling us the names of those features. It just tells us the degree to which each user aligns with each latent feature.
Σ Singular values show how much of A’s structure lies in each direction (feature). The higher the value, the more "energy" or "information" it captures.

The singular values in Σ represent how important each latent feature is
Vᵀ Each column in A becomes a vector in the same latent space — capturing how each column contributes along those same directions.

This matrix describes how each song correlates with the latent features.

For example,

Song A (Punk) might have a high score for the latent feature "energetic music,"

while Song B (Romantic) might also score highly on that feature, but could also have a moderate score for a latent feature related to "emotional depth" (which both romantic and punk music might share).

Song C (Jazz) and Song D (Classical) would likely score higher on a latent feature related to "calm or soothing music."

In essence, Vᵀ describes what kind of latent features (or qualities) each song has, again without explicitly telling us what the features are. It just tells us the degree to which each song aligns with each latent feature.

Example Scenario

Let’s start with a user-song rating matrix where users rate songs from different genres. Each user has their own tastes, and some might enjoy music from different genres.

Song A (Punk) Song B (Romantic) Song C (Jazz) Song D (Classical)
Alice 5 4 ? 2
Bob 4 5 ? 1
Carol 1 2 5 5
  • Alice: Likes Song A (Punk) and Song B (Romantic).
  • Bob: Likes Song B (Romantic) and Song A (Punk) as well, but with slightly different ratings.
  • Carol: Prefers Song C (Jazz) and Song D (Classical). #### The Problem: How Do We Predict Missing Ratings?

We want to predict ratings for the missing entries (denoted by ?). For example, we want to predict how Alice might rate Song C (Jazz) or Song D (Classical). Similarly, we want to predict how Bob might rate Song C.

We don't explicitly know which genres each user likes. But SVD can discover latent features or hidden preferences based on their ratings.

Breakdown of U, Σ, Vᵀ

  1. U (User-to-Latent Features) – Describes the users in terms of the latent features.
  2. Σ (Singular Values) – Indicates the importance of each latent feature.
  3. Vᵀ (Song-to-Latent Features) – Describes the songs in terms of the latent features.

U: User-to-Latent Feature Matrix

  • Rows (users): Each row in U corresponds to one user.
  • Columns (latent features): Each column in U corresponds to one latent feature that was discovered by SVD.
  • Values: Each entry in U shows how strongly a user aligns with each latent feature. A high value in a column indicates that the user has a strong preference for that particular latent feature. A low value means they don’t have much of a preference for that feature.
  • Let’s say after applying SVD, we get a matrix U for our music example that looks like this:
Feature 1 (Energetic Music) Feature 2 (Calm Music)
Alice 0.8 0.2
Bob 0.9 0.1
Carol -0.3 0.9
  • Alice has a strong preference for Feature 1 (Energetic Music) with a value of 0.8, and a weak preference for Feature 2 (Calm Music) with a value of 0.2.

  • Bob also has a strong preference for Feature 1 (Energetic Music) (0.9), but he has a much weaker preference for Feature 2 (Calm Music) (0.1).

  • Carol, on the other hand, has a strong preference for Feature 2 (Calm Music) (0.9) and a weak preference for Feature 1 (Energetic Music) (-0.3).

  • What this Means:

    • Alice and Bob both like energetic music, which is why they have high values for Feature 1.
    • Carol prefers calm music, as indicated by her high value for Feature 2. #### Σ: Singular Value Matrix
  • A larger singular value indicates that the latent feature explains more of the variance in the ratings. For instance, the first latent feature might explain a significant portion of the ratings data (because both Alice and Bob like energetic music), while the second latent feature (which explains Carol’s preference for calm music) might explain less.

    Vᵀ: Song-to-Latent Feature Matrix

  • Rows (songs): Each row in Vᵀ corresponds to a particular song.

  • Columns (latent features): Each column corresponds to one latent feature discovered by SVD.

  • Values: Each entry in Vᵀ tells us how much the song aligns with a particular latent feature.

  • Here’s how the matrix Vᵀ might look for our music example:

Feature 1 (Energetic Music) Feature 2 (Calm Music)
Song A (Punk) 0.9 -0.1
Song B (Romantic) 0.8 0.2
Song C (Jazz) -0.2 0.9
Song D (Classical) -0.5 0.8
  • Song A (Punk) has a high value for Feature 1 (Energetic Music) (0.9), meaning that this song aligns with energetic or lively music, but it has a low value for Feature 2 (Calm Music) (-0.1), meaning it doesn't align with calm or soothing music.

  • Song B (Romantic) also has a high value for Feature 1 (Energetic Music) (0.8) but also has a moderate value for Feature 2 (Calm Music) (0.2), showing it may combine both energetic and calming elements.

  • Song C (Jazz) has a low value for Feature 1 (Energetic Music) (-0.2) and a high value for Feature 2 (Calm Music) (0.9), meaning it's a calm, soothing song.

  • Song D (Classical) also has a low value for Feature 1 (Energetic Music) (-0.5) and a high value for Feature 2 (Calm Music) (0.8), making it more of a calm or soothing piece of music.

  • What this Means:

    • Vᵀ shows us how each song relates to the latent features.
    • Song A (Punk) is closely related to Feature 1 (Energetic Music), while Song C (Jazz) is closely related to Feature 2 (Calm Music). ### Key Insight: What SVD Actually Reveals

Here’s where SVD’s magic comes in:

  • Alice and Bob like both Song A (Punk) and Song B (Romantic).

    • SVD captures that they share a preference for energetic, lively music, even though one song is punk and the other is romantic.
    • This shared preference shows up as a latent feature in U for Alice and Bob: both have high scores for this feature.
  • Carol, on the other hand, likes Song C (Jazz) and Song D (Classical), which are more calm and soothing.

    • SVD identifies this preference as another latent feature, and Carol scores highly on this second latent feature in U.
  • SVD doesn’t explicitly label these features as genres like “punk” or “romantic.”

    • It simply sees that Alice and Bob share some ratings for energetic music, and Carol shares ratings for calming music.
    • This means SVD will identify latent features that explain why Alice and Bob like Songs A and B, and why Carol prefers Songs C and D.

Predicting Missing Ratings with SVD

Now that we understand what each matrix represents, let's see how SVD helps us predict missing ratings:

  1. We've decomposed our original ratings matrix into U, Σ, and Vᵀ
  2. To predict Alice's rating for Song C (Jazz), we multiply her latent feature values by the importance of each feature, then by Song C's latent feature values:

Alice's predicted rating for Song C = (U_Alice × Σ × Vᵀ_SongC)

Using our example values:

  • Alice's latent feature values: [0.8, 0.2]
  • Singular values (assuming Σ = [[3, 0], [0, 2]]): 3 for Feature 1, 2 for Feature 2
  • Song C's latent feature values: [-0.2, 0.9]

Calculation:

  1. Alice's preference for Feature 1 × Importance of Feature 1 × Song C's alignment with Feature 1:
    0.8 × 3 × (-0.2) = -0.48

  2. Alice's preference for Feature 2 × Importance of Feature 2 × Song C's alignment with Feature 2:
    0.2 × 2 × 0.9 = 0.36

  3. Sum these values: -0.48 + 0.36 = -0.12

  4. Since ratings are typically positive, we might scale this to a rating range (e.g., 1-5):
    Adjusted rating ≈ 2.5 (neutral/slightly negative)

This makes intuitive sense: Alice strongly prefers energetic music (Feature 1), but Song C (Jazz) is negatively associated with that feature and strongly associated with calm music (Feature 2), which Alice only weakly prefers. Therefore, SVD predicts Alice would give Song C a relatively low rating.

Application in Dimensionality Reduction

In real-world data, there’s often a lot of redundancy — users rate songs similarly, or products have overlapping qualities. SVD helps by:

  • Capturing the most important features (patterns) in the data.
  • Removing noise and redundancy by ignoring less significant singular values.
  • Compressing the data: Instead of storing the full original matrix, we store a low-rank approximation.

How the Reduction Works

To reduce dimensionality:

  1. Choose a smaller rank k, where k < full rank.
  2. Keep only the top k singular values and their corresponding vectors in U and Vᵀ.
  3. Reconstruct the matrix approximately:

$A_k \approx U_k \Sigma_k V_k^T$

This reduced version of the matrix retains most of the meaningful information but uses far fewer numbers.


👥 Example: Alice, Bob, and Carol

Imagine we have a matrix of 3 users (Alice, Bob, Carol) rating 4 songs:

Punk Rock Love Ballad
Alice 5 4 1 1
Bob 4 5 1 0
Carol 1 1 5 4

This is a 3×4 matrix.

After applying SVD and reducing it to k = 2, we get:

  • U_k: 3×2 matrix — Each user represented by just 2 latent features.
  • Σ_k: 2×2 diagonal matrix — Strength of these 2 features.
  • V_kᵀ: 2×4 matrix — Each song represented using just 2 features.

This reduced version:

  • Reveals that Alice and Bob prefer a “Punk/Rock” dimension, while Carol prefers a “Love/Ballad” dimension.
  • Allows us to reconstruct an approximation of the original matrix using just the most important patterns.
  • Reduces noise and dimensionality without losing much of the core structure.

Why to perform this?

1. Captures Core Patterns, Not Raw Values

The original matrix told you explicitly:

“Alice rated Punk 5, Love 1…”

But those are surface-level observations.

The reduced version asks:

“What underlying factors might explain why Alice likes Punk and Rock more?”

For example:

  • Maybe Factor 1 represents a preference for high-energy music.
  • Maybe Factor 2 represents a preference for emotional or romantic themes.

In the reduced matrix, Alice might be represented as:

Alice → [2.1, 0.1] → Strong on Factor 1, Weak on Factor 2 Carol → [0.1, 2.3] → Weak on Factor 1, Strong on Factor 2

This tells us more than raw numbers — it shows why people like what they like.


2. Good Generalization — Not Just Memorization

The full matrix memorizes exact numbers.

The reduced matrix generalizes, letting us:

  • Predict missing values more robustly.
  • Spot similar users or items even if their ratings don’t match exactly.
  • Cluster users or songs by deeper, shared preferences.

This is crucial in recommendation systems like Netflix or Spotify — we often have incomplete data, and we want smart predictions, not perfect reconstructions.


3. Removes Noise

In real-world data, some ratings are noisy:

  • A person misclicked a 1 instead of 4.
  • Someone rated a song randomly.

SVD smooths over such inconsistencies by focusing on consistent trends, not one-off values.


Conclusion: Why SVD Matters

SVD's power lies in its ability to:

  1. Discover hidden patterns (latent features) in data
  2. Reduce dimensionality while preserving important information
  3. Enable accurate predictions for missing values
  4. Filter noise from data

Whether you're building recommendation systems, processing images, or analyzing text, SVD provides a mathematical foundation for understanding and working with complex data relationships.


Top comments (0)