Principal Component Analysis, PCA for short, is a bit of a “deep” topic in machine learning. By deep I mean that you need to know about clusters, covariance, and then you learn about principal component analysis – who knows what else is there after PCA. Learning about PCA is like learning about glucose: you can’t really understand glucose unless you already know what sugar is. I say this to emphasize that PCA is a challenging concept to grasp without some prior knowledge. That being said, here’s what PCA means.
When creating clusters of data, we often deal with multi-dimensional data points. For me it’s easy to picture a point in a 2D axis (x, y), or even in 3D (x, y, z). But once it goes beyond 3D into 4D or more, it becomes nearly impossible to mentally visualize. Visualizing the data itself is already hard — now imagine trying to rotate or move one of these high-dimensional points. At that point, I just trust the math that it works. Enough venting. So, what is PCA?
PCA comes from simplifying the covariance matrix.
What is a covariance matrix?
A covariance matrix is a 2D table that shows how the variance of one feature relates to another. For example, suppose we have a dataset with the following features:
• calories_consumed
• amount_of_sleep
• amount_of_social_media
• miles_ran
The covariance matrix for these might look like this (values are illustrative, not from real data):
• The diagonal values (25.0, 9.0, 30.0, 40.0) are the variances of each feature.
• The off-diagonal values show the covariance between two features. For example:
• Calories consumed and miles ran = -20.0 (negative covariance → running more tends to reduce calories consumed).
• Sleep and social media = -15.0 (negative covariance → more sleep, less scrolling).
This table is already hard to interpret with just 4 features. Imagine 100+ features — the relationships get too complex to reason about directly.
Why PCA?
That’s where PCA comes in. PCA finds the principal components — the directions in which the data varies the most. By rotating the data into this new coordinate system, we can:
• Reduce dimensions (e.g., compress 100 features down to 2 or 3, while keeping most of the information).
• Better visualize high-dimensional data.
• Understand the strongest underlying patterns.
In essence, PCA transforms the messy covariance relationships into a simpler structure. Instead of asking “how does feature A affect feature B,” PCA lets us ask, “what combination of features explains most of the variation in this dataset?”
Top comments (0)