In my previous article, I explored several dimensionality reduction techniques, including PCA, t-SNE, UMAP, LDA, Sammon Mapping, KNN Graphs, and Autoencoders.
Going into that project, my goal was fairly simple.
I wanted to understand how these algorithms worked, where they were useful, and whether they could improve model performance.
Like many people learning machine learning, I saw dimensionality reduction as a classical ML topic.
Something you learn alongside feature engineering and data preprocessing.
Useful knowledge, but not something I expected to connect to modern AI systems.
I was wrong.
Not because PCA powers Large Language Models.
Not because dimensionality reduction is secretly the most important topic in machine learning.
But because the project forced me to think about a question I hadn't considered before.
The Question I Didn't Expect
While comparing different algorithms, I noticed something strange.
The same dataset could look completely different depending on the technique I used.
PCA produced one view.
t-SNE produced another.
UMAP showed patterns that weren't obvious in either of them.
Autoencoders created their own representation altogether.
At first, I was focused on which visualization looked better.
Then I started asking a different question:
If all of these algorithms are looking at exactly the same data, what are they actually trying to preserve?
That question turned out to be far more interesting than finding the "best" dimensionality reduction algorithm.
PCA Isn't Really About Reducing Dimensions
Most tutorials introduce PCA as a way to reduce features.
For example, a dataset with 100 features might be compressed into 10 principal components while retaining most of the variance.
That explanation is correct.
But while experimenting with PCA, I realized I was focusing on the wrong part of the process.
The interesting part wasn't that 100 dimensions became 10.
The interesting part was that the transformed data still retained much of the structure of the original dataset.
Some information was discarded.
Some information was preserved.
The algorithm had effectively made a decision about what mattered.
And that idea kept showing up across other dimensionality reduction techniques.
Each algorithm compressed the data differently because each algorithm had a different definition of what should be preserved.
The Moment Things Started Clicking
The more I explored these techniques, the less I thought about dimensions and the more I thought about representations.
PCA preserves variance.
t-SNE focuses heavily on local neighborhoods.
UMAP attempts to maintain structural relationships.
Autoencoders learn their own compressed representation from data.
Different algorithms.
Different mathematics.
Different outputs.
But they all seemed to revolve around the same challenge:
How do you transform information into a form that still captures the important patterns?
Once I started thinking that way, I began noticing the same idea outside dimensionality reduction.
Why This Matters Beyond Classical Machine Learning
One thing that often happens when you're learning machine learning is that topics get placed into separate mental boxes.
Classical ML.
Deep Learning.
LLMs.
Recommendation Systems.
Computer Vision.
NLP.
They can feel like completely different worlds.
But sometimes the same ideas appear in all of them.
Take image classification.
A neural network doesn't look at an image the same way throughout the entire model.
Early layers respond to simple patterns.
Edges.
Textures.
Basic shapes.
Deeper layers work with increasingly abstract representations.
By the time the model makes a prediction, it is operating on something very different from the original pixels.
The representation has changed multiple times.
The model has transformed the data into a form that makes the task easier.
That sounded surprisingly familiar.
Then I Started Looking at LLMs
The same thing happened when I started learning more about embeddings.
Consider the words "car" and "vehicle."
They are different words, yet most embedding models place them relatively close together in vector space.
The model isn't storing dictionary definitions.
It is learning a representation that captures part of the relationship between those words.
The exact mechanism is very different from PCA.
The mathematics is different.
The scale is different.
But the underlying idea felt familiar.
Once again, information was being transformed into a representation that preserved what mattered for the task.
That was the connection I hadn't expected when I started learning dimensionality reduction.
A Better Mental Model
Many beginners think about machine learning like this:
Data → Model → Prediction
After working through these dimensionality reduction techniques, I think a more useful mental model is:
Data → Representation → Model → Prediction
Because the way information is represented often determines what patterns a model can learn.
Two systems can work with the same underlying data and arrive at different outcomes simply because they represent that data differently.
That's true for PCA.
It's true for Autoencoders.
It's true for embeddings.
And it's true for many modern AI systems.
Final Thoughts
I started learning dimensionality reduction because I wanted to understand PCA, t-SNE, UMAP, and a few other algorithms.
What I didn't expect was that those techniques would change how I think about machine learning itself.
The biggest lesson wasn't about reducing dimensions.
It wasn't about preprocessing.
And it wasn't about improving model accuracy.
It was realizing that many AI systems, regardless of how different they appear on the surface, spend a significant amount of effort answering the same question:
How should information be represented so that useful patterns become easier to discover?
For me, dimensionality reduction was the first place where that idea became visible.
And once I noticed it, I started seeing it everywhere.
Top comments (0)