PCA vs t-SNE vs UMAP: Real Performance on 10K Samples

#dimensionalityreduct #pca #tsne #umap

The 10-Minute t-SNE That Killed My Demo

I was prepping a cluster visualization for a client review when t-SNE decided to run for 10 minutes on a 10,000-sample dataset. The Jupyter cell just sat there, kernel busy, while I frantically Googled "t-SNE slow fix". By the time it finished, the meeting had moved on.

That's when I learned the hard truth: dimensionality reduction isn't one-size-fits-all. PCA finishes in under a second. UMAP takes 15 seconds and gives you something that looks like t-SNE. t-SNE? Still chugging along at 600 seconds.

Here's what actually matters when you're staring at 50 dimensions and need to get to 2.

Abstract display of floating letters creating a creative visual texture. — Photo by Anton Belitskiy on Pexels

What Each Algorithm Actually Does (No, Not the Math First)

PCA projects your data onto orthogonal axes that capture maximum variance. It's linear, which means it fundamentally can't untangle spirals or clusters that wrap around each other. The upside? It runs in $O(nd^2 + d^3)$ time where $n$ is samples and $d$ is dimensions — blazing fast for most real-world data.

Continue reading the full article on TildAlice