DEV Community

Raymond
Raymond

Posted on

UNSUPERVISED ML

What is Unsupervised ml ???

Unsupervised ML is a branch of ML that operates without the luxury of labeled data like supervised ML, making it a powerful tool for discovery. I’d say the main principle behind unsupervised learning is pattern recognition. The algorithm must identify hidden structures, group similar data points, reduce dimensionality, or detect anomalies based only on the characteristics of the data. Among the various techniques within unsupervised ML, clustering emerges as maybe the most intuitive and widely applicable approach to understanding data.

How does it work ??? **
Unsupervised ML achieves it tasks by first of all **Clustering
. Clustering is a very common technique that groups data points that are similar to each other, such as when a company sorts its customers into different groups based on their buying habits without anyone telling the algorithm what a "frequent shopper" looks like. Next it uses Association methods to find rules that show relationships between different items, with market basket analysis being a well-known example that might discover people who buy product A are also likely to buy product B. Dimensionality reduction techniques then reduces the number of features or variables in a dataset, which helps make complex data easier to visualize and work with without losing the important information.

Common Models and Examples

K-Means Clustering
K-means is the most popular clustering method because it's simple and works well. It splits data into k groups by finding the best center points for each group and putting data points with their nearest center. The algorithm keeps moving these center points and reassigning data until it finds the best arrangement. K-means works fast, but it works best with round-shaped groups and you need to decide how many groups you want beforehand.

Hierarchical Clustering
This method creates a family tree of groups, working either from bottom-up or top-down. The bottom-up approach starts by treating each data point as its own group, then combines the closest groups together step by step. The top-down approach does the opposite - it starts with all data in one big group and keeps splitting it into smaller groups. This creates a tree diagram that shows how groups relate to each other at different levels.

Gaussian Mixture Models (GMM)
GMM assumes that data comes from a mix of bell-shaped patterns and uses a special method to figure out the best settings. Unlike other clustering methods that put each data point in just one group, GMM gives each point a probability of belonging to different groups. This "soft" grouping approach is really helpful when it's unclear which group a data point should belong to, since it can partly belong to multiple groups at once.

My two cents(opinions)

As someone who's just started exploring clustering, I'm genuinely amazed by what these algorithms can do. What strikes me most is how clustering reveals hidden connections. I've noticed that the groups it creates often make perfect sense once you see them, but would have been nearly impossible to identify manually, especially with lots of data points. I've learned that choosing the right number of clusters is trickier than it first appears. Sometimes the "best" mathematical answer doesn't match what makes sense in the real world, and that's taught me that these tools need human judgment too. The results aren't always perfect, but they consistently give me new ways to think about my data.

Top comments (0)