Unsupervised Machine Learning

#data #datascience #machinelearning #ai

Machine learning is one of the core concepts of data science which forms the foundation for AI.

Have you ever wondered how recommendation systems such as YouTube work? How are they able to recommend just the right content? The answer is simple: unsupervised machine learning.

Unsupervised Machine Learning

It is a type of machine learning where the model is fed raw, unstructured data without any labels.

The model then learns and make sense of the data through discovering patterns and relationships on its own.

How does it work?

For it to learn the patterns and relationships, unsupervised machine learning depends heavily on mathematical concepts.
Data points with similar features are grouped together each in its own group.

Models of Unsupervised Machine Learning

There are different unsupervised machine learning models: clustering and dimensionality reduction.

i. Clustering

Just as the name suggests, clustering involves grouping data into different clusters such that data points in the same cluster have very similar features while data points in different clusters are very different.

There are different algorithms used in clustering:

K-means

In K-means clustering, the user defines the number of desired clusters (K) that the algorithm is supposed to form.

Distance metrics such as the Euclidean distance and Manhattan distance come in to play whereby the algorithm measures the distance of data points from a centroid and clusters the data points depending on how similar a point is to a centroid.

Hierarchical clustering

It involves forming a hierarchy of data points thus creating a tree of clusters. There are two types of hierarchical clustering: agglomerative which is a bottom-up approach and divisive which is a top-bottom approach.

ii. Dimensionality Reduction

At times, we encounter datasets that have so many features; features which are of no meaningful value to the data we are trying to make sense of.

In such a case, we use dimensionality reduction which works by reducing the number of variables while preserving key information. The model filters through the noise and gets rid of the unnecessary dimensions.