Unsupervised Learning

#unsupervisedlearning #machinelearning #datascience #computerscience

Unsupervised Learning involves a set of algorithm that are used to learn patterns from a data without targets (labels). Unsupervised Learning does not require that each that data point in the dataset be This is Contrarily to Supervised Learning that requires that each data point should have a label, which means that the dataset consists of features and targets.
Basically, in unsupervised learning, the features are not labelled. That means that there is no target to be predicted and there is an interest in finding some patterns in the features. Fundamentally, unsupervised learning involves clustering or dimension reduction. However, there are five types of unsupervised learning. Dridi ^[1] discussed four types of task that can be carried out in unsupervised learning which include: Clustering, association, anomaly detection and autoencoders. But on examination, principal component analysis (pca) and autoencoders are only some approach to carrying out dimensionality reduction which is a task that can be carried out using unsupervised learning:

Clustering

The interest is to put the unlabeled data into categories called clusters, such that objects (data points) in a cluster are more similar than objects in other clusters. A cluster is thus, a collection of objects (data points) with similar characteristics, such objects in these collection are dissimilar to objects in other collections. The types of clustering are: partition, hierarchical, overlapping (fuzzy set) and probabilistic clustering.
a. Partition: Here, an object can belong to one and only one cluster
b. Hierarchical: starts with all the data points as a separate cluster (which means that if the dataset has 1000 data points, then we initially will have 1000 clusters) and then ends with fewer number of clusters, iteratively by taking points that are close by, until the required number of clusters is reached.
c. Overlapping: Each datapoint can belong to two or more clusters with some degree of membership to each cluster.
d. Probabilistic: This method uses some probability to assign data points to clusters.

Types of partitioning clusters

K-means: this algorithm divides a group of data points into non-overlapping clusters using centroids. With an assumption that the clusters are of equal sizes, the joint distribution of features have equal variance, and that there are independent features with similar cluster density^[2]. Basically, K-centroids are chosen and points that are close to each centroid are put in a cluster with the centroid. The measure of closeness is determined by some measure of distance between points, such as the Euclidean distance. Then these two steps written below follow: (a) A new centroid is thereafter obtained for each cluster by finding the average of the points in the cluster, (b) Using the distance measure metric again, points that are close to the centroid are put in same cluster with the new centroid ^[2]
(a) and (b) continue until the difference between the centroids from a preceding iteration and the next iteration (the iteration that immediately follows) is lower than a threshold value.

For small K's computation is usually time saving, but with an increase in K, the computation becomes complex and slower. This algorithm is probably the most popular clustering algorithm in unsupervised learning. There are other variants of this algorithm like the mini-batch k means ^[2]
Mini-Batch-K-Means: This is variant of K-means which uses subsets of the input data which are randomly sample for every iteration; this subsets are called mini-batches. This technique saves computation time and may only have slightly worse results when compared to the traditional K-means^[2].
Bisecting-K-means: This algorithm starts with a single centroid which implies a single cluster for the whole data and then splits each cluster into two until the desired number of clusters is reached^[2].
Mean Shift: Here, instead taking all the centroid at once as in K-means, a centroid is taken, within a circle of radius, r of points, and only points in the radius, r are considered and used to compute a new centroids, and as the centroid shifts, the circle shifts until it converges, which is the birth of the first cluster; then an entirely new centroid is obtained and undergoes the process by which the first cluster was obtained and the process goes on and on until there is the desired number of clusters^[2].

Association

The interest here is to find a defining rule which represents a relationship. E.g If A is connected to B and B is connected to C, then A is connected to B, Customers who purchase item A also purchase item B e.t.c. Association includes:
1. Market basket analysis
2. Customer clustering
3. Price bundling
4. Assortment decisions
5. Cross selling e.t.c^[1]

Anomaly detection

This is concerned with detecting outliers. This is useful in detecting hacked network (by anomaly network traffic), military surveillance, data cleaning and crime detection^[1].

Dimensionality reduction

This can be carried out using pca and autoencoders. Here, the data is reduced to a lower dimension ^[1].

Principal Components Analysis: Principal Components Analysis (pca), is a data reduction technique used when there is multicollinearity between the features. The features are reduced into components using Eigen vectors. The corresponding Eigen values are used to determine the variation in the data. The sum of the Eigen values = trace of variance-covariance matrix which is the total variation in the data. Hence the variation explained by a component = Eigen value of the component/sum of the Eigen values. Pca can be used for facial recognition, image compression, moving recommendation system, and optimizing the power allocation in various communication e.t.c.

Reference

Dridi, S. (2022, April 4). Unsupervised Learning - A Systematic Literature Review. https://doi.org/10.31219/osf.io/kpqr6
Buitinck, L., Louppe, G., Blondel, M., Pedregosa, Fabian, Mueller, A.,Grisel, O., … Ga"el Varoquaux. (2013). API design for machine learning software: experiences from the scikit learn project. In ECML PKDD Workshop: Languages for Data Mining and Machine Learning (pp. 108–122) (Check the Documentation User Guide)

DEV Community

Unsupervised Learning

Clustering

Association

Anomaly detection

Dimensionality reduction

Reference

Top comments (0)

Read next

Comprehensive Guide to Data Observability Tools in 2024

Machine Learning Basics: Building Your First Predictive Model in R

Why Run LLM's /SLM's locally

Why Seeing Data Beats Reading It: The Case for Data Visualization