We all know what is clustering. For revision.....
Clustering:It is the task of dividing the population or data points into several groups.
There are two types of clustering 1. Hard Clustering 2. Soft Clustering.
And there are four types of algorithms also which are...
1) Connectivity Model: As the name suggests, these are the models based on the definition that the data points closer in data space exhibit more similarity to each other than the data points lying farther away.
There are two approaches in this type...
I. Start with classifying all data points into separate clusters and then aggregating them as distance increases.
II. All the data points are classified as a single cluster and then partitioned as the distance increases.
Ex: Hierarchical Algorithm, etc.
2) Centroid Models: These are the iterative clustering algorithms in which the notion of the similarity is derived by the closeness of data points to the centroid of clusters.
Ex: K-Means Algorithm, etc.
3) Distribution Models: These clustering models are based on the definition of how probable is it that all the data points in the cluster belong to the same distribution. This type of model often leads to overfitting.
Ex: Expectation-Maximization Algorithms, etc.
4) Density Models: These models search the data space for areas of the varied density of data points in that data space.
Ex: DBSCAN & OPTICS, etc.
The short explanation for hard and soft clustering can be..
Hard Clustering: In this, each data point either belongs to a cluster complexity or not.
Soft Clustering: In this, instead of putting each data points into a separate cluster, a probability or likelihood of that data point to be in those clusters is assigned.